134 74 24MB
English Pages 707 [678] Year 2023
Smart Innovation, Systems and Technologies 370
Vikrant Bhateja · Xin-She Yang · Marta Campos Ferreira · Sandeep Singh Sengar · Carlos M. Travieso-Gonzalez Editors
Evolution in Computational Intelligence Proceedings of the 11th International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA 2023)
123
Smart Innovation, Systems and Technologies Volume 370
Series Editors Robert J. Howlett, KES International Research, Shoreham-by-Sea, UK Lakhmi C. Jain, KES International, Shoreham-by-Sea, UK
The Smart Innovation, Systems and Technologies book series encompasses the topics of knowledge, intelligence, innovation and sustainability. The aim of the series is to make available a platform for the publication of books on all aspects of single and multi-disciplinary research on these themes in order to make the latest results available in a readily-accessible form. Volumes on interdisciplinary research combining two or more of these areas is particularly sought. The series covers systems and paradigms that employ knowledge and intelligence in a broad sense. Its scope is systems having embedded knowledge and intelligence, which may be applied to the solution of world problems in industry, the environment and the community. It also focusses on the knowledge-transfer methodologies and innovation strategies employed to make this happen effectively. The combination of intelligent systems tools and a broad range of applications introduces a need for a synergy of disciplines from science, technology, business and the humanities. The series will include conference proceedings, edited collections, monographs, handbooks, reference books, and other relevant types of book in areas of science and technology where smart systems and technologies can offer innovative solutions. High quality content is an essential feature for all book proposals accepted for the series. It is expected that editors of all accepted volumes will ensure that contributions are subjected to an appropriate level of reviewing process and adhere to KES quality principles. Indexed by SCOPUS, EI Compendex, INSPEC, WTI Frankfurt eG, zbMATH, Japanese Science and Technology Agency (JST), SCImago, DBLP. All books published in the series are submitted for consideration in Web of Science.
Vikrant Bhateja · Xin-She Yang · Marta Campos Ferreira · Sandeep Singh Sengar · Carlos M. Travieso-Gonzalez Editors
Evolution in Computational Intelligence Proceedings of the 11th International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA 2023)
Editors Vikrant Bhateja Department of Electronics Engineering Faculty of Engineering and Technology (UNSIET) Veer Bahadur Singh Purvanchal University Jaunpur, Uttar Pradesh, India
Xin-She Yang Middlesex University London, UK Sandeep Singh Sengar Cardiff Metropolitan University Cardiff, Warwickshire, UK
Marta Campos Ferreira Faculty of Engineering University of Porto Porto, Portugal Carlos M. Travieso-Gonzalez Institute for Technological Development and Innovation in Communications University of Las Palmas de Gran Canaria Las Palmas de Gran Canaria, Spain
ISSN 2190-3018 ISSN 2190-3026 (electronic) Smart Innovation, Systems and Technologies ISBN 978-981-99-6701-8 ISBN 978-981-99-6702-5 (eBook) https://doi.org/10.1007/978-981-99-6702-5 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore Paper in this product is recyclable.
Organization
Chief Patron President and Vice-Chancellor, Cardiff Metropolitan University, UK.
Patron Prof. Jon Platts, Dean CST, Cardiff Metropolitan University, UK.
Honorary Chair Prof. Rajkumar Buyya, University of Melbourne, Australia.
Steering Committee Chairs Prof. Suresh Chandra Satapathy, KIIT Deemed to be University, Bhubaneswar, Odisha, India. Prof. Siba K. Udgata, University of Hyderabad, Telangana, India. Dr. Xin-She Yang, Middlesex University London, UK.
v
vi
Organization
General Chairs Dr. Jinshan Tang, George Mason University, Virginia, USA. Dr. João Manuel R. S. Tavares, Faculdade de Engenharia da Universidade do Porto, Portugal.
Organizing Chair Dr. Sandeep Singh Sengar, Cardiff Metropolitan University, UK.
Publication Chairs Dr. Fiona Carroll, Cardiff Metropolitan University, UK. Dr. Peter Peer, Faculty of Computer and Information Science, University of Ljubljana, Slovenia. Dr. Vikrant Bhateja, Veer Bahadur Singh Purvanchal University, Jaunpur, Uttar Pradesh, India. Dr. Jerry Chun-Wei Lin, Western Norway University of Applied Sciences, Bergen, Norway.
Publicity Chairs Dr. Rajkumar Singh Rathore, Cardiff Metropolitan University, UK. Dr. Elochukwu Ukwandu, Cardiff Metropolitan University, UK. Dr. Hewage Chaminda, Cardiff Metropolitan University, UK. Dr. Catherine Tryfona, Cardiff Metropolitan University, UK. Dr. Priyatharshini Rajaram, Cardiff Metropolitan University, UK.
Advisory Committee Aime’ Lay-Ekuakille, University of Salento, Lecce, Italy. Amira Ashour, Tanta University, Egypt. Aynur Unal, Stanford University, USA. Bansidhar Majhi, IIIT Kancheepuram, Tamil Nadu, India. Dariusz Jacek Jakobczak, Koszalin University of Technology, Koszalin, Poland. Edmond C. Prakash, University for the Creative Arts, UK.
Organization
Ganpati Panda, IIT Bhubaneswar, Odisha, India. Isah Lawal, Noroff University College, Norway. Jagdish Chand Bansal, South Asian University, New Delhi, India. João Manuel R. S. Tavares, Universidade do Porto (FEUP), Porto, Portugal. Jyotsana Kumar Mandal, University of Kalyani, West Bengal, India. K. C. Santosh, University of South Dakota, USA. Le Hoang Son, Vietnam National University, Hanoi, Vietnam. Milan Tuba, Singidunum University, Belgrade, Serbia. Naeem Hanoon, Multimedia University, Cyberjaya, Malaysia. Nilanjan Dey, TIET, Kolkata, India. Noor Zaman, Universiti Tecknologi, PETRONAS, Malaysia. Pradip Kumar Das, IIT Guwahati, India. Rahul Paul, Harvard Medical School and Massachusetts General Hospital, USA. Roman Senkerik, Tomas Bata University in Zlin, Czech Republic. Sachin Sharma, Technological University Dublin, Ireland. Sriparna Saha, IIT Patna, India. Swagatam Das, Indian Statistical Institute, Kolkata, India. Siba K. Udgata, University of Hyderabad, Telangana, India. Tai Kang, Nanyang Technological University, Singapore. Valentina Balas, Aurel Vlaicu University of Arad, Romania. Vishal Sharma, Nanyang Technological University, Singapore. Yu-Dong Zhang, University of Leicester, UK.
Technical Program Committee Chairs Dr. Mufti Mahmud, Nottingham Trent University, Nottingham, UK. Dr. Paul Angel, Cardiff Metropolitan University, UK. Dr. Steven L. Fernandes, Creighton University, USA. Ioannis Kypraios, De Montfort University, Leicester, UK. Jasim Uddin, Cardiff Metropolitan University, Cardiff, UK.
Technical Program Committee A. K. Chaturvedi, IIT Kanpur, India. Abdul Wahid, Telecom Paris, Institute Polytechnique de Paris, Paris, France. Ahit Mishra, Manipal University, Dubai Campus, Dubai. Ahmad Al-Khasawneh, The Hashemite University, Jordan. Alexander Christea, University of Warwick, London, UK. Anand Paul, The School of Computer Science and Engineering, South Korea. Anish Saha, NIT Silchar, India. Bhavesh Joshi, Advent College, Udaipur, India.
vii
viii
Organization
Brent Waters, University of Texas, Austin, Texas, USA. Catherine Tryfona, Cardiff Metropolitan University, UK. Chhavi Dhiman, Delhi Technological University, India. Dan Boneh, Stanford University, California, USA. Debanjan Konar, Helmholtz-Zentrum Dresden-Rossendorf, Germany. Dipankar Das, Jadavpur University, India. Feng Jiang, Harbin Institute of Technology, China. Gayadhar Panda, NIT Meghalaya, India. Ginu Rajan, Cardiff Metropolitan University, UK. Gengshen Zhong, Jinan, Shandong, China. Hewage Chaminda, Cardiff Metropolitan University, UK. Imtiaz Ali Khan, Cardiff Metropolitan University, UK. Issam Damaj, Cardiff Metropolitan University, UK. Jean Michel Bruel, Department Informatique IUT de Blagnac, Blagnac, France. Jeny Rajan, National Institute of Technology Surathkal, India. Krishnamachar Prasad, Auckland University, New Zealand. Korhan Cengiz, University of Fujairah, Turkey. Lorne Olfman, Claremont, California, USA. Martin Everett, University of Manchester, UK. Massimo Tistarelli, Dipartimento Di Scienze Biomediche, Viale San Pietro. Milan Sihic, RHIT University, Australia. M. Ramakrishna, ANITS, Vizag, India. Ngai-Man Cheung, University of Technology and Design, Singapore. Philip Yang, Price Water House Coopers, Beijing, China. Praveen Kumar Donta, Institut für Information Systems Engineering, Austria. Prasun Sinha, Ohio State University Columbus, Columbus, OH, USA. Priyatharshini Rajaram, Cardiff Metropolitan University, UK. Sami Mnasri, IRIT Laboratory Toulouse, France. Shadan Khan Khattak, Cardiff Metropolitan University, UK. Ting-Peng Liang, National Chengchi University Taipei, Taiwan. Uchenna Diala, University of Derby, UK. V. Rajnikanth, St. Joseph’s College of Engineering, Chennai, India. Wai-Keung Fung, Cardiff Metropolitan University, UK. Xiaoyi Yu, Institute of Automation, Chinese Academy of Sciences, Beijing, China. Yun-Bae Kim, Sungkyunkwan University, South Korea. Yang Zhang, University of Liverpool, UK.
Student Ambassadors Kandala Abhigna, KIIT Deemed to be University, India. Rasagna T., KIIT Deemed to be University, India.
Preface
This book is a collection of high-quality peer-reviewed and selected research papers presented at the 11th International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA-2023) held at Cardiff School of Technologies, Cardiff Metropolitan University, Cardiff (Wales), UK, during 11–12 April 2023. The idea of this conference series was conceived by few eminent professors and researchers from premier institutions of India. The first three editions of this conference: FICTA-2012, 2013, and 2014 were organized by Bhubaneswar Engineering College (BEC), Bhubaneswar, Odisha, India. The fourth edition FICTA-2015 was held at NIT Durgapur, West Bengal, India. The fifth and sixth editions FICTA-2016 and FICTA-2017 were consecutively organized by KIIT University, Bhubaneswar, Odisha, India. FICTA-2018 was hosted by Duy Tan University, Da Nang City, Vietnam. The eighth edition FICTA-2020 was held at NIT Karnataka, Surathkal, India. The ninth and tenth editions FICTA-2021 and FICTA-2022 were held at NIT Mizoram, Aizawl, India. All past editions of the FICTA conference proceedings are published by Springer. FICTA conference series aims to bring together researchers, scientists, engineers, and practitioners to exchange and share their theories, methodologies, new ideas, experiences, applications in all areas of intelligent computing theories, and applications to various engineering disciplines like Computer Science, Electronics, Electrical, Mechanical, Bio-medical Engineering, etc. FICTA-2023 had received a good number of submissions from the different areas relating to computational intelligence, intelligent data engineering, data analytics, decision sciences, and associated applications in the arena of intelligent computing. These papers have undergone a rigorous peer-review process with the help of our technical program committee members (from the country as well as abroad). The review process has been very crucial with minimum 02 reviews each; and in many cases, 3–5 reviews along with due checks on similarity and content overlap as well. This conference witnessed huge number of submissions including the main track as well as special sessions. The conference featured many special sessions in various cutting-edge technologies of specialized focus which were organized and chaired by eminent professors. The total toll of papers included submissions received crosscountry along with many overseas countries. Out of this pool, only 109 papers were ix
x
Preface
given acceptance and segregated as two different volumes for publication under the proceedings. This volume consists of 55 papers from diverse areas of Evolution in Computational Intelligence. The conference featured many distinguished keynote addresses in different spheres of intelligent computing by eminent speakers like: Dr. Frank Langbein, Cardiff University, Cathays, Cardiff, Wales, UK, spoke on “Control and Machine Learning for Magnetic Resonance Spectroscopy”; Mr. Aninda Bose, Executive Editor, Springer Nature, London, UK, discussed about “Nuances in Scientific Publishing”. Prof. Yu-Dong Zhang, University of Leicester, UK, spoke on Intelligent Computing Theories and Applications for Infectious Disease Diagnosis, whereas Dr. Chaminda Hewage, Cardiff Metropolitan University, UK, discussed about “Data protection in the era of ChatGPT”. Dr. Imtiaz Khan, Cardiff Metropolitan University, UK, delivered a talk on “Artificial Intelligence and Blockchain in Health Care 4.0”, while Dr. Siba K. Udgata, University of Hyderabad, India, spoke on “WiSE-Tech: Wi-Fi Sensing Environment for Various technological and Societal Applications”; Dr. R. Chinnaiyan, Presidency University, Bengaluru, Karnataka, India, delivered an invited address on “AI, Digital Twin and Blockchain for Health Care and Agriculture”. Lastly, the keynote sessions were concluded with the felicitation of Dr. XinShe Yang (Reader at Middlesex University London, UK, and also Steering Chair of FICTA-2023). These sessions received ample applause from the vast audience of delegates, budding researchers, faculty, and students. We thank the advisory chairs and steering committees for rendering mentor support to the conference. An extreme note of gratitude goes to our Organizing Chair and Publicity and TPC Chairs for playing a lead role in the entire process of organizing this conference. We take this opportunity to thank authors of all submitted papers for their hard work, adherence to the deadlines, and patience with the review process. The quality of a refereed volume depends mainly on the expertise and dedication of the reviewers. We are indebted to the technical program committee members who not only produced excellent reviews but also did these in short time frames. We would also like to thank the participants of this conference, who have participated the conference above all hardships. Jaunpur, India London, UK Porto, Portugal Cardiff (Wales), UK Las Palmas de Gran Canaria, Spain
Dr. Vikrant Bhateja Dr. Xin-She Yang Dr. Marta Campos Ferreira Dr. Sandeep Singh Sengar Dr. Carlos M. Travieso-Gonzalez
Contents
1
2
3
4
Towards the Development of a Decision-Making Framework: A Contribution Inform of a Decision Support Aid for Complex Technical Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Akinola Kila and Penny Hart Multi-attention TransUNet—A Transformer Approach for Image Description Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pradumn Mishra, Mayank Shrivastava, Urja Jain, Abhisek Omkar Prasad, and Suresh Chandra Satapathy Empirical Review of Oversampling Methods to Handle the Class Imbalance Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ritika Kumari, Jaspreeti Singh, and Anjana Gosain Automatic COVID Protocols-Based Human Entry Check System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Annapareddy V. N. Reddy, Chinthalapudi Siva Vara Prasad, Oleti Prathyusha, Duddu Sai Praveen Kumar, and Jangam Sneha Madhuri
5
Effect of Machine Translation on Authorship Attribution . . . . . . . . . S. Ouamour and H. Sayoud
6
Smart Hospitality: Understanding the ‘Green’ Challenges of Hotels and How IoT-Based Sustainable Development Could be the Answer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nick Kalsi, Fiona Carroll, Katarzyna Minor, and Jon Platts
7
A Novel Knowledge Distillation Technique for Colonoscopy and Medical Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Indrajit Kar, Sudipta Mukhopadhyay, Rishabh Balaiwar, and Tanmay Khule
1
21
35
49
65
73
85
xi
xii
8
9
Contents
AI-Enabled Smart Monitoring Device for Image Capturing and Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Valluri Padmapriya, M. Prasanna, Kuruba Manasa, Rallabandi Shivani, and Sai Bhargav
99
A Compact Formulation for the mDmSOP: Theoretical and Computational Time Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Ravi Kant and Abhishek Mishra
10 Keywords on COVID-19 Vaccination: An Application of NLP into Macau Netizens’ Social Media Comments . . . . . . . . . . . . . . . . . . . 125 Xi Chen, Vincent Xian Wang, Lily Lim, and Chu-Ren Huang 11 Ensemble Machine Learning-Based Network Intrusion Detection System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 K. Indra Gandhi, Sudharsan Balaji, S. Srikanth, and V. Suba Varshini 12 Household Power Consumption Analysis and Prediction Using LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 Varsha Singh, Abhishek Karmakar, Sharik Gazi, Shashwat Mishra, Shubhendra Kumar, and Uma Shanker Tiwary 13 Indian Stock Price Prediction Using Long Short-Term Memory . . . 161 Himanshu Rathi, Ishaan Joardar, Gaurav Dhanuka, Lakshya Gupta, and J. Angel Arul Jothi 14 On the Impact of Temperature for Precipitation Analysis . . . . . . . . . 173 Madara Premawardhana, Menatallah Abdel Azeem, Sandeep Singh Sengar, and Soumyabrata Dev 15 Vision-Based Facial Detection and Recognition for Attendance System Using Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Siginamsetty Phani and Ashu Abdul 16 Cross-Modal Knowledge Distillation for Audiovisual Speech Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 L. Ashok Kumar, D. Karthika Renuka, V. Dineshraja, and Fatima Abdul Jabbar 17 Classification of Autism Spectrum Disorder Based on Brain Image Data Using Deep Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . 209 Polavarapu Bhagya Lakshmi, V. Dinesh Reddy, Shantanu Ghosh, and Sandeep Singh Sengar 18 Transformer-Based Attention Model for Email Spam Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 V. Sri Vinitha, D. Karthika Renuka, and L. Ashok Kumar
Contents
xiii
19 Agriculture Land Image Classification Using Machine Learning Algorithms and Deep Learning Techniques . . . . . . . . . . . . . 235 Yarlagadda Mohana Bharghavi, C. S. Pavan Kumar, Yenduri Harshitha Lakshmi, and Kuncham Pushpa Sri Vyshnavi 20 A Comprehensive Machine Learning Approach in Detecting Coronary Heart Disease . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 ElaproluSai Prasanna, T. Anuradha, Vara Swetha, and Jangam Pragathi 21 Evaluation of Crime Against Women Through Modern Data Visualization Techniques for Better Understanding of Alarming Circumstances Across India . . . . . . . . . . . . . . . . . . . . . . . . 257 P. Chandana, Kotha Sita Kumari, Ch. Devi Likhitha, and Sk. Shahin 22 Artificial Neural Network-Based Malware Detection Model Among Shopping Apps to Increase the App Security . . . . . . . . . . . . . . 267 N. Manasa, Kotha Sita Kumari, P. Bhavya Sri, and D. Sadhrusya 23 A Satellite-Based Rainfall Prediction Model Using Convolution Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 T. Lakshmi Sujitha, T. Anuradha, and G. Akshitha 24 A Smart Enhanced Plant Health Monitoring System Suitable for Hydroponics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 I. N. Venkat Kiran, G. Kalyani, B. Mahesh, and K. Rohith 25 A Survey: Classifying and Predicting Features Based on Facial Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 J. Tejaashwini Goud, Nuthanakanti Bhaskar, Voruganti Naresh Kumar, Suraya Mubeen, Jonnadula Narasimharao, and Raheem Unnisa 26 Satellite Ortho Image Mosaic Process Quality Verification . . . . . . . . 309 Jonnadula Narasimharao, P. Priyanka Chowdary, Avala Raji Reddy, G. Swathi, B. P. Deepak Kumar, and Sree Saranya Batchu 27 Early Prediction of Sepsis Utilizing Machine Learning Models . . . . 319 J. Sasi Kiran, J. Avanija, Avala Raji Reddy, G. Naga Rama Devi, N. S. Charan, and Tabeen Fatima 28 Detection of Malicious URLs Using Gradient Boosting Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 Saba Sultana, K. Reddy Madhavi, G. Lavanya, J. Swarna Latha, Sandhyarani, and Balijapalli Prathyusha 29 License Plate Recognition Using Neural Networks . . . . . . . . . . . . . . . . 341 D. Satti Babu, T. V. Prasad, B. Revanth Sai, G. D. N. S. Sudheshna, N. Venkata Kishore, and P. Chandra Vamsi
xiv
Contents
30 Dehazing of Satellite Images with Low Wavelength and High Distortion: A Comparative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 Amrutha Sajeevan and B. A. Sabarish 31 Automated Segmentation of Tracking Healthy Organs from Gastrointestinal Tumor Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363 Sanju Varghese John and Bibal Benifa 32 Collision Free Energy Efficient Multipath UAV-2-GV Communication in VANET Routing Protocol . . . . . . . . . . . . . . . . . . . . 375 Mohamed Ayad Alkhafaji, Nejood Faisal Abdulsattar, Mohammed Hasan Mutar, Ahmed H. Alkhayyat, Waleed Khalid Al-Azzawi, Fatima Hashim Abbas, and Muhammet Tahir Guneser 33 Intelligent Data Transmission Through Stability-Oriented Multi-agent Clustering in VANETs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391 Ali Alsalamy, Mustafa Al-Tahai, Aryan Abdlwhab Qader, Sahar R. Abdul Kadeem, Sameer Alani, and Sarmad Nozad Mahmood 34 Improved Chicken Swarm Optimization with Zone-Based Epidemic Routing for Vehicular Networks . . . . . . . . . . . . . . . . . . . . . . . 405 Nejood Faisal Abdulsattar, Ahmed H. Alkhayyat, Fatima Hashim Abbas, Ali S. Abosinnee, Raed Khalid Ibrahim, and Rabei Raad Ali 35 Trust Management Scheme-Based Intelligent Communication for UAV-Assisted VANETs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419 Sameer Alani, Aryan Abdlwhab Qader, Mustafa Al-Tahai, Hassnen Shakir Mansour, Mazin R. AL-Hameed, and Sarmad Nozad Mahmood 36 Wideband and High Gain Antenna of Flowers Patches for Wireless Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433 Qahtan Mutar Gatea, Mohammed Ayad Alkhafaji, Muhammet Tahir Guneser, and Ahmed J. Obaid 37 Certain Investigations on Solar Energy Conversion System in Smart Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447 C. Ebbie Selva Kumar, R. Brindha, and R. Eveline Pregitha 38 Stationary Wavelet-Oriented Luminance Enhancement Approach for Brain Tumor Detection with Multi-modality Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461 A. Ahilan, M. Anlin Sahaya Tinu, A. Jasmine Gnana Malar, and B. Muthu Kumar
Contents
xv
39 Multi Parameter Machine Learning-Based Maternal Healthiness Classification System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475 Rajkumar Ettiyan and V. Geetha 40 Machine Learning-Based Brain Disease Classification Using EEG and MEG Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487 A. Ahilan, J. Angel Sajani, A. Jasmine Gnana Malar, and B. Muthu Kumar 41 Performance Comparison of On-Chain and Off-Chain Data Storage Model Using Blockchain Technology . . . . . . . . . . . . . . . . . . . . 499 E. Sweetline Priya and R. Priya 42 Performance Analysis of Skin Cancer Diagnosis Model Using Deep Learning Algorithm with and Without Segmentation Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513 A. Bindhu and K. K. Thanammal 43 Security for Software Defined Vehicular Networks . . . . . . . . . . . . . . . 529 P. Golda Jeyasheeli, J. Deepika, and R. R. Sathya 44 Design of Multiband Antenna with Staircase Effect in Ground for Multiband Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547 Sonali Kumari, Y. K. Awasthi, and Dipali Bansal 45 Affine Non-local Means Image Denoising . . . . . . . . . . . . . . . . . . . . . . . . 555 Rohit Anand, Valli Madhavi Koti, Mamta Sharma, Supriya Sanjay Ajagekar, Dharmesh Dhabliya, and Ankur Gupta 46 A Novel LRKS-WSQoS Model for Web Service Quality Estimation Using Machine Learning-Based Linear Regression and Kappa Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565 K. Prakash and Kalaiarasan 47 Deep Learning-Based Optimised CNN Model for Early Detection and Classification of Potato Leaf Disease . . . . . . . . . . . . . . . 577 R. Chinnaiyan, Ganesh Prasad, G. Sabarmathi, Swarnamugi, S. Balachandar, and R. Divya 48 Detection of Parkinson’s Disease in Brain MRI Images Using Deep Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 591 N. S. Kalyan Chakravarthy, Ch. Hima Bindu, S. Jafar Ali Ibrahim, Sukhminder Kaur, S. Suresh Kumar, K. Venkata Ratna Prabha, P. Ramesh, A. Ravi Raja, Chandini Nekkantti, and Sai Sree Bhavana 49 Malicious Bot Detection in Large Scale IoT Network Using Unsupervised Machine Learning Technique . . . . . . . . . . . . . . . . . . . . . 605 S. Pravinth Raja, Shaleen Bhatnagar, Ruchi Vyas, Thomas M. Chen, and Mithileysh Sathiyanarayanan
xvi
Contents
50 LSTM with Attention Layer for Prediction of E-Waste and Metal Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615 T. S. Raghavendra, S. R. Nagaraja, and K. G. Mohan 51 ADN-BERT: Attention-Based Deep Network Model Using BERT for Sarcasm Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 629 Pallavi Mishra, Omisha Sharma, and Sandeep Kumar Panda 52 Prediction of Personality Type with Myers–Briggs Type Indicator Using DistilBERT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 641 Suresh Kumar Grandh, K. Adi Narayana Reddy, D. Durga Prasad, and L. Lakshmi 53 A Novel Deep Learning Approach to Find Similar Stocks Using Vector Embeddings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651 Rohini Pinapatruni and Faizan Mohammed 54 Investigating Vulnerabilities of Information Solicitation Process in RPL-Based IoT Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665 Rashmi Sahay and Cherukuri Gaurav Sushant 55 Computer-Based Numerical Analysis of Bioconvective Heat and Mass Transfer Across a Nonlinear Stretching Sheet with Hybrid Nanofluids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 677 Madhu Aneja, Manoj Gaur, Tania Bose, Pradosh Kumar Gantayat, and Renu Bala Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 687
About the Editors
Vikrant Bhateja is associate professor in Department of Electronics Engineering Faculty of Engineering and Technology (UNSIET), Veer Bahadur Singh Purvanchal University, Jaunpur, Uttar Pradesh, India. He holds a doctorate in ECE (Bio-Medical Imaging) with a total academic teaching experience of 20 years with around 190 publications in reputed international conferences, journals and online book chapter contributions; out of which 39 papers are published in SCIE indexed high impact factored journals. One of his papers published in Review of Scientific Instruments (RSI) Journal (under American International Publishers) has been selected as “Editor Choice Paper of the Issue” in 2016. Among the international conference publications, four papers have received “Best Paper Award”. He has been instrumental in chairing/ co-chairing around 30 international conferences in India and abroad as Publication/ TPC chair and edited 52 book volumes from Springer-Nature as a corresponding/ co-editor/author on date. He has delivered nearly 22 keynotes, invited talks in international conferences, ATAL, TEQIP and other AICTE sponsored FDPs and STTPs. He has been Editor-in-Chief of IGI Global—International Journal of Natural Computing and Research (IJNCR) an ACM & DBLP indexed journal from 2017–22. He has guest edited Special Issues in reputed SCIE indexed journals under Springer-Nature and Elsevier. He is Senior Member of IEEE and Life Member of CSI. Xin-She Yang obtained his D.Phil. in Applied Mathematics from the University of Oxford. He then worked at Cambridge University and National Physical Laboratory (UK) as a senior research scientist. Now, he is a reader at Middlesex University London, and a co-editor of the Springer Tracts in Nature-Inspired Computing. He is also an elected fellow of the Institute of Mathematics and its Applications. He was the IEEE Computational Intelligence Society (CIS) chair for the Task Force on Business Intelligence and Knowledge Management (2015–2020). He has published more than 300 peer-reviewed research papers with more than 70,000 citations, and he has been on the prestigious list of highly-cited researchers (Clarivate Analytics/ Web of Sciences) for seven consecutive years (2016–2022).
xvii
xviii
About the Editors
Marta Campos Ferreira is a researcher and invited assistant professor at Faculty of Engineering of University of Porto. She holds a Ph.D. in Transportation Systems from the Faculty of Engineering of University of Porto (MIT Portugal Program), a M.Sc. in Service Engineering and Management from the Faculty of Engineering of University of Porto, and a Lic. in Economics from the Faculty of Economics of University of Porto. She is the co-founder and co-editor of the Topical Collection “Research and Entrepreneurship: Making the Leap from Research to Business” with SN Applied Sciences and an associate editor of the International Journal of Management and Decision Making. She has been involved in several R&D projects in areas such as technology enabled services, transport, and mobility. Her current research interests include service design, human–computer interaction, data science, knowledge extraction, sustainable mobility, and intelligent transport systems. Sandeep Singh Sengar is Lecturer in Computer Science at Cardiff Metropolitan University, UK. Before joining this position, he worked as Postdoctoral Research Fellow at the Machine Learning Section of Computer Science Department, University of Copenhagen, Denmark. He holds a Ph.D. degree in Computer Science and Engineering from Indian Institute of Technology (ISM), Dhanbad, India, and an M.Tech. degree in Information Security from Motilal Nehru National Institute of Technology, Allahabad, India. His current research interests include medical image segmentation, motion segmentation, visual object tracking, object recognition, and video compression. His broader research interests include machine/deep learning, computer vision, image/video processing, and its applications. He has published several research articles in reputed international journals and conferences in the field of computer vision and image processing. He is Reviewer of several reputed international transactions, journals, and conferences including IEEE Transactions on Systems, Man and Cybernetics: Systems, Pattern Recognition, Neural Computing and Applications, Neurocomputing. Carlos M. Travieso-Gonzalez received the M.Sc. degree in 1997 in Telecommunication Engineering at Polytechnic University of Catalonia (UPC), Spain; and Ph.D. degree in 2002 at University of Las Palmas de Gran Canaria (ULPGCSpain). He is a full professor on Signal Processing and Pattern Recognition and the head of Signals and Communications Department at ULPGC, teaching from 2001 on subjects on signal processing and learning theory. His research lines are biometrics, biomedical signals and images, data mining, classification system, signal and image processing, machine learning, and environmental intelligence. He has researched in 50 International and Spanish Research Projects, some of them as the head researcher. He has 440 papers published in international journals and conferences. He has published 7 patents in Spanish Patent and Trademark Office.
Chapter 1
Towards the Development of a Decision-Making Framework: A Contribution Inform of a Decision Support Aid for Complex Technical Organization Akinola Kila and Penny Hart
Abstract Technical organizations with complex engineering activities possess strong methods for conceptualizing, designing and managing complex engineered systems. This type of organization could involve critical resources, and the project could be time critical. Technical managers often rely on their intuitions to make critical decisions in this domain. If this intuition is not mindfully managed, it can result in a single-point failure of the project. To address this, the author proposes the use of an organizational decision-making framework (D’MHAS model) which conceptualizes decision-making as a perceived human activity system. The framework adopts a systems perspective in combination with the concept of requisite variety to guide intuition-led decisions in technical organizations. D’MHAS is made up of concrete systems (processes, behaviours, structures and meaning) and conceptual systems (concepts and ideas) which are in continuous flux through time; it incorporates the idea of requisite variety to allow for effective management of the system. The framework allows for thinking, reflection and learning while taking action. In this paper, Vickers’s idea of appreciation and an appreciative system is revisited as the theoretical basis for the development of D’MHAS model. The discussion of the operational factors of the model and how it was validated are presented. The domain of the Nigerian Space Agency (NASRDA) was used as an example to validate the model.
A. Kila (B) · P. Hart School of Computing, University of Portsmouth, Portsmouth, UK e-mail: [email protected] P. Hart e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_1
1
2
A. Kila and P. Hart
1.1 Introduction Vickers critiqued what he considered the “mechanical and mathematical models of decisionmaking” which emphasised action at the expense of judgement. [1, p. 107]
Decision-making is a process of problem-solving that aims to remove barriers to achieving organizational or individual goals [2, p. 119], it is influenced by different yet connecting factors, including the use of information technology, set goals of the organization, the will and the purpose of the individuals involved. The challenges of identifying ways to improve decision-making in an engineering/technical organization are both technical and social in nature [3, pp. 317–320]. A ‘technical organization’ in the context of this research is that which involves sets of individuals with formal training and received knowledge in the technical and scientific disciplines; it is characterized by the specialized knowledge workers which specifically include scientists, engineers, specific software programmers [4, p. 1387]. Most of the workforce and the organization comprises highly qualified and skilled staff focusing on developing cutting-edge technological innovations, these staff form the core value chain and are supported by corporate/admin management and governance structures as in any other forms of organization [4, pp. 1387–1388]. System engineering principles are reflected in this type of organization; the engineering activities are about (comprehensively include) designing and managing engineered systems [5, p. 678]. As these engineered systems become increasingly complex, the activity’s complexity grows with it, when this complexity exceeds its threshold (i.e. the magnitude of activities is beyond what the set system can manage), the management tools and methods begin to fail [5]. At this stage, the problem is partly due to the limited integration of organizational, social and human activity into the engineering activity, the tools and methods for managing engineering activities tend to be rigid and do not adapt appropriately to complex engineering environments [5, p. 678]. These methods can be characterized as comprising a series of pathways followed in a step-by-step progression from one fixed point to a predefined end. NASA’s Challenger disaster is an example, where it was first thought to be an error in the O-ring design that caused the tragedy but was later discovered that it was because of a managerial decision [6]. To address this, a soft systems perspective on engineering activity that views organizational decision-making as a purposeful human activity is proposed.
1.2 Literature In a bid to catch up with an evolving world of globalization and dynamic change, technical organizations are in search of improved management approaches to decisionmaking. Hayward and Preston [7, pp. 175–179] argue that linear rational models do not satisfactorily perform for businesses operating under ambiguity and rising
1 Towards the Development of a Decision-Making Framework …
3
pressure. Nutt [8] reported that rational decision-making model strategies struggle to reach the 50% success mark in managerial decision-making. Because many of the requirements for bounded rationality are becoming difficult to satisfy [9, 10] suggest that organizations have begun to embrace holistic approaches to human or non-programmed decisions. Much of the existing reviewed literature [8, 11–14] take a deterministic (mathematical approach) approach to addressing decision-making, treating humans as rational beings and decisions as an asset. Mohsen et al. [15, p. 6] adopted a hybrid method of multi-criteria decision-making and analytic hierarchical process (AHP) in a study to rank power supply systems by a government organization. Seyman et al. [11, p. 211] also adopted the multi-criteria decision-making method (MCDM) for the evaluation and selection of relevant equipment for corresponding athletes. More methods identified within literature used to address decision-making in a technical organization include iterative techniques [16], linear optimization method [17], probabilistic method [18] etc. With these methods, drawback such as being stuck in ‘local optimum’ (either minimal or maximal objective solutions) was identified [19]. Although Mohsen et al. [15, p. 3] identified the ‘meta-heuristic’ approach to mitigate this drawback, it still does not address the human aspect that contributes to sub-optimal decisions. It appeared that assumptions were being made about how decisions should be treated, without necessarily engaging the perspectives of those individuals actively involved in it. In this paper, the process involved in making decisions in the chosen technical organization is examined as a system, specifically a human activity system that can be conceptualized and designed to achieve a low entropy (minimal organizational disturbance), to identify areas of concern within it, and what can be done about them. Technical organizations such as the Nigerian Space Agency are drawn towards mathematical or evidence-based decision-making tools and methodologies, but personal experience and recent literature show the importance of the human activity system approach [5, pp. 678–680], which is comparatively neglected or tacit in practice [20, p. 4].
1.3 Methodology (Interpretative Approach to Decision-Making) It is important to examine the author’s interpretative and epistemological perspectives to the approach adopted in this paper. Spender [21, p. 161] in his discussion of epistemological considerations, suggested that a pragmatic view (dealing with practical evidence rather than just theory) is most applicable/practicable with regards to knowledge as practice, based not just on the definition of the focus of interest (in this case ‘decision-making’) but on its use (the practice). From a different perspective on social reality, interpretivism is more useful when investigating and thinking about situations/problems of human activity. The emphasis here is on the deliberate
4
A. Kila and P. Hart
individual action, access to the meaning which is subjective and attributed by the individual to the action, and the effect it has on others. This is in contrast to universally applicable rules emerging as manifestations of society and restraining or influencing the behaviour of individuals. Based on interpretivism, social reality is described as knowledge from a particular point of view [22], general laws with regard to social settings and in contrast to Durkheim’s view (functionalist approach) are not ontologically real, instead are ways of thinking about reality. If we accept the notion of ‘subjectivity’, that social reality is based or developed upon an interplay of purposeful individual action, to which varying meanings are made distinctive by the observers and actors, that individuals could change their minds regarding sense-making and different perspectives on what reality is, it suggests that the methodologies which are underpinned by reductionist or positivist philosophical thinking are not appropriate for inquiring about it. Methodologies that can be used to explore individuals’ perceptions of their own worldview are apparently suitable. In Blackburn [23], Dilthey proposed a method which was tagged ‘Verstehen’ by which one can understand the meaning of words or even action, this is achievable by relieving the individual’s mental state which is inferred by the comparison based on the knowledge or understanding of the inquirer’s experience [23, p. 100]. This method was further developed to incorporate social and historical components in the inquiry of ‘objectification’ of the mind of individuals as culture. This approach was used in the ‘hermeneutic cycle’ which is an interactive and interpretive inquiry of social reality. As practised in this context, the hermeneutics tradition is about to the philosophy of interpretation and the careful consideration of text, culture and tradition to derive meaning [23, p. 165]. If what is to be inquired about is influenced by physical, traditional and cultural elements, there cannot be a single and universal account of social reality [24, p. 143]. This paper favours this approach and suggests that social reality should be studied by determining the subjectivity regarding the meaning individuals’ link with their action by placing oneself as an observer in the inquiry process. Doing this, the observer gains a deeper layer of subjective understanding, and this can be recognized. This approach is interpretivism, the analysis of the meaningful action, interpreted by the observer in terms of the means-ends scheme of rationality’ [25, p. 267]. The term ‘Weltanschauung’ or ‘World view’ encapsulates this subjective perspective of the individual as an important element of this approach; it is described as a ‘comprehensive apprehension or conception of the world most importantly from a specific standpoint’ [26]. This can also be said to be a set of beliefs and ideas through which an individual or group of people interprets the world. It is suggested that Weltanschauung comprises of three key elements, namely individuals’ personal representation of the world around them, their ideals regarding the conduct of life and their evaluation of life. This idea of Weltanschauung is commonly used in interpretative research which helps to represent each participant’s position [27]. In sum, interpretivism views social reality as constructed by each individual, who can influence and can be influenced by others.
1 Towards the Development of a Decision-Making Framework …
5
1.3.1 Epistemological Basis of Interpretivism Checkland [28, 29] considered the application of systems thinking to organizational and management research. Checkland [29] reviewed the positivist approach and the contribution of systems engineering to organizational research, and he concluded that the approach was inadequate to addressing the issues. He raised issues of concern with the ontological perception of systems existing in the real world instead of the epistemological perception and ways of thinking about reality. He also highlighted that there was fixated focus on objective problem definition without necessarily considering the difficulty of defining problems within the context of a human setting, the account of not fundamentally considering human activities as the primary goal and disregarding the social and cultural influences on the situation. The idea that organizational issues can be explored and formulated in this way is the distinctive factor of ‘hard systems’ as against ‘soft systems’ thinking [29, p. 138]. Humans are said to have many qualities besides rationality, Checkland [29, p. A7] describes ‘human activity systems’ as the human situation(s) where people are attempting to take purposeful action which is meaningful to them, that these situations are messy and complex. Checkland [29, p. 150] also examined the limit to which systems ideas could be used for assessing ill-defined problems in organizational research or social systems. Through different inquiries, key realizations were made: Thinking about the problem to be solved in a situation, as for a ‘hard system’, was not as useful as recognizing the potential ‘conditions to be alleviated’. These, however, were identifiable via the consideration of subjective perception which may change through time, this may be conceptualized as a ‘human activity system’. This interpretivist approach and epistemological perspective were expressed in soft systems methodology (SSM) as a set of ‘principles of method’ in order to aid the thinking about the situation the inquirer is interested in [29]. These systems’ ideas are underpinned by the phenomenological tradition, with its emphasis on the mental processes of the inquirer/observer, followed by the ideas of their worldview in relation to their subjective positions.
1.3.2 Vickers’ Notion of an Appreciative System Vickers’s interest in how policy formulation is affected led to the postulation of the term ‘appreciation’, which refers to how people gain understanding about a situation of concern which enables appropriate policies to be designed and relevant action is taken [30, 31, p. 51]. Also, the notion of ‘appreciation’ appears not to only relate to policy-making but to social processes in general [28]. The appreciative inquiry method is presented as an attempt to produce an approach that operationalizes this Vickers’ notion and the process of appreciation [30]. Vickers was considered critical of what he categorized as mathematical and mechanical models of decisionmaking which focus on action at the expense of judgement [1, p. 107]. The idea of
6
A. Kila and P. Hart
a cycle of learning by a group or individual within an organization was integral to Vickers’s work as a result of being in the world. It provides an explanation of how an individual participates in social reality, learns to form it and, in turn, influences further participation [30, pp. 54–56]. Thus human history is a two-stranded rope; the history of events and the history of ideas develop an intimate relationship with each other yet each according to its own logic and its own time scale; and each condition both its own future and the future of the other. [30, p. 54]
The learning cycle components were broken down into these activities: perceiving, judging and acting [32]. This learning cycle is constrained by a phenomenon described by Vickers as ‘the readiness of the mind to perceive’ due to the previous history of the appreciative cycle [33, p. 143]. In the development of the theory of appreciative systems, it was noted that Vickers did not represent this idea diagrammatically nor as a model, but Casar and Checkland [32, p. 3] argued that it would offer the richest for expressing the idea and making it operational, and as well as subjecting it to scrutiny. The appreciative system model starts with the illustration of the interaction of events and ideas in a flux unfolding through time. Appreciation in this context is achieved from the ability to select and choose. It (appreciation) perceives reality and then makes corresponding judgements before contributing to the stream of ideas which leads to actions that become part of the flux of events. It is a recursive loop: the flux of events generates appreciation, and this appreciation contributes to the flux of events. The epistemology of judgement management is one of the relationship management rather than goal-seeking, and both value and reality judgements stem from standards of both value and fact: standard of what is and standard of what is bad or good [34]. The notion of Vickers appreciation is in contrast to the idea of seeking to achieve a common goal. Vickers [30] argues that there is no ultimate source of standards as the source of it is the previous history of the system itself. The current operation of the system may change its current and future operation via its effect on the standards. The claim by Vickers [30] is that he had constructed an epistemology that can provide convincing accounts of the process by which individuals and groups deliberate and act. A model of an appreciative system (developed by: Checkland and Casar [32]) is shown in Fig. 1.1: This paper adopts an epistemological approach, which follows Churchman’s [35] use of systemic thinking idea. Instead of viewing the situation of concern as a set of systems, the process itself is considered to be a system.
1 Towards the Development of a Decision-Making Framework …
7
1.4 Experimental Work—Contribution to Knowledge (Decision-Making as a Human Activity System—D’MHAS) Obtaining valid inputs and validating outputs are key steps in modelling and in simulation [36]. This informed the thinking behind the study activities and validation method in this paper. The approach to the participants/respondents’ validation is well documented in practice [36, 37].
In this section, the development and operational principles of D’MHAS are discussed and presented as the contribution to knowledge. Decision-making as a human activity system (D’MHAS) shown above in figure is a representation of the human behavioural activity of decision-making (concrete systems/happenings and ideas (behaviours, processes, structures and meanings), as discussed in Sect. 1.3.2 of this paper) taken place through time (T) driven by a purpose or shared interest to achieve a common goal. It is a way of operationalizing Vickers’ idea of an appreciative system, which was first modelled by Checkland and Casar [32]. The three activities, namely, perceive, judge and discern, which make up the process of appreciation [32] and standards of fact and value, are discussed in the following section. The operational principle of the D’MHAS is also provided. The standards and activities represented in the circle which leads onto the discern activity represent the requisite variety required to meet the varieties generated by the system, that is process of decision-making. These standards can be varied depending on the organization requirements. It is important to note that these activities are presented as an idea to frame the process of intuition-led decision-making and not a fixed model. To make sense of the D’MHAS model it is imperative to define it is operational principles and the theoretical underpinnings.
1.4.1 Perceive In the D’MHAS model, the perceive is the pre-processing step that depends on the intuition of the expert. It is conceptualized on the model as the monitoring of the signs from the flux of events through time. The activity ‘perceive’ helps with determining the preliminary decision-making point with regard to the type of decision (routine or critical) as conceived by the expert (technical manager, as in the case of NASRDA). Decisions perceived as routine are everyday structured and defined tasks which have low entropy and maintain a stable flux (see in Figs. 1.1 and 1.2). From the study conducted at the Nigerian Space Agency, the participants described decisions perceived as critical are those decisions with a core implication for the management and finance, thereby having high entropy/concern/disturbance that requires further consideration.
8
Fig. 1.1 A model of appreciative system. Source Checkland and Casar [32]
Fig. 1.2 Decision-making as a human activity system (D’MHAS)
A. Kila and P. Hart
1 Towards the Development of a Decision-Making Framework …
9
An important aspect of this model is addressing decisions recognized as critical by the technical manager.
1.4.2 Judge In the judge activity, the confirmation of the categorization of decisions that are critical or routine is made. It is about judging perceptions/concerns by integrating it with the project deliverables and organizational mandate. Once the decision is confirmed as critical, it goes to the next activity tagged as discern. Based on the description of critical decisions by the participants, one can deduce that it is a novel decision about a situation not encountered before which requires a careful thought on the implications of possible courses of action. The process can be reapplied once the situation is recognized again. The description of critical decision as informed by the participants applies to the perceive activity of this model (See Fig. 1.2). After a technical manager perceives a critical decision, this is processed further in the judge activity by re-evaluating it to determine if it is actually a novel decision or a recognizable situation (critical decision processed in the past) that the previous process can be reapplied. In this situation, the decision can be processed like a routine decision, and a decision on action can be taken without it going into the discern activity. Before discussing discern activity in detail, it is important to note that the activities: judge and discern are joint vital in an appreciative system. In D’MHAS, they are both involved nine to address critical decisions. If a decision is perceived as routine, it will still need to be carefully judged as such before a decision on action is taken, if not it will go to the discern stage for further consideration. At this stage, it is evident that the process between these activities focuses more on judgement managing and not just goal-seeking. A more detailed breakdown of the activity discern from Fig. 1.1 is discussed below:
1.4.3 Discern As established Vickers [30] and espoused by Checkland and Casar [32], the ‘discern’ activity in the model above is a key element to the mindful intuition the framework seeks to achieve before a decision is made. It is an activity that helps to envisage what actions are required and how to implement them before a decision to act. As discussed in the operational factors of D’MHAS, the inquiry questions are used to address critical-decision judgement-making. It is after these two activities (Judge and Discern) that a decision on action and action is made. Decision on action refers to how to act to maintain the relevant terms of the regulatable relationships or resolve from the discern activity and sub-activities, especially the requisite variety. The decision to act can be referred to as a way of
10
A. Kila and P. Hart
granting causal powers, while the action is an application of causal powers to alleviate concerns in the flux of events and ideas [5, p. 685].
1.4.4 Operational Factors of D’MHAS Judgement-making and action-taking are guided by important responses to signs representing operational factors in the perceived human activity system [5, p. 684]. The operational or influencing factors represent the standards of fact and value (the appreciation shown in Fig. 1.1) and can be updated through feedback received from the thinking phase that allows for epistemological pluralism that provides for several ways of knowing or inquiry before taking action, as shown in Fig. 1.1. When a concern arises, the thinking about the conceptual and concrete systems (ideas and events) in Fig. 1.3 allows for the accommodation of variability of opinions, judgement, purposes and operational practices; it is a way of managing judgements while aligning the operational systems to serve the organization’s purpose. The thinking phase is where the analysis of the conceptual and concrete systems interplay is conducted. The doing phase is where the necessary changes to bring the system back to stability is done. As an example to operationalize the operational principles of D’MHAS, previous study findings at the Nigerian Space Agency are used as key activities to help discern what needs to be considered before action is taken. For the Nigerian Space Agency (NASRDA), what can be considered organizational principles or considerations to follow before a technical manager takes a critical decision are? This line of reasoning and questioning was also used in the validation stage of the study and model. These are summarized below: ‘Technical know-how’ and ‘experience’ of the technical manager (decisionmaker) generate effective results. Also, ‘Integration’ with other departments through clear and transparent communication confirms and conveys intent and respect for key stakeholders. Seeking a superior officer’s point of view achieves managementlevel support and provides a requisite for efficient and effective decisions based on experience.
Fig. 1.3 A process that enables epistemological pluralism
1 Towards the Development of a Decision-Making Framework …
11
The table below shows inquiry questions for critical-decision judgement-making as key activities that map to each of the organizational principles or considerations (e.g. NASRDA’s technical managers). This idea of developing inquiry questions as a metric of key activities of concern was also used by Calvo-Amodio [5, pp. 684–686]. This idea of developing these inquiry questions for key activities of concern was first used by Calvo-Amodio [5, pp. 684–686]. In the contexts of decision-making and action, what is taken to be true and right implies value judgements; the idea of the consideration of operational factors suggests rationality but this is difficult to achieve when the values of the parties involved differ. Ideas from critical systems thinking and critical system heuristics which is about promoting reflective professional practice, and discursive understanding of the systems approach help to address this [38, p. 325]. When it comes to judging and issue of concern, the underlying concept of rationality and expertise is monological (leading to a monologue of experts) instead of dialogical [38 , p. 325]. Although the D’MHAS model involves discourse amongst those considered experts in the technical organization, dialogue rather than expertise is the driving factor. This is to address critical decisions which have implications on the collectives. The inquiry questions (Table 1.1) are a way to establish a normative foundation (establishing through behaviour) of knowledge about the critical decision and action to be taken. It should be noted that these principles refer to organization’s operational factors, which may differ with different organizations. The study findings from the Nigerian Space Agency were the factors used as an example in this paper. The following sections further discuss the D’MHAS principles and the set of activities involved. Table 1.1 Table showing inquiry questions for critical-decision judgement-making as a metric that maps to each of the Inquiry questions
Comments and questions (information)
What needs to be correct or valid for critical decision to be addressed?
What is important to know about the areas of activities where the critical decision has arisen? What type of change could be needed to alleviate the critical decision and thus improve the performance of the key activity?
Who cares about this critical decision and why?
Why is the critical-decision meaningful? How does the issue relate to organization and/or social dimensions in the flux of ideas and events?
What is good enough for this critical decision What is the best balance to solve the issue to be satisfied? around making the critical decision and the burden the solution would place on the organizational flux of events and ideas? Does the potential solution address the critical decision in alignment with the D’MHAS model?
12
A. Kila and P. Hart
1.4.5 Validation of the Model—D’MHAS As stated by Wilson [39, p. 8] and espoused by Harwood [40, p. 754], a model must be useful. The essence of the D’MHAS framework is to provide a model as a guide for decision-makers, especially in technical organizations where critical decisions can have 12 significant consequences. Having developed a model for social processes in a participatory context (D’MHAS), the agreement of the participating stakeholders on the validity of the model can be a reasonable indicator of the validity of the model [41, p. 6]. Also, obtaining valid inputs and validating outputs are key steps in modelling and in simulation [36]. This informed the thinking behind the study activities and validation method in this paper. The approach to the participants/ respondents’ validation is well-documented in practice [36, 37, 41, 42]. The validation of the developed D’MHAS model was conducted with participants within the Nigerian Space Agency. This is done through feedback based on its use in practice. The approach adopted by the author in the development of the D’MHAS is a simultaneous way of doing a participatory research/inquiry with the experts and taking action, linked together through critical reflection. While more details on the feedback on this are presented in appendix, an excerpt from the feedback of a participant is that: the model D’MHAS is very useful because key elements of the model are utilized in my office where critical decisions are discussed in my office before actions are taken.
1.4.6 The Concept of Requisite Variety What does the Law of Requisite Variety (LRV) contribute to systems practice? At a superficial level, it reminds the practitioner or researcher when carrying out an investigation into a problem area that any changes to the situation will mean simplification (or added complexity) of the means of control. [24, p. 120]
The law of requisite variety refers to the notion that only variety can absorb/nullify variety [43, p. 207], and experts/managers cannot effectively manage everything within a system, they need to choose what to manage effectively [44, p. 53; 45, p. 142]. This means that any variation in the situation should be addressed by an equivalent variety in the model/solution devised to address the situation. Beer [46, p. 53] proposes that since managing or controlling the whole system is impossible, we should only attempt to control some of the system. He argues that thinking in terms of heuristics rather than algorithms is a key way of coping with increasing variety [44, p. 53]. This line of thinking echoes Vickers’s notion of maintaining relationships in an appreciative setting. From the literature on organizational research [47, 48], the concept of requisite variety has been used in different ways, which does not necessarily conform with an interpretative research framework of ideas. The law of requisite variety often informs the argument that top management teams with greater experiential variety
1 Towards the Development of a Decision-Making Framework …
13
draw upon a rich reserve of cognitive resources and behavioural routines to enact adaptive competitive repertoires that enhance firm performance [47, p. 545]. In this case, experiential variety refers to top management (superior officers) high levels of diversity and experience [47, p. 547], and that they can better comprehend the full complexity of the competitive environment and enact adaptive repertoires that increase performance. Studies such Bell et al. [49] and Connelly et al. [48] also used requisite variety to explain why top management team variety shapes competitive repertoires and, in turn, performance. The concept of requisite variety was useful in the conceptualization and development of a decision-making model to frame the idea of accommodating different yet relevant views and judgement in addressing critical decisions. It is a way of enhancing and managing judgements.
1.4.7 D’MHAS as a Relationship Maintaining Model Vickers [50] pointed out that each human individual’s experience of life is unique. Humans make sense of the world around them through unique perception and sensemaking processes: As they reflect upon their experience, they gradually develop individual sets of values and goals and these, in turn, colour their future perceptions so that they notice some things but screen out others [24, p. 36]. Vickers termed this phenomenon—‘Appreciative settings’. The author conceptualizes D’MHAS as a relationship-maintaining framework that retains the open systems perspective but represents a shift in emphasis from optimization and the maintenance of equilibrium between a system and its environment towards learning and adapting to maintain a relationship with a constantly changing environment. The framework (D’MHAS) attempts to represent the system of interest (decision-making within the technical organization) and its environment from the perspective of those actively involved within the problem situation. The majority of models of organization have been developed from within a framework of thinking which is judged to be functionalist [51]. The development of D’MHAS is based on the consideration of a technical organization as something ever-changing, not a fixed entity, and whose survival depends upon maintaining its relationship with its environment.
1.5 Discussion The essence of the D’MHAS framework is to provide a model as a mental guide for decision-makers, especially in technical organizations where critical decisions can have dire consequences. Notable literature on the application of Vickers’ idea of an appreciative system in organizations [5, 52] shares some similarities with D’MHAS. PHAS (Purposeful
14
A. Kila and P. Hart
Human Activity System, [5]) does not address decision-making and how the dichotomy between routine and critical decisions can be addressed, especially within a technical organization. It also does not make provisions for how to accommodate different opinions or judgements when dealing with critical decisions. In ‘discern’ element of D’MHAS, there are different factors identified which will influence an intuition-led decision. This means that, for their intuition to be properly informed, there is a need to consider these factors that highlight different aspects of the decision to be made, and the decision-making process needs to take this into account which reintroduces the notion of requisite variety. From the findings of the studies, the interplay between experience and authority is further examined within the organizational power-structure based on the technical managers’/experts’ views. One can consider the superior officer’s view/opinion as a form or manifestation of casual or soft power, but to the participants, it presents the decision-maker with the relevant options based on experience and implications on other departments of the organization. The idea of the superior’s point of view provided not only an option of judgement for the decision-maker but also a layer of security and the feeling of being supported regardless of the outcome of the decision. While this situation might present itself as an influence or a soft or causal power play, it can be viewed as be requisite for efficient and effective decision within the organization. That is, a measure to enhance the various judgements available at a particular time since superiors are usually more experienced with envisaging possible outcomes of a decision and having an idea of the corresponding impact on integrated departments. This does not necessarily mean what the superior says must be adhered to, rather it helps improve the options (varieties) available to the technical manager/ decision-maker from an experienced position rather than a position of authority. It also resolves the issue of concern amongst the technical managers who worry that their director/superior would not support an individualistic action on critical decisions even if the technical manager feels it is the right call. For example, having perceived and judged a decision to be critical, it is taken into the discern activity where key sub-activities are considered, and the possibility of unifying understanding, accommodating different ideas on judgements amongst key stakeholders involved (in integration with other department sub-activity: other technical managers and superior officer) as shown in the D’MHAS diagram (Fig. 1.2). After the whole process of consideration and regulation, a specific or consensus judgement is made on a decision. For a consensus judgement which is due to its impact across other relevant departments, the decision must be adhered to, and driven by a common purpose to achieve a common goal. An example of this type of decision is to adopt carbon fibre as the material of choice for a rocket or a specific propellant to achieve a certain mass; these will both impact the structures department, propellant, and rocket fuel department, thrust engines department, etc. The dichotomy between routine and critical decisions brings to the fore the delicate boundary of autonomy that the technical managers/decision-makers possess. While a significant level of autonomy is required for the technical managers to effectively make decisions and discharge their duties, there is also the need to understand that it
1 Towards the Development of a Decision-Making Framework …
15
is not absolute and there is a need to unify ideas, judgements and understanding of critical decisions that could have a major impact on the overall project. The author denotes this phenomenon as ‘autonomy of function’. This is different from the autonomy of purpose [46]. Autonomy of purpose conceptualizes the system as a 15 purposeful one, in which the actors are driven by a shared interest, while the autonomy of function addresses the responsibility that comes with their autonomy at a personal level in relation to other actors (e.g. a superior officer).
1.6 Conclusion D’MHAS was developed to aid mindful intuitions of experts in the domain of technical organizations where critical decisions are novel decisions about a situation not encountered before, which requires careful thought on the implications of possible courses of action, and the process can be reapplied once the situation is recognized again. Key ideas behind the usefulness of the model are mindful intuition, dialogue and accommodation of relevant views. The model might not be able to cope with a disagreement or sub-optimal decision insisted upon for political reasons. Testing the developed model (D’MHAS) in another similar or not-so-similar technical organization to further examine how it works. It would also be useful to knowhow the model would accommodate experts that do not consider mindful intuition or make use of an independent declarative approach to decision-making (not up for debate approach). Also, issues are not always transparent with engineering managers as they are too near to them. Common human behaviours such as reliance on past experience and cultural bias may also be a barrier in complex decision-making. These considerations would be interesting to explore in future research.
Appendix Some participants’ feedback forms
16
A. Kila and P. Hart
1 Towards the Development of a Decision-Making Framework …
17
18
A. Kila and P. Hart
References 1. Blunden, M.: Vickers’ contribution to management thinking. J. Appl. Syst. Anal. 12, 107–112 (1985) 2. Guo, K.: DECIDE: a decision-making model for more effective decision making by health care managers. Health Prog. 39(3), 133–141 (2020). https://doi.org/10.1097/HCM.000000000000 0299 3. Mumford, E.: The story of socio-technical design: reflections on its successes, failures and potential. Inf. Syst. J. 16(4), 317–342 (2006) 4. Ramkhelawan, T., Barry, M.L.: Leading a technical organization through change: a focus on the key drivers affecting communication. In: 2010 IEEE International Conference on Industrial Engineering and Engineering Management, pp. 1386–1390. IEEE (2010) 5. Calvo-Amodio, J.: Using principles as activity drivers in human activity systems. Syst. Res. Behav. Sci. 36(5), 678–686 (2019) 6. Vaughan: The challenger launch decision: risky technology, culture, and deviance at NASA. University of Chicago Press (1996). http://search.ebscohost.com/login.aspx?direct=true&db= edsacl&an=edsacl.miu01000000000000003603043&site=eds-live 7. Hayward, T., Preston, J.: Chaos theory, economics and information: the implications for strategic decision-making. J. Inf. Sci. 5(3), 173–182 (1998) 8. Nutt, P.C.: Surprising but true: half the decisions in organizations fail. Acad. Manag. Executive 13(4), 75–89 (1999) 9. Langley, A., Mintzberg, H., Pitcher, P., Posada, E., Saint-Macary, J.: Opening up decision making: the view from the black stool. Organ. Sci. 6(3), 260–279 (1995) 10. Eisenhardt, K.M., Zbaracki, M.J.: Strategic decision making. Strateg. Manag. J. 13(Special Issue), 17–37 (1992) 11. Seyman, B.-U., Mustafa, S., Merve, V.-A.: Multi criteria decision making approaches for evaluation of equipment selection processes in rowing. J. Phys. Educ. Sports Sci. 19(2) (2021) 12. Mahmoudi, M., Pingle, M.: Bounded rationality, ambiguity, and choice. J. Behav. Exp. Econ. 75, 141–153 (2018). http://search.ebscohost.com/login.aspx?direct=true&db=ecn&an= 1719963&site=eds-live 13. Ilori, M.O., Irefin, I.A.: Technology decision making in organisations. Technovation 17(3), 153–160 (1997). https://doi.org/10.1016/s0166-4972(96)00086-7 14. March, K., Weissinger-Baylon, R.: Ambiguity and Command: Organizational Perspectives on Military Decision Making, pp. 11–35. Addison Wesley Longman (1986) 15. Mohsen, R., Javad, S., Hossein, K., & Ali, M.: A new hybrid decision-making framework to rank power supply systems for government organizations: a real case study. Sustain. Energy Technol. Assess. 41 (2020) 16. Ashok, S.: Optimised model for community-based hybrid energy system. Renew. Energy 32, 1155–1164 (2007) 17. Bartolucci, L., Cordiner, S., Mulone, V., Rossi, J.L.: Hybrid renewable energy systems for household ancillary services. Int. J. Electr. Power Energy Syst. 107, 282–297 (2019) 18. Tina, G., Gagliano, S., Raiti, S.: Hybrid solar/wind power system probabilistic modelling for long-term performance assessment. Sol. Energy 80(5), 578–588 (2006) 19. Tezer, T., Yaman, R., Yaman, G.: Evaluation of approaches used for optimization of stand-alone hybrid renewable energy systems. Renew. Sustain. Energy Rev. 73, 840–853 (2017) 20. Polanyi, M.: The tacit dimension. University of Chicago Press (2009). http://search.ebscohost. com/login.aspx?direct=true&db=cat01619a&an=up.1232260&site=eds-live 21. Spender: Organizational learning and knowledge management: whence and whither? Manag. Learn. 39(2), 159–176 (2008). https://doi.org/10.1177/1350507607087582 22. Morrison, K.: Marx, Durkheim, Weber: Formations of Modern Social Thought, vol. 2. Sage Publications, London (2006) 23. Blackburn, S.: The Oxford Dictionary of Philosophy, 2 rev. edn. Oxford University Press (2008). https://doi.org/10.1093/acref/9780199541430.001.0001
1 Towards the Development of a Decision-Making Framework …
19
24. Stowell, F., & Welch, C.: The manager’s guide to systems practice: making sense of complex problems. Wiley, Croydon (2012) 25. Checkland, P., & Tsouvalis, C.: Reflecting on SSM: the link between root definitions and conceptual models. Syst. Res. Behav. Sci. Off. J. Int. Feder. Syst. Res. 14(3), 153–168 (1997) 26. Merriam-Webster dictionary. Retrieved 11th November 2012 from http://www.merriam-web ster.com/ 27. Checkland, P.: From framework through experience to learning: the essential nature of action research. Inf. Syst. Res. 397–403 (1991) 28. Checkland, P.: Systems Thinking, Systems Practice. Wiley, Chichester (1981) 29. Checkland, P.: Soft Systems Methodology: A 30-Year Retrospective. Wiley, Chichester (1999) 30. Vickers: The Art of Judgement. Chapman and Hall (1965) 31. Smith, S.A.: Modelling the discharge decision-making process in the domain of mental health care. Ph.D. dissertation, University of Paisley (2001) 32. Checkland, P., & Casar, A.: Vickers’ concept of an appreciative system: a systemic account. J. Appl. Syst. Anal. 13(3), 3–17 (1986) 33. West, D.: The appreciative inquiry method: a systemic approach to information systems requirements analysis. In: Stowell, F.A. (ed.) Information Systems Provision: The Contribution of Soft Systems Methodology, pp. 140–158. McGraw-Hill, Maidenhead (1995) 34. West, D.: Knowledge elicitation as an inquiring system: towards a ‘subjective’ knowledge elicitation methodology. J. Inf. Syst. 2, 31–44 (1992) 35. Churchman, C.W.: The Design of Inquiring Systems: Basic Concepts of Systems and Organization. Basic Books, New York (1971) 36. Axelrod, R.: Advancing the art of simulation in the social sciences. In: Conte, R., Hegselmann, R., Terna, P. (eds.) Simulating Social Phenomena. Lecture Notes in Economics and Mathematical Systems, vol. 456. Springer, Berlin (1997). https://doi.org/10.1007/978-3-662-03366-1_ 2 37. Slettebø, T.: Participant validation: exploring a contested tool in qualitative research. Qual. Soc. Work. 20(5), 1223–1238 (2021). https://doi.org/10.1177/1473325020968189 38. Ulrich, W.: Beyond methodology choice: critical systems thinking as critically systemic discourse. J. Operat. Res. Soc. 54(4), 325–342 (2003) 39. Wilson, B.: Systems: Concepts, Methodologies and Applications. Wiley, Chichester (1984) 40. Harwood, S.A.: The management of change and the Viplan methodology in practice. J. Oper. Res. Soc. 63(6), 748–761 (2012) 41. Troitzsch, K.G.: Validating simulation models. In: Proceedings of the 18th European Simulation Multiconference, pp. 98–106. SCS, Erlagen, Germany (2004) 42. Bharathy, G.K., Silverman, B.: Validating agent based social systems models. In: Proceedings of the 2010 Winter Simulation Conference, Baltimore, MD, USA, pp. 441–453 (2010). https:// doi.org/10.1109/WSC.2010.5679142 43. Ashby, W.R.: An Introduction to Cybernetics. Chapman & Hall (1961) 44. Beer, S.: Brain of the Firm, 2nd edn. Wiley, Chichester (1981) 45. Kila, A., Hart, P.: Towards building an intelligent system based on cybernetics and Viable system model. Science 1(40), 141–163 (2019). https://doi.org/10.1126/science.49.1259.170 46. Beer, S.: Diagnosing the System for Organizations, Wiley, New York (1985) 47. Fox, B.C., Simsek, Z., Heavey, C.: Top management team experiential variety, competitive repertoires, and firm performance: examining the law of requisite variety in the 3D printing industry (1986–2017). Acad. Manag. J. 65(2), 545–576 (2022). https://doi.org/10.5465/amj. 2019.0734 48. Connelly, B.L., Tihanyi, L., Ketchen, D.J., Carnes, C.M., Ferrier, W.J.: Competitive repertoire complexity: governance antecedents and performance outcomes. Strateg. Manag. J. 38, 1151– 1173 (2017) 49. Bell, S.T., Villado, A.J., Lukasik, M.A., Belau, L., & Briggs, A.L.: Getting specific about demographic diversity variable and team performance relationships: a meta-analysis. J. Manag. 37, 709–743 (2011) 50. Vickers, G.: A classification of systems. General Syst. 15, 3–6 (1970)
20
A. Kila and P. Hart
51. Winograd, T., Flores, F.: Understanding Computers and Cognition: A New Foundation for Design. Ablex, Norwood, NJ (1986) 52. Smith, S.A.: Modelling complex decision-making: contribution towards the development of a decision support aid (2001)
Chapter 2
Multi-attention TransUNet—A Transformer Approach for Image Description Generation Pradumn Mishra, Mayank Shrivastava, Urja Jain, Abhisek Omkar Prasad, and Suresh Chandra Satapathy
Abstract Image description generation is a task that requires the translation of images into sentences that describe the image. In deep learning terms, this task is described as an efficient connection of vision and language. This paper proposes an efficient and fast transformer-based approach for image description generation using TransUNet and multi-attention decoder. The vision-based TransUNet extracts local as well as global contexts of the image, thus providing detailed embedding information for the decoder architecture to predict words using context. This helps the multi-modal architecture to understand the relationship between the image and the sentences, thus performing the task efficiently and faster than other transformer-based counterparts. The experimental results have been compared to existing published works demonstrating that the approach used in this paper has achieved state-of-the-art (SOTA) performance in terms of multi-model architecture for the given task.
2.1 Introduction The task of generating descriptions of images requires the integration of vision and language, which has witnessed a significant development of effective algorithms over the past few years. The recent advancement in deep learning has inspired researchers to develop various multi-modal architectures that integrate computer vision with natural language processing (NLP). Most of these techniques revolve around CNNRNN or encoder–decoder-based architectures involving a visual attention mechanism. Early research on the task revolved around pre-trained convolutional neural networks (CNNs) for extracting level features [1] of the image (encoder) and recurrent neural networks (RNNs) to perform the decoder task [2]. These techniques tend to cover region-level features which do not cover the entire image and miss out on fine-grained information about the images. On the decoder part, LSTM [3] with a P. Mishra (B) · M. Shrivastava · U. Jain · A. O. Prasad · S. C. Satapathy School of Computer Engineering, KIIT Deemed to be University, Bhubaneshwar, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_2
21
22
P. Mishra et al.
Fig. 2.1 Proposed high-level architecture of the system
soft attention mechanism [4] was used in the past few years, but it fails in the training efficiency and has limited expression ability. Since the introduction of transformers, researchers started inducing MSA into the existing LSTM decoder, they could not reach SOTA performance as compared to the use of transformer decoders. In this paper, a multi-modal architecture based on the transformers is proposed for the description generation of the images. A modified TransUNet [5] is used as an encoder that inherits from transformers and UNet [6]. This enables fine extraction of image features as it recovers localized spatial information. The transformer layer of the TransUNet encodes the tokenized image patches from the CNN. This enables better edge detection and extraction of the global contexts from the images. A multiattention decoder architecture has been implemented for the decoder, closely resembling the transformer decoder proposed by Vaswani et al. [7]. The whole high-level architecture of the model can be visualized in Fig. 2.1. The proposed architecture has been trained on the Flickr 30k dataset and evaluated on the BLEU and CIDEr metrics. The encoder for image segmentation has been evaluated on Jaccard metrics. The rest of the paper is well-organized and is divided into the following sections: (1) Section 2.2 contains the previous related work and research studies in detail. (2) Section 2.3 provides our proposed method with complete architecture at a low level. (3) Section 2.4 shows the training details, results evaluated under different metrics and comparison with other architectures. (4) Section 2.5 opens up more future aspects of the TransUNet-based Model for Image Description.
2.2 Related Work The existing research on image description generation can be broadly categorized into CNN-RNN and CNN-transformer-based architectures. Luo et al. [8] and Xu et al. [4] have adopted a faster R-CNN encoder to encode images into features at the
2 Multi-attention TransUNet—A Transformer Approach for Image …
23
grid level. The pre-trained VGG-16 [9] was used to encode images with fixed-length features, and then LSTM, with some attention [10], was used to generate descriptions. The R-CNN, along with the ResNet-101 pre-trained model [11], was faster than its previous counterparts in encoding images in a similar correlation to human visual behaviour. These models had significant disadvantages regarding training efficiency and ability of expression. The lack of fine-grained information on the regional level of the images and huge training time remains a disadvantage of the R-CNN-based models. Using LSTM as a decoder created a lack of efficiency in these models. A further improvement in the encoder part was the introduction of graph convolution networks [12], which integrated semantic and spatial relationships [13] between image objects. The very first transformer-based image captioning was proposed by Herdade et al. [14] in 2019, leading to a Relation with object transformer and the introduction of spatial information of regions within images. In 2020, Pan et al. proposed a new attention block (X-Linear Attention) [15] to capture the second-order interaction of single and multi-modal architectures. He also integrated his model into the encoder and decoder. Cornia et al. designed mesh architecture [16] of encoders and decoders that focus on both high-level and low-level feature maps of the images in the encoder. A similar result was achieved by Luo [8] via his proposed DLCT, which was able to process high-level and grid-level information from the images. After the introduction of BERT, researchers leveraged it to integrate vision and NLP tasks. For example, Zhou et al. [17] proposed single transformer layers in streams where image and word are combined in a flow, thus achieving a fusion of NLP and Vision. The region-level feature extraction in all those proposed models was timeconsuming; hence, most of them were trained on cached resources instead of images. J Schlemper designed additive attention gate modules with the integration of skip connections which was more efficient than previous approaches in terms of time. However due to skip connections, it could not extract low-level features from images, bringing back the same disadvantage. TransUNet achieves superior performance as an encoder to the images by leveraging the UNet architecture along with the transformer layers. It is comparatively faster than its other transformer-based counterparts for image segmentation and encoding. Hence, it has been used as the encoder with some modifications in this paper, along with the proposed multi-attention decoder for generating sentence sequences for an image.
2.3 Proposed Model The TransUNet is still significant new research in the image segmentation field. It has shown tremendous improvement in medical image segmentation [18]. Since it involves 12 transformer layers, it can be used for Language tasks when combined with an efficient decoder architecture. Hence, this paper introduces a multi-attention decoder for generating sentences upon the TransUNet encoded images.
24
P. Mishra et al.
2.3.1 Encoder Architecture An image of spatial resolution L × B with C channels is the required input of the model. Given the image with these dimensions, the encoder architecture performs the pre-processing involving the following steps:
2.3.1.1
Image Tokenization
The input image x of dimensions L × B with C channels is passed into N number of sequences of 2D patches each of size p × p which is flattened. Here N is the length of the input sequence and is defined in the below equation: N=
2.3.1.2
LB . p2
(2.1)
Embedding the Patches
The generated patches, vectorized from the above operation, are mapped into the embedding space of dimension D via linear projection, which is trainable. The encoder model learns specific positional embeddings to encode the information, which is patch spatial. These positional embeddings are added to the patch embeddings so that positional information is retained. The overall operation is summed up in the below equation: E 0 = [x 1p e to x pN e] + epos
(2.2)
where e is the trainable linear projections and epos is the positional embedding. After the pre-processing step, a hybrid CNN has been applied to generate an input feature map. Each CNN block consists of a 3 × 3 convolution layer along with ReLU activation. Here CNN acts as a feature extractor. In these extracted images, patch embedding is applied to them instead of raw images. These hidden features generate linear projections after patch embedding is applied to them. These projections act as the input to the transformer layer.
2.3.1.3
Transformer Layer
There are 12 transformer layers in the proposed architecture. Each layer consists of Masked Self Attention. Unlike typical attention where there are Q, K and V, Attention in terms of images (where dimensions are present in the form of L, B and H) can be defined by the below equation:
2 Multi-attention TransUNet—A Transformer Approach for Image …
QK T V = Softmax √ HK
25
Attentionimage
(2.3)
where q, k and v are equal to the same length l = L × B with Dimension D. Here, H K corresponds to the dimension of K. Therefore the Masked Self Attention can be summarized as follows: MSAimage = Concatenate(Attention1 , Attention2 , . . . , Attention H ).
(2.4)
The concatenation performs a weighted sum of all the similarities obtained via attention to get the outputs, thus enabling a mechanism of global attention. A skip connection has been added between the Masked Self-Attention (MSA) and feedforward layers to ensure the recovery of the spatial order of the image. Multiple upsampling steps, along with cascading, are applied after the encoder to decode hidden-level features from the MSA, which enables the image blocks to reach the dimension of L × B. The skip connection present between the layers allows for aggregating features at multiple resolutions. The entire architecture of the encoder can be found in Fig. 2.2. This design allows the model to take advantage of high-resolution feature maps generated from CNN towards the decoder architecture.
2.3.2 Pre-integration Module The pre-integration module prepares the input for the decoder architecture. This involves steps to ensure information retrieval from the encoder. This helps to map the specific parts of the segmented feature maps into corresponding accurate text. In this module, the feature map generated is concatenated with the positional embeddings, which act as the input to the multi-attention decoder to generate text output. Figure 2.3 summarizes the overall process:
2.3.3 Decoder Architecture The purpose of the decoder is to generate a description of the image word by word, given the global and region-level features of the image from the encoder. This part of the task integrates the multi-modal architecture into each other. There are 12 layers of the decoder in the proposed architecture. Each layer has a block of Masked MSA, Cross MSA, a feed-forward layer and layer normalization in between to facilitate skip connections and proper sampling of the vectors. The overall decoder layer can be summarized in Fig. 2.4. The concatenated output from the pre-integration module is fed into the decoder, where it passes through the Masked Multi-Self Attention (MSA). In this block, the
26
P. Mishra et al.
Fig. 2.2 Encoder consists of hybrid-CNN and transformer block Fig. 2.3 Pre-integration block takes patch embedding from the first step and concatenates it with the positional encoding obtained from encoder block
vectors Q, K and V are fed through linear layer for the proper scaling of the features, and then the output of the same is processed by further doing the dot product and attention mechanism. This step is done to upsample the grid-level features of the image to a piece of high-level textual information. Then the whole grid is concatenated to pass through the linear layer again for further upsampling at other blocks of
2 Multi-attention TransUNet—A Transformer Approach for Image …
27
Fig. 2.4 Decoder architecture consists of Masked and Cross MSA along with feed-forward network
the decoder. One can understand the mechanism of Masked MSA with the help of the flow diagram in Fig. 2.5. Further processing of the vectors includes layer normalization and Cross MSA, where the positional embedding vectors are also fed. This block facilitates the intermodal relationship (vision with words). It combines the information of words from the Masked MSA and positional embedding of the image context. Given a word W t at time frame t and visual context V t , the Cross MSA can be formulated as (2.5) CrossMSA = LayerNorm X t + MSA WtQ X t , WtK Vt , WtV Vt where W t Q , W t K and W t V are the word context and also the learned parameters obtained from the masked MSA block.
28
P. Mishra et al.
Fig. 2.5 Masked MSAblock in detail
The output of cross MSA block is fed through the feed-forward network and then for a Layer Normalization block. The word generation is done by linear block, which gives us the final output as a generated description for a given image. Given the outcome, after layer normalization, be X t and the learned parameter W t , the final distributed vocabulary is sampled according to the below summation: p(X n ) = Linear Wt X nt−1
(2.6)
where n is the length of generated sequence.
2.3.4 Model Optimization and Fine Tuning The primary optimization of the model is done by the cross-entropy loss function on the images. The loss of the model from ground truth comparison can be equated as L(θ ) = −
t i=1
log( p(yi |yi−1 )).
(2.7)
2 Multi-attention TransUNet—A Transformer Approach for Image …
29
There is always a need to optimize a model based on the evaluation parameters. To optimize the model for CIDEr [19] evaluation, an unsupervised approach is adopted for loss calculation as follows: L U = − p[(C|yt )]
(2.8)
where C is CIDEr obtained score of an output yt .
2.4 Experiments and Evaluation The model is experimented with the Flickr30k Dataset and evaluated on BLEU [20] and CIDEr metrics. The experimental observations tend to perform better than many CNN-LSTM-based models such as GCN-LSTM [21] and RFNet [22] and also as compared to only transformer models such as DLCT and PureT.
2.4.1 Experimental Setup The model is trained on the Flickr30k Dataset [23] containing 31,783 images collected from the website Flickr and labelled by five sentences through human labour. Hence, the overall description for images in the dataset is approximately 150,000. The whole dataset is divided into 23,000 for the training set, 2000 for validation and the rest for testing purposes. All the training, validation and testing description text is converted into lowercase before passing into the model. The embedding size of the vector is set to 1024, and the number of encoder and decoder blocks is set to 12. The training hardware used in the experiment is NVIDIA RTX 2060 clustered for seven systems in total, along with Intel Core i7 11th -gen processors in all the systems. The first iteration of the model is trained for 30 epochs on batch size 20 with 10,000 warm-up steps on cross-entropy loss L(θ ). The next iteration is trained on L U for 50 epochs. The learning rate is kept constant 3 × 10−5 throughout the training process. These hyperparameters were determined through the cross-validation process. The famous Adam Optimizer [24] is used in both stages of the training process.
2.4.2 Analysis The benefits of using a transformer-based approach had some underlying advantages, both in object detection using TransUNet and sending the details to the slandered encoder- and decoder-based transformers. Both the architecture was trained without self-critical training and introduced the model on approximately 31,728 captioned images. In order to get improved accuracy in the image description, first, the model
30
P. Mishra et al.
was trained for 30 epochs, and it was noted that during this phase, the accuracy and generated image description were relatively inaccurate and were not nearly close to the provided test image description. Then after analysis, it was trained for at least 20 more epochs, and the image description generated produced somewhat close to the test images. The cross-entropy loss versus training iteration graph can be visualized in Fig. 2.6. The model was again trained for 50 epochs, with the learning rate kept constant 3×10−5 throughout the training process. It was observed that there was an increase in accuracy, and image description generation was much more related to the test images. The overall graphs for CIDEr and METEOR metrics have been plotted throughout the training process to determine the right hyperparameters in Fig. 2.7a, b. Fig. 2.6 Decreasing cross-entropy loss for training and validation sets during training iterations
Fig. 2.7 a Variation of CIDEr score with training epochs. b Distribution of METEOR in the course of training
2 Multi-attention TransUNet—A Transformer Approach for Image …
31
2.4.3 Comparative Analysis with Similar Past Work Done The model presented a spectacular result against other network-based frameworks. Table 2.1 compares the result from the proposed model and different approaches.
2.4.3.1
Comparison with CNN-RNN-Based Frameworks
An image is composed of several colours in the human eye to represent various scenes. But from a computer’s perspective, most images are formed as channels, three if images are RGB. In contrast, various data modalities in the neural network are all heading towards creating a vector and performing the below operations on these features. In the CNN-RNN-based approach, CNN presents a rich formation of the image. Based on the embeddings of fixed length provided by the CNN part, the RNN-based network is able to obtain historical information via the contentious circulation of hidden layers. But in RNN, the sequential number of operations is O(n), whereas, in a transformer-based approach, the sequential number of operations is O(1). In most cases, due to the large path length, the RNN-based approach is not optimal.
2.4.3.2
Comparison with CNN-LSTM-Based Frameworks
Even though models like the LSTM network contain memory cells that are better able to retain the extensive historical data of the sequence creation process than RNN, it is Table 2.1 Comparison of the three metrics of the model with other frameworks Models
BLEU-1
METEOR
CIDEr
c5
c40
c5
c40
c5
c40
SCST
78.1
34.2
27
35.5
111
116.7
GCN-LSTM
80
95
28
37.6
123
126
Up-Down
80.2
95.5
27.5
36.7
117.9
120.5
SGAE
81
95.3
28.2
37.2
123.8
126.5
AoANet
81
95
29.1
38.8
126.6
129.6
X-Transformer
81.9
95.7
29.6
39.2
131.1
133.6
RSTNet
82.1
96
29.6
39.1
131.9
132.1
GET
81.6
96.4
29.4
38.8
130.3
134
DLCT
82.4
96.1
29.8
39.6
133.3
132.5
PureT
82.8
96.5
30.1
39.9
136.0
138.3
Multi-attention TransUNet
84.8
97.2
31.4
43.4
138.5
139.5
The bold text indicates the name and scores of the proposed model in this paper. It has been done to distinguish it from other architectures with which we are comparing
32
P. Mishra et al.
still refreshed every time, making long-term memory somewhat challenging. Recent studies have demonstrated the advantages of CNN for the task of picture captioning, which was inspired by machine learning. Using CNN as a text production tool in NLP is incredibly effective. It has been demonstrated that the RNN recurrent model replaces CNN convolution model for neural machine translation because it not only outperforms the cycle model in terms of accuracy but also accelerates training time by a factor of nine. But in CNN-CNN-based architecture, the first step comprising of CNN, which is used for object detection, recognizes the input image pixel by pixel and identifies features like corners by the way up from local to global. In contrast, the transformer uses self-attention, which makes the information connection from all the connections between the distant image location, the other benefit of using the transformer based is that it outperforms the CNNs in both computational efficiency and accuracy by up to four times.
2.4.4 Sample Outputs Generated from Input Images The model generates captions from the input images via the inference engine. It shows the actual vs generated captions. Figure 2.8a, b is among the few samples generated from the model without any external bias.
2.5 Conclusion and Future Scope The limited capability of LSTM led to the introduction of transformers, which proved to be phenomenal in NLP tasks. The TransUNet revolutionized the task of image segmentation over the CNN algorithm. The proposed encoder system captures global and grid-level features from the image. The Multi-Self Attention decoder performs text generation efficiently with minimal loss of information. In order to enhance the multi-modal interaction, a pre-integration module has been induced in the architecture. The results on the Flickr8k dataset evaluated on three metrics prove that the model achieves a new SOTA performance for the Image description generation task. For further enhancement, investigation of a wider range of conditioning information, such as free-form text, a previously proposed state-of-the-art model, and challenges that combine modalities, including language-driven image modification, can be performed. The future scope of the architecture lies in the fact that these enhancements can be achieved with a minimal deviation from the architecture and complexity. The applications of the model in real life include human–robot interaction, visual devices for blind people and the perception of humanoid robots.
2 Multi-attention TransUNet—A Transformer Approach for Image …
33
Fig. 2.8 a Generated description for people outside a building. b Generated description of a girl playing
References 1. Anderson, P., et al.: Bottom-up and top-down attention for image captioning and visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018) 2. Vinyals, O., et al.: Show and tell: a neural image caption generator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015) 3. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997) 4. Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A.C., Salakhutdinov, R., Zemel, R.S., Bengio, Y.: Show, attend and tell: neural image caption generation with visual attention. In: Proceedings of the ICML, pp. 2048–2057 (2015) 5. Chen, J., et al.: Transunet: transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021)
34
P. Mishra et al.
6. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and ComputerAssisted Intervention. Springer, Cham (2015) 7. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017) 8. Ji, J., Luo, Y., Sun, X., Chen, F., Luo, G., Wu, Y., Gao, Y., Ji, R.: Improving image captioning by leveraging intra- and inter-layer global representation in transformer network. In: Proceedings of the AAAI, pp. 1655–1663. AAAI Press (2021) 9. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) Proceedings of the ICLR (2015) 10. Huang, L., Wang, W., Chen, J., Wei, X.: Attention on attention for image captioning. In: Proceedings of the ICCV, pp. 4633–4642 (2019) 11. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the CVPR, pp. 770–778 (2016) 12. Yang, X., Tang, K., Zhang, H., Cai, J.: Auto-encoding scene graphs for image captioning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019) 13. Yao, T., Pan, Y., Li, Y., Mei, T.: Exploring visual relationship for image captioning. In: Proceedings of the European Conference on Computer Vision (2018) 14. Herdade, S., Kappeler, A., Boakye, K., Soares, J.: Image captioning: transforming objects into words. In: Proceedings of the NeurIPS, pp. 11135–11145 (2019) 15. Pan, Y., Yao, T., Li, Y., Mei, T.: X-linear attention networks for image captioning. In: Proceedings of the CVPR, pp. 10968–10977 (2020) 16. Cornia, M., Stefanini, M., Baraldi, L., Cucchiara, R.: Meshed-memory transformer for image captioning. In: Proceedings of the CVPR, pp. 10575–10584 (2020) 17. Zhou, L., Palangi, H., Zhang, L., Hu, H., Corso, J.J., Gao, J.: Unified vision-language pretraining for image captioning and VQA. In: Proceedings of the AAAI Conference on Artificial Intelligence (2020) 18. Lin, A., et al.: Ds-transunet: dual swin transformer u-net for medical image segmentation. IEEE Trans. Instrum. Meas. 71, 1–15 (2022) 19. Vedantam, R., Lawrence Zitnick, C., Parikh, D.: Cider: consensus-based image description evaluation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015) 20. Papineni, K., Roukos, S., Ward, T., Zhu, W.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the ACL, pp. 311–318 (2002) 21. Yao, T., Pan, Y., Li, Y., Mei, T.: Exploring visual relationship for image captioning. In: Proceedings of the ECCV, pp. 711–727 (2018) 22. Shen, X., et al.: Rf-net: an end-to-end image matching network based on receptive field. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019) 23. Young, P., Lai, A., Hodosh, M., Hockenmaier, J.: From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions. TACL 2, 67–78 (2014) 24. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the ICLR (2015)
Chapter 3
Empirical Review of Oversampling Methods to Handle the Class Imbalance Problem Ritika Kumari, Jaspreeti Singh, and Anjana Gosain
Abstract Many real-world applications, including fault detection and medical diagnostics, suffer from the Class Imbalance Problem (CIP), in which one class (the majority class) has more instances than the other (the minority class). CIP affects the classifier’s learning as they are based on optimizing accuracy over all training instances. As a result, these classifiers will tend to label all the instances as belonging to the majority class. To handle the CIP, we have reviewed six oversampling methods: SMOTE (Synthetic Minority Oversampling Technique), Random SMOTE (RS), Distance SMOTE (DS), Borderline SMOTE2 (BS2), Polynom fit SMOTE (PS) and Modified SMOTE (MSMOTE) using four classification models: Decision Tree (DT), Support Vector Machine (SVM), Naïve Bayes (NB) and Random Forest (RF). We have used Area under the curve (AUC) to compare the performance of the classifiers using different oversampling methods. For the majority of the datasets, it was found that the Polynom fit SMOTE method performed better than the other oversampling methods.
R. Kumari (B) · J. Singh · A. Gosain USICT, Guru Gobind Singh Indraprastha University, New Delhi, India e-mail: [email protected] J. Singh e-mail: [email protected] A. Gosain e-mail: [email protected] R. Kumari Department of Artificial Intelligence and Data Sciences, IGDTUW, Delhi, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_3
35
36
R. Kumari et al.
3.1 Introduction CIP exists when one class has more instances than the other class. A minority class is defined as one with fewer instances, whereas a majority class is defined as one with more instances [1]. CIP impacts the performance of the classifier’s learning as they become biased towards the over-represented class (majority class). As the traditional classifiers have an accuracy-oriented design, they tend to classify every instance as a majority class, ignoring the under-represented class (minority class). CIP is present in many real-world applications like medical diagnosis, text classification, credit fraud detection, etc., where the minority class is more concerned [2–4]. Several ways of dealing with CIP have been mentioned in the literature. These strategies can be categorized into 3 groups: Data Level (DL) approaches, Algorithm Level (AL) approaches and Hybrid Level (HL) approaches [1, 5]. DL approaches are also called External approaches where we work at the DL. In this, we balance the dataset either using over-sampling or under-sampling techniques which is independent of the classifier’s logic. Under-sampling, on the other hand, may result in the loss of potential information that may be crucial for classifier learning, while over-sampling may result in issues with class overlaps. Internal approaches are another name for algorithm level techniques, when we create an entirely novel classification algorithm or enhance a current one to handle the CIP. These approaches have two subclasses: cost-sensitive algorithms and ensemble methods. In the cost-sensitive approach, different weights are given to each class, however in ensemble learning, several classifiers are trained to enhance the predictive performance of a single classifier. In HL methods, we incorporate the DL and AL approaches with the goal of creating a classifier that can better handle the CIP. The performance of six oversampling methods SMOTE, RS, DS, BS2, PS and MSMOTE employing 4 classification models DT, SVM, NB and RF over 10 datasets to handle the CIP has been analyzed in this paper. It is observed that Polynom fit SMOTE outperformed the other oversampling techniques. The organization of the paper is as follows: We discuss the material and methods in Sect. 3.2. The dataset’s properties are described in Sect. 3.3. In Sect. 3.4, we discuss the experiment and results, and Sect. 3.5 brings this study to a conclusion.
3.2 Material and Methods An overview of six oversampling techniques, including SMOTE, RS, DS, BS2, PS, and MSMOTE, as well as classification models and performance measures, is provided in this section.
3 Empirical Review of Oversampling Methods to Handle the Class …
37
3.2.1 Oversampling Methods Used SMOTE. In order to address the CIP, Chawla et al. [6] created SMOTE, which considers the samples of the underrepresented class and builds artificial samples by extrapolating between the instance of the underrepresented class and its k nearest neighbors. SMOTE ignores the majority class when it generates synthetic examples, leading to problems with class overlapping [7]. Random SMOTE (RS). Dong et al. [8] proposed Random-SMOTE in 2011. RS enhances the sparsity of the underrepresented class by generating artificial instances within that class space. On the other hand, in such sparse area of the sample space, SMOTE finds it difficult to predict unknown cases. Distance SMOTE (DS). Calleja et al. [9] proposed Distance-SMOTE, which performs similarly to SMOTE, however, to get the mean example, DS selects the nearest neighbor by averaging the k nearest neighbors. Borderline SMOTE. In 2005, Han et al. [10] proposed the borderline SMOTE1 and BS2 over-sampling algorithms. In this work, we have used BS2 which also takes into account the both the majority and minority classes for creating the fictitious data points. Polynom fit SMOTE (PS). Polynom fit SMOTE was introduced by Gazzah et al. [11] in 2008, using polynomial fitting functions for oversampling the minority class. To deal with the CIP, they presented four network topology-based solutions: star, bus, polynomial curve and mesh topology. The synthetic instances are created by finding the coefficients of a polynomial p(x) of degree n that fits the minority instances using curve fitting methods. Modified SMOTE (MSMOTE). Hu et al. [12] in 2009, proposed a modified technique based on SMOTE called MSMOTE to address the CIP. MSMOTE uses a different strategy for selecting the nearest neighbor during synthetic examples generation. Unlike Smote algorithm, MSMOTE also considers the latent noises and recognizes the minority class better than Smote algorithm.
3.2.2 Classification Models Decision Tree (DT). DT is a machine learning (ML) predictive model where each internal node represents a decision, a branch represents the decision’s result, and a leaf node represents the class label in DT’s tree structure [13]. The DT classifier is simple to understand since it mimics human decision-making behavior. However, selecting the attribute for node splitting is a time-consuming process that is suboptimal for datasets with inadequate information. Furthermore, even if a minor adjustment is made, the entire tree structure is disrupted. Support Vector Machine (SVM). SVM is a statistical learning technique that can be used to solve issues in both regression and classification [14]. The maximum margin hyperplane between the two classes is used by SVM to separate the data
38
R. Kumari et al.
points. Support vectors are the data points or vectors that are critical for locating the hyperplane. SVM can handle high-dimensional datasets because of the Kernel trick. SVM can also be used to solve multiclass classification issues by breaking them down into two-class problems. Choosing the correct kernel function, on the other hand, is a difficult task. The SVM is computationally intensive and takes a long time to learn. Naïve Bayes (NB): It is a probabilistic classifier that assumes that the feature set has no relation to one another [13, 14]. The probability of all the class attributes is examined independently when classifying unknown data into defined classes. NB is easy to set up, use and works well with small datasets. However, NB is difficult to implement with large datasets. Random Forest (RF). RF is a classifier that uses ensemble learning to handle classification issues. It builds multiple decision trees, takes into account their forecasts and then makes the prediction using a majority vote [13]. RF avoids the issue of overfitting by averaging the predictions. Additionally, it can manage datasets with numerous dimensions. It generates more trees than DT, which extends the model training period [14].
3.2.3 Performance Measure We used AUC to empirically evaluate the classifiers’ performance. Repeated Stratified K-Fold Cross-validation from the sklearn library in python is employed in the paper. The metric is described as follows [15]. AUC. It is a curve where the x and y-axis are plotted against the proportion of positively classed events that were correctly identified, or true positive rate (TPrate) and the proportion of negatively categorized events that were correctly identified or true negative rate (TNrate). The performance of the classifier improves with increasing AUC. Equation (3.1) provides the formula. AUC =
TPrate + TNrate 2
(3.1)
3.3 Dataset Ten imbalanced datasets were imported into the jupyter notebook from the imbalanced_databases package, which is built-in. There are no missing values in these datasets. The cases with class labels of 0 and 1 correspond to the majority and minority classes, respectively. Table 3.1 displays the dataset’s characteristics.
3 Empirical Review of Oversampling Methods to Handle the Class …
39
Table 3.1 Properties of datasets Datasets
#Dimensions
Dataset size
#Minority class
#Majority class
Imbalance ratio
Ecoli3
8
336
35
301
0.11
Glass6
10
214
29
185
0.15
Glass0
10
214
70
144
0.48
Glass_0_1_2_3_vs_ 10 4_5_6
214
51
163
0.31
Hypothyroid
25
3163
151
3012
0.05
Page-blocks0
11
5472
559
4913
0.11
9
768
268
500
0.53
19
846
199
647
0.30
9
482
20
462
0.04
11
472
28
444
0.06
Pima Vehicle0 Yeast_2_vs_8 Page-blocks-1-3_ vs_4
3.4 Experiment and Results This section presents the performance comparison of six oversampling methods: SMOTE, RS, DS, BS2, PS and MSMOTE on ten datasets. The performance evaluation of DT, SVM, NB, and RF on the original imbalanced data sets (NOSMOTE) is also taken into consideration in this study. Table 3.2 show the AUC values for DT, SVM, NB and RF. The highest values are highlighted in bold. The AUC values for six oversampling techniques using DT, SVM, NB, and RF are shown in Figs. 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9 and 3.10 for ten datasets. The results showed that the Polynom fit SMOTE method performed better in comparison to other oversampling methods. PS delivered the best results because it focuses on relatively remote samples of the minority class and creates samples along line segments between them; as a result, the created instances are more dispersed across the minority class’s manifold.
40
R. Kumari et al.
Table 3.2 AUC values for different classifiers AUC values Datasets
Methods
DT
SVM
NB
RF
Ecoli3
NOSMOTE
0.88
0.93
0.91
0.92
SMOTE
0.94
0.97
0.96
0.97
Random_SMOTE
0.93
0.97
0.96
0.97
Distance_SMOTE
0.93
0.98
0.96
0.98
Borderline_SMOTE2
0.93
0.96
0.94
0.96
Polynom fit_SMOTE
0.98
0.98
0.97
0.99
MSMOTE
0.94
0.97
0.96
0.97
NOSMOTE
0.86
0.92
0.91
0.95
SMOTE
0.97
0.97
0.97
0.99
Random_SMOTE
0.97
0.96
0.95
0.99
Distance_SMOTE
0.95
0.97
0.97
0.99
Borderline_SMOTE2
0.97
0.97
0.96
0.99
Polynom fit_SMOTE
0.97
0.97
0.97
0.99
MSMOTE
0.86
0.92
0.91
0.95
NOSMOTE
0.80
0.82
0.76
0.90
SMOTE
0.79
0.83
0.78
0.93
Random_SMOTE
0.84
0.84
0.81
0.93
Distance_SMOTE
0.81
0.82
0.79
0.92
Glass6
Glass0
Glass_0_1_2_3_vs_4_5_6
Hypothyroid
Borderline_SMOTE2
0.79
0.78
0.78
0.86
Polynom fit_SMOTE
0.87
0.86
0.86
0.94
MSMOTE
0.81
0.82
0.79
0.91
NOSMOTE
0.89
0.95
0.94
0.98
SMOTE
0.95
0.97
0.96
0.99
Random_SMOTE
0.95
0.98
0.96
0.99
Distance_SMOTE
0.95
0.98
0.96
0.99
Borderline_SMOTE2
0.95
0.97
0.95
0.99
Polynom fit_SMOTE
0.94
0.98
0.98
0.98
MSMOTE
0.96
0.97
0.96
0.99
NOSMOTE
0.92
0.96
0.79
0.97
SMOTE
0.98
0.99
0.83
0.99
Random_SMOTE
0.98
0.99
0.84
0.99
Distance_SMOTE
0.98
0.99
0.85
0.99
Borderline_SMOTE2
0.95
0.98
0.79
0.98
Polynom fit_SMOTE
0.99
0.99
0.92
0.99 (continued)
3 Empirical Review of Oversampling Methods to Handle the Class …
41
Table 3.2 (continued) AUC values Datasets Page-blocks0
Pima
Vehicle0
Yeast_2_vs_8
Page-blocks-1-3_vs_4
Methods
DT
SVM
NB
RF
MSMOTE
0.98
NOSMOTE
0.92
0.99
0.84
0.99
0.96
0.92
SMOTE
0.98
0.98
0.97
0.90
0.99
Random_SMOTE
0.98
0.97
0.90
0.99
Distance_SMOTE
0.98
0.97
0.91
0.99
Borderline_SMOTE2
0.98
0.96
0.89
0.98
Polynom fit_SMOTE
0.98
0.99
0.94
0.99
MSMOTE
0.98
0.97
0.91
0.99
NOSMOTE
0.77
0.79
0.81
0.82
SMOTE
0.82
0.86
0.82
0.85
Random_SMOTE
0.79
0.85
0.82
0.85
Distance_SMOTE
0.81
0.85
0.82
0.86
Borderline_SMOTE2
0.76
0.81
0.78
0.82
Polynom fit_SMOTE
0.85
0.87
0.85
0.89
MSMOTE
0.81
0.85
0.83
0.85
NOSMOTE
0.94
0.99
0.81
0.98
SMOTE
0.97
0.99
0.83
0.99
Random_SMOTE
0.96
0.99
0.83
0.99
Distance_SMOTE
0.96
0.99
0.84
0.99
Borderline_SMOTE2
0.96
0.99
0.83
0.98
Polynom fit_SMOTE
0.96
0.99
0.91
0.99
MSMOTE
0.96
0.99
0.83
0.99
NOSMOTE
0.59
0.80
0.79
0.83
SMOTE
0.96
0.97
0.89
0.98
Random_SMOTE
0.96
0.97
0.91
0.98
Distance_SMOTE
0.96
0.98
0.93
0.98
Borderline_SMOTE2
0.96
0.99
0.96
0.99
Polynom fit_SMOTE
0.98
0.99
0.99
0.99
MSMOTE
0.97
0.99
0.97
0.99
NOSMOTE
0.96
0.92
0.93
0.99
SMOTE
0.99
0.98
0.95
0.99
Random_SMOTE
0.99
0.98
0.94
1.00
Distance_SMOTE
0.99
0.98
0.94
1.00 (continued)
42
R. Kumari et al.
Table 3.2 (continued) AUC values Datasets
Winning times
Fig. 3.1 Ecoli3
Methods
DT
SVM
NB
RF
Borderline_SMOTE2
0.99
0.99
0.96
0.99
Polynom fit_SMOTE
0.99
0.99
0.97
1.00
MSMOTE
0.99
0.98
0.92
0.99
NOSMOTE
0
1
0
0
SMOTE
4
3
1
5
Random_SMOTE
3
3
0
6
Distance_SMOTE
2
5
1
6
Borderline_SMOTE2
3
4
0
3
Polynom fit_SMOTE
8
10
10
9
MSMOTE
3
3
0
5
3 Empirical Review of Oversampling Methods to Handle the Class … Fig. 3.2 Glass6
Fig. 3.3 Glass0
43
44 Fig. 3.4 Glass_0_1_2_3_ vs_4_5_6
Fig. 3.5 Hypothyroid
R. Kumari et al.
3 Empirical Review of Oversampling Methods to Handle the Class … Fig. 3.6 Page-blocks0
Fig. 3.7 Pima
45
46 Fig. 3.8 Vehicle0
Fig. 3.9 Yeast_2_vs_8
R. Kumari et al.
3 Empirical Review of Oversampling Methods to Handle the Class …
47
Fig. 3.10 Page-blocks-1-3_ vs_4
3.5 Conclusion This paper reviews six oversampling methods based on SMOTE in jupyter notebook—SMOTE, Random SMOTE, Distance SMOTE, Borderline SMOTE2, Polynom fit SMOTE and MSMOTE to handle the CIP. Ten imbalanced data sets were imported into the jupyter notebook from the imbalanced_databases package, which is built-in. The four classification models DT, SVM, NB, and RF are then used to empirically compare the performance of the CIP oversampling strategies. AUC value is used for examining the performances. It is observed that Polynom fit SMOTE performs better than the other oversampling methods for majority of the datasets as Polynom fit creates samples along line segments between relatively remote instances of the under-represented class; as a result, the created samples are more dispersed across the minority class’s manifold. On high-dimensional datasets, the results can be illustrated for further research. The evaluation can also be performed with multi-class datasets.
References 1. Gosain, A., Sardana, S.: Handling class imbalance problem using oversampling techniques: a review. In: International Conference on Advances in Computing, Communications and Informatics, pp. 79–85. IEEE (2017) 2. Nishant, P.S., Rohit, B., Chandra, B.S., Mehrotra, S.: HOUSEN: hybrid over–undersampling and ensemble approach for imbalance classification. In: Inventive Systems and Control, pp. 93– 108. Springer, Singapore (2021)
48
R. Kumari et al.
3. E Elyan CF Moreno-Garcia C Jayne 2021 CDSMOTE: class decomposition and synthetic minority class oversampling technique for imbalanced-data classification Neural Comput. Appl. 33 7 2839 2851 4. AS Desuky S Hussain 2021 An improved hybrid approach for handling class imbalance problem Arab. J. Sci. Eng. 46 4 3853 3864 5. Kaur, P., Gosain, A.: Comparing the behavior of oversampling and undersampling approach of class imbalance learning by combining class imbalance problem with noise. In: ICT Based Innovations, pp. 23–30. Springer, Singapore (2018) 6. NV Chawla KW Bowyer LO Hall WP Kegelmeyer 2002 SMOTE: synthetic minority oversampling technique J. Artif. Intell. Res. 16 321 357 7. Z Jiang T Pan C Zhang J Yang 2021 A new oversampling method based on the classification contribution degree Symmetry 13 2 194 8. Dong, Y., Wang, X.: A new over-sampling approach: random-SMOTE for learning from imbalanced data sets. In: International Conference on Knowledge Science, Engineering and Management, pp. 343–352. Springer, Berlin (2011) 9. De La Calleja, J., Fuentes, O.: A distance-based over-sampling method for learning from imbalanced data sets. In: FLAIRS Conference, pp. 634–635 (2007) 10. Han, H., Wang, W.Y., Mao, B.H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: International Conference on Intelligent Computing, pp. 878–887. Springer, Berlin (2005) 11. Gazzah, S., Amara, N.E.B.: New oversampling approaches based on polynomial fitting for imbalanced data sets. In: 2008 the Eighth IAPR International Workshop on Document Analysis Systems, pp. 677–684. IEEE (2008) 12. Hu, S., Liang, Y., Ma, L., He, Y.: MSMOTE: improving classification performance when training data is imbalanced. In: Second International Workshop on Computer Science and Engineering, vol. 2, pp. 13–17. IEEE (2009) 13. AK Verma S Pal BB Tiwari 2020 Skin disease prediction using ensemble methods and a new hybrid feature selection technique Iran J. Comput. Sci. 3 4 207 216 14. A Thakkar R Lohiya 2021 Attack classification using feature selection techniques: a comparative study J. Ambient Intell. Humaniz. Comput. 12 1 1249 1266 15. Gosain, A., Sardana, S.: Farthest SMOTE: a modified SMOTE approach. In: Computational Intelligence in Data Mining, pp. 309–320. Springer, Singapore (2019)
Chapter 4
Automatic COVID Protocols-Based Human Entry Check System Annapareddy V. N. Reddy, Chinthalapudi Siva Vara Prasad, Oleti Prathyusha, Duddu Sai Praveen Kumar, and Jangam Sneha Madhuri
Abstract Many people locate themselves getting succumbed to the ailment by using no longer following the same old operating methods (SOPs) positioned forth by the government. The number one safety measures consist of carrying face masks properly, washing fingers with hand sanitizers regularly, and maintaining social distancing. So, the number one perspective of the venture is to screen people in making them comply with protection precautions in crowded locations by detecting face masks on their faces, checking the temperature, and spraying the sanitizer, with video monitoring at the show and accumulating the masks facts as well. Considering the situation in crowded locations like hospitals, department shops, temples, movie theatres, ATMs, and many others. This tool is stored at the doorway of crowded places. With this, human intervention can be minimized at the doorway for checking precautions; the system detects the face and temperature of a human and sprays sanitizer, if the individual is taking each precaution the door which is related to the gadget will open and let the character move and if either of the precautions had been now not observed the door will no longer open and the photo of the character may be captured and the reveal display the temperature.
4.1 Introduction The coronavirus changed into commenced spreading in December 2019. We are very aware of wherein it commenced, the virus came out and showed up in Wuhan China. It is named as COVID-19, after that they determined its miles previously now not regarded as zoonotic coronavirus, and it’s far called SARS-CoV-2. To manage and prevent of spreading this virus, every us government and international health A. V. N. Reddy (B) Department of Information Technology, Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India e-mail: [email protected] C. Siva Vara Prasad · O. Prathyusha · D. Sai Praveen Kumar · J. Sneha Madhuri Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_4
49
50
A. V. N. Reddy et al.
Fig. 4.1 Different positions of mask on face
organization, and a maximum of noticeably professionalized doctors are suggesting people place their masks on the face covering the nostril and mouth if they any breathing troubles or any cold and cough, even medical doctors looking after human beings who have infected and who have slight signs and symptoms as well if the humans who’ve signs and no longer following the safety precautions and staying with the organization of humans will end in the unfolding of the virus. Several pics from the MAFA dataset were used to teach the model. When in comparison with publicly available datasets, they can handiest discover whether or not a person is carrying a mask or not, however when it comes to the MAFA dataset, we have more than one snap shot of humans in various positions of face masks, as shown in Fig. 4.1 [1]. In view that these are the different pics that we’ve taken from the MAFA dataset. Compared with the opposite dataset of masked faces, it detects the facial masks of characters according to the tips given by the WHO, If someone does now not wear a mask in the proper position, the virus can unfold to different humans, ensuing in an endemic; some of those viruses consist of COVID-19 and avian influenza. The snapshots which are shown above are examples of masked faces with right and flawed positions, the photos that are marked with inexperience are the precise positions of the mask worn. In this painting, the dataset of masked faces has been examined and defined manually by using considering the position of the mask and designed to recognize the place of the cover worn by the people. The above pictures are pleasant and visible in hues. The majority who’ve public providers and shops, they’re hiring humans to test with the guide device at the entrance we use the IR sensor to detect the temperature using sending the IR sensor to the skin of a character. The temperature is confined, and human must incorporate the normal room temperature. The working model may also spray the sanitizer when those procedures are examined and the result needs to be positive, the gates that are linked to Raspberry Pi instruct to perform the door characteristic. Over time, the increase in COVID-19 cases made people more cautious, vigilant, and cautious of their surroundings. Safety always comes first in circumstances like this, where a single sneeze might endanger a large number of people. It is necessary to have a system that can detect whether a face mask is used in order to ensure the safety of all people. Not only would it protect the being, but also its neighbours. Enforcing such a system could be beneficial to society given access to cutting-edge technical trends.
4 Automatic COVID Protocols-Based Human Entry Check System
51
After evaluating the problem statement, numerous research that had been done on it were scanned in order to start the exploration. Additionally, the issue-relevant text was filtered out, and a thorough knowledge of the content was acquired. Additionally, a number of datasets were investigated, along with the options. The available styles were researched in the literature, and the various algorithms were then contrasted. The software was also investigated, which affected operations. The proposed system here performs three key functions: (i) detecting a face mask in the correct position, (ii) determining the temperature, (iii) sanitizer spraying, and (iv) opening the door. If someone tries to enter a public area, a machine will first determine whether or not their face mask is in the correct position before checking the temperature. If the temperature is within a certain range, the machine will spray sanitizer and open the door.
4.2 Literature Survey Social distance monitoring [2] and face mask discovery the operation of the deep neural network, the author is Priya Pandiyan, and designed in the time 2020, the model’s most important intention is to check for the presence of a facial covering whilst covering the social distance in public areas using 6 toes distance collaboration strategies [3]. Detection of a mask on the face, the venture is designed by using G. Jignesh Chowdary and others, and it changed into designed within 12 months in 2020. They automatic the technique of recognizing individuals who are not wearing facial coverings by way of schooling the version the usage of the deep studying model, inceptionV3. On a simulated masked face, it’s far trained and evaluated (SMFD). Amey Kulkarni and others will construct a continuous, AI-based absolutely [4] facial coverings discovery application for COVID-19 in 2020, and Amey Kulkarni and others will build the model. Deep stream SDK is used to efficaciously paint the model, locate the face masks, and create the educated version in actual time. A comparative look at deep studying techniques in detection [5] face mask utilization and the model evolved using Ivan Muhammad Siegfried, within the year 2020. Byway of the use of switch and deep studying strategies: MobileNNetV2, ResNet50V2, and exception. The detection is primarily based on the ResNet50V2 and the exception for the photograph dataset is the use of masks, by way of evaluating with the MobileNetV2 technique, the version has higher accuracy and precision. A deep gaining knowledge of primarily based unmanned method for actual time monitoring [6] of human beings wearing scientific masks and it’s far designed by way of Debojit Ghosh and others, in the 12 months of 2020. The principle use of this model is principal object identification calculations along with YOLOv3, YOLOv3Tiny, SSD, and faster R-CNN and assesses them on the Moxa3K benchmark dataset. A mixture of profound exchange acquiring information on model with contraption learning [7] systems for facemask recognition in the age of the coronavirus pandemic, it’s miles designed by way of Mohamed Loey and others, in the 12 months of 2020.
52
A. V. N. Reddy et al.
The model includes two main things, the deep switch mastering as a characteristic extractor and the second is a classical machine learning like SVM, and ensemble. Synthetic intelligence for COVID-19: speedy evaluation and the model [8] is educated by using Jiayang Chen and others, in the year 2020. AI is used for COVID19 in the following regions: analysis, general wellbeing, logical decision-making, medicines, observation, and care of COVID-19 patients, all of which might be blended with the large statistics characteristic of other core medical offerings. The layout of cellular software takes a look at your masks to restrict the unfold of COVID-19 [9], and this version is built by Mahmoud Melkemi and others, in the year 2020. This approach will make use of Haar-like element descriptors to test the significant thing abilities of the face and the decision-making calculation have connected to test if the mask is in the right position or now not. Inside the year, in 2020, Harco Leslie and others invented a masks detector for preventing COVID-19 [10] in the public services location, in addition to the version. A mask detection software uses a camera with a photograph and video enter and is connected to a speed maxi bit microprocessor to system records and show it on the lid. Arjya Das et al. [11] defined detecting face masks as the usage of Convolutional Neural Networks (CNN). This approach can hit upon facial coverings even moving. To place into impact the model the creator utilized OpenCV, Keras, TensorFlow, and Scikit—concentrating on applications and predominant a CNN model for facial coverings recognition. The precision acquired from the high level is somewhat 95%. The use of the basic instrument learning gear this approach has acquired higher exactness. Yet, this precision can be progressed with the guide of the utilization of our developed MobileNet model. Face detection and protestation was defined by Maliha Khan et al. [12] as the application of abecedarian part appraisal (PCA) to scan, choose, and organize a person’s face. Then, using Haar Cascade, OpenCV, Eigen face, Fisher face, and neighbourhood double model Histogram (LBPH), a camera-grounded constant face acknowledgment machine is developed. Haar falls are employed for face recognition; with this classifier, the face region’s perfection is above 95, and it is extremely advanced solely for the forward-looking face disclosure. The face sun machine was described by Vijay Kumar Sharma et al. [13]. The face is removed from the body using the Viola-Jones object recognizable evidence computation; LBP is used to exclude the specific bounds of the face, after which the eventual outgrowth is identified, and the prints that are already present in the dataset are used to detect the presence of the face. Then, for face differentiating evidence, Haar capacities as well as the announcement help estimation are taken into account. This structure also performs well at recognizing faces that are facing forward, with a perfection of 95–96. In their IoT-based illustration, Amritha Nag et al. [14] showed a door employing a face character to gain access to a controlling machine. The customer determines how the gateway operates; if the face is recognized as a championed customer, the entrance opens. Then, a favoured customer is captured through Eigen faces research.
4 Automatic COVID Protocols-Based Human Entry Check System
53
The usage of Haar cataracts allows for face disclosure. This structure determines how to connect the outcomes of face fashionability with IOT. He et al.’s (2014) illustration of seeing an internal face. An interior face is sorted using the powerful brand name-matching concept. Additionally, the SRC technique is used to actually see a face hidden by obstacles. CNNs are utilized for face recognition, and since the computer can significantly trip over a face when faced with impediments, its logic is absurd. Even with internal facial ID, this device ultimately does not work well for faces that are facing forward. A face-area trick that employs Boosted Cascade limitations was shown by Chandrappa DN et al. in their study [15]. Then, a face is celebrated using classifiers from the AdaBoost estimation and the Haar classifiers. Since the classifiers used have colourful feathers, the gimmick’s capacity varies for incontrovertible datasets. This device proves to be feasible for forward-looking facial exposure as well. Gurucharan [16] demonstrated how OpenCV and TensorFlow operate to detect face coverings. To determine if a single wears a cover or not, a successional CNN interpretation is presented. With an introduction association, the system’s degree of perfection is ninety-seven, which is particularly rewarding. In any case, there are ways to shorten the high care time. According to Hu et al. [17], face character was defined using fog figure and IoT. Then, a face identifier is created, and an objective frame has been employed to determine a person’s personality. It also employs regionally similar approaches to analyse the trick prosecution. Because the potential for enormous real variables has been employed for managing and dealing with real forces, the response time for the gimmick has advanced. The machine is extremely effective given the diversity of LBPs employed. Using fog-grounded complete pack figure, Srinivasa Raju Rudraraju et al. [18] described seeing a face. Then, using the computational power of Raspberry Pi-based IoT hubs, Haar falls are used to spot faces in a print, and the LBPH set of rules is utilized to identify the face. This might make more than one appearance in an unconnected photo; therefore its delicateness is to some extent 92, which is much less when compared to other systems. Associative face-seeing was described by Jun Kong et al. [19]. It is CSGF(2d) 2PCANet that is being used. These relationships safeguard astronomical degree caps and suggest redundant information about pervasiveness. The faces are sorted using block-sensitive histograms, linear SVM, and twofold mincing. Under restraint, the device operates quite efficiently, and it is carried with 98 degree of delicacy. This model works effectively, although being different from CNN.
54
A. V. N. Reddy et al.
4.3 Proposed System 4.3.1 Block Diagram The schematic representation of our system has been described below as follows: As proven in Fig. 4.2, the process takes location, the version that we designed will stumble on the mask of the individual and classify the marks at the face and perspective, and then it modifies the picture and modifications the angle and saturation because it desires via using the command from a python script. If the mask is found an incorrect role, then it’s going to display the word mask on the pc, which we are the usage of for video streaming, where the use of Raspberry Pi 4 models carries out the operations [18]. Using the pi cam, we locate the masks and confirm it and stay video streamed on the computer. If the mask is a gift, then it’s going to take a look at the temperature with the aid of commanding the MLX device to ship the IR rays on the skin of a person and when the rays are reflected it suggests the value of the temperature. If the value is beneath frame temperature (i.e. 36 ranges Celsius) [20], then it displays the fee within the inexperienced shade in the laptop. If each manner is finished, then next the Raspberry Pi starts evolved the spraying of sanitizer the use of the water pump motor and opening the doorways the usage of the serve motor by using the command from the python script [21]. If both of the detecting and checking the temperature doesn’t result positive, then the operation of water pump motor and serve motor won’t technique. Then it method the man or woman is not allowed to enter the respective location [22].
4.4 Hardware Components 4.4.1 Raspberry Pi4 Specifications: • • • • • • • • • • • •
It has a quad-core processor which is A72 with clock speed of 1.5 GHz SDRAM with a capacity of 2 GB, 4 GB, or 8 GB IEEE 802.11ac cellular Ethernet Gigabit The GPIO header has 40 pins for Raspberry Pi There are two micro-HDMI slots MIPI DSI view port with two lanes MIPI CSI video port with two lanes 4-pole stereo audio and a composite video connector are included Graphics using OpenGL MIPI DSI view port with two lanes The direct current here is 5 A
4 Automatic COVID Protocols-Based Human Entry Check System
55
Fig. 4.2 Schematic representation of the model
• MIPI CSI video port with two lanes • PoE is allowed.
4.4.2 Servomotor Specifications: • • • • • • • • • • • • • •
For longer life, use a metal-geared servo Double ball bearing feature that is both stable and shock resistant Maximum load current: 1200 mA For rapid reaction, use a high-speed rotation Control reaction time is short Torque remains constant in the servo travel length. Exceptional holding power 55 g in weight The operating voltage frequency is 4.8–7.2 V 5 s is the diameter of the dead band Temperature scale of operation: 0 °C to + 55 °C For rapid reaction, use a high-speed rotation At idle, the current draw is 10 mA Operating current draw without load: 170 mA Exceptional holding power
56
A. V. N. Reddy et al.
• Maximum load current: 1200 mA.
4.4.3 Liquid Pump Motor Specifications: • • • • • • • • • •
Dimensions are 90 mm * 40 mm * 35 mm Inner Diameter: 6 mm, External Diameter: 9 mm DC6-12V is the recommended working voltage. 12V1A 9V1A 0.3–0.7 A working electric current Flow rate of 1.5–2 L/min 2 m maximum suction range Max. Pumping Head: 3 m Maximum lifetime reach is 2500 H The temperature of the water is 80 degrees. Note: 4pcs of Dry Battery Cannot Drive Voltage 6 V Power 6 W/H.
4.4.4 Temperature Sensor: Mlx90614 Specifications: • • • • • •
Voltage range: 3.6–5 V 1.5 mA is the supply current The temperature of the object ranges from 70 to 382.2 °C 0.02 °C precision 80° view for the field 2–5 cm between the object and the sensor.
4.4.5 Raspberry Pi Cam Specifications: • • • • • •
Conformable with both the models of Raspberry Pi, A and B The camera module is of 5MP 2592 × 1944 is the resolution of an image taken in the pi cam in the form of pixels In terms of video, the resolution 1080p is at 30 Frames per second, the resolution of 720p is at 60 Frames consistent with second, and 640 × 480p 60/90 recording is supported by the camera • 15-pin MIPI digital camera Serial Interface—connects to the Raspberry Pi Board directly • Dimensions: 20 × 25 × 9 mm
4 Automatic COVID Protocols-Based Human Entry Check System
57
• 3 g in weight.
4.5 Working Procedure 4.5.1 Flow Chart If the person fulfils the both mask and temperature as according to recommendations then man or woman is allowed by means of showing enter after getting into automatic sanitization may be accomplished in any other case shows not to enter and no automated sanitization (Figs. 4.3 and 4.4).
4.6 Result 4.6.1 Result Face Mask Detection and Temperature Sensing The above photographs show the humans carrying masks and the proportion of face covered in the picture at the side of the temperature as proven in the above figures (Figs. 4.5, 4.6, and 4.7). The folks are wearing masks according to the protection precautions by way of covering the nose, mouth, and chin. And now it passes for the further process. As shown in the above determine, the man or woman’s face masks are detected and the temperature become measured using the infrared sensors [23]. The tool will send the infrared rays and collects the meditated rays from the pores and skin, and its degree the temperature, if the fee is a restriction, then it displays the fee in inexperienced colour and permits the character to go for a similar technique (Figs. 4.8 and 4.9). Here, we can have a look that the persons are not the usage of face masks, and the pixels are labeled in blue here, indicating that the people are not wearing face masks, thus we can see that the folks are not wearing them.
4.6.2 Servomotor Result If the human failed in face mask detection and in measuring temperature within the limit. The raspberry gives input to the servo motor do not open the doors. In order to prevent the spread of disease, the person is unable to enter the designated area (Figs. 4.10 and 4.11).
58
A. V. N. Reddy et al.
Fig. 4.3 Automatic human-based entry check system flow
If the human passed both the mask detection and measuring temperature value in the limit, then the Raspberry Pi gives input to the servo motor to open the gate and the person can freely move into the respective place.
4 Automatic COVID Protocols-Based Human Entry Check System
59
Fig. 4.4 Different positions of face mask image-1
Fig. 4.5 Different positions of face mask image-2
Fig. 4.6 Different positions of face mask image-3
If the person has masked his face as per the safety precautions and the temperature of the body is within the limit, then the Raspberry Pi sends input to the water pump motor to spray the sanitizer on the hands of the person when hand detected.
60 Fig. 4.7 Different positions of face mask image-4
Fig. 4.8 Different positions of face mask image-5
Fig. 4.9 Different positions of face mask image-6
A. V. N. Reddy et al.
4 Automatic COVID Protocols-Based Human Entry Check System
61
Fig. 4.10 Image of closed gate
Fig. 4.11 Image of open gate
If the character isn’t carrying a mask or is not sporting it in an appropriate function, even though the temperature is within the suitable range, the Raspberry Pi sends a practice to now not spray the sanitizer.
4.7 Conclusion and Future Scope 4.7.1 Conclusion The SARA-CoV-2 is an indigenous document that will very likely always be with us. Even though we plan to accept it as it has been for a long time, we still want to live proposed and pay attention to the abecedarian conventions to prevent being a victim of the complaint and to break the chain of this illness that has wreaked havoc on our lives. This is why the main goal of our investigation is to prevent the spread of this contaminant by using motorized rapid screening of individuals. Action must be taken to halt the spread of the coronavirus pandemic. Using a primary structure and switch getting-to-know-you tactics, we built a face mask detector in neural networks. We used a dataset of 306 masked face images and 314 unmasked face images to train and test the version. These images were produced using a variety of datasets, including FMLD, Wider Face, and MAFA. The use of the Raspberry Pi and its operating system
62
A. V. N. Reddy et al.
adds further accuracy and precision. The optimization of the version is ongoing, and by adjusting the hyperparameters, we have developed a noticeably accurate solution.
4.7.2 Future Scope In future, the project could be extended by adding the detection of some symptoms like checking the cold and cough of a person and also any other minor detections by scanning the throat and nose. So, then we can take the data of the person and can send the data to the hospitals or higher officials, to avoid the person not roaming in other places as well. So, this is some kind of help to society by preventing the spread.
References 1. GoogleImages: [Online]. Available: https://www.google.ae/imghp?hl=en&authuser=0&ogbl 2. Siegfried, I.M.: Comparative study of deep learning methods in detection face mask utilization 3. Roy, B., Nandy, S., Ghosh, D., Dutta, D., Biswas, P., Das, T.: MOXA: a deep learning based unmanned approach for real-time monitoring of people wearing medical masks (2020) 4. Aruna, A., Mol, Y.B., Delcy, G., Muthukumaran, N.: Arduino powered obstacles avoidance for visually impaired person. Asian J. Appl. Sci. Technol. 2, 101–106 (2018) 5. Jayapradha, S., Vincent, P.D.R.: An IOT based human healthcare system using Arduino uno board. In: Proceedings of the IEEE 2017 International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT), Kerala 6. Pawar, P.A.: Heart rate monitoring system using IR base sensor & Arduino uno. In: Proceedings of the IEEE 2014 Conference on IT in Business, Industry and Government (CSIBIG), pp. 1–3, Indore, India, 8–9 Mar 2014. Das, A., Ansari, M.W., Basak, R.: Covid-19 face mask detection using TensorFlow, Keras and OpenCV. IEEE (2020). 7. Khan, M., Chakraborty, S., Astya, R., Khepra, S.: Face detection and recognition using OpenCV. IEEE (2019) 8. Sharma, V.K.: Designing of face recognition system. IEEE (2019) 9. Nag, A., Nikhilendra, J.N., Kalmath, M.: IOT based door access control using face recognition. IEEE (2018) 10. He, L., Li, H., Zhang, Q.: Dynamic feature matching for partial face recognition. IEEE (2018) 11. Chandrappa, D.N., Akshay, G., Ravishankar, M.: Face Detection Using a Boosted Cascade of Features Using OpenCV. Springer, Berlin (2012) 12. Gurucharan, M.K.: COVID-19 face mask detection using TensorFlow and OpenCV. https:// towardsdatascience.com/covid-19-face-mask-detection-using-tensorflow-and-opencv-702 dd833515b 13. Hu, P., Ning, H., Qiu, T., Zhang, Y., Luo, X.: Fog computing based face identification and resolution scheme in internet of things. IEEE (2016) 14. Rudraraju, S.R., Suryadevara, N.K., Negi, A.: Face recognition in the fog cluster computing. IEEE (2019) 15. Rosebrock, A.: Face detection with OpenCV and deep learning 16. https://www.pyimagesearch.com/2018/02/26/face-detection-with-opencv-and-deep-learning/ (2018) 17. Dwivedi, D.: Face detection for beginners (2018). https://towardsdatascience.com/face-detect ion-for-beginners-e58e8f21aad9 18. https://doi.org/10.1007/s10586-022-03735-8
4 Automatic COVID Protocols-Based Human Entry Check System
63
19. https://doi.org/10.1016/j.jjimei.2021.100036. https://www.sciencedirect.com/science/article/ pii/S266709682100029X 20. Kazemi, V., Kth, J.S.: One millisecond face alignment with an ensemble of regression trees 21. Hapani, S., Prabhu, N., Parakhiya, N., Paghdal, M.: Automated attendance system using image processing. In: Proceedings—2018 4th International Conference on Computing Communication Control and Automation ICCUBEA 2018 (2018) 22. Aravindhan, K., Sangeetha, S.K.B., Periyakaruppan, K., Keerthana, K.P., Sanjaygiridhar, V., Shamaladevi, V.: Design of attendance monitoring system using RFID. In: 2021 7th International Conference on Advanced Computing and Communication Systems ICACCS 2021, pp. 1628–1631 (2021) 23. https://doi.org/10.1016/j.imavis.2022.104573
Chapter 5
Effect of Machine Translation on Authorship Attribution S. Ouamour and H. Sayoud
Abstract In this investigation, the effect of automatic machine translation on stylometry is investigated. For that purpose, an Arabic corpus called Hundred of Arabic Travelers (HAT), containing 100 authors, is used. The idea is to translate all the texts of this corpus, which are written in Arabic, to the French language by using Microsoft office translate. An authorship attribution system is applied on both datasets in order to attribute the author identity for each text before and after translation. Thus, a comparative evaluation based on the author attribution score is made between the two datasets. Several types of features are tested, namely: rare words, words, word bi-gram, word tri-gram, character bi-gram, character tri-gram, and character tetra-gram. Those features are used with a centroid nearest neighbor distance for classification. The experimental results have shown that the effect of machine translation reduces the stylometric identification performances, but preserves some characteristics of the author, which makes the identification still possible even after translation. The accuracy of authorship attribution, with 100 authors, on the translated documents is about 80% of correct attribution, while the best accuracy obtained on the original documents is 97%.
5.1 Introduction Plagiarism represents a serious concern in intellectual property and academic publication regulation [1]. That is why several tools of plagiarism detection have been implemented and proposed to the academic community [2]. A simple way could be a web search through a search engine such as Google or Bing for instance. However, most of existing tools do not perform a plagiarism detection through a cross-language approach, which represents a serious problem. For example, if we take a paragraph from an article in English and translate it into French, conventional plagiarism tools do not detect any plagiarism in it, unfortunately. S. Ouamour (B) · H. Sayoud EDT, Laboratory of Speech Communication and Signal Processing, USTHB University, Algiers, Algeria e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_5
65
66
S. Ouamour and H. Sayoud
In a broader range, author style identification can also be influenced by the effect of machine translation, resulting in a loss of some author’s style features across the translation process. In this context, machine translation can be defined as a fully automatic process that converts a document from one natural language into another different natural language [3]. Different classifiers and features have been tested and evaluated in this specific application. For instance, Hedegaard and Simonsen investigated the author identification using classifiers that are based on frame semantics. Their purpose was to discover whether adding semantic information to syntactic and lexical techniques for author identification could improve them, especially to address the complex problem of author identification of translated texts. Their result suggested that frame based classifiers are usable for author identification of both translated and untranslated documents, and that frame based classifiers usually perform worse than the baseline ones with untranslated texts [4]. Again, some researchers, tried to see the impact of such translation on authorship attribution and reported several modifications in the document property. For instance, in 2012, Caliskan et al. investigated the effect of translation tools on translated texts and the accuracy of authorship and also the translator attribution. They showed that the more translation performed on a text by a tool of machine translation, the more effects specific to that translator are noticed [5]. More recently, in 2021, Murauer et al. analyzed the possibility to detect the source language from the translated document of 2 commercial machine translation systems, by using machine learning techniques with some textual features such as Ngrams. Evaluations showed that the source language can be reconstructed with accurately for texts containing a large quantity of translated text [6]. In the same context, in this investigation, we are interested in evaluating the impact of machine translation on the author style analysis with large number of authors. For that purpose, we have used a quite large corpus with 100 authors, namely the HAT corpus. We built this dataset in 2019 [7] and called it “HAT” (i.e., Hundred of Arabic Travelers). This corpus has been proposed for an objective of comparative evaluation in stylometry and is made available to researchers. This corpus has been translated from Arabic to French by using the Microsoft Translate Engine embedded in Microsoft Office 2019 [8]. The adopted approach and experimental results are reported in the next sections. In Sect. 5.2, we present the HAT corpus. In Sect. 5.3, we define the approach of authorship attribution used to identify the author of each text. The experimental results of this investigation are commented in Sect. 5.4. At the end of the paper, a conclusion is given to summarize all the important results of this investigation.
5 Effect of Machine Translation on Authorship Attribution
67
5.2 Corpus and Used Dataset The HAT corpus is composed of 100 groups of Arabic documents corresponding to 100 authors, and each group is composed of three different texts, written by one author, which means that each text group belongs to a unique author. The corpus was extracted from one hundred books written in Arabic [7]. Totally, this corpus, with 300 text documents was built in 2019 for an objective of competitive evaluation of authorship attribution approaches in Arabic. We called it HAT or “Hundred of Arabic Travelers”. Moreover, this corpus can represent a reference corpus for author identification in Arabic [7], which can be used by scientists in the field of stylometry or NLP in general. In Fig. 5.1, one can see a piece of text belonging to Author 92 (N. Khasru). The documents have a short-medium size: the average length per document is about 1100 words, and each author is represented by 3 documents, which corresponds to a total of 300 documents in the corpus, since there are 100 authors. This case involves severe experimental conditions, because it has been shown in some research works [9] that the minimal quantity of words per text should be at least 2500 words in order to ensure good performances of attribution. The HAT corpus is edited by the Microsoft Translate Engine, embedded in Microsoft Office 2019 [8], and translated into French. Evidently, the translation is not perfect and contains several mistakes, since it is a machine translation process, but those mistakes are kept in the document and are not corrected. The use of Microsoft translate engine was selected after a comparison with Google translate, where we noticed that the first one was more accurate with regards to our textual corpus. At the end, we get two datasets: the original HAT corpus and the translated HAT corpus (denoted by FRHAT).
Fig. 5.1 Arabic text sample belonging to Author 92: known as Khasru
68
S. Ouamour and H. Sayoud
5.3 Approach of Authorship Attribution and Experimental Protocol As mentioned previously, we use the HAT corpus in its original form and in its translated form into French (i.e., FRHAT). The translation was ensured by Microsoft Translate Engine, which is integrated in Microsoft office. Thus, seven types of features are evaluated: Rare words, Word, Word bi-gram, Word tri-gram, Character bi-grams, Character tri-gram, and Character tetra-gram. These features are selected because of their reliability in authorship attribution—most of them [10]. During our experiments, the centroid-based nearest neighbor technique is used by employing Manhattan distance. This last technique has been chosen because it does not require a lot of data for the training, which is quite suitable in our case since the dataset contains only 3 documents per author (see Fig. 5.2). According to Fig. 5.2, the general approach is based on the following successive steps: • Feature extraction from the analyzed document and the reference ones; • Classification per authors by using Manhattan distance and the centroid-based technique; • Evaluation of the identification result and comparison with the actual author; • Performances assessment and computation of the accuracy. These processes are performed on both HAT and FRHAT (the translated corpus).
Feature Extraction
Classification by authors
Evaluation
(Manhattan distance) Performances
Translation into French Performances
Feature Extraction
Classification by authors (Manhattan distance)
Fig. 5.2 Approach of authorship attribution and experimental protocol
Evaluation
5 Effect of Machine Translation on Authorship Attribution
69
5.4 Experiments and Results In this section, the experiments of authorship attribution, conducted on the HAT corpus and FRHAT corpus, are described, where the main goal is to identify the 100 authors of both datasets. For every author, two documents are used for the training and the remaining document is used for the testing step. The accuracy of AA is evaluated for every feature and compared to the other features. All these features are employed with the Manhattan distance [11]. The Authorship Attribution Accuracy is defined by the following formula: AA Accuracy = Number of correct identified texts/Total number of texts (5.1) The obtained results are displayed in Table 5.1 and Fig. 5.3. As one can see in Fig. 5.3, the accuracy of AA depends on the used feature. Furthermore, it also varies with the size of the Ngram. For instance, for words, the accuracy is almost inversely proportional to the Ngram size (i.e., decreases when the Ngram size increases), while for characters, it is almost proportional to the Ngram size (i.e., increases when the Ngram size increases). Concerning the effect of machine translation, we notice that the identification of the translated texts is less accurate than the identification of the original ones. According to Table 5.1, the AA accuracy degradation, due to the translation, is about 0.15 for words and rare words, 0.18 for word bi-grams, 0.21 for character bi-grams, and it is about 0.17 for character tetra-grams, character tri-grams and word tri-grams. The medium AA accuracy degradation, due to the translation, is about 0.17.
Table 5.1 Performances of authorship attribution Feature
Accuracy AA Accuracy on the original corpus: HAT
AA Accuracy on the translated corpus: FRHAT
Accuracy degradation after translation
Rare words
0.93
0.78
0.15
Words
0.91
0.76
0.15
Word bi-grams
0.92
0.74
0.18
Word tri-grams
0.71
0.54
0.17
Char bi-grams
0.91
0.70
0.21
Char tri-grams
0.94
0.77
0.17
Char tetra-grams
0.97
0.80
0.17
70
S. Ouamour and H. Sayoud
1 Accuracy
0.8
0.93 0.78
0.91 0.76
0.92 0.74
0.91 0.71
0.7
0.94 0.77
0.97 0.8
0.54
0.6 0.4 0.2 0 Rare Words
Words
Word bigrams
Original texts in Arabic
Word trigrams
Char bigrams
Char Char trigrams tetragrams
Translated texts into French
Fig. 5.3 Comparative performances of authorship attribution on HAT (in black) and FRHAT (in red)
5.5 Conclusion During this investigation, we have examined the effect of machine translation (Microsoft Office Translate Engine) on stylometry. For that purpose, the Arabic HAT corpus, representing 100 authors, has been used before and after the machine translation process from Arabic to French. Seven types of features have been evaluated and compared in authorship attribution on both datasets (i.e., original corpus and translated one). The experimental results have shown that the impact of machine translation process reduces the authorship attribution performances, but surprisingly preserves some characteristics of the author, which makes the attribution still possible even after that translation. The best accuracy of authorship attribution on the translated documents was 80% of correct attribution, while the best attribution accuracy obtained on the original documents was 97%. As for the effect of machine translation, we have noticed that the identification of translated texts is less accurate than the identification of the original ones: the medium stylometric accuracy degradation, due to that translation, was about 0.17. Finally, even though the machine translation process deteriorates the AA performances, it does not remove all the author style characteristics, which shows that the author identification remains still possible even after translation (i.e., about 80% of the documents were correctly identified after translation).
References 1. Farahian, M., Avarzamani, F., Rezaee, M.: Plagiarism in higher education across nations: a case of language students. J. Appl. Res. High. Educ. (2021)
5 Effect of Machine Translation on Authorship Attribution
71
2. Sabeeh, M., Khaled, F.: Plagiarism detection methods and tools: an overview. Iraqi J. Sci. 2771–2783 (2021) 3. Pradhan, I., Mishra, S.P., Nayak, A.K.: A collation of machine translation approaches with exemplified comparison of Google and Bing translators. In: International Conference on Intelligent Computing and Communication Technologies, pp. 854–860. Springer, Singapore (2019) 4. Hedegaard, S., Simonsen, J.G.: Lost in translation: authorship attribution using frame semantics. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 65–70 (2011) 5. Caliskan, A., Greenstadt, R.: Translate once, translate twice, translate thrice and attribute: identifying authors and machine translation tools in translated text. In: 2012 IEEE Sixth International Conference on Semantic Computing, pp. 121–125 (2012). https://doi.org/10.1109/ ICSC.2012.46 6. Murauer, B., Tschuggnall, M., Specht, G.: On the influence of machine translation on language origin obfuscation (2021). arXiv preprint arXiv:2106.12830 7. Sayoud, H., Ouamour, S.: HAT-A new corpus for experimental stylometric evaluation in Arabic. In: ExLing 2021, pp. 205–208 (2021) 8. Microsoft: Translate text into a different language (2022). https://support.microsoft.com/en-us/ office/translate-text-into-a-different-language-287380e4-a56c-48a1-9977-f2dca89ce93f. Last visit in July 2022 9. Eder, M.: Does size matter? Authorship attribution, short samples, big problem. In: Digital Humanities 2010 Conference, pp. 132–135 (2010) 10. Sayoud, H., Hadjadj, H.: Authorship identification of seven Arabic religious books—a fusion approach. HDSKD J. 6(1), 137–157 (2021). ISSN 2437-069X. https://doi.org/10.5281/zenodo. 6353805 11. Ouamour, S., Sayoud, H.: A comparative survey of authorship attribution on short Arabic texts. In: International Conference on Speech and Computer, pp. 479–489. Springer, Cham (2018)
Chapter 6
Smart Hospitality: Understanding the ‘Green’ Challenges of Hotels and How IoT-Based Sustainable Development Could be the Answer Nick Kalsi, Fiona Carroll , Katarzyna Minor, and Jon Platts
Abstract As the world takes urgent action to combat climate change, going green has now become a necessity for many areas of society, particularly the hotel hospitality sector. Without a doubt, future hotel experiences will need to have minimal impact on their environments, and this involves everyone from the housekeeper to the customer to the hotel director. However, the challenge of adopting a greener approach to hospitality is not as easy as it seems. Hotel staff, customers and management need to be supported in how to enhance/change their behaviours and day-to-day tasks/activities to ensure that they lessen the negative effect of their carbon footprint. Cutting-edge smart Internet of Things (IoT) sensors are currently being used by many hotels to track and monitor hotel operations and consumption. But this is only one side of the coin. In order for hotels to effectively go green, they need full buy-in from both their staff and customers. The authors of the paper fully advocate that the IoT technology implemented needs to effectively enable the hotel staff and customers with the ability and desire to adopt and cope with the act of ‘greening’/new ‘green’ practice. This paper discusses a study that probes in detail hotel housekeepers dayto-day activities in order to gain a more in-depth insight into their green needs. In N. Kalsi (B) · F. Carroll · J. Platts Cardiff School of Technologies, Cardiff Metropolitan University, Llandaff Campus, Western Avenue, Cardiff CF5 2YB, UK e-mail: [email protected] URL: https://www.cardiffmet.ac.uk/technologies/Pages/default.aspx F. Carroll e-mail: [email protected] URL: https://www.cardiffmet.ac.uk/technologies/Pages/default.aspx J. Platts e-mail: [email protected] URL: https://www.cardiffmet.ac.uk/technologies/Pages/default.aspx K. Minor Cardiff School of Management, Cardiff Metropolitan University, Llandaff Campus, Western Avenue, Cardiff CF5 2YB, UK e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_6
73
74
N. Kalsi et al.
detail it shows that hospitality staff are willing to be part of their hotel’s green initiatives. However, it has also shown that many of their attempted green practices are unsupported and left to estimates.
6.1 Introduction The importance of having a thoroughly ‘green’ hotel has never been more apparent. Indeed, it is every hotel’s goal to reduce water, electricity and gas. But now, it is also about attracting and keeping staff as well as guests. Sustainability has been a key objective for hoteliers in recent years as tourists are increasingly concerned about the environmental issues. Green practices adopted by hotels have become critical factors for travellers when choosing and booking accommodation to stay in. This could also become the case for staff and choosing a ‘green’ workplace. Although hotels have implemented a range of green strategies in their operations, this has often been done in an off-the-shelf technology manner without a deep understanding of staff and customer’s green needs. Millar and Baloglu [1] suggested that companies need to make more informed decisions regarding spending on environmental initiatives and not ride the green wave without first understanding what their staff and customers need and want. This paper discusses a study that probes in detail hotel housekeepers day-to-day activities in order to gain a more in-depth insight into their green needs. In detail it shows that hospitality staff are willing to be part of their hotel’s green initiatives. However, it has also shown that many of their attempted green practices are unsupported and left to estimates. The following sections will discuss in detail what makes a hotel ‘green’, and it will discuss the role of IoT in this process while also paying close attention to the ‘green’ needs and desires of the staff on the ground needing to make this happen.
6.2 Insight into the Hotel Hospitality Industry Hotels are resource-intensive businesses; in order to deliver the products and services, they require large amounts of human capital, energy and monetary capital. Therefore, hotel practices have a direct impact on sustainability; thus, whole supply and value chains need to be taken into consideration as well as the lifecycle of the operations. To reflect this holistic approach, Sloan et al. [2] note the importance of the threepillar sustainability framework, whereby a hotel needs to harmonise the day-to-day operations with the environment, society and operational profitability. Central to this paper is the issue relating to green practices, as recognised and delivered by the staff within the day-to-day operations. The environmental dimension aims to minimise the negative environmental impacts of hotel consumption via monitoring [3]. Hotel green practices cut across resource use, e.g. water and energy consumption [4, 5], resource conservation, e.g.
6 Smart Hospitality: Understanding the ‘Green’ Challenges of Hotels …
75
recycling and reuse [6], impact monitoring, i.e. assessment of CO2 emissions [7] or sewage emissions [8]. Oxenswärdh [9] notes that a lack of staff training combined with high staff turnover and seasonal workforce hinder sustainability efforts by hotels. Thus, the employees’, i.e. the managers and operational staff, general awareness of environmental issues and customer pressure for environment management systems need to be the driving forces for change and adoption of green practices.
6.2.1 The ‘Green’ Landscape in the Hospitality Industry The UK hotel industry has implemented a variety of green practices in their operations, to implementing rigorous Leadership in Energy and Environmental Design (LEED) certification standards [10]. For example, the Radisson Hotel Group operate to high standards of performance and advocate socially and environmentally sustainable business practices. Its aim is to bring positive services through economic growth, environmental protection, community involvement and employment. The Hilton Hotels also have focused on sustainability by introducing its eco-room, in which 97% of its materials are recyclable. The room also features energy and water conservation technology [11]. One of the reasons for hotels to go green was governmental pressures and the desire to preserve resources by conserving waste, energy and water. Recent research finds that guests now expect environmental-friendly attributes in hotels. Berezan et al. [12] further elaborated that because of this expectation, it is important for hotels to maintain environmental initiatives. A study by Millar and Baloglu [1] added occupancy sensors, key cards that turn power on and off when the room is vacant attempts to achieve this. This study [1] found that customers evaluated the reuse of linens and towels as a basic attribute that they expected from hotels. Conversely, the study also showed that if a hotel used clean and renewable energy sources, customers considered it as a ‘plus’ attribute. This means gaining and keeping hotel guests that have a positive attitude towards hotels implementing a wide spectrum of green practices. Guests preferences are changing to include more eco-friendly products and services when they choose a hotel room for their trip [10]. For example, 79 per cent of travellers are interested in going green when they choose a hotel. According to an eco-friendly survey by TripAdvisor [10], 57 per cent of travellers often make eco-friendly decisions when they choose accommodations. Meanwhile, research has shown that hotel guests are willing to pay more to stay at a green hotel because this can help them believe that they are saving the environment for future generations [10]. As mentioned, many hotels are involved in eco-sustainability environments and encourage guests to participate in their green practices, such as reusing towels. However in some cases, the hotels do not explain what their green practices are. To deliver a green image to hotel guests, some green hotels use terms such as ‘eco-’, ‘environmentally friendly’, ‘green’ or ‘sustainability’ without giving clear definitions, so these terms may not make the green practices clear to guests [10]. Moreover, the over-focus on the customer/guest and the ‘green’ image also tends to overlook
76
N. Kalsi et al.
the importance of getting ‘green’ buy-in from the hotel staff. As Lim et al. [13], p. 1, note: ‘the reform is difficult for green hotels because of traditional customs, consumers’ psychology, as well as the management concept’.
6.3 The Internet of Things (IoT) and Sustainable Development The Internet of Things (IoT) is a network for devices to connect and communicate information using physical devices embedded with sensors, software, cloud computing and other technologies [14]. So much so, IoT ‘makes a significant contribution to development in economical, social and ecological terms’ [15], p. 1. Indeed, as people need to take more care of their environments, IoT has become the epitome of sustainable development [16]. By hosting different technologies, IoT has spearheaded the development of smart city systems for sustainable living, increased comfort and productivity for citizens [17]. Importantly, IoT is also at the heart of the ‘green’ hotel initiatives. As a result of climate change and high competition within tourism, enhancing sustainability through energy savings is a priority for many hotels [18]. Moreover, IoT has begun its influence to seek customer’s satisfaction, cost saving and business profit [19]. The hotel industry has realised the potential impact of IoT to achieve these and further ‘green’ objectives. IoT can also play a major role in improving the hotel ‘green’ experience by providing customised services and tools to customers and staff. In their paper, Sharma and Gupta [14] note the role of IoT in the hotel industry to improve the guest experience by providing customised services. In another paper, Basana et al. [20] show that the Internet of Things (IoT) has a significant impact on green hotel operation. Moreover, another body of research, Moyeenudin et al. [21] highlight how recent innovations through Internet of Things (IoT) have been designed in such a manner that hotel administrations are developed using these advanced technology for guest satisfaction and for better occupancy [21]. This demonstrates how IoT has already arrived in hotels and is currently being pushed forward for green initiatives. The following section will explore the three pillars of sustainability that are required to ensure this.
6 Smart Hospitality: Understanding the ‘Green’ Challenges of Hotels …
77
6.4 Study 6.4.1 Introduction This study forms a part of a larger study aiming to understand the daily tasks of housekeeping managers working in luxury hotels located in central London. In detail, its objective is to get a glimpse into the housekeeping department and how employees perceive and understand their everyday activities.
6.4.2 Methods This study takes a qualitative research approach to describe and explain how the housekeeping staff interprets and manages their daily practices, particularly within a green context. The questions have been designed around the three pillars of sustainability: social, economic and environmental. The questions are open-ended allowing those partaking to show their true feelings and actions without direction. The interview questions consisted of ten general and thirty-five detailed questions. All interviews transcripts were audio recorded.
6.4.3 Procedure All participants were given information about the study’s purpose and were asked their permission to be recorded digitally. They were also informed that no identifiable personal information will be asked for or stored. Each interview lasted between 30 and 60 min, and to guarantee a consistent process, questions were asked in the same order.
6.4.4 Participants The four participants in this study were directly responsible for the housekeeping departments of each hotel. The participants have been categorised by a P number from P1 to P4; Roles of person; 2 × head housekeepers, 1 × deputy housekeeper and 1 × senior housekeeper. All participants were female and ranged in their current position between 8 and 26 years. They were between the ages of 26 and 55 years old. All participants’ primary language was a European language. This study possessed working backgrounds in hospitality. Three of the participants were identified as
78
N. Kalsi et al.
extremely confident in using technology; one was highlighted as a moderately confident user. Three of the participants had a secondary school certificates but no degrees. One of them had some certificates but no degree.
6.4.5 Results Two of the participants interviewed have been working with the organisation for a long time; both had started in junior positions as housekeepers and worked their way up into management over the years [P1] [P2]. Participant 1 mentioned that dedication was one of her department’s most important points in looking after the staff. Participant 3 and participant 4 both served the company for 8 years; however, participant 3 seemed a little nervous and was keen to reply quickly with short answers. Interestingly, all participants highlighted staff, and people were the most important parts of a hotel. Participant 1 discussed the company’s policy to have 100% guest satisfaction, a bold statement used to market the hotel brand showing the hotel cares for the guest. Participant 2 mentioned that if the staff and the guest were happy, she was doing her job well. By analysing the data, it was evident that all participants understood staff and guests were important. From a social perspective, it was clear that each participant was kind to their co-workers and devoted to their employment, regardless of how long they had worked at the hotel. A major issue was the lack of workers to clean guest rooms. Although it was not part of their duties, several cleaning staff members were asked to clean additional rooms. In some cases, some staff were not happy about the extra work. For some hotel visitors who had trouble with their English, speaking multiple languages and having a translation given in their native tongue was useful. Additionally, consumers familiar with hotel management procedures are more likely to remain loyal than those not [22]. Social behaviours increase consumer satisfaction levels as well. It also was apparent from the data that the housekeepers interact socially by connecting with other team members via IT technology. Even though when faced with difficulties, personnel felt more socially connected when utilising their smartphones. To minimise time lost, smart gadgets were employed to report errors. It is clear that understanding and compassion for each hotel’s cleaning staff is crucial for hotel management and everyday operations. From an environmental perspective, participants were asked: Are you aware of any green policies in your department or at the hotel? All participants knew the company had green policies. Some talked about changing towels. For example, if guests do not want a towel changed, they leave it hanging on the hook. In detail, Participant 1 and participant 2 discussed shampoo bottles being rescued and the use of plastic cups instead of glass cups. Also, the linen was noted to being changed every two days, which was part of the green policy. Participant 3 mentioned that they have a green card in each room, showing the company has green policies. It was recorded that if a guest did not want their linen changed, the participants said that they make a note in the log book, then each month, donate £1.50 per linen saving to charities
6 Smart Hospitality: Understanding the ‘Green’ Challenges of Hotels …
79
Fig. 6.1 Word cloud representation of word frequencies from each participant’s answer to the green policy question
the hotel supports. Moreover, it was mentioned that the sinks and showers have flow regulators on taps and showers which helps to keep control on the water. Interestingly, according to the keywords generated from the data of this ‘green policy’ answer, all participants had different outcomes. • Participant 1 words such as ‘guest’, ‘sometimes’, ‘water’, ‘use’, ‘log’, ‘green’, ‘room’, ‘policies’ and ‘heating’. • Participant 2 words such as ‘linen’, ‘happy’, ‘lights’, ‘department’, ‘green’, ‘company’ and ‘waste’. • Participant 3 words such as ‘technology’, ‘cleaning’, ‘maintenance’, ‘towels’, ‘job’, ‘think’ and ‘water’. • Participant 4 words such as ‘water’, ‘job’, ‘guestroom’, ‘management company’, ‘blue cards’ and ‘people’. Furthermore, the combined four transcripts (see Fig. 6.1) show that each participant has their working ethics, but the understanding of the green policies differs. Specifically, when participants were asked about the quantity or duration of water used to clean the guest rooms. Participant 2 said, ‘I assume 3–4 L of water; it runs for 3–4 min’ this was an estimate. However, all participants found this issue challenging, yet they were worried about water consumption. Furthermore, all participants were prompted by the question to consider whether the guest would like to have information on their impact from a water, heating and cooling viewpoint. Participant 1 was eager to make an optimistic assumption, saying that the visitor would undoubtedly be intrigued. In fact, they were all eager to spread this knowledge. • Most defiantly, the guest would be interested in their interest. Also, this will enhance the hotel brand and possibly help global warming. [P1] • That is an excellent idea; this will help the guest to conserve water. Is that possible? [P2] • This is up to the management, yes I would think it is a good idea. I do not believe we have an idea of how much water we are using. [P3] • I am not sure if that is a good idea. Is it something expensive to have installed? [P4]
80
N. Kalsi et al.
6.4.6 Discussion and Conclusion From this study, it is clear that everyone who participated in the study was clearly committed to their work and concerned about the environment. However, in reality, there is a shortage of housekeepers on a worldwide scale. Hiring has been extremely difficult in some circumstances. Most agency employees are not conversant with hotel green policies, including cleaning procedures. When there is not enough staff, it becomes necessary to restrict the number of services or guest rooms. Since agency staff are hired on short notice, the housekeeping department may have to cope with implications such as not being fully trained and/or aware of hotel green initiatives, including cleaning practices. In addition, it is evident that most of the staff guessed when it came to environmental issues. For instance, staff did not know how much water was used when cleaning rooms, the types of electrical appliances used or the internal laundrette machines used for water soap drying mechanisms. In summary, the data from all participant’s answers clearly highlights the word guest as being the most important to them. The words water, use, room, happy, company and green also demonstrated that all participants were keen to adopt the green policy but as the data also highlights found challenges.
6.4.7 Reflection on the Benefits of IoT Technology for Green Hotel Practices Hotels going ‘green’ is not just a trend, but something many guests highlight and are concerned about. Guests tend to have a contradictory view of green hotel practices; they want to participate in green practices but are unlikely to give up luxury and suitability while staying in hotels [23]. However, some guests will pay extra to support the hotel’s environmental protection practices. This paper is interested in the hotel ‘green’ narrative from the housekeeping staff perspective. And in particular how IoT can support housekeeping staff to improve their green practices. It is the authors’ opinion that IoT technologies have numerous benefits for hotels, especially in encouraging green housekeeping practices. As discussed, IoT can support the hotel staff to integrate green practices into their hotel operations, pushing environmental conservation and sustainability. In detail, they can be used to manage heat and cooling of rooms, and in doing so, they improve sustainability, primarily through energy efficiency and cost savings. Other areas include: water conservation, smart linen management, energy usage such as lighting and carbon footprint monitoring and reduction. The challenge now lies in the design of the IoT infrastructure to ensure that all three pillars of sustainability are being addressed and are working in sync with the hotel housekeeper’s ‘green’ wants and needs.
6 Smart Hospitality: Understanding the ‘Green’ Challenges of Hotels …
81
6.5 Conclusion Hospitality is a critical sector for socioeconomic advancement. Indeed, it sustains other sectors, such as tourism which is an essential social and economic pillar in multiple regions worldwide. These factors along with environmental indicate how essential it is to ensure the long-term sustainability of the hospitality sector. Interestingly, from a hotel housekeeping perspective, the staff are interested in being ‘green’, and they want to know how ‘green’ their practices are? Therefore, it is clear that hotels should not just focus on the customer but also make further efforts to communicate to staff their commitment towards sustainability. In addition, hotels need to ensure that their staff are supported around the three pillars of sustainability: social, economic and environmental [24]. As we have seen from the study, there are several areas of the housekeeping work practice where they need to make assumptions based on their practice (e.g. they state that they have used ‘X’ amount of water without actually having any evidence that this is the case, etc.). For example, we have seen estimates around water and energy usage as well as towels and bathrobe usage. The IoT technology now exists to provide housekeepers with the exact usage figures to support their ‘greenness’. The challenge lies in how this technology is designed and implemented as ‘off-the-shelf’ solutions do not always meet the staff’s precise needs.
6.5.1 Future Work Planning for the future IoT-driven ‘green’ hotels, it is important to take all hotel stakeholder’s perspectives into account. This paper takes the first step by actively listening to the hotel housekeeper and their needs. It is envisioned that the next steps will be a more complete study that will further inform what aspects of the sustainability pillars that housekeeping will need to be focused on. In doing so, it provides rich possibilities for IoT solutions to enhance the green housekeeping practices. At the end of the day, nothing deters hotel guests more than poor housekeeping. In hotels, the room is at the heart of any sustainable practice, so ‘green’ housekeeping needs to be a top concern.
References 1. Millar, M., Baloglu, S.: Hotel guests’ preferences for green guest room attributes. Cornell Hosp. Quart. 52, 302–311 (2011). https://doi.org/10.1177/1938965511409031 2. Legrand, W., Sloan, P., Chen, J.S.: Sustainability in the Hospitality Industry, 3 edn. Routledge, London (2016). https://www.routledge.com/Sustainability-in-the-Hospitality-Industry-Princi ples-of-sustainable-operations/Legrand-Sloan-Chen/p/book/9781138915367 3. Vachon, S.: Green supply chain practices and the selection of environmental technologies. Int. J. Prod. Res. 45(18–19), 4357–4379 (2007). https://doi.org/10.1080/00207540701440303
82
N. Kalsi et al.
4. Bohdanowicz, P., Martinac, I.: Determinants and benchmarking of resource consumption in hotels—case study of Hilton international and Scandic in Europe. Energy Build. 39, 82–95 (2007). https://doi.org/10.1016/j.enbuild.2006.05.005 5. Bruns-Smith, A., Choy, V., Chong, H., Verma, R.: Environmental sustainability in the hospitality industry: best practices, guest participation, and customer satisfaction. https://hdl.handle. net/1813/71174 (2015). Accessed 03 Dec 2022 6. Dimara, E., Manganari, E., Skuras, D.: Don’t change my towels please: factors influencing participation in towel reuse programs. Tour. Manag. 59, 425–437 (2017). https://doi.org/10. 1016/j.tourman.2016.09.003 7. Gössling, S.: Global environmental consequences of tourism. Global Environ. Change 12(4), 283–302 (2002). https://doi.org/10.1016/S0959-3780(02)00044-4 8. Han, H., Hyun, S.S.: What influences water conservation and towel reuse practices of hotel guests? Tour. Manag. 64, 87–97 (2018) 9. Oxenswärdh, A.: Sustainability practice at hotels on the Island of Gotland in Sweden: an exploratory study. Eur. J. Tour. Hosp. Recreat. 10, 203–212 (2020). https://doi.org/10.2478/ ejthr-2020-0018 10. Lee, H., Jai, T.M.C., Li, X.: Guests’ perceptions of green hotel practices and management responses on tripadvisor. J. Hosp. Tour. Technol. 7, 182–199 (2016). https://doi.org/10.1108/ JHTT-10-2015-0038 11. Xu, X., Gursoy, D.: Influence of sustainable hospitality supply chain management on customers’ attitudes and behaviors. Int. J. Hosp. Manag. 49, 3 (2015). https://doi.org/10.1016/j.ijhm.2015. 06.003 12. Berezan, O., Millar, M., Raab, C.: Sustainable hotel practices and guest satisfaction levels. Int. J. Hosp. Tour. Admin. 15(1), 1–18 (2014). https://doi.org/10.1080/15256480.2014.872884 13. Lim, J.M., Madhoun, W.A., Yee, C.M., Nair, G., Siong, W.S., Isiyaka, H.A.: Factors influencing customer intention to stay in green hotel in Malaysia. 228, 012022 (2019). https://doi.org/10. 1088/1755-1315/228/1/012022 14. Sharma, U., Gupta, D.: Analyzing the Applications of Internet of Things in Hotel Industry. J. Phys. Conf. Ser. 1969, 012041 (2021). https://doi.org/10.1088/1742-6596/1969/1/012041 15. Lopez-Vargas, A., Fuentes, M., Vivar, M.: Challenges and opportunities of the internet of things for global development to achieve the United Nations sustainable development goals. IEEE Access 8, 472 (2020). https://doi.org/10.1109/ACCESS.2020.2975472 16. Haque, A.K., Tasmin, S.: Security threats and research challenges of IoT: a review. J. Eng. Adv. 1, 8 (2020) 17. Syed, A.S., Sierra-Sosa, D., Kumar, A., Elmaghraby, A.: Iot in smart cities: a survey of technologies, practices and challenges. Smart Cities 4, 24 (2021). https://doi.org/10.3390/smartciti es4020024 18. Eskerod, P., Hollensen, S., Morales-Contreras, M.F., Arteaga-Ortiz, J.: Drivers for pursuing sustainability through IoT technology within high-end hotels-an exploratory study. Sustainability 11, 5372 (2019). https://doi.org/10.3390/su11195372 19. Verma, A., Shukla, V.K., Sharma, R.: Convergence of IoT in Tourism Industry: A Pragmatic Analysis. J. Phys. Conf. Ser. 1714, 012037 (2021). https://doi.org/10.1088/1742-6596/1714/1/ 012037 20. Basana, S.R., Tarigan, Z.J.H., Suprapto, W., Andreani, F.: The effects of internet of things, strategic green purchasing and green operation on green employee behavior: evidence from hotel industry. Manag. Sci. Lett. 11, 6 (2021). https://doi.org/10.5267/j.msl.2021.4.006 21. Moyeenudin, H.M., Bindu, G., Anandan, R.: OTA and IoT influence the room occupancy of a hotel (2021). https://doi.org/10.1007/978-981-16-2934-1_17 22. Olya, H., Altinay, L., Farmaki, A., Kenebayeva, A., Gursoy, D.: Hotels’ sustainability practices and guests’ familiarity, attitudes and behaviours. J. Sustain. Tour. 29, 5622 (2021). https://doi. org/10.1080/09669582.2020.1775622
6 Smart Hospitality: Understanding the ‘Green’ Challenges of Hotels …
83
23. Chathoth, P.K., Mak, B., Jauhari, V., Manaktola, K.: Employees’ perceptions of organizational trust and service climate: a structural model combining their effects on employee satisfaction. J. Hosp. Tour. Res. 31, 338–357 (2007). https://doi.org/10.1177/1096348007299922 24. FutureLearn: The four pillars of sustainability (2017). https://www.futurelearn.com/info/cou rses/sustainable-business/0/steps/78337
Chapter 7
A Novel Knowledge Distillation Technique for Colonoscopy and Medical Image Segmentation Indrajit Kar , Sudipta Mukhopadhyay , Rishabh Balaiwar , and Tanmay Khule
Abstract Since medical equipment has limited resources and little computational capacity, large segmentation models cannot be installed on them. Colonoscopy equipment, which has little computer power for deep learning models, is one such example where large segmentation models cannot be embedded. The solution to this problem is to perform model compression on cutting-edge models that have demonstrated outstanding diagnostic and prediction capabilities while performing segmentation. Knowledge distillation of large medical image segmentation models is the main focus of our study. However, unlike other knowledge distillation papers, we find the best student and teacher pair with varied network architecture. Even though the network architectures of our student and teacher pairs are different, the pairs still achieve better results in terms of segmentation score, pixel accuracy intersection-over-union (IoU, Jaccard Index), and dice coefficient (F1 Score). Overfitting is one of the main issues with this methodology, which we were able to minimize through depth-wise model pruning and hyperparameter tuning. Finally, we achieved the Xception-Efficientb0 pair as knowledge distillation which can outperform state-of-the-art models’ values of performance metrics on Kvasir-SEG and CVC-ClinicDB datasets.
I. Kar · S. Mukhopadhyay (B) · R. Balaiwar · T. Khule Siemens Technology and Services Private Limited, Bengaluru, India e-mail: [email protected] I. Kar e-mail: [email protected] R. Balaiwar e-mail: [email protected] T. Khule e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_7
85
86
I. Kar et al.
7.1 Introduction Many deep learning technologies or algorithms are being explored for a wide range of applications as computer systems advance. Medical image segmentation is critical in many applications of computer-aided diagnosis because it allows for faster disease detection and treatment [1]. Medical image segmentation, which can be done manually, semi-automatically, or entirely automatically, is the process of extracting the organ from a medical image (2D or 3D) [2]. Millions of people are estimated to die each year from the second-deadliest colorectal cancer (CRC). As with any cancer, the longer treatment is delayed, the greater the risk [3]. Only a thorough examination of the colon and rectum can reveal polyps or cancer. Two important methods for detecting polyps are sigmoidoscopy and colonoscopy [4]. Even though colonoscopy is widely used and recommended for polyp detection, several studies have shown that polyps are frequently overlooked, with polyp miss rates ranging from 18 to 26% which depends on the type as well as the size of the polyps [5]. Polyp segmentation using existing approaches is an intelligent yet difficult task because neoplastic lesions example adenomas [6] can be discovered and extracted before developing into cancer, lowering the morbidity and mortality of CRC. Despite colonoscopy’s success in reducing cancer burden, some issues are listed below [7]. By improving accuracy, and precision, and minimizing manual involvement, a Computer-Aided Diagnosis (CADx) [8] system for polyp segmentation can help with monitoring and improving the diagnostic abilities and these can be very well achieved by artificial intelligence however both training and inference are challenging because. • Deep neural networks depend upon millions or even billions of parameters. Training these networks for image segmentation is time-consuming and takes high computational power. • To generate an immediate medical report, an edge AI-based diagnostics system that can perform real-time inference using resource-constrained sigmoidoscopy and colonoscopy machines would be required. Running AI workload on these machines is difficult [9], especially for segmentation workload [10–13]. Because segmentation models are complex and timeconsuming. Many of the best-performing semantic segmentation models are inappropriate for use on onboard platforms, where computational resources are limited and low-latency operation is critical. The objective of the research is to develop compressed deep neural network models with the following characteristics. • Small and Robustness: the model’s capacity to consistently perform better on both simple and complicated images will small network architecture. • Generalization: the ability of a model trained on a particular procedure at a particular dataset to generalize to other polyp segmentation datasets.
7 A Novel Knowledge Distillation Technique for Colonoscopy …
87
• Computationally inexpensive: uses less computation power in terms of getting better results will fewer epochs. To leverage an effective semantic segmentation model compression technique for medical images and videos, the authors use a compression technique called knowledge distillation. The main aim of our paper is to have a different teacher and student network for knowledge distillation. The data evaluation, in-depth examination of the best and worst scenarios, and comparison of the suggested approach with other existing approaches are additional substantial experiments that the authors have carried out. The literature review discusses existing polyp segmentation in general; the methodology section describes how the proposed algorithm is implemented; the results section summarizes the observations and compares the efficiency of the proposed algorithm with SOTA techniques using performance metrics, and the conclusion section discusses the algorithm’s scope and applications.
7.2 Literature Review There is a collection of studies that attempts to apply deep learning to colonoscopy [14, 15], and they all capture various state-of-the-art segmentation models. The challenge of smaller, faster, and more robust models during training and inference remains. We observed numerous research that performed knowledge distillation on medical images such as kidney, X-ray, MRI, and CT. Few initiatives have been taken to distill knowledge from semantic segmentation of the Kvasir-SEG dataset. The similarity we found was in [16–18]. The authors of the papers propose a comprehensive student– teacher learning setup in which multiclass student networks are penalized using the class probabilities of a single-class teacher model trained on a larger dataset. In terms of mean average precision (mAP), their model performs better on the Kvasir-SEG dataset and the endoscopic disease detection (EDD2020) challenge dataset. However, they only detect bounding boxes using a faster RCNN algorithm. In the paper [17] XP-Net, this research proposes a hierarchical adversarial knowledge distillation method employing a deep learning network called “XP-Net” with an Effective Pyramidal Squeeze Attention (EPSA) module. The student network gains “complementary knowledge,” which helps to enhance network efficiency. By recording multi-scale spatial information of objects at a detailed level with longrange channel dependency, the compact EPSA block improves the current network design. In the paper [18], they used the knowledge distillation methodology to enhance ResUNet++, which automatically segmented polyps. According to their experiment, the KD-ResUNet++ model performs better than ResUNet++ using the Jaccard index, dice similarity coefficient, as well as recall. On the challenge’s official test dataset,
88
I. Kar et al.
their top models obtained the Jaccard index, dice similarity coefficient, also FPS values of 0.6196, 0.7089, and 107.8797, respectively. Unlike our paper, they have all performed KD on one model. Knowledge distillation can be approximately divided into three categories based on the number of teachers: knowledge distillation techniques include single-teacher, multiple-teacher, co-distillation, and none (commonly known as self-distillation). A single teacher is the simplest method of knowledge distillation. One teacher transmits knowledge to each student. Multi-teacher co-distillation is a student model that brings knowledge of various pre-trained teachers. It can be used to address data privacy and the non-sharing of multiple number data sources. Occasionally, multi-t can also be used for teacher KD to advance the performance of the student model. Our focus is on knowledge distillation from a single teacher, but our student models have distinct network architectures. The greatest difficulty in conducting this experimental study is overfitting and insufficient learning. We find the most compatible pair of teacher and student architectures.
7.3 Methodology 7.3.1 Dataset • The Kvasir-SEG dataset [19], a publicly available dataset contains 1000 polyp images with ground truth masks. The images present are of the size between 332 * 487 pixels to 1920 * 1072 pixels. For the task, a total of 800 images were randomly selected from the pool of the initial 1000 images for the current study and preserved as the training dataset, while the remaining 200 images were assigned as testing, and 20 images were selected from the testing dataset for the model construction. • CVC-ClinicDB [20] is an open-access dataset including 612 images from 31 colonoscopy sequences with a resolution of 384,288. It is employed in medical image segmentation, specifically polyp detection in colonoscopy videos.
7.3.2 Image Preprocessing To have more understanding of the data few augmentation techniques are applied to the dataset. Horizontal and vertical Flip are the augmentations that are applied on only the training dataset while resize is one of the augmentations that is utilized by both training and valid datasets. A technique called Test-Time Augmentation (TTA) [21] involves making appropriate adjustments to the test dataset as well to boost cumulative prediction performance. Each test image in TTA receives augmentation, and several enhanced images are produced. The average prediction of every augmented
7 A Novel Knowledge Distillation Technique for Colonoscopy …
89
image is then used as the final output prediction. Next, we make predictions on these enhanced images. Both horizontal as well as vertical flip augmentation was used for TTA in this study.
7.3.3 Base Neural Network Architectures • UNet: State-of-the-art models for image segmentation like UNet [22], have encoder-decoder architecture. These segmentation encoder-decoder networks have one thing in common: skip connections. The deep-grained feature maps from UNets’s decoder network are combined with the low-level fine-grained data from the encoder network in these skip connections, preserving all of the target object’s fine details. This produces segmentation masks that can distinguish small elements on intricate backgrounds. • UNet++: UNet is the tried-and-true architecture for medical segmentation [23]. It is made up of an encoder sub-network or backbone and a decoder sub-network. UNet++ is the skip connections between both the two sub-networks and the usage of deep supervision makes it distinct from UNet. Through newly constructed skip pathways, the encoder, as well as decoder sub-networks, communicate with one another. In UNet, the encoder’s feature maps go straight to the decoder. However, in UNet++, they pass through a thick convolution block whose layer count is determined by the pyramid level.
7.3.4 Knowledge Distillation with Pairs of Different Students and Teacher Networks A strategy known as knowledge distillation involves knowledge transfer from a robust but heavy network to a lightweight model to improve the performance of the latter without diminishing its efficacy. To improve the performance of the student network, it attempts to extract knowledge from a network of teachers who have received extensive training. The logits of the final convolution layer are the only data that can be transferred using the original distillation techniques. However, in this paper, we have used different architecture for student networks and different architecture for student networks, unlike most of the papers where the architecture remains the same (Fig. 7.1). Let’s understand the distillation architecture’s pipeline. The diagram depicts the two primary components of the holistic distillation framework. The Prediction Map distillation unit then attempts to assist the student network to imitate the output of the teacher’s final layer to rapidly develop segmentation capability. The segmentation task loss must be included last to achieve a fundamental performance consistent with the input domain. The student network can perform its own segmentation tasks while simultaneously acquiring knowledge from the teacher.
90
I. Kar et al.
Fig. 7.1 Knowledge distillation architecture of the proposed model
Figure 7.1 depicts the distillation architecture we developed, including the pipeline. The technique receives a Width*Height grayscale polyp image as input and returns a result of the same dimensions as the input for segmentation. The importance map is responsible for conveying interim signals by building importance maps and region affinity maps, respectively, from left to right. Then, to rapidly acquire segmentation competence, the Prediction Map Distillation module pushes the student network to replicate the last layer’s output of the instructor network. In the end, the segmentation job loss must be added to guarantee a baseline performance in line with the input domain. With the help of this structure, the student network may do the necessary segmentation and distillation of knowledge by itself. The teacher network’s feature maps can be encoded into a form that can be transformed using the importance maps. In particular, the arbitrary feature maps of student network m s of dimensions channels , widths , and heights are computed using the teacher maps m t of dimensions channelt , widtht , and heightt in the same general area of the teacher network. Then the student feature map is implemented for a rescaling step to compel both the feature maps must be conformed in same spatial scale. Mathematically, the above step can be formulated as m s = f (m s ); m s ∈ Rchannels ∗widtht ∗heightt
7 A Novel Knowledge Distillation Technique for Colonoscopy …
91
The use of the scaling technique f (·) is dependent on the relationship between the spatial sizes of m s and m t . By calculating the difference of the final layer of teacher network, or the output logits, using a function like cross entropy or Kullback–Leibler divergence, the core methodology of knowledge distillation aims to encourage the student network to learn from the teacher network. Prediction Map Distillation unit is constructed with the purpose of explicitly teaching the student network how to predict using the teacher network’s output segmentation map. Here, we consider the segmentation map to be a set of classification tasks at the pixel level. We specifically determine a loss value for every pair of pixels in the two networks that are in the same spatial location, then aggregate these values to form the distillation loss for this unit. The loss function shown given as LossPredMap =
1 student teacher KL pk pk N i∈N
where the number of pixels belonging to the segmentation map is denoted by N = X * Y, Kullback–Leibler divergence function is indicated by KL(·). Probabilities of kth pixel in the segmentation map which is obtained from the student as well as teacher model is represented by pkstudent and pkteacher , respectively.
7.3.5 Experimental Setup 7.3.5.1
Encoder-Decoder Models
Inspired by Inception [24], Xception [25] replaced the Inception modules with depthwise separable convolutions, which enables it to work better than Inception. Inception has been used for image segmentation before, and the results were decent. With 22 M parameters, Xception is a heavy network. Because of this, it’s a wise concept for a teacher model for knowledge distillation. We put this teacher with a few different students, and out of all of them, EfficientNet-b0 [26] did the best. The base EfficientNet-B0 network is constructed of squeeze-and-excitation blocks and inverted bottleneck residual blocks from MobileNetV. The Xception model was trained for 8 epochs with ImageNet weights and then stopped early. The model’s final validation loss was 0.0308, and its average IoU score was 0.9459. For data that hasn’t been seen (the testing set) the score for accuracy was 0.0309, and the score for IoU was 0.9474. Then, we trained the student, i.e., EfficientNet-b0, using this Xception model. For 150 epochs, the student model was trained. The model’s final validation loss (KD loss) was 0.5719, and the average IoU score was 0.9551. For data that had never been seen before (the testing set), the accuracy was 0.3518 and the IoU score was 0.9562. In terms of accuracy, the student model is better than the teacher.
92
I. Kar et al.
While training the architectures, we performed pruning and hyper-parameters turning to the network. All the models and processes were trained by Adam optimizer, a learning rate 0.0001 with early stopping. The loss function used was a combination of distillation loss: KL divergence is used between teacher’s soft outputs and student’s soft outputs, and student’s dice loss. The temperature and alpha were set to (5) and (0.9), respectively. Also, early stopping was not used for training the student. In addition to this some augmentation techniques like horizontal flip and vertical flip with p value as (0.5) were also applied to enhance the data. When provided as input, the images were resized to (512 * 608) size.
7.3.5.2
Evaluation Metrics
The Dice Coefficient is frequently used for assessment in medical imaging segmentation difficulties. The dice score indicated in our experiment refers to the uniform dice coefficient for each example, which is important for the segmentation task’s applicability and workability. The following is the formula for dice coefficient: Dice(A,B) =
2|A ∩ B| |A| + |B|
where A and B stand for the mask’s prediction and ground truth, respectively. We have used mean IoU which is a common evaluation metric for image segmentation. IOU the value ranging from to 1 is the overlap between predicted and ground truth bounding box IoU( A,B) =
|A ∩ B| |A ∪ B|
where A and B are the ground truth box as well as the prediction box. So, to calculate mean IOU score, it can be done by taking the IoU of each class and averaging them. The Pixel Loss is principally a measure of how far the target image’s pixels diverged from the expected image’s pixels image. The function is useful for understanding pixel-level interpolation.
7.4 Results and Discussion We initially used our distillation architecture on a variety of networks made up of teacher along with student pairs. In terms of best results, we used Xception (Teacher) and EfficientNetB4 (Student) with UNet and UNet++ as backbone network. We observed that every student network outperformed itself by learning from every teacher network.
7 A Novel Knowledge Distillation Technique for Colonoscopy …
93
The performance metrics obtained from teacher networks and student networks are listed below (Figs. 7.2 and 7.3). popular benchmark datasets for polyp segmentation, such as Kvasir-SEG and CVC-Clinic DB, have been the subject of extensive testing. The backgrounds as the purple region on the pixel-level segmentation maps and the target as the yellow region.
7.5 Conclusion In this study, we provide an unique distillation technique adapted to the problem of polyp segmentation. Our student and teacher pair performers better than regular UNet and UNet++ pair. We mitigate the challenge of overfitting by reducing the student network depth wise and keeping a different student architecture from teacher architecture. This opens new avenues in knowledge distillation. Finally, we provide a benchmark for knowledge distillation on Kvasir-SEG as well as CVC-ClinicDB datasets.
7.6 Future Work Because trials with real data are typically quite time-consuming, sometimes taking days to accomplish a single run, many various modifications, tests, and investigations have been put off until later. We plan to continue our research into the mechanisms behind knowledge distillation in the future, with an eye toward developing methods for using a predicted teacher mask as a ground truth mask for a student model via co-distillation.
Fig. 7.2 Comparative study of performance metrics from our proposed network
94 I. Kar et al.
Fig. 7.3 Results from a Kvasir-seg dataset b CVC-clinic DB dataset c Comparison of performance metrics of best model for two datasets
7 A Novel Knowledge Distillation Technique for Colonoscopy … 95
96
I. Kar et al.
References 1. Wang, R., et al.: Medical image segmentation using deep learning: a survey. IET Image Process. 16(5), 1243–1267 (2022) 2. Roth, H.R., Oda, H., Zhou, X., Shimizu, N., Yang, Y., Hayashi, Y., et al.: An application of cascaded 3D fully convolutional networks for medical image segmentation. Comput. Med. Imag. Graph. 66, 90–99 (2018) 3. Masud, M., et al.: A machine learning approach to diagnosing lung and colon cancer using a deep learning-based classification framework. Sensors 21(3), 748 (2021) 4. Brenner, H., Stock, C., Hoffmeister, M.: Effect of screening sigmoidoscopy and screening colonoscopy on colorectal cancer incidence and mortality: systematic review and meta-analysis of randomised controlled trials and observational studies. BMJ 128, 348 (2014) 5. Asano, T.K., McLeod, R.S.: Dietary fibre for the prevention of colorectal adenomas and carcinomas. Cochrane Database Syst. Rev. 1, CD003430 (2002) 6. Sivananthan, A., Glover, B., Ayaru, L., Patel, K., Darzi, A., Patel, N.: The evolution of lower gastrointestinal endoscopy: where are we now? Therap. Adv. Gastrointest. Endosc. 13, 2631774520979591 (2020) 7. Eu, C.Y., Tang, T.B., Lin, C.H., Lee, L.H., Lu, C.K.: Automatic polyp segmentation in colonoscopy images using a modified deep convolutional encoder-decoder architecture. Sensors 21(16), 5630 (2021) 8. Brandao, P.: Enhancing endoscopic navigation and polyp detection using artificial intelligence (Doctoral dissertation, UCL (University College London)) (2021) 9. Zhang, J., Tao, D.: Empowering things with intelligence: a survey of the progress, challenges, and opportunities in artificial intelligence of things. IEEE Internet Things J. 8(10), 7789–7817 (2020) 10. Feng, J., Li, S., Li, X., Wu, F., Tian, Q., Yang, M.H., Ling, H.: Taplab: a fast framework for semantic video segmentation tapping into compressed-domain knowledge. IEEE Trans. Pattern Anal. Mach. Intell. 44(3), 1591–1603 (2020) 11. Wang, W., Zhou, T., Porikli, F., Crandall, D., Van Gool, L.: A survey on deep learning techniques for video segmentation. arXiv preprint arXiv:2107.01153 (2021) 12. Xie, J., Shuai, B., Hu, J.F., Lin, J., Zheng, W.S.: Improving fast segmentation with teacherstudent learning. arXiv preprint arXiv:1810.08476 (2018) 13. Li, G., Yun, I., Kim, J., Kim, J.: Dabnet: depth-wise asymmetric bottleneck for real-time semantic segmentation. arXiv preprint arXiv:1907.11357 (2019) 14. Sanchez-Peralta, L.F., Bote-Curiel, L., Picon, A., Sanchez-Margallo, F.M., Pagador, J.B.: Deep learning to find colorectal polyps in colonoscopy: a systematic literature review. Artif. Intell. Med. 108, 101923 (2020) 15. Kayes, M.I.: A lightweight and robust convolutional neural network for carcinogenic polyp identification (Doctoral dissertation, University of Science and Technology) (2021) 16. Chavarrias-Solano, P.E., Teevno, M.A., Ochoa-Ruiz, G., Ali, S.: Knowledge distillation with a class-aware loss for endoscopic disease detection. In: MICCAI Workshop on Cancer Prevention Through Early Detection, pp. 67–76. Springer, Cham (2022) 17. Sivaprakasam, M.: XP-Net: An Attention Segmentation Network by Dual Teacher Hierarchical Knowledge Distillation for Polyp Generalization (2022) 18. Kang, J., Gwak, J.: KD-ResUNet++: automatic polyp segmentation via self-knowledge distillation. In: MediaEval (2020) 19. Jha, D., Smedsrud, P.H., Riegler, M.A., Halvorsen, P., Lange, T.D., Johansen, D., Johansen, H.D.: Kvasir-seg: a segmented polyp dataset. In: International Conference on Multimedia Modeling, pp. 451–462. Springer, Cham (2020) 20. Chlap, P., et al.: A review of medical image data augmentation techniques for deep learning applications. J. Med. Imag. Radiat. Oncol. 65(5), 545–563 (2021) 21. Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. J. Big Data 6(1), 1–48 (2019)
7 A Novel Knowledge Distillation Technique for Colonoscopy …
97
22. Patel, K., Bur, A.M., Wang, G.: Enhanced U-Net: a feature enhancement network for polyp segmentation. In: Proceedings of the 2021 18th Conference on Robots and Vision (CRV), pp. 181–188. IEEE (2021) 23. Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., Liang, J.: Unet++: a nested U-Net architecture for medical image segmentation. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pp. 3–11. Springer, Cham (2018) 24. Cahall, D.E., Rasool, G., Bouaynaya, N.C., Fathallah-Shaykh, H.M.: Inception modules enhance brain tumor segmentation. Front. Comput. Neurosci. 13, 44 (2019) 25. Chahal, E.S., Patel, A., Gupta, A., Purwar, A.: Unet based exception model for prostate cancer segmentation from MRI images. Multimedia Tools Appl. 81(26), 37333–37349 (2022) 26. Tan, M., Le, Q.: Efficient net: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114. PMLR (2019)
Chapter 8
AI-Enabled Smart Monitoring Device for Image Capturing and Recognition Valluri Padmapriya, M. Prasanna, Kuruba Manasa, Rallabandi Shivani, and Sai Bhargav
Abstract This paper presents an AI-powered Smart Monitoring Device for realtime applications with multiple capabilities, including case 1: face recognition for age, gender, and vehicle registration plate (or commonly known as number plate) recognition. This proposed system has been integrated into the AI-based Smart Identification Engine and uses an IP-enabled camera to capture real-time information at public places. We use AI to create a convolutional neural network (CNN), a Pythonbased deep learning module that extracts streams, identifies and processes images, and collects snapshots from videos before storing the data in the cloud. Case 2: The AI-enabled Smart Digital Monitoring Engine (SDME) is an implementation of the CNN deep learning model to identify and process images and videos, and can be used in various sectors such as targeted advertising in malls and automated identification in gated communities through face and vehicle number plate recognition.
8.1 Introduction Artificial intelligence (AI) is the fastest-developing computer science technology used to build intelligent devices that reduce human effort [1]. Artificial intelligence is everywhere and has great potential for the future. AI is created when a machine can perform human-like functions such as learning, thinking, and problem-solving. V. Padmapriya (B) · M. Prasanna · K. Manasa · R. Shivani · S. Bhargav Bhavan’s Vivekananda College, Sainikpuri, Secunderabad, Telangana, India e-mail: [email protected]; [email protected] M. Prasanna e-mail: [email protected] V. Padmapriya Department of Computer Science-GITAM School of Science, GITAM Deemed to be University, Visakhapatnam, Andhra Pradesh, India M. Prasanna Department of Physics & Electronics-GITAM School of Science, Gitam Deemed to be University, Hyderabad, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_8
99
100
V. Padmapriya et al.
AI helps in the development of software devices that can handle real-world problems with great accuracy, for example, AI in traffic management, marketing, and so on. AI contributes to the creation of personal virtual assistants such as Cortana, Google Assistant, Siri, and others. We also build robots to eliminate human hazards. AI opens doors to many other technologies and new opportunities in the future. The primary goal of artificial intelligence is to imitate human intelligence to solve complex problems and develop systems that exhibit intelligent behavior, learn new things on their own, explain and advise users. Artificial intelligence is becoming important with several uses in today’s world [2], as it can tackle complex problems in various fields, including healthcare, entertainment, banking, and education. AI is making our daily lives more convenient and efficient. AI is used to recognize faces, create targeted ads, test commercials, and optimize performance. AI can read, write, interpret words, sense its surroundings, perceive, and recognize objects. It can identify patterns in data and use them to make predictions, continually improving its accuracy with more data. Once trained by humans, AI can continue to learn and develop on its own. The following are some industries that use artificial intelligence as shown in Fig. 8.1.
Fig. 8.1 AI usage in industries
8 AI-Enabled Smart Monitoring Device for Image Capturing …
101
Deep learning is a form of machine learning that incorporates neural networks that mimic the human brain [3]. It is a type of machine learning that employs a large number of nonlinear processing units to extract and manipulate features. Each succeeding layer takes the output of the preceding layer as input, and we use deep learning when we have a large number of inputs and outputs. Deep learning is accomplished with the help of neural networks [4], which are inspired by biological neurons—brain cells. Convolutional neural networks mostly used for image classification, picture clustering, and object identification [4]. One of its key applications is face recognition. Deep convolutional neural networks are preferred over other neural networks for achieving the highest accuracy. Convolutional neural networks (CNNs) are a form of feed-forward artificial neural network whose connection pattern is inspired by the visual cortex. Assume an image that is represented as a cuboid with dimensions of length, width, and height. The picture dimensions are represented here by the Red, Green, and Blue channels, as illustrated in the Fig. 8.2. To create a convolutional neural network, an extra layer known as the convolutional layer is added, which gives an “eye” to the artificial intelligence or deep learning model. It allows us to easily take a 3D frame or image as input, as opposed to a previous artificial neural network, which could only take an input vector containing some features as information. In this case, a convolutional layer is added at the front, which will be capable of “seeing” pictures in the same way as humans do. TensorFlow is a popular deep learning framework created by the Google team. It is an open-source software library written in Python. TensorFlow is capable of training and running deep neural networks for image recognition, handwritten digit classification, recurrent neural network, word embedding, natural language processing, video detection, and many more tasks. TensorFlow runs on a variety of CPUs and GPUs, as well as mobile operating systems, making it one of the best libraries. The TensorFlow library combines many APIs to produce large-scale deep learning architectures, such as CNNs or recurrent neural networks (RNNs). As a graph-based computing platform, TensorFlow enables developers to build neural networks using TensorBoard. This tool assists in debugging the program, and both a Central Processing Unit (CPU) and Graphical Processing Unit (GPU) are used to execute it [5]. Use Cases of Tensorflow are shown in Fig. 8.3
Fig. 8.2 Cuboid
102
V. Padmapriya et al.
Fig. 8.3 Use cases of tensor flow
TensorFlow outperforms other prominent deep learning frameworks in terms of functionality and features. TensorFlow is used to construct multi-layered, large-scale neural networks [5]. In this paper, we have worked on image processing. In first frame work, video recording using an IP webcam and detecting a person’s age and gender using OpenCV and TensorFlow along with other modules is done. After separating blurry images, the captured video is used to play relevant advertisements based on the estimated gender and age. The second framework identifies the license plate of a vehicle and compares it to other license plates stored in a database (MongoDB). If a match is found, the vehicle is identified as belonging to a specific community. Recognition and license plate identification heavily rely on AI and deep learning. As an illustration, imagine a shopping mall with thousands of daily visitors. In order to make the shopping mall a better location for advertising, we can ensure that the advertisements are displayed on large LCD screens for greater visibility. If this framework is set up, those nearby can view advertisements. For example, if a mall visitor is a female and is between the ages of 25 and 50, an advertisement for lipstick may be presented to her. We may enhance the quality of advertisements in this method.
8.2 Literature Review Examining and contrasting the functions of existing advertising systems is done. The Yahoo smart billboard utilizes the globalization idea, which prioritizes the majority to attract greater attention from passersby [1]. Based on this, we conducted research on an AI-based advertising system that can identify faces in public settings and categorize them by age and gender [1]. It can also identify objects with the help of a camera and display relevant advertisements based on the individual. Face recognition
8 AI-Enabled Smart Monitoring Device for Image Capturing …
103
Fig. 8.4 Classification of deep learning [4]
software that uses deep learning [1] convolutional neural networks and different other modules critically depends on artificial intelligence [2]. The billboard advertisements that play in this study paper draw nearby people’s attention. This personalized advertising method is used throughout the nation to take advantage of people’s interest and target them [1]. This agency plays a lot of advertisements, which is the ideal strategy for advertising and boosting sales [1]. Deep learning and artificial intelligence may be used together to execute vehicle number plate recognition, making the area safer (Fig. 8.4). The integration of AI and IoT has transformed conventional retail stores into modern, intelligent retail stores. This transformation has greatly improved the shopping experience for customers, making it easier and more convenient, while also optimizing the supply chain. The use of AI has allowed for the creation of machines that can monitor human senses, such as sight, hearing, taste, smell, and touch. The implementation of these machines has resulted in a stronger connection between consumers and brands, as well as an improved product-brand association in the e-commerce industry [3]. SRC, or the Sparse Representation Classifier, is a well-regarded method in the face and pattern recognition fields for its strong performance and resilience against occlusions and noise. SRC finds a simplified representation of test samples by combining them with training samples through L1-minimization. Once calculated, SRC selects the most efficient subset of training samples to represent the test samples and disregards the rest. Unlike other methods, SRC doesn’t require training for classification and can handle new face data in the training set without needing to retrain the model [6]. The facial recognition model combines a two-layer deep CNN and SRC for feature extraction and classification, respectively. The use of SRC results in improved classification even when a basic feature extraction method is utilized. The system demonstrates that the selection of an appropriate feature space can enhance the performance of SRC. The performance of SRC, which aims to create a training dictionary to sparsely represent the test image, is also impacted by the size of the dataset [7].
104
V. Padmapriya et al.
Low Resolution Face Recognition (RFR): (1) Super-resolution methods, (2) Comparison of virtual and real LR faces, (3) Face identification, and (4) Face reidentification. These approaches address the issue of mismatches in face image quality between probe and gallery images. A new DCGAN ( Deep Convolutional generative adversarial network) pre-training strategy is introduced for better network visualization and performance on larger datasets [8].
8.3 AI-Based Smart Monitoring Engine The system design and training architecture of the Smart Monitoring Engine we have designed are as follows:
8.3.1 System Design Image capture, face detection, age classification-based on face detection, genderbased ad playback, and vehicle number plate detection form the system architecture’s flow. a. IP webcam is used to access the camera and record video or photos for content monitoring. The images are then processed by the engine, stored in a database, and retrieved for further use. In the image processing step, several types of images, including faces, are identified in a single image or frame. The engine then detects gender and age based on the faces. The output is sent to the database holding the advertising, where appropriate advertisements are selected based on a person’s age and gender and displayed in shopping malls [9]. b. In a separate framework [2], the vehicle’s license plate number is identified, deciphered, and stored in a database. The matching of number plates determines if the vehicle belongs to the community. Recognition is performed using pre-trained models.
8.3.2 Training Architectures Age and gender recognition models are trained separately, and we use Keras and TensorFlow to identify age and gender. The task of these models is to detect faces in camera photos and determine their age and gender. This architecture is based on the combination of face alignment and detection, as aligned faces can provide better characteristics to improve the face categorization process according to preliminary research. The age assessment is a multi-class problem, in which the years are classified, and different age groups have different face features, making it challenging to put the photographs together.
8 AI-Enabled Smart Monitoring Device for Image Capturing …
105
Different methodologies are used to determine the age and gender of multiple faces. The convolutional network extracts features from the neural network, and the image is processed into one of the age groups based on the provided models. The UTK (UTKFace) Dataset in .csv format includes age, gender, and pictures. There has been extensive research on determining gender and age from photographs. Over the years, various approaches have been used to solve this problem. Keras is the TensorFlow library’s interface and is used for deep learning because it allows for easy and rapid prototyping through its ease of use, abstraction, and extensibility [6]. It supports convolutional networks and recurrent methods, and both CPU and GPU operations are flawless. For vehicle number plate recognition, we use pre-trained models and detection identification technology to find the plates and recognize them. This involves extracting text from an image. A security system relying on video monitoring must perform the crucial task of recognizing a vehicle’s license plate. By using certain computer vision methods, we can extract the number plate from a picture, and then utilize optical character recognition to identify the license number. For this, we use the TensorFlow and OpenCV modules. After extracting the number from the license plate and comparing it with the existing database to identify the vehicle owner’s name, the process is complete.
8.4 Practical Implementation 8.4.1 Age and Gender Detection Facial images are captured as input data by the system during the gender and age recognition process. The photos are pre-processed before being sent to the face detection algorithms [9]. To optimize the recognition process, the precise face location is calculated and cut out of the superfluous background. The cropped facial picture is then subjected to the feature extraction method. The workflow is as shown below: Fig. 8.5. The flow chart for age and gender detection is shown in Fig. 8.6. UML DIAGRAMS (Play ads) • UML stands for Unified Modeling Language. UML is a standardized, generalpurpose modeling language in the field of object-oriented software engineering, managed and created by the Object Management Group. • The goal is for UML to become a common language for creating models of objectoriented computer software. Currently, UML consists of two major components: a meta-model and a notation. In the future, some form of method or process may also be added to or associated with UML. Use Case Diagram (Fig. 8.7)
106
V. Padmapriya et al.
Fig. 8.5 The workflow
Fig. 8.6 The flow chart for age and gender detection
Activity Diagram (Fig. 8.8) The sample results of gender and age recognition is as follows in Fig. 8.9.
8.4.2 Vehicle Number Plate Recognition Three fundamental events must occur for software to identify and recognize a vehicle number: • Using a car picture as input—The application uses a car image as input to detect the vehicle number plate.
8 AI-Enabled Smart Monitoring Device for Image Capturing …
107
Fig. 8.7 Use case diagram of predicting age and gender
• Processing the Input—The photograph obtained as input is processed to detect the section of the vehicle number plate. • Recognizing the number plate—The values of the detected number plate are extracted from the number plate picture. The Flow chart for this framework is as shown (Fig. 8.10). Use case diagram (Fig. 8.11). Activity diagram (Fig. 8.12). The output is shown in Fig. 8.13. Following the cropping of the vehicle number plate, the cropped plate is compared to the vehicle numbers stored in the database. If there is a match, the vehicle is allowed to enter a gated community since it belongs to the owner or tenant of the neighborhood (Fig. 8.13).
108 Fig. 8.8 Activity diagram for age and gender classification
V. Padmapriya et al.
8 AI-Enabled Smart Monitoring Device for Image Capturing …
Fig. 8.9 The sample results of gender and age recognition
Fig. 8.10 Flow chart to identify and recognize a vehicle number
109
110
Fig. 8.11 Use case diagram for vehicle number plate recognition
V. Padmapriya et al.
8 AI-Enabled Smart Monitoring Device for Image Capturing … Fig. 8.12 Activity diagram for vehicle number plate recognition
111
112
V. Padmapriya et al.
Fig. 8.13 The sample output of cropped number plates
8.5 Conclusion The intelligent agent for engine monitoring introduced in this paper offers a more personalized advertising experience for both advertisers and consumers. Its ability to target advertising based on the individual’s age and gender, as well as its use in public areas such as malls, makes it a successful advertising solution. The added benefit of a vehicle number plate recognition system in gated communities, ensuring security, and preventing unauthorized entry, further enhances its usefulness and effectiveness. In future, the intelligent agent for monitoring engine has the potential to become even more sophisticated. By incorporating AI technologies such as machine learning and natural language processing, the system can further personalize the advertising experience and improve its relevance to the individual. Additionally, integration with other IoT devices, such as smart homes and wearable devices could provide even more data points to create a more comprehensive profile of the individual. The vehicle number plate recognition system could also be enhanced by integrating with facial recognition technology to improve the accuracy of identifying individuals. With these enhancements, the system could offer even greater benefits to advertisers and consumers alike. This entire setup can be made as a compact stand-alone device using a Raspberry Pi, Pi camera, with the smart monitoring agents we have developed.
8 AI-Enabled Smart Monitoring Device for Image Capturing …
113
References 1. Chin Poo lee: AI based targeted advertising system. https://www.researchgate.net/publication/ 330997856_AI-based_targeted_advertising_system (2019). Accepted 30 Dec 2018 2. Verma, S.: Artificial intelligence in marketing. Int. J. Inform. Manag. Data Insights 1, 100002 (2021) 3. Schofield, D., Nagrani, A., Zisserman, A., Hayashi, M., Matsuzawa, T., Biro, D., Carvalho, S.: Chimpanzee “face recognition from videos in the wild using deep learning.” Sci. Adv. 5(9), eaaw0736 (2019) 4. Li, L., Mu, X., Li, S., Peng, H.: A review of face recognition technology. IEEE Access 14(99), 1 (2020) 5. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D. G., Steiner, B., Tucker, P., Vasudevan, V., Warden, P., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: a system for large-scale machine learning (2016) 6. Wright, A., Yang, Y., Ganesh, A., Sastry, S.S., Ma, Y.: Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2009) 7. Cheng, E.J., Chou, K.P., Rajora, S., Jin, B.H., Tanveer, M., Lin, C.T., Young, K.Y., Lin, W.C., Prasad, M.: Deep sparse representation classifier for facial recognition and detection system. Pattern Recognit. Lett. 125, 71–77 (2019) 8. Li, P., Prieto, L., Mery, D., Flynn, P.J.: On low resolution face recognition in the wild: comparisons and new techniques. IEEE Trans. Inform. For. Sec. 14(8), 2000–2012 (2019) 9. Zhao, K., Xu, J., Cheng, M.: Regularface: deep face recognition via exclusive regularization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1136– 1144 (2019)
Chapter 9
A Compact Formulation for the mDmSOP: Theoretical and Computational Time Analysis Ravi Kant and Abhishek Mishra
Abstract The multi-Depot multiple Set Orienteering Problem (mDmSOP) is one of the recently proposed variants of the Set Orienteering Problem (SOP), which has applicability in different real-life applications such as delivering products and mobile crowd-sensing. The objective of the problem is to collect maximum profit from clusters within a given budget. In this paper, we propose an improved integer linear programming (ILP) formulation of the mDmSOP and conduct a time analysis of the results. We solved it using GAMS 39.2.0 and found that we can reduce a large number of constraints while changing sub-tour elimination constraints only. In the case of small instances, the improved mathematical formulation gives better results in all of the test cases for small instances up to 76 vertices except one instance of .16eil76 when .w < 0.5, and it gives better results in 93.33% of cases for small instances and 88.23% of cases while simulating on mid-size instances up to 198 nodes when .w = 0.5. Keywords Traveling Salesman Problem (TSP) · Set Orienteering Problem (SOP) · Sub-tour elimination constraints (SEC)
9.1 Introduction Sub-tour elimination constraint (SEC) plays a vital role in solving all NP-hard routing problems. The Traveling Salesman Problem (TSP) is one of the famous NP-hard routing problems that was first formulated mathematically by Hamilton [9], but SEC was first introduced by Dantzig et al. [6] in 1954 for the TSP, which has the number of constraints in the order of .O(2n ) if the input matrix is of .(.n × n) size. It uses the concept of a subset to remove the sub-tours in the path. A comparative analysis is done by Oncan et al. [13] for different proposed formulations of TSP based on the number R. Kant (B) · A. Mishra Department of CSIS, Birla Institute of Technology and Science, Pilani 333031, India e-mail: [email protected] A. Mishra e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_9
115
116
R. Kant and A. Mishra
of variables and constraints. In 1960, Miller–Tucker–Zemlin (MTZ) [11] proposed a new SEC for TSP so that it can be solved in lesser time as SEC has less number of constraints of the order of .O(n 2 ) in the new formulation, and it is not feasible to solve using the formulation of Dantzig et al. [6]. It is shown by Velandnitsky [17] that the MTZ formulation also contains the polytope of the Dantzig–Fulkerson–Johnson (DFJ) TSP formulation. Utilizing extra variables is a key component of the MTZ formulation. Node potentials are the variables represented by .u p for a node . p, where .u p gets the values of all nodes except the depot. The node potential value of node .q must be greater than the node potential value of node . p if the edge between these two vertices is traversed by the traveler from node . p to node .q. This condition ensures that the travelers do not trap in a circle. Still, it’s not the same in the case of the depots because depots are excluded from this condition of node potential, and hence, no value is assigned to the depots so that the travelers can always end their trip at the depot. If we include the depots in this, the node potential variable for the depots will not satisfy the condition as it will give two different values for the same depot. In Desrochers et al. [7], the authors improved the MTZ formulation so that it can be used to solve various vehicle routing problems. Many researchers suggested different SECs for the TSP, the Orienteering Problem (OP), and the SOP, but MTZ-based formulations are the most commonly used to date. Campuzano et al. [5], Sherali et al. [15], and Sawik et al. [14] used the MTZ formulation for multiple Traveling Salesman Problem. Vansteenwegen et al. [16], and Yuan et al. [18] also used the SECs suggested by MTZ to define OP and routing problem with time windows, respectively. Bektas [3] and Bektas et al. [4] formulated the multiple Traveling Salesman Problem with MTZ-based SECs. Bazrafshan et al. [2] performed some studies on the SECs and concluded that it is not known which formulation will work better for our problem unless we do a relative comparison to check the suitability of the formulation for our problem. The ILP formulation suggested in the paper is inspired by the formulation suggested by Miller et al. [11] for the TSP, but we made some adjustments so that it gives correct results for the mDmSOP, we associated node potential with every traveler (.u j p ) (Eq. (9.11)) as every traveler has different starting depot. This paper is categorized as follows. A formal description and improved ILP of the problem are presented in Sect. 9.2. Comparative results are shown in Sect. 9.3, while the conclusion is presented in Sect. 9.4.
9.2 Problem Definition In this section, we discuss the mDmSOP and its ILP formulation in detail. First, we give a brief overview of the mDmSOP. In this problem, we have an undirected complete graph .G(V, E), where .V = {v1 , v2 , . . . , vn } ∪ {vn+1 , vn+2 , . . . , vn+m } is the set of vertices and . E = {e pq } is the set of edges, .e pq defined as an edge between vertices .v p and .vq . The last .m vertices . M = {vn+1 , vn+2 , . . . , vn+m } represent the depots from which the .m salesmen start. All the edges are weighted and represented
9 A Compact Formulation for the mDmSOP …
117
the travel cost for a salesman to reach from . p to .q. The cost matrix satisfies the Euclidean conditions (i.e., the triangle inequality). Because for all practical purposes, the salesman will take the shortest path from . p to .q even if it’s not direct. Hence, the cost matrix always stores the shortest path between the pair of vertices, thus satisfying the triangle inequality for any three triplets. The cost of a tour for a salesman is the sum of the costs of all edges traversed by the salesman, and the total cost is the sum of the cost of tours for each salesman. An upper bound is put on the total cost .(B), and each salesman should visit at least one vertex so that the load is distributed and no salesman is idle. Moreover, each salesman belongs to a unique set .s in . S = {s1 , . . . , sr }, and each set has a profit (. P) associated with it. In this case, the last .m sets, which we denote by . Sμ , belong to the depots of the .m salesmen, and the profits associated with them are .0. .x j pq , . y j p and .z ji are decision binary variables. .x j pq is 1 if edge .( p, q) is visited by the traveler . j, . y j p is 1 if vertex . p is traversed by . j and the value 1 is assigned to .z ji if any vertex within set .i is visited by traveler . j. The primary objective is to maximize the profit while ensuring that the constraints of the problem are not violated. As we know, SEC is an integral part of the formulation, and often these conditions are the most computationally extensive.
9.2.1 An Improved ILP Formulation ILP formulation for the mDmSOP can be formulated as follows: ∑∑ .maximize Pi z ji ,
(9.1)
j∈M i∈S
subject to: .
x j pq , y j p , z ji , ∈ {0, 1} ∀ j ∈ M; ∀ p, q ∈ V, ∑∑∑ .
x j pq C pq ≤ B,
(9.2) (9.3)
j∈M p∈V q∈V .
∑ .
y j ( j+n) = 1 ∀ j ∈ M,
(9.4)
x j pq = y jq ∀ j ∈ M; ∀q ∈ V,
(9.5)
x jq p = y jq ∀ j ∈ M; ∀q ∈ V,
(9.6)
v p ∈V −{vq }
∑ .
v p ∈V −{vq }
∑ .
v p ∈Si
y j p = z ji ∀ j ∈ M; ∀i ∈ S,
(9.7)
118
R. Kant and A. Mishra
∑
z ji ≤ 1 ∀i ∈ S,
(9.8)
1 ≤ u s ≤ n ∀s ∈ S,
(9.9)
.
j∈M .
⎛ u + 1 ≤ n ⎝1 −
∑
. s1
⎞ x j pq ⎠ + u s2 ∀ p, ∀q ∈ V ; ∀s1 , ∀s2 ∈ {S − M|s1 /= s2 }.
j∈M
(9.10) The problem proposed here is the same as the one in Ravi et al. [10] referred as ILP 1. However, in this case, we have improved the SECs (ILP 2) by reducing the number of conditions; here, .u s1 and .u s2 are set potential rather than node potential. In Ravi et al. [10], the SEC’s equations differ as follows, which is based on MTZ formulation with some improvements in sub-tours; we assign node potential with every traveler (. j) referred as (.u j p ) in it as every traveler has a different depot as a starting point. The SEC is given as the following: u j p − u jq + 1 ≤ n(1 − x j pq ), ∀ j ∈ M, ∀ p, ∀q ∈ {V − M| p /= q}.
.
(9.11)
The number of SECs in Eq. (9.11) is .m(n − m)(n − m − 1) = |SEC1 |. The formulation proposed by us reduces the number of equations by a factor of .m by getting rid of the factor . j from Eq. (9.11). It also reduces the node potential variables by the same factor as we are using the concept of the set potential in the improved formulation. From Eq. (9.5) and the ∑ fact that . y jq ≤ 1, it’s evident that only a single salesman can visit a vertex since . vi ∈V −{v j } ≤ 1. Similarly, only a single salesman can leave the vertex (from Eq. (9.6) and. y jq ≤ 1). This means that if salesman. j enters the same non-depot vertex . j, it will be the only one to leave that vertex. Therefore, if a subtour is formed, it will be formed by the same salesman, i.e., if .T = {v1 , v2 , . . . , vα } is the sub-tour then . y jq = 1, ∀q ∈ T . Since the same salesman can only form the cycle, we don’t need separate node potentials corresponding to each salesman. We can replace .u j p by .u s1 . Also, we need to replace .x j pq . We know it represents whether edge .{ p, q} is traversed by any salesman. This can be found by doing a summation over . M for the given edge. This leads us to the Eq. (9.10). The new SECs reduce the number of conditions ∑ and the overall complexity of the ILP generation. We can easily pre-cache the values . j∈M x j pq , ∀ p, q ∈ V in . O(n 2 ) time and . O(Mn 2 ) space. Therefore, there’s no additional computation time over the original SECs. Also, the number of conditions, in this case, is.(S − m)(S − m − 1) = |SEC2 |. (S − m)(S − m − 1) |SEC2 | = . . (9.12) |SEC1 | m(n − m)(n − m − 1) Therefore, the formulation reduces the number of SECs by a factor of .(S/n)2 if we assume that .m is a small number. Moreover, the number of variables in node potentials is also reduced by the same factor, i.e., .(S/n)2 .
9 A Compact Formulation for the mDmSOP …
119
9.3 Comparative Results 9.3.1 Test Instances In order to assess and compare the performance of the aforementioned formulations, we employed the Generalized Traveling Salesman Problem (GTSP) instances recommended by Noon in 1988 [12]. Additionally, we utilized a branch and cut method, originally proposed by Fischetti et al. [8], which was initially designed for solving the Symmetric Generalized Traveling Salesman Problem. However, we modified the GTSP to accommodate multiple depots, aligning it with the requirements of our specific formulation as follows: 1. The depot vertices are relocated from the non-depot sets to the depot sets. 2. The non-depot sets are arranged in ascending order based on their number of vertices. 3. A stepwise process is implemented, wherein the list is iterated over. Whenever an empty set is encountered, the first vertex from a non-empty set with a size greater than one is identified and transferred into the empty set. The algorithm employed in our study generates sets that adhere to the constraints of our problem. The profit calculation is based on different approaches. In scheme g1, inspired by Angelelli et al. [1], the number of nodes within the respective cluster determines the profit. On the other hand, scheme g2 employs a mathematical formula .(1 + (7141q))mod(100) to generate a pseudo-random profit for each node .q. In this case, the cluster profit is obtained by summing up the profits of all nodes associated with that particular cluster. Notably, the depot sets are assigned a profit of 0 in both schemes.
9.3.2 Computational Results Simulation has been done on IntelR Xeon(R) Silver 4316 CPU @ 2.30 GHz .× 80 with 256 GB of RAM. Simulation results are given in Tables 9.1 and 9.2 with the following criteria: 1. threads = 0 (using all available threads) in Table 9.1. 2. threads = 0 (using all available threads) with 5% relative gap for set 1 and 20% relative gap for set 2 in Table 9.2. Results are presented in Tables 9.1 and 9.2. Table 9.1 is organized as follows: The first five columns represent the GTSP instance name, number of vertices (.n), number of travelers .(m), and the rule to generate the profit (. Pg ) and value of .w. Budget is calculated as .┌w × m × Tmax ┐, where .Tmax is the solution of GTSP instance, .w is a variable to adjust the budget according to the need, .m is the number of travelers used to obtain the maximum profit, Opt. shows the optimal solution, and the last
120
R. Kant and A. Mishra
Table 9.1 ILP comparison with optimal solutions on small instances with .w < 0.5 Instance
.n
.m
Pg
.w
Opt.
ILP 1 Sol.
Time
ILP 2 Gap
Sol.
Time
Gap
11berlin52
52
2
g1
0.2
27
27
8.685
0.00
27
2.471
11berlin52
52
2
g1
0.3
34
34
1392.508
0.00
34
71.98
0.00 0.00
11berlin52
52
2
g1
0.4
45
45
15,457.762
0.00
45
375.209
0.00
11berlin52
52
2
g2
0.2
1276
1276
6.986
0.00
1276
1.896
0.00
11berlin52
52
2
g2
0.3
1571
1571
2360.777
0.00
1571
68.686
0.00
11berlin52
52
2
g2
0.4
2106
2106
18,263.182
0.00
2106
262.432
0.00
11berlin52
52
3
g1
0.2
43
43
42.674
0.00
43
4.11
0.00
11berlin52
52
3
g1
0.3
47
47
31.171
0.00
47
2.889
0.00
11berlin52
52
3
g1
0.4
49
49
5.261
0.00
49
2.604
0.00
11berlin52
52
3
g2
0.2
2083
2083
388.393
0.00
2083
8.653
0.00
11berlin52
52
3
g2
0.3
2341
2341
200.726
0.00
2341
9.843
0.00
11berlin52
52
3
g2
0.4
2365
2365
14.041
0.00
2365
1.898
0.00
11eil51
51
2
g1
0.2
29
29
5.915
0.00
29
1.589
0.00
11eil51
51
2
g1
0.3
38
38
69.969
0.00
38
8.285
0.00
11eil51
51
2
g1
0.4
45
45
76.094
0.00
45
50.135
0.00
11eil51
51
2
g2
0.2
1552
1552
4.783
0.00
1552
1.718
0.00
11eil51
51
2
g2
0.3
1931
1931
121.879
0.00
1931
9.817
0.00
11eil51
51
2
g2
0.4
2226
2226
179.397
0.00
2226
99.224
0.00
11eil51
51
3
g1
0.2
37
37
39.525
0.00
37
3.866
0.00
11eil51
51
3
g1
0.3
44
44
768.960
0.00
44
29.771
0.00
11eil51
51
3
g1
0.4
48
48
2.779
0.00
48
1.523
0.00
11eil51
51
3
g2
0.2
1862
1862
54.488
0.00
1862
3.586
0.00
11eil51
51
3
g2
0.3
2157
2157
1540.950
0.00
2157
246.089
0.00
11eil51
51
3
g2
0.4
2296
2296
3.547
0.00
2296
1.354
0.00
16eil76
76
2
g1
0.2
39
39
591.146
0.00
39
16.394
0.00
16eil76
76
2
g1
0.3
54
54
9222.964
0.00
54
508.812
0.00
16eil76
76
2
g1
0.4
66
66
44,504.173
0.00
66
2016.839
0.00
16eil76
76
2
g2
0.2
1939
1939
433.914
0.00
1939
21.626
0.00
16eil76
76
2
g2
0.3
2621
2621
13,025.470
0.00
2621
774.811
0.00
16eil76
76
2
g2
0.4
3170
3170
30,455.980
0.00
3170
4700.094
0.00
16eil76
76
3
g1
0.2
57
57
1949.760
0.00
57
533.493
0.00
16eil76
76
3
g1
0.3
68
68
34,823.119
0.00
68
7285.434
0.00
16eil76
76
3
g1
0.4
73
73
165.025
0.00
73
3.349
0.00
16eil76
76
3
g2
0.2
2696
2696
3645.931
0.00
2696
359.353
0.00
16eil76
76
3
g2
0.4
3521
3521
13.208
0.00
3521
61.879
0.00
Note The entry where ILP 2 took more time than ILP 1 is marked bold
six columns represent the solution, time, and relative gap for ILP formulation and improved ILP formulation, respectively. Results given in Table 9.2 are organized as follows: Set 1 belongs to small instances up to 76 nodes, while Set 2 is related to large instances up to 198 nodes.
9 A Compact Formulation for the mDmSOP …
121
Table 9.2 ILP comparison on small and large instances with .w = 0.5 Instance
.n
.m
Pg
Opt.
ILP 1 Sol.
Set 1
Set 2
Time
ILP 2 Gap
Sol.
Time
Gap
11berlin52
52
2
g1
50
48
1.808
4.00
48
5.234
4.00
11eil51
51
2
g1
49
47
5.91
4.08
47
1.731
4.08
14st70
70
2
g1
68
67
34.552
1.47
67
3.922
1.47
16eil76
76
2
g1
74
71
125.561
4.05
71
22.199
4.05
11berlin52
52
2
g2
2375
2264
5.357
4.67
2277
1.489
4.13
11eil51
51
2
g2
2365
2293
7.96
3.04
2298
1.683
2.83
14st70
70
2
g2
3266
3182
12.811
2.57
3182
3.465
2.57
11berlin52
52
3
g1
49
48
1.283
2.04
47
0.824
4.08
11eil51
51
3
g1
48
46
1.151
4.17
48
0.982
0.00
14st70
70
3
g1
67
66
2.929
1.49
67
1.520
0.00
16eil76
76
3
g1
73
70
5.753
4.11
71
2.136
2.74
11berlin52
52
3
g2
2365
2354
1.338
0.47
2354
0.811
0.47
11eil51
51
3
g2
2296
2229
1.268
2.92
2216
0.934
3.48
14st70
70
3
g2
3218
3134
2.944
2.61
3218
1.500
0.00
16eil76
76
3
g2
3521
3421
4.878
2.84
3421
2.094
2.84
20rat99
99
2
g1
97
85
12.37
82
29.199
15.46
20rd100
100
2
g1
98
84
9.907
14.29
87
3.719
11.22
21eil101
101
2
g1
99
85
10.004
14.14
90
4.073
9.09
21lin105
105
2
g1
103
89
11.981
13.59
89
6.736
13.59
22pr107
107
2
g1
105
90
174.985
14.29
90
235.425
14.29
25pr124
124
2
g1
122
103
289.628
15.57
108
86.958
11.48
26bier127
127
2
g1
125
106
31.687
15.20
121
11.994
3.20
26ch130
130
2
g1
128
109
37.357
14.84
108
12.683
15.63
28pr136
136
2
g1
134
114
64.898
14.93
116
103.840
13.43
29pr144
144
2
g1
142
123
382.59
13.38
132
125.440
7.04
30ch150
150
2
g1
148
130
376.932
12.16
124
134.266
16.22
30kroA150
150
2
g1
148
127
505.315
14.19
124
155.251
16.22
30kroB150
150
2
g1
148
130
343.76
12.16
125
121.882
15.54
31pr152
152
2
g1
150
128
600.804
14.67
126
896.027
16.00
32u159
159
2
g1
157
131
493.955
16.56
145
176.251
7.64
39rat195
195
2
g1
193
193
49.435
0.00
174
9.376
9.84
40d198
198
2
g1
196
183
42.909
6.63
196
8.483
0.00
20rat99
99
2
g2
4793
4228
114.809
11.79
4059
46.023
15.31
20rd100
100
2
g2
4871
4304
8.577
11.64
4463
3.721
8.38
21eil101
101
2
g2
4890
4200
10.988
14.11
4089
3.734
16.38
21lin105
105
2
g2
5076
4287
10.483
15.54
4412
4.310
13.08
22pr107
107
2
g2
5165
4340
1812.764
15.97
4561
71.940
11.69
25pr124
124
2
g2
6043
5172
230.562
14.41
5180
61.464
14.28
26bier127
127
2
g2
6175
5888
15.674
4.65
5586
9.881
9.54
26ch130
130
2
g2
6276
5256
43.64
16.25
5646
12.832
10.04
28pr136
136
2
g2
6585
5692
386.288
13.56
6046
55.590
8.19
29pr144
144
2
g2
6993
6021
322.393
13.90
6189
115.307
11.50
30ch150
150
2
g2
7246
6052
494.603
16.48
6165
117.746
14.92
30kroA150
150
2
g2
7246
6089
708.481
15.97
6048
169.120
16.53
30kroB150
150
2
g2
7246
6459
34.7
10.86
6699
81.024
7.55
31pr152
152
2
g2
7325
6237
646.7
14.85
6361
143.933
13.16
32u159
159
2
g2
7743
6519
446.088
15.81
7163
136.661
7.49
39rat195
195
2
g2
9541
9175
23.617
3.84
8434
8.929
11.60
40d198
198
2
g2
9706
8330
17.371
14.18
8920
7.993
8.10
102.28
Note The entry where ILP 2 took more time than ILP 1 is marked bold
122
R. Kant and A. Mishra
Fig. 9.1 CPU performance of 11berlin52 for ILP Formulation 1
Fig. 9.2 CPU performance of 16eil76 for ILP Formulation 1
Fig. 9.3 CPU performance of 32u159 for ILP Formulation 1
Fig. 9.4 CPU performance of 11berlin52 for ILP Formulation 2
Fig. 9.5 CPU performance of 16eil76 for ILP Formulation 2
Fig. 9.6 CPU performance of 32u159 for ILP Formulation 2
9.3.3 Inferential Results For the sake of a comprehensive understanding of the system performance, we have depicted the Silver 4316 CPU performance in both the Formulations 1 [10] and 2 [Eqs. (9.1)–(9.10)]. Figures 9.1, 9.2 and 9.3 illustrate that as we increase the number of nodes in the instances taken, the CPU utilization increases. Figure 9.1 shows some under-utilized CPU threads followed by a hike in the CPU thread utilization in Fig. 9.2 and finally in Fig. 9.3. Therefore, we inferred that increasing the number of threads in the case of a small instance (11Berlin52) in Fig. 9.1 is not worthy enough. The system performance of our improved Formulation 2 in Figs. 9.4, 9.5 and 9.6 shows a considerable improvement in the system performance. It utilizes a lesser number of threads and solves the instances much more quickly as compared to Formulation 1. Consequently, we can claim that our improved Formulation 2 can run multiple numbers of tasks in a lesser time frame as it needs less number of threads to simulate the same instance as compared to ILP Formulation 1.
9 A Compact Formulation for the mDmSOP …
123
9.4 Conclusion After observing the simulation results of both the ILP formulations, we can see that the improved ILP formulation gives better results for all the test instances except one instance of .16eil76 in Table 9.1 when .w < 0.5. While in Table 9.2, it performs better in 93.33% instances of set 1 and 88.23% instances of set 2. Moreover, we witnessed that our improved Formulation 2 utilizes multiple threads efficiently in a window size of 60 seconds, which shows a significant improvement in CPU performance in comparison to Formulation 1. In the above formulation, we were able to reduce the number of SECs by getting rid of .m, which is the number of salesmen, and using set potential rather than node potential, as per the constraint in our ILP formulation, we can travel at most .1 vertex per set, and no set has any common vertex, so these conditions allowed us to reduce the SECs by using node potential on sets rather than all the nodes in the instance. By doing so, we only check the set potential, and we are able to generate maximum profit same as the previous formulation.
References 1. Angelelli, E., Archetti, C., Vindigni, M.: The clustered orienteering problem. Eur. J. Oper. Res. 238(2), 404–414 (2014) 2. Bazrafshan, R., Hashemkhani Zolfani, S., Mirzapour Al-e Hashem, S.M.J.: Comparison of the sub-tour elimination methods for the asymmetric traveling salesman problem applying the SECA method. Axioms 10(1), 19 (2021) 3. Bektas, T.: The multiple traveling salesman problem: an overview of formulations and solution procedures. Omega 34(3), 209–219 (2006) 4. Bekta¸s, T., Gouveia, L.: Requiem for the Miller–Tucker–Zemlin subtour elimination constraints? Eur. J. Oper. Res. 236(3), 820–832 (2014) 5. Campuzano, G., Obreque, C., Aguayo, M.M.: Accelerating the Miller–Tucker–Zemlin model for the asymmetric traveling salesman problem. Expert Syst. Appl. 148, 113229 (2020) 6. Dantzig, G., Fulkerson, R., Johnson, S.: Solution of a large-scale traveling-salesman problem. J. Oper. Res. Soc. Am. 2(4), 393–410 (1954) 7. Desrochers, M., Laporte, G.: Improvements and extensions to the Miller–Tucker–Zemlin subtour elimination constraints. Oper. Res. Lett. 10(1), 27–36 (1991) 8. Fischetti, M., Salazar González, J.J., Toth, P.: A branch-and-cut algorithm for the symmetric generalized traveling salesman problem. Oper. Res. 45(3), 378–394 (1997) 9. Hamilton, W.R.: Problema del viajante 10. Kant, R., Mishra, A.: The multi Depot multiple set orienteering problem. Unpublished Manuscript (2022) 11. Miller, C.E., Tucker, A.W., Zemlin, R.A.: Integer programming formulation of traveling salesman problems. J. ACM (JACM) 7(4), 326–329 (1960) 12. Noon, C.E.: The generalized traveling salesman problem. Ph.D. thesis. University of Michigan (1988) 13. Öncan, T., Altınel, I.K., Laporte, G.: A comparative analysis of several asymmetric traveling salesman problem formulations. Comput. Oper. Res. 36(3), 637–654 (2009) 14. Sawik, T.: A note on the Miller–Tucker–Zemlin model for the asymmetric traveling salesman problem. Bull. Pol. Acad. Sci. Tech. Sci. 3 (2016) 15. Sherali, H.D., Driscoll, P.J.: On tightening the relaxations of Miller–Tucker–Zemlin formulations for asymmetric traveling salesman problems. Oper. Res. 50(4), 656–669 (2002)
124
R. Kant and A. Mishra
16. Vansteenwegen, P., Souffriau, W., Van Oudheusden, D.: The orienteering problem: a survey. Eur. J. Oper. Res. 209(1), 1–10 (2011) 17. Velednitsky, M.: Short combinatorial proof that the DFJ polytope is contained in the MTZ polytope for the asymmetric traveling salesman problem. arXiv preprint arXiv:1805.06997 (2018) 18. Yuan, Y., Cattaruzza, D., Ogier, M., Semet, F.: A note on the lifted Miller–Tucker–Zemlin subtour elimination constraints for routing problems with time windows. Oper. Res. Lett. 48(2), 167–169 (2020)
Chapter 10
Keywords on COVID-19 Vaccination: An Application of NLP into Macau Netizens’ Social Media Comments Xi Chen, Vincent Xian Wang, Lily Lim, and Chu-Ren Huang
Abstract This study investigates the general public’s concerns about COVID-19 vaccination by their comments in social media (YouTube) with NLP techniques and time series analysis. A set of keywords are traced in order to better understand the changes in public opinion and responses at different stages of the pandemic, as well as the influences of fake news. These keywords were extracted from Macau netizens’ online comments based on word frequency, TF-IDF, and TextRank. It is observed that the misinformation dissipated abruptly after initiation of mass vaccination in Macau. We account for this change by the Prospect Theory. This study has shown that NLP techniques can assist in discourse analysis of people’s perceptions of COVID-19 vaccination, and people’s linguistic behaviours have been captured by the extracted keywords through text mining and time series analysis.
X. Chen · V. X. Wang (B) Department of English, University of Macau, Macau, China e-mail: [email protected] X. Chen e-mail: [email protected] L. Lim MPU-Bell Centre of English, Macao Polytechnic University, Macau, China e-mail: [email protected] C.-R. Huang Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong, China The Hong Kong Polytechnic University-Peking University Research Centre On Chinese Linguistics, The Hong Kong Polytechnic University, Hong Kong, China C.-R. Huang e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_10
125
126
X. Chen et al.
10.1 Introduction As a formidable public health crisis, COVID-19 has posted unprecedented challenges on human lives across the globe. Endeavours have been made to meet this challenge, in particular, the worldwide campaigns of mass vaccination. It is imperative to gain knowledge of people’s attitudes and perceptions of COVID-19 vaccination, and this would facilitate health communication for preventing the disease. Meanwhile, social media have come to the foreground in previous COVID-19 related research [1–9] since they stand as important channels for people to communicate and therefore provide favourable language resources to study the public opinion. Social media, however, can also be detrimental to public health during the pandemic by disseminating the misinformation [10, 11] including the fake news on vaccination, such as aggravating the vaccine hesitancy [12–14]. Against this backdrop, this study will strengthen the foregoing stream of research by analysing Macau netizens’ social media comments on a YouTube channel to disclose people’s concerns and attitudes on COVID-19 vaccination through keywords extracted by natural language processing (NLP) techniques and explore the relevant affairs along the timeline with time series analysis.
10.2 Methodology Our entire workflow is illustrated in Fig. 10.1. First, the data of comments were crawled from “日更频道PLUS1 ” (Daily Update Channel PLUS), one of the most popular YouTube channels based in Macau with around 76 thousand followers (cf. the whole population of Macau residents is around 680 thousand). It posted videos on COVID-19 vaccination from December 2020 and we scaped comments under 41 relevant videos since then. The personal identification information associated with each comment was deleted, and the collected data are solely intended for research purposes. Second, the crawled comments were combined in Pandas2 as a DataFrame with columns of texts, i.e. the comments per se, and the time when these comments were posted. The texts were transformed into string type. The time column was converted into timestamps, and we reindexed the DataFrame by the timestamps. In this study, we focus on the first half-year changes of these online comments from the beginning for further time series analysis so only the texts from December 2020 to May 2021 were kept for our study. Table 10.1 displays the count of comments and words for each month. It can be seen that the maximum number of comments is 1194 in March 2020. As time went by, people were losing their enthusiasm and the number of comments decreased sharply. Third, most of the texts are in Chinese. The traditional Chinese characters were transformed into simplified Chinese characters by 1 2
https://www.youtube.com/@macau853/videos. https://pandas.pydata.org/.
10 Keywords on COVID-19 Vaccination: An Application of NLP …
127
zhconv3 to unify the Chinese characters for further segmentation. The processed texts were segmented by the Chinese text segmentation tool Jieba4 since Chinese texts are character-based and there is no space between words [15]. Fourth, the segmented texts were used to generate the keywords lists by word frequency, TF-IDF, and TextRank. The word frequency is calculated by Pandas’ counting function. TF-IDF [16] and TextRank [17] are built in Jieba’s “analyse” function for extracting keywords. TFIDF is achieved by the term frequency (TF) in a given document multiplying its inverse document frequency (IDF), which means that a term with a high frequency in a specific document and simultaneously with a low document frequency in the overall collection of documents can result in a high-weight TF-IDF score. Therefore, TF-IDF tends to have a tendency to screen out common words and produce peculiar words. The concept of TextRank originates from Google’s PageRank algorithm. It builds a network through the adjacent relationship between words, and then uses the PageRank algorithm to iteratively calculate the rank value of each word and sort the rank values to generate keywords. We used three means to identify the keywords to avoid the bias of a single algorithm. The stop words (meaningless words) were removed from the texts and the lists of keywords, respectively. After identifying and inspecting the keywords, some idiosyncratic words drew our attention, whereupon we investigated the peculiar keywords by time series analysis to delve into their diachronic features. Taking advantage of the “resample” and “plot” functions of Pandas, the amounts of these keywords were counted by month and plotted along the timeline (cf. Sect. 3.2).
10.3 Results 10.3.1 Keywords The top 20 keywords (with their values) generated by three means are presented in Table 10.2. The three different algorithms have produced a large proportion of shared keywords. They are 疫苗 vaccine, 澳门 Macau, 香港 Hong Kong, 接种 inoculate, 中国 China, 美国 the USA, 政府 government, 病毒 virus, 大陆 Mainland, 科兴 Sinovac, 问题 problem, 国药 Sinopharm, 国产 domestic, 市民 citizen, 副作用 side effect, and 疫情 epidemic situation. These keywords reflect people’s general concerns surrounding COVID-19 vaccination. 疫苗 vaccine is the principal keyword in three lists without doubt. Some keywords are directly encompassing vaccine related issues, such as inoculating (接种) vaccines, vaccines’ efficacy to virus (病毒), its problems (问题 problem and 副作用 side effect) and various types of vaccines (科兴 Sinovac, 国药 Sinopharm, 国产 domestic). Other keywords concern comparison between different places (澳门 Macau, 香港 Hong Kong, 大陆 Mainland, 中国 China, and 美 3 4
https://github.com/gumblex/zhconv. https://github.com/fxsjy/jieba.
128 Fig. 10.1 Workflow of generating keywords and time series analysis
X. Chen et al.
10 Keywords on COVID-19 Vaccination: An Application of NLP … Table 10.1 The quantity of comments and words of each month
129
Time
Amount of comments
Amount of words
December, 2020
206
4445
January, 2021
77
1207
February, 2021
977
15,461
March, 2021
1194
20,568
April, 2021
504
8973
May, 2021
498
9435
国 the USA) and their governments (政府) with regard to their policies on COVID-19 vaccination and their types of vaccines. The remaining two keywords (市民 citizen and 疫情 epidemic situation) refer to people’s lives since the citizens care about the government’s (vaccination) policies on its people and whether vaccines would contain the COVD-19 epidemic. Other keywords in the frequency list also appear in the TF-IDF and TextRank lists. 断子绝孙 (sonless, childless) marks the fake news of COVID-19 vaccines, which will be discussed in the following time series analysis. Meanwhile, people are comparing the vaccines of different countries’ (国家) according to data (数据), and they hope (希望) vaccines can help contain the epidemic. In addition, 辉瑞 (Pfizer), 新冠 (coronavirus), and 通关 (open the border) are unique in TF-IDF. 辉瑞 (Pfizer) refers to the vaccine produced by the USA, which is in the issue of aforementioned comparing different vaccines by their manufacturing countries. 新冠 (Coronavirus) in TF-IDF and 肺炎 (pneumonia) in TextRank point to the disease of COVID-19. Furthermore, 通关 (open the border) stands as a key factor closely related to Macau people’s lives and the trilateral relationship among Chinese Mainland (大陆), Macau (澳门), and Hong Kong (香港), of which Macau and Hong Kong are two special administrative regions of China. The COVID-19 epidemic led to the tightening of cross-border policies, and people were hoping that, following mass vaccination, the tightened cross-border policies would be eased, so that travelling between the three places can be back to normal.
10.3.2 Time Series Analysis From our close examination of the keywords, some words attracted our attention by their particularities, especially the expression 断子绝孙 (sonless, childless), which points to the seemingly grave side effect of COVID-19 vaccines that would lead to infertility. Therefore, we further explored the words of various types of vaccines (科 兴 Sinovac, 国药 Sinopharm, 辉瑞 Pfizer, and 国产 domestic) and the words on the problems/issues of vaccination (问题 problem, 副作用 side effect, and 断子绝孙 sonless, childless) using time series analysis to see if the occurrence of these words would change with time by calculating and plotting their amounts in each month (cf.
130
X. Chen et al.
Table 10.2 Keywords by frequency, TF-IDF, and TextRank Word
Freq
Word
TF-IDF
Word
TextRank
疫苗 Vaccine
915
疫苗 Vaccine
0.232
疫苗 Vaccine
1.000
澳门 Macau
497
澳门 Macau
0.146
澳门 Macau
0.563
香港 Hong Kong
397
香港 Hong Kong
0.081
香港 Hong Kong
0.460
接种 Inoculate
199
接种 Inoculate
0.055
接种 Inoculate
0.260
中国 China
192
科兴 Sinovac
0.046
中国 China
0.244
美国 The USA
162
断子绝孙 Sonless, childless
0.038
政府 Government
0.200
政府 Government
161
病毒 Virus
0.037
病毒 Virus
0.192
病毒 Virus
142
国药 Sinopharm
0.034
美国 The USA
0.153
大陆 Mainland
138
大陆 Mainland
0.029
大陆 Mainland
0.148
科兴 Sinovac
122
政府 Government
0.026
问题 Problem
0.120
问题 Problem
110
美国 The USA
0.024
科兴 Sinovac
0.113
国药 Sinopharm
102
国产 Domestic
0.023
国药 Sinopharm
0.111
断子绝孙 Sonless, childless
100
辉瑞 Pfizer
0.023
国产 Domestic
0.097
国产 Domestic
86
副作用 Side effect
0.023
市民 Citizen
0.096
市民 Citizen
82
中国 China
0.020
希望 Hope
0.080
希望 Hope
76
疫情 Epidemic situation
0.018
国家 Country
0.078
副作用 Side effect
76
市民 Citizen
0.018
数据 Data
0.077
数据 Data
71
新冠 Coronavirus
0.017
疫情 Epidemic situation
0.075
疫情 Epidemic situation
65
通关 (Re)open the border
0.017
副作用 Side effect
0.065
国家 Country
64
问题 Problem
0.015
肺炎 Pneumonia
0.064
10 Keywords on COVID-19 Vaccination: An Application of NLP …
131
Methodology). The results are presented in Fig. 10.2. The word count may fluctuate with the quantity of comments in each month (cf. Table 10.1). However, there are still four words rising as salient ones, given that the largest number of comments occurred in March 2021. The words 辉瑞 (Pfizer), 国产 (domestic), and 断子绝孙 (sonless, childless) peaked in February 2021, while 科兴(Sinovac) culminated in April 2021. By inspecting the comments containing “断子绝孙” (sonless, childless) in February 2021, we observed that instances of this curing expression in Chinese that denotes the side effect of infertility due to COVID-19 vaccination were found in rumours on both American (辉瑞 Pfizer) vaccines and domestic (国产) vaccines (cf. Example 1), which are definitely fake news. The most striking imprint of Fig. 10.2 is the prompt decline of the amount of the word “断子绝孙” (sonless, childless) after February 2021 when the mass vaccination began in Macau. Another outstanding place is the higher amount of科兴 (Sinovac) in April than March (March has the largest number of comments). 科兴 (Sinovac) is not one of the two types of COVID19 vaccines being administered in Macau. After examining the concordance of “科 兴” (Sinovac) in the comments, we observed that 科兴 (Sinovac) was mentioned by the dozens because it was also used in Hong Kong and various countries and people were comparing different types of vaccines (cf. Example 2), which corresponds to previous keywords analysis about people’s concerns that they were contrasting the vaccination policies and the vaccines used in different places. Examples: 1. 美国牌断子绝孙针, 效果不容置疑!(10/02/2021). ‘The sonless/childless shot of the American brand has undeniable efficacy!’ 2. 国药同科兴系两种疫苗, 唔同药厂制造, 科兴系民企, 国药系军方研究所研 发, 至于选择边个, 那是个人问题, 没有人强迫你的。如果你认为复必泰好D, 你可以选择打复必泰。自己觉得好就是。(11/05/2021).
Fig. 10.2 Time series analysis of keywords
132
X. Chen et al.
‘These are two types of vaccines developed by Sinopharm and Sinovac. They are manufactured by different companies, with Sinovac being a private enterprise, and Sinopharm being developed by a military research institution. As for which one to choose, it is a personal decision and no one is forcing you. If you think that BioNTech is better, you can choose to get it. It’s all about personal preference.’
10.4 Discussion Our study of the keywords from Macau netizens’ online comments have revealed people’s general concerns surrounding COVID-19 vaccination, that is, vaccine’s efficacy on the virus, the problems and issues (side effects) that may be caused by vaccines, government’s vaccination policies and the hope that vaccines would contain the COVD-19 epidemic situation. We have also identified that the fake news on COVID-19 vaccines crested in February 2021 and vanished suddenly when people in Macau started to be vaccinated on a large scale. This phenomenon supports the predictive persuasiveness of gain/loss-framed messages in health communication established on Prospect Theory proposed by Tversky and Kahneman [18] that Meyerowitz and Chaiken [19] argues that loss-framed messages are more effective in promoting risk-seeking health behaviours, and Rothman and Salovey [20] further articulates that gain-framed messages are more persuasive in advocating preventative health behaviours. We can note that Macau contained the COVID-19 successfully from 2020 to 2021, and it was accordingly a rather risk-free region at that time. Gain-framed messages (e.g. stressing the benefits of vaccination) suit the situation rather well, in particular, vaccination is considered a safe health behaviour (leading to gains) against the disease. By contrast, the loss-framed messages, which are largely expressed by the fake news (e.g. the side effect of infertility due to COVID-19 vaccination), would not prevail against the gain-framed messages due to their ineffective persuasion facing the preventative health behaviour (vaccination). This was indeed demonstrated by the rapid subsidence of misinformation (断子绝孙 sonless, childless) detected by the time series analysis (cf. Figure 10.2). This phenomenon and our explanation resonate with the recent research on the gain/loss-framed messages of various social behaviours under COVID-19 based on Prospect Theory [21, 22].
10.5 Conclusion Our study applied NLP techniques to assist in a discourse analysis of people’s perceptions of and reactions to COVID-19 vaccination in terms of language use. People’s linguistic behaviours were captured by tracing the extracted keywords through text mining and time series analysis. It is hoped that knowledge gained from these analyses can lead to more effective health communication and vaccination promotion in public health campaigns. This study contributes to the recent COVID-19 related
10 Keywords on COVID-19 Vaccination: An Application of NLP …
133
research in connection with linguistics, language processing, discourse analysis, and social science [23–31]. Acknowledgements The first two authors acknowledge the conference grant of the University of Macau (Ref. No.: FAH/CG/2023/002).
References 1. Chen, X., Wang, V.X., Huang, C.-R.: Themes and sentiments of online comments under COVID-19: a case study of Macau. In: Dong, M., Gu, Y., Hong, J.-F. (eds.) Chinese Lexical Semantics. CLSW 2021. LNCS, vol. 13249, pp. 494–503. Springer, Cham (2022). https://doi. org/10.1007/978-3-031-06703-7_39 2. Cinelli, M., Quattrociocchi, W., Galeazzi, A., Valensise, C.M., Brugnoli, E., Schmidt, A.L., Zola, P., Zollo, F., Scala, A.: The COVID-19 social media infodemic. Sci. Rep. 10, 16598 (2020). https://doi.org/10.1038/s41598-020-73510-5 3. Cuello-Garcia, C., Pérez-Gaxiola, G., van Amelsvoort, L.: Social media can have an impact on how we manage and investigate the COVID-19 pandemic. J. Clin. Epidemiol.Clin. Epidemiol. 127, 198–201 (2020). https://doi.org/10.1016/j.jclinepi.2020.06.028 4. Essam, B.A., Abdo, M.S.: How do Arab tweeters perceive the COVID-19 pandemic? J. Psycholinguist. Res. 50, 507–521 (2021). https://doi.org/10.1007/s10936-020-09715-6 5. Gao, J., Zheng, P., Jia, Y., Chen, H., Mao, Y., Chen, S., Wang, Y., Fu, H., Dai, J.: Mental health problems and social media exposure during COVID-19 outbreak. PLoS ONE 15, e0231924 (2020). https://doi.org/10.1371/journal.pone.0231924 6. Han, X., Wang, J., Zhang, M., Wang, X.: Using social media to mine and analyze public opinion related to COVID-19 in China. Int. J. Environ. Res. Public Health 17, 2788 (2020). https://doi. org/10.3390/ijerph17082788 7. Shi, W., Zeng, F., Zhang, A., Tong, C., Shen, X., Liu, Z., Shi, Z.: Online public opinion during the first epidemic wave of COVID-19 in China based on Weibo data. Hum. Soc. Sci. Commun. 9, 159 (2022). https://doi.org/10.1057/s41599-022-01181-w 8. Tsao, S.F., Chen, H., Tisseverasinghe, T., Yang, Y., Li, L., Butt, Z.A.: What social media told us in the time of COVID-19: a scoping review. Lancet Digit. Health. 3, e175–e194 (2021). https://doi.org/10.1016/S2589-7500(20)30315-0 9. Wicke, P., Bolognesi, M.M.: Framing COVID-19: how we conceptualize and discuss the pandemic on Twitter. PLoS ONE 15, e0240010 (2020). https://doi.org/10.1371/journal.pone. 0240010 10. Ferrara, E., Cresci, S., Luceri, L.: Misinformation, manipulation, and abuse on social media in the era of COVID-19. J. Comput. Soc. Sci. 3, 271–277 (2020). https://doi.org/10.1007/s42 001-020-00094-5 11. Rocha, Y.M., de Moura, G.A., Desiderio, G.A., de Oliveira, C.H., Lourenco, F.D., de Figueiredo Nicolete, L.D.: The impact of fake news on social media and its influence on health during the COVID-19 pandemic: a systematic review. J. Public Health (2021) 21, 1-10. https://doi.org/ 10.1007/s10389-021-01658-z 12. Faasse, K., Chatman, C.J., Martin, L.R.: A comparison of language use in pro- and antivaccination comments in response to a high profile Facebook post. Vaccine 34, 5808–5814 (2016). https://doi.org/10.1016/j.vaccine.2016.09.029 13. Puri, N., Coomes, E.A., Haghbayan, H., Gunaratne, K.: Social media and vaccine hesitancy: new updates for the era of COVID-19 and globalized infectious diseases. Hum. Vaccines Immunother. 16, 2586–2593 (2020). https://doi.org/10.1080/21645515.2020.1780846 14. Wilson, S.L., Wiysonge, C.: Social media and vaccine hesitancy. BMJ Glob. Health 5, e004206 (2020). https://doi.org/10.1136/bmjgh-2020-004206
134
X. Chen et al.
15. Huang, C.-R., Chen, K.-J., Chen, F.-Y., Chang, L.-L.: Segmentation standard for Chinese natural language processing. Comput. Linguist. Chin. Lang. Process. 2, 47–62 (1997) 16. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag.Manag. 24, 513–523 (1988). https://doi.org/10.1016/0306-4573(88)90021-0 17. Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411. Association for Computational Linguistics (2004) 18. Tversky, A., Kahneman, D.: The framing of decisions and the psychology of choice. Science 211, 453–458 (1981). https://doi.org/10.1126/science.7455683 19. Meyerowitz, B.E., Chaiken, S.: The effect of message framing on breast self-examination attitudes, intentions, and behavior. J. Pers. Soc. Psychol. 52, 500–510 (1987). https://doi.org/ 10.1037/0022-3514.52.3.500 20. Rothman, A.J., Salovey, P.: Shaping perceptions to motivate healthy behavior: the role of message framing. Psychol. Bull. 121, 3–19 (1997). https://doi.org/10.1037/0033-2909.121.1.3 21. Gantiva, C., Jiménez-Leal, W., Urriago-Rayo, J.: Framing messages to deal with the COVID-19 crisis: the role of loss/gain frames and content. Front. Psychol. 12, 568212 (2021). https://doi. org/10.3389/fpsyg.2021.568212 22. Jiang, M., Dodoo, N.A.: Promoting mask-wearing in COVID-19 brand communications: effects of gain-loss frames, self- or other-interest appeals, and perceived risks. J. Advert. 50, 271–279 (2021). https://doi.org/10.1080/00913367.2021.1925605 23. Asif, M., Zhiyong, D., Iram, A., Nisar, M.: Linguistic analysis of neologism related to coronavirus (COVID-19). Soc. Sci. Humanit. Open. 4, 100201 (2021). https://doi.org/10.1016/j. ssaho.2021.100201 24. Atabekova, A., Lutskovskaia, L., Kalashnikova, E.: Axiology of Covid-19 as a linguistic phenomenon. J. Inf. Sci. 128, 1542 (2022). https://doi.org/10.1177/01655515221091542 25. Bavel, J.J.V., Baicker, K., Boggio, P.S., Capraro, V., Cichocka, A., Cikara, M., Crockett, M.J., Crum, A.J., Douglas, K.M., Druckman, J.N., Drury, J., Dube, O., Ellemers, N., Finkel, E.J., Fowler, J.H., Gelfand, M., Han, S., Haslam, S.A., Jetten, J., Kitayama, S., Mobbs, D., Napper, L.E., Packer, D.J., Pennycook, G., Peters, E., Petty, R.E., Rand, D.G., Reicher, S.D., Schnall, S., Shariff, A., Skitka, L.J., Smith, S.S., Sunstein, C.R., Tabri, N., Tucker, J.A., Linden, S.V., Lange, P.V., Weeden, K.A., Wohl, M.J.A., Zaki, J., Zion, S.R., Willer, R.: Using social and behavioural science to support COVID-19 pandemic response. Nat. Hum. Behav.Behav. 4, 460–471 (2020). https://doi.org/10.1038/s41562-020-0884-z 26. Chen, L.-C., Chang, K.-H., Chung, H.-Y.: A novel statistic-based corpus machine processing approach to refine a big textual data: an ESP case of COVID-19 news reports. Appl. Sci. 10, 5505 (2020). https://doi.org/10.3390/app10165505 27. Gu, J., Xiang, R., Wang, X., Li, J., Li, W., Qian, L., Zhou, G., Huang, C.R.: Multi-probe attention neural network for COVID-19 semantic indexing. BMC Bioinform. 23, 259 (2022). https://doi.org/10.1186/s12859-022-04803-x 28. Lei, S., Yang, R., Huang, C.-R.: Emergent neologism: a study of an emerging meaning with competing forms based on the first six months of COVID-19. Lingua 258, 103095 (2021). https://doi.org/10.1016/j.lingua.2021.103095 29. Wan, M., Su, Q., Xiang, R., Huang, C.R.: Data-driven analytics of COVID-19 ‘infodemic.’ Int. J. Data Sci. Anal. 15, 313–327 (2022). https://doi.org/10.1007/s41060-022-00339-8 30. Wang, X., Ahrens, K., Huang, C.-R.: The distance between illocution and perlocution: a tale of different pragmemes to call for social distancing in two cities. Intercult. Pragmat.. Pragmat. 19, 1–33 (2022). https://doi.org/10.1515/ip-2022-0001 31. Wang, X., Huang, C.-R.: From contact prevention to social distancing: the co-evolution of bilingual neologisms and public health campaigns in two cities in the time of COVID-19. SAGE Open 11, 1–17 (2021). https://doi.org/10.1177/21582440211031556
Chapter 11
Ensemble Machine Learning-Based Network Intrusion Detection System K. Indra Gandhi, Sudharsan Balaji, S. Srikanth, and V. Suba Varshini
Abstract With the rise of cyber-attacks in fields like Education, Government, Banking, etc. the need for better Network Intrusion Detection System is increasing. An intrusion detection system (IDS) monitors the activity of a network of connected computers, to assess the activity of invasive or anomalous patterns. In case an attack is detected the system must respond accordingly. The use of machine learning in IDS is becoming more and more popular and has proven to be better than traditional methods. Different techniques have been explored and applied by others in the past. However, these techniques mostly make use of a single machine learning or hybrid model, with a few exploring ensembling methods. High false positive rates, low detection accuracy and unavailability of proper dataset are some of the prevailing problems. This work proposes an ensemble machine learning-based network intrusion detection system with Random Forest, AdaBoost, and LGBM ensemble models combined with a soft voting scheme as an improvement to currently employed models. The proposed model achieves an accuracy of 99.9% and a low false positive rate of 0.14 on the NSL-KDD benchmark dataset, with similar results on UNSW-NB15 dataset as well.
11.1 Introduction 11.1.1 Background All the information available online are subjected to threats from both internal and external intruders. A vulnerability in a system can led to unauthorized access which can lead to data manipulation. As a result, intrusion detection is critical for securing a network and other assets in cyberspace. Cyber-attacks are changing swiftly in tandem with technological improvements, posing a major threat that necessitates the use of K. Indra Gandhi (B) · S. Balaji · S. Srikanth · V. S. Varshini College of Engineering, Anna University, Guindy 12, Sardar Patel Rd, Guindy, Chennai, Tamil Nadu 600025, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_11
135
136
K. Indra Gandhi et al.
an effective and dependable intrusion detection system (IDS). The system that has to be deployed should ensure an effective deployment, manage the number of alerts and users who enter the system. To develop such a system, the usage of machine learning and deep learning algorithms have gained popularity in terms of prediction accuracy. A network IDS is broadly classified into two types: signature-based IDS and anomalybased IDS. A signature-based IDS, detects intrusions based on a signature database. An anomaly-based IDS on the other hand, detects intrusions by monitoring system activity and categorizing it accordingly. It is more effective in detecting unknown attacks. Many research papers focus on machine learning classifiers and optimized methods to prevent an intrusive packet.
11.1.2 Objective The primary objective is to build an effective ensemble machine learning model employable in an anomaly based network intrusion detection system. This model must be capable of analyzing patterns in network packet data and detect anomalies in them as effectively as possible. The model is to be built by combining the best models identified through research as well as trial and error. This model must also be extendable to perform attack type classification on real time data as well, thereby preventing network-based threats.
11.1.3 Solution Overview The system proposed in this research puts forward a way to combine several ensemble ML models that can be combined with voting classifier that can predict the nature of the packet and classifies whether it is an attack or normal. Furthermore, we intend to classify the type of attack known as multiclass classification. This project aims to combine the outputs of ensemble models and formulate the evaluation metrics. Additionally, live packets are captured using Wireshark that are considered as test cases and passed into the detection system which gives an output whether it is an attack or normal thereby generating a report. At first, we have to choose the correct dataset with attributes and relevant features for analysis, then use ensemble machine learning classifiers that has lower misclassification rate thereby building an effective model that can analyze clearly whether the given packet is an attack or normal.
11 Ensemble Machine Learning-Based Network Intrusion Detection System
137
11.2 Literature Survey Musa et al. [1] have identified NSL-KDD as the most commonly used dataset for this type of project. In addition, about 7 datasets were widely used: KDD Cup’99, NSL-KDD, Kyoto2006+, AWID, CIC-IDS2017, UNSW NB-15, and UGR’16. The NSL-KDD train data set consists of 125,973 records and the test data set contains 22,544 records. This advantage allows experiments to be performed on the full set without having to randomly select a small portion. Thus, the evaluation results of different studies are consistent and comparable. The UNSW-NB15 dataset includes nine types of attacks: fuzzer, analytics, backdoor, DoS, exploit, generic, reconnaissance, shellcode, and worm. We use the Argus, Bro-IDS tools and develop 12 algorithms to generate a total of 49 class label features. To determine the best classifier for intrusion detection some major evaluation metrics are used: accuracy, precision, and F1 score. Experimental results show that the Random Forest classifier out performs other classifiers in terms of 87% accuracy, 98% accuracy, and 84% Fmeasure. The Random Forest classifier takes the highest weight as in [2]. There are several baselines of various ML classifiers available in this area of study, developed using the following classification algorithms such as decision trees, k-nearest neighbors, Random Forests, and support vector machines that have been reviewed in [3]. Statistical methods such as mean and standard deviation were chosen to handle large false-negative results [4]. Comparative analysis of SVM and Naive Bayes classifiers by Halimaa et al. [5] showed that the SVM method performed better than the Naive Bayes method with higher accuracy (94%) and lower misclassification rate. Furthermore, a study was conducted on seven different datasets from 2015 to 2020 referenced by Musa et al. [1], we narrowed the study based on ensemblebased classifiers and hybrid machine learning. Combining multiple ML classifier algorithms using vote classifiers provides better accuracy as the model performance is not compensated by other models such as Sridevi et al. [6]. Using a combination of SVM, KNN, and a Random Forest using voting classifiers, we were able to achieve higher performance than either classifier alone. The following method of learning how to build NIDS using different learning algorithms paved the way for using a combination of deep learning algorithms, an interesting area of research that uses layered networks for processing. It is widely used in image classification, object detection, and pattern recognition. As noted by Varanasi et al. [7] deep learning methods are difficult to use for attack detection. One such paper uses CNN-1D time series showing that CNNs can be applied to tabular datasets such as NSL-KDD, as described in [8]. A general overview of the attacks that can be introduced to validate the model is discussed in [9]. This document contains logs with detailed semantic information suitable for detecting SQL injection, U2R and R2L attacks, and communication package contents suitable for detecting U2L and R2L attacks. This thread introduces the entire network environment to detect DOS and Probe attacks. A NIDS based on ensemble machine learning was developed by Kiflay et al. [10] which combines four ensemble ML classifiers: Random Forest, AdaBoost, XGBoost, and Gradient
138
K. Indra Gandhi et al.
Boosting Decision Tree to increase the efficiency of attack detection and reduce false positives. Therefore, we narrow the topic to an ensemble machine learning classifier algorithm combined with a vote classifier [11] using a resampling method.
11.3 Problem Description The importance for network security mechanisms due to the rising rate of cyber threats and network attacks has grown exponentially. Attackers are constantly looking for methods to exploit any vulnerability in the network. IDS’s are one solution to this problem. The classification function is critical in such intrusion detection systems for determining if traffic is normal and classifying them. IDS models are also affected by problems such as high false positive rates and low detection accuracy. It is also rather laborious to identify and extract the significant features. Thus, the aim is to develop an ensemble machine learning-based network intrusion detection system that detect anomalies and classifies the attack type and generate reports on captured packets.
11.3.1 Technical Architecture The technical architecture which depicts the flow of data through the different preprocessing steps and finally to the anomaly classifier, the results of which are analyzed and interpreted using the report generation unit, as shown in Fig. 11.1.
Fig. 11.1 Proposed architecture
11 Ensemble Machine Learning-Based Network Intrusion Detection System
139
Fig. 11.2 Attack class distribution in NSL-KDD
11.4 Proposed Work The major components of this work include feature selection methods (ANOVA, Information gain, Recursive Feature Elimination) and anomaly classification (labels: Normal and attack) using soft voting.
11.4.1 Datasets The benchmark data to train and test our model are from the NSL-KDD and UNSWNB15 public IDS datasets. KDD CUP 99 [12, 13] was created by processing tcp dump data of the 1998 DARPA intrusion detection system. NSL-KDD was created using KDD CUP 99 dataset. UNSW [14, 15] dataset was created by IXIA PerfectStorm in the cyber range lab of UNSW Canberra, Australia. NSL-KDD benchmark dataset. It has four attack categories: Normal, DOS, Probe, U2R, and R2L which is depicted in Fig. 11.2. UNSW-NB15 on the other hand has nine attack categories: Fuzzers, Analytiis, Backdoor, DoS, Exploit, Generic, Reconnaissance, Shellcode and Worms.
11.4.2 Dataset Preprocessing As shown in the Fig. 11.1 this module processes given raw dataset and performs the following for both the datasets: 1. Feature mapping and attack class mapping: Before applying scaling, encoding, feature selection we have to provide proper headers to the features and map the
140
K. Indra Gandhi et al.
types of attacks to either 0 or 1, where 0 represents normal and 1 represents anomaly. 2. Data scaling: Some of the features in the dataset are not in the same scale. This may lead to bias toward higher values in the model. Therefore, we perform data scaling using min max scaler. 3. Data encoding: Categorical values are converted to numerical n-dimensional vectors using one hot encoding methodology. 11.4.2.1
Feature Selection
The following feature selection methods have been explored in this work: ANOVA, Info Gain, Recursive Feature Elimination (RFE), and Random Forest Importances. The following have been discovered as the most important and common features. {‘service’, ‘dst_host_same_srv_rate’, ‘count’, ‘same_srv_rate’, ‘dst_host_srv_count’, ‘flag’, ‘logged_in’}
11.4.2.2
The “Service” Feature
Using RFI the following have been found to be most important in the binary classification of packets. {‘http’, ‘domain_u’, ‘private’, ‘smtp’, ‘ftp_data’, ‘other’, ‘eco_i’, ‘ecr_i’, ‘urp_i’, ‘telnet’, ‘ftp’, ‘IRC’, ‘ntp_u’, ‘Z39_50’, ‘imap4’}
11.4.3 Anomaly Classifier Engine The processed data from both datasets is then passed to the anomaly classifier engine. The engine consists of 2 soft voting ensemble models for each dataset. Random Forest [16] along with AdaBoost and LGBM have been combined using soft voting for NSL-KDD and UNSW-NB15, respectively.
11.4.3.1
Soft Voting
After comparison of hard and soft voting methodologies, we discovered that soft voting produces better accuracy than hard voting. The Fig. 11.3 shows the algorithm for soft voting.
11 Ensemble Machine Learning-Based Network Intrusion Detection System
141
Fig. 11.3 Soft voting algorithm [10]
11.4.4 Evaluation Metrics Models on both datasets have been evaluated using the following evaluation metrics: • • • • •
Accuracy Recall Precision F1-score False positive rate
11.4.5 Report Generation Once test data is provided, either in real time or labeled, the report generation unit processes the classification results to provide necessary information to the end user (% of attacks captured, time of last detected attack and its details, etc.). This system integrates the various modules and after preprocessing of the dataset, leverages the best outcomes of ML classifiers such as RFC, AdaBoost, LightGBM using soft voting classifier taking the results from feature selection.
11.5 Results Among the various feature selection methods and voting methods, ANOVA feature selection with soft voting has given the highest accuracy and lowest false positive rate as shown in Table 11.1. Performance and evaluation metrics of the model for NSL-KDD and UNSWNB15 are shown in Tables 11.2 and 11.3, respectively. Performance of the proposed system is significantly higher than individual classifiers, thus confirming our hypothesis. The proposed model, although developed with NSL-KDD in mind, shows similar results for the UNSW-NB15 benchmark dataset as well as shown in Table 11.3.
142
K. Indra Gandhi et al.
Table 11.1 Performance metrics of various feature selection methods Feature selection
Voting
Accuracy
Recall
Precision
F1-score
FPR
ANOVA
Hard
98.45
96.96
97.25, 99.8
98.5, 98.37
0.16
Info Gain
Hard
97
94.48
95.09, 99.41
97.24, 96.89
0.51
RFI
Hard
98.67
97.44
97.67, 99.79
98.73, 98.6
0.18
RFE
Hard
98.73
97.55
97.77, 99.81
98.79, 98.67
0.16
ANOVA
Soft
99.99
99.9
99.9, 99.8
99.9, 99.9
0.14
Info Gain
Soft
99.19
99.17
99.23, 99.15
99.22, 99,16
0.78
RFI
Soft
99.86
99.95
99.95, 99.77
99.87, 99.86
0.20
RFE
Soft
99.89
99.97
99.98, 99.79
99.89, 99.88
0.18
Table 11.2 NSL-KDD
Metric
Score
Accuracy
99.99
Recall
99.9
Precision
99.9, 99.8
F1-score
99.9, 99.9
FPR
Table 11.3 UNSW-NB15
0.14
Metric
Score
Accuracy
99.99
Recall
99.9
Precision
99.9, 99.8
F1-score
99.9, 99.9
FPR
0.14
11.6 Conclusion and Future Work In this work, we propose an anomaly based network intrusion detection system by combining ensemble ML models with a soft voting scheme. The individual ensemble models were chosen by comprehensively evaluating their performance in comparison to classical ML classifiers on benchmark IDS datasets (NSL-KDD, UNSW-NB15), along with various feature selection methodologies (ANOVA, RFE, RFI, Info gain). The proposed model outperforms traditional single model classifiers in every metric, achieving an accuracy of 99.9% and a sub 1% false positive rate of 0.14.
11 Ensemble Machine Learning-Based Network Intrusion Detection System
143
Even though this model is highly efficient in attack anomaly prediction, it can still be extended to incorporate multiclass classification. This model can be deployed on a physical or simulated network to predict attack and attack class with real time data. In such a real time scenario, all the control nodes on a network can be alerted when attack detection occurs. The effect of various types of attacks and IDS alert flood messages can further be analyzed by performing queue theory analysis on the queueing nodes in a network.
References 1. Musa, U. S., Chhabra, M., Ali, A., Kaur, M.: Intrusion detection system using machine learning techniques: a review. In: Proceedings of the 2020 International Conference on Smart Electronics and Communication (ICOSEC), pp. 149–155. IEEE (2020) 2. Almomani, O., Almaiah, M.A., Alsaaidah, A., Smadi, S., Mohammad, A.H., Althunibat, A.: Machine learning classifiers for network intrusion detection system: comparative study. In: Proceedings of the 2021 International Conference on Information Technology (ICIT), pp. 440– 445. IEEE (2021) 3. Shah, A., Clachar, S., Minimair, M., Cook, D.: Building multiclass classification baselines for anomaly-based network intrusion detection systems. In: Proceedings of the 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), pp. 759–760. IEEE (2020) 4. Das, A., Sunitha, B.S., et al.: Anomaly-based network intrusion detection using ensemble machine learning approach. Int. J. Adv. Comput. Sci. Appl. 13(2), 275 (2022) 5. Halimaa, A., Sundarakantham, K.: Machine learning based intrusion detection system. In: Proceedings of the 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI), pp. 916–920. IEEE (2019) 6. Sridevi, S., Prabha, R., Narasimha Reddy, K., Monica, K.M., Senthil, G.A., Razmah, M.: Network intrusion detection system using supervised learning based voting classifier. In: Proceedings of the 2022 International Conference on Communication, Computing and Internet of Things (IC3IoT), pp. 01–06. IEEE (2022) 7. Varanasi, V.R., Razia, S.: Network intrusion detection using machine learning, deep learninga review. In: Proceedings of the 2022 4th International Conference on Smart Systems and Inventive Technology (ICSSIT), pp. 1618–1624. IEEE (2022) 8. Verma, A.K., Kaushik, P., Shrivastava, G.: A network intrusion detection approach using variant of convolution neural network. In: Proceedings of the 2019 International Conference on Communication and Electronics Systems (ICCES), pp. 409–416. IEEE (2019) 9. Liu, H., Lang, B.: Machine learning and deep learning methods for intrusion detectionsystems: a survey. Appl. Sci. 9(20), 4396 (2019) 10. Kiflay, A.Z., Tsokanos, A., Kirner, R.: A network intrusion detection system using ensemble machine learning. In: Proceedings of the 2021 International Carnahan Conference on Security Technology (ICCST), pp. 1–6. IEEE (2021) 11. Scikit-Learn: VotingClassifier (2021). https://scikitlearn.org/stable/modules/generated/skl earn.ensemble.VotingClassifier.html. Accessed 01 Feb 2021 12. Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A.: A detailed analysis of the KDD CUP 99 data set. In: Proceedings of the 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, Ottawa, ON, Canada, pp. 1–6 (2009). https://doi.org/10. 1109/CISDA.2009.5356528 13. Cup, K.D.D.: Dataset. http//kdd.edu/databases/kddcup99/kddcup99.html (1999)
144
K. Indra Gandhi et al.
14. Moustafa, N., Slay, J.: UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In: Proceedings of the 2015 Military Communications and Information Systems Conference, MilCIS 2015—Proceedings, pp. 1–6 (2015). https://doi.org/10.1109/MilCIS.2015.7348942 15. Moustafa, N.: The UNSW-NB15 dataset. UNSW, Sydney (2019). https://doi.org/10.26190/5d7 ac5b1e8485 16. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Chapter 12
Household Power Consumption Analysis and Prediction Using LSTM Varsha Singh, Abhishek Karmakar, Sharik Gazi, Shashwat Mishra, Shubhendra Kumar, and Uma Shanker Tiwary
Abstract Time-series forecasting emphasizes a specific future value prediction over a specific period. By utilizing the available time resources, forecasting assists in estimating significant values beyond time. Numerous real-world forecasting applications exist that aim to estimate prominent values over time by exploiting the current time resources. Forecasting has a variety of applications in the real world. One of its applications in electric power consumption is the detection of unusual consumption of electricity in future by comparing it with the probable forecasted power consumed estimation. After that, it becomes beneficent to figure out the unintended utilization of power supplements. Electric power consumption for household goods is one of the time-series datasets consisting of residential power consumption by individual houses. In an early experiment on this dataset, the output of the model trained on the linear regression algorithm and artificial neural network algorithm has been compared. Experimenting with this dataset, this work presented is based on predicting future power consumption using an LSTM network. Also, the demonstration of the LSTM model with a better end result compared with the linear regression approach as well as the proposal of the neural network has been proclaimed.
V. Singh (B) · A. Karmakar · S. Gazi · S. Mishra · S. Kumar · U. S. Tiwary Indian Institute of Information Technology, Allahabad, Uttar Pradesh 211015, India e-mail: [email protected] A. Karmakar e-mail: [email protected] S. Gazi e-mail: [email protected] S. Mishra e-mail: [email protected] S. Kumar e-mail: [email protected] U. S. Tiwary e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_12
145
146
V. Singh et al.
12.1 Introduction The smart grid system has made a remarkable change in today’s world with its variety of operations. Its metering architecture becomes more advanced when integrated with energy forecasting technology. A well-founded energy power consumption model points out a demand in today’s world at an individual level of household electricity consumption as it creates favorable circumstances for power sector department to understand the future demands in the ways of its refinement in power supply. A technology which can help us to forecast demands in case of shortage or to understand behaviors of consumers would refer to the fundamentals of the expected future. The energy is measured according to the power consumed per unit time. Forecasting or regression analysis assists to incorporating data with reference to certain given input values. Regression typically means to estimate the past, present or future data, whereas forecasting refers to future value prediction. Since forecasting is also a kind of regression analysis, the basic work carried on individual power datasets mentioned in Sect. 12.3 is linear regression and neural network. Our proposal is to suggest an LSTM network for this approach as it can store sequences in memories. For the presentation of this research work, we have divided it into the following parts: Sect. 12.2 explains the literature review in the field of forecasting related to similar kind of dataset. Section 12.3 consists of the data description; Sect. 12.4 consists of the general framework where the architecture and methodology of the research framework has been explained. The output of each component had also been forecasted in this part. Section 12.5 is the conclusion. Here the comparison table as well as the overall result has been presented.
12.2 Literature Review An application of forecasting for non-typical consumption of electricity by comparing with predicted future power consumed is demonstrated in the work [1]. A comparison-based study on linear regression and neural network on household residential power consumption dataset was performed in [2]. For an efficient use of electricity, the forecasting of individual household electric power consumption dataset consists of time-series minute sampled data that has come into focus in recent years [3–10]. In [3] a deep-RNN approach has been used for forecasting of the household Irish dataset. Another approach using a hybrid model for CNN-LSTM work for forecasting has been selected for many-to-many model in [4]. A proposal of streaming classifier in [5] for electric consumption has been suggested. In [6] AI, algorithms were used for the future values prediction in campus micro grid. Another similar work has the approach of ensemble learning in [7]. The works in [8–10] has been done household consumption based on deep learning approaches. The machine learning and deep learning technologies have proved to be advantageous for energy power consumption prediction due to its forecasting capabilities
12 Household Power Consumption Analysis and Prediction Using LSTM
147
on a nonlinearity dataset. Different forecasting models show different accuracies. A model which shows the least error proves to be a better model. In this research work, we have used a LSTM-RNN many-to-one model in oneminute time stamp. In total, seven many-to-one LSTM-RNN models has been designed for prediction of seven attributes from the individual household electric power consumption dataset. An early work has been presented using linear regression and neural network [2] on the same data file. Our work shows better R square values compared with those models.
12.3 Data Description This dataset is taken from UCI machine learning repository. It contains 2,075,259 samples consisting of sixty seconds sampling rate for a period of ~ 4 years, starting from Dec, 2006 to Nov, 2010. The values have been recorded from a house located in Sceaux, France (9.7 km from center of Paris). The attribute information is as follows: 1. Date: The pattern for date is day-month-year format as “dd-mm-yyyy” 2. Time: The scheme for time is recorded as hour-month-second as “hh:mm:ss” 3. Global active power: The data recorded consists of active power averaged on the basis of minute (in kilowatt) 4. Global reactive power: The data recorded consists of reactive power averaged on the basis of minute (in kilowatt) 5. Voltage: Voltage averaged on the basis of minute (in volt as S.I. unit) 6. Global intensity: Intensity values consists of the average on the basis of minute (in ampere as S.I. unit) 7. Submetering 1: The data recorded consists of active energy consumed in kitchen (storage of active energy in watt-hour format) 8. Submetering 2: The data recorded consists of active energy consumed in laundry (storage of active energy in watt-hour format) 9. Sub_metering_3: The data recorded consists of active energy consumed in household appliances (storage of active energy in watt-hour format).
148
V. Singh et al.
12.4 General Framework The sole general purpose of implementing an LSTM model is to fit and predict the power consumption of household datasets because it is best suited for large data, time series analysis, and solving sequential problems. The main purpose of this experiment is to achieve more accurate prediction results compared with the present work. In this section, we shall discuss the structure overview behind this model evaluation. The overall architecture has been proclaimed in Fig. 12.1. Consideration of the amount of input data is important to balance model accuracy and computation cost. We had divided the dataset into 80 and 20% as the same fundamental been taken in [2]. We didn’t sample the data into hour, day, and week or quarter wise as our main focus is to show how LSTM works better than linear regression and neural network. Its framework consists of basic four steps, as shown in Fig. 12.2. Each of the specific methodology shown in the Fig. 12.2 has been explained here and correlation matrix of working dataset in Fig. 12.3.
Fig. 12.1 Overall architecture
12 Household Power Consumption Analysis and Prediction Using LSTM
149
Fig. 12.2 Four basic steps
Fig. 12.3 Correlation matrix of working dataset
12.4.1 Handling Insignificant Data—Data Cleaning Please note that the dataset consists unwanted string values represented as ‘?’. For the data cleaning part such insignificant values have been replaced by NULL. Furthermore, we had replaced all the values present in the dataset which lies outside of
150
V. Singh et al.
Fig. 12.4 Reframed table proclaims the current time as output with respect to the prior time as input
interquartile range with NULL as those as outliers. As we are now confirmed of the unimportant sample points, we replace all of them with the mean of that specific attribute. This process is followed by normalization so that it becomes an organized input data.
12.4.2 Reframing the Dataset In this section, we will be describing the reframing of the dataset. We had trained seven LSTM models pertaining to ‘Global_active_power’, ‘Global_reactive_ power’, ‘Voltage’, ‘Global_intensity’, ‘Sub_metering_2’, ‘Sub_metering_1’, and ‘Sub_metering_3’. In order to design the learning algorithm as a supervised machine learning problem for predicting any of the abovementioned attributes at the present time (t), all the seven attributes have been contemplated at prior time (t − 1). Here in this image shown below at Fig. 12.4, global active power at time t becomes the output attribute, whereas rest all other attributes at time (t − 1) becomes input. The time stamp considered is 1 min. Likewise, all other attributes has been taken into consideration for obtaining seven different many-to-one models.
12.4.3 Splitting the Dataset The training to test data proportion used for LSTM model is 4:1 collected from the reframed data. The training data consisted of 1,660,206 samples, whereas the test data consists of 415,052 samples.
12 Household Power Consumption Analysis and Prediction Using LSTM
151
12.4.4 Training and Validating the Dataset The training to test data proportion used for LSTM model is 4:1 collected from the reframed data. Figure 12.5 shows the LSTM-RNN architecture for the same. Given Below Are the Settings Used to Train the LSTM Model Global Active Power • • • •
100 units in the first visible layer of LSTM 1000 units in first dense layer 30 units in second dense layer 1 unit for output in dense layer.
The loss function of mean squared error has been used with Adam optimizer. We have trained the model on 25 epochs with a batch size of 70. Figures 12.6 and 12.7 show the corresponding outputs. Global Reactive Power • • • •
100 units in the first visible layer of LSTM 1000 units in first dense layer 100 units in second dense layer 1 unit for output in dense layer.
Fig. 12.5 LSTM-RNN architecture
152
V. Singh et al.
Fig. 12.6 First 2000 initial values of the test data1 (Global active power)
Fig. 12.7 Prediction on the test data1
The loss function of mean squared error has been used with Adam optimizer. In total, 12 epochs have been used with a batch size of 100 to train the model. Figures 12.8 and 12.9 show the corresponding outputs.
Fig. 12.8 First 2000 initial values of the test data2 (Global reactive power)
12 Household Power Consumption Analysis and Prediction Using LSTM
153
Fig. 12.9 Prediction on the test data2
Voltage • • • •
100 units in the first visible layer of LSTM 2000 units in first dense layer 100 units in second dense layer 1 unit for output in dense layer.
The loss function of mean squared error has been used with Adam as optimizer. We have trained the model on 10 epochs with a batch size of 100. Figures 12.10 and 12.11 show the corresponding outputs. Global Intensity • • • •
100 units in the first visible layer of LSTM 1000 units in first dense layer 100 units in second dense layer 1 unit for output in dense layer.
The loss function of mean squared error has been used with the Adam optimizer. The training was done on 10 epochs with a batch size of 100. Figures 12.12 and 12.13 show the corresponding outputs.
Fig. 12.10 First 2000 initial values of the test data3 (Voltage)
154
V. Singh et al.
Fig. 12.11 Prediction on the test data3
Fig. 12.12 First 2000 initial values of the test data4 (Global intensity)
Fig. 12.13 Prediction on the test data4
Sub_metering_1 • • • •
200 units in the first visible layer of LSTM 1000 units in first dense layer 100 units in second dense layer 1 unit for output in dense layer.
The loss function of mean squared error has been used with the Adam optimizer. We have trained the model on 10 epochs with a batch size of 100. Figures 12.14 and 12.15 show the corresponding outputs.
12 Household Power Consumption Analysis and Prediction Using LSTM
155
Fig. 12.14 First 2000 initial values of the test data5 (Sub_metering_1)
Fig. 12.15 Prediction on the test data5
Sub_metering_2 • • • •
100 units in the first visible layer of LSTM 1000 units in first dense layer 100 units in second dense layer 1 unit for output in dense layer.
The loss function of mean squared error has been used with the Adam optimizer on 15 epochs with a batch size of 100. Figures 12.16 and 12.17 show the corresponding outputs. Sub_metering_3 • • • • •
100 units in the first visible layer of LSTM 5000 units in first dense layer 500 units in second dense layer Dropout of 10% 1 unit for output in dense layer.
The loss function of mean squared error has been used with the Adam optimizer on 20 epochs with a batch size of 100. Figures 12.18 and 12.19 show the corresponding outputs.
156
Fig. 12.16 First 2000 initial values of the test data6 (Sub_metering_2)
Fig. 12.17 Prediction on the test data6
Fig. 12.18 First 2000 initial values of the test data7 (Sub_metering_3)
V. Singh et al.
12 Household Power Consumption Analysis and Prediction Using LSTM
157
Fig. 12.19 Prediction on test data7
12.5 Conclusion The experimentation has been carried out for a better generalized intuition in the field of energy forecasting. Mainly this paper work has been carried out on LSTM network. For each column, separate models have been trained from scratch using the LSTM network. Our LSTM-RNN model has been compared with the linear regression and neural network in [2] using the R2 score metric. The comparison Table 12.1 is given below (Table 12.2). Table 12.1 Comparative analysis between linear regression, MLP regressor and LSTM using R2 score metric Features
Linear regression
MLP regressor
LSTM
Global active power
0.409
0.6679
0.940
Global reactive power
0.008
0.034
0.844
Voltage
0.073
0.085
0.937
Global intensity
0.394
0.6647
0.937
Sub_metering_1
–
–
0.872
Sub_metering_2
–
–
0.863
Sub_metering_3
–
–
0.956
As a part of conclusion the results demonstrate the behavior of each component trained on LSTM-RNN model. It shows the performance of LSTM to be better than linear regression and neural network when trained for single attribute output. This network has been modeled for many-to-one output.
Error
Difference between prediction and actual value
Displot of test and predicted
Attributes
MAE: 0.088 MSE: 0.045 RMSE: 0.213 R2 = 0.940
Difference of variance = 0.00049
Global active power
Voltage
MAE: 0.024 MSE: 0.002 RMSE: 0.045 R2 = 0.844
MAE: 0.468 MSE: 0.385 RMSE: 0.621 R2 = 0.945
Difference of Difference of variance variance = = 0.00121 0.00095
Global reactive power
Table 12.2 Model evaluations on various parameters mentioned in Sect. 3.4
MAE: 0.409 MSE: 0.835 RMSE: 0.914 R2 = 0.937
Difference of variance = 0.00041
Global intensity
MAE: 0.355 MSE: 3.753 RMSE: 1.937 R2 = 0.872
Difference of variance = 0.00028
Sub metering 1
MAE: 0.686 MSE: 3.169 RMSE: 1.780 R2 = 0.863
Difference of variance = 0.0011
Sub metering 2
MAE: 0.870 MSE: 3.006 RMSE: 1.751 R2 = 0.956
Difference of variance = 0.00501
Sub metering 3
158 V. Singh et al.
12 Household Power Consumption Analysis and Prediction Using LSTM
159
References 1. Kamunda, C.: A study on efficient energy use for household appliances in Malawi. Malawi J. Sci. Technol. 10(1), 128 (2014) 2. Li, C., et al.: Climate change and dengue fever transmission in China: evidences and challenges. Sci. Total. Environ. 622, 493–501 (2018) 3. Shi, H., Xu, M., Li, R.: Deep learning for household load forecasting—a novel pooling deep RNN. IEEE Trans. Smart Grid 9(5), 5271–5280 (2018) 4. Kim, T.Y., Cho, S.B.: Predicting the household power consumption using CNN-LSTM hybrid networks. In: Intelligent Data Engineering and Automated Learning: IDEAL 2018. Lecture Notes in Computer Science, vol. 11314. Springer, Cham (2018) 5. Loginov, A., Heywood, M.I., Wilson, G.: Benchmarking a coevolutionary streaming classifier under the individual household electric power consumption dataset. In: Proceedings of the 2016 International Joint Conference on Neural Networks, pp. 2834–2841 (2016) 6. Hajjaji, I., et al.: Evaluation of artificial intelligence algorithms for predicting power consumption in university campus microgrid. In: Proceedings of the 2021 International Wireless Communications and Mobile Computing (IWCMC). IEEE (2021) 7. Pinto, T., et al.: Ensemble learning for electricity consumption forecasting in office buildings. Neurocomputing 423, 747–755 (2021) 8. Lim, C., Choi, H.: Deep learning-based analysis on monthly household consumption for different electricity contracts. In: Proceedings of the 2020 IEEE International Conference on Big Data and Smart Computing, pp. 545–547 (2020) 9. Hyeon, J., Lee, H., Ko, B., Choi, H.-J.: Building energy consumption forecasting: Enhanced deep learning approach. In: Proceedings of the 2nd International Workshop on Big Data Analysis for Smart Energy, pp. 22–25 (2020) 10. Liang, K., Liu, F., Zhang, Y.: Household power consumption prediction method based on selective ensemble learning. IEEE Access 8, 95657–95666 (2020)
Chapter 13
Indian Stock Price Prediction Using Long Short-Term Memory Himanshu Rathi , Ishaan Joardar , Gaurav Dhanuka , Lakshya Gupta, and J. Angel Arul Jothi
Abstract In recent years, numerous researchers across the world have developed various methods for predicting stock prices. However, the accuracy of these models has been found to be inconsistent. This field, known as stock market prediction and analysis, offers potential for further improvement. This paper proposes a framework based on a long short-term memory (LSTM)-based deep learning model, capable of accurately predicting the closing prices of companies listed on the National Stock Exchange (NSE) or Bombay Stock Exchange (BSE) of India. The LSTM model was trained using historical stock market data of Tata Motors and demonstrated a high degree of accuracy in predicting future price movements. The LSTM approach was found to be superior to other methods in terms of accuracy and precision.
13.1 Introduction Stock market prediction in its very essence is an attempt to predict or forecast the future value of an equity stock or similar financial instruments which are traded in the stock market. Successfully predicting stock prices is one of the most sought after research fields due to its capacity to induce huge profits for stakeholders and other investors. H. Rathi (B) · I. Joardar · G. Dhanuka · J. Angel Arul Jothi Department of Computer Science, Birla Institute of Technology and Science, Pilani, Dubai Campus, Dubai, UAE e-mail: [email protected] I. Joardar e-mail: [email protected] G. Dhanuka e-mail: [email protected] J. Angel Arul Jothi e-mail: [email protected] L. Gupta Delhi Technological University, Delhi, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_13
161
162
H. Rathi et al.
Given the unpredictable nature of stock market dynamics, financial institutions and research scientists have been working to develop a software to estimate the near term potential valuation of stock prices in order to make better-informed judgment calls as opposed to those premised on assessment of finite customary data sources that only provides slight knowledge and insight. Such a prognostic system would provide a means of determining the likelihood that the trends in the stock market will increase, remain steady, or depreciate, and based on this knowledge, traders may make wise judgments about when to trade stocks to maximize their chances of capitalizing. Research has been done using the fundamental analysis technique, which argues that a stock’s value is defined by its inherent worth and anticipated income, to develop trading strategies. This approach uses elements such as market experience, financial data, and the company’s background and value. While this type of analysis is useful for examining long-term patterns, it is not considered useful for short- or mediumterm patterns. Another approach under study involves using existing financial market statistics, such as previous share prices and volume information, to predict a company’s equity worth in the future. This technique is based on the idea that historical patterns can be used to forecast the direction and development of the market in the near term. The subject of stock forecasting has advanced significantly over the past decade thanks to the application of a variety of machine learning techniques, including decision trees, SVM, regression, and others. Artificial neural network (ANN) is also utilized in this field to predict stock prices. The idea that data mining can be used for stock price forecasting, in general, is based on the analytical inference that data displays certain patterns that can be identified utilizing mining techniques. This technique allows different machine learning models to be employed to accurately predict future market developments. Once these trends in data have been identified, the possible short- to medium-term stock value can be predicted. Stock market prediction is an important sought out tool by individuals and organizations alike as company may use stock market prediction to help it make decisions about when to raise capital or when to invest in new projects. By understanding the likely direction of the market, businesses can make more informed decisions that can help them grow and succeed. By analyzing market trends and other factors, investors can gain a better understanding of the direction that the market is likely to take, which can help them make more strategic decisions about their investments. This can help investors maximize their returns and minimize their risks. Our proposed model is targeted toward individual investors who are in the market for short time. Our model can help them in predicting the direction of the market over the next few days or weeks. Our LSTM stock prediction model consists of three LSTM layers and three dense layers, all with a ReLU activation function. The model is trained on a dataset that has been pre-processed to normalize the values of all variables, so that no single variable has an outsized influence on the model’s predictions. The LSTM layers are designed to capture long-term dependencies in the data, while the dense layers provide additional nonlinearity and allow the model to make
13 Indian Stock Price Prediction Using Long Short-Term Memory
163
more complex predictions. Overall, this architecture allows our model to accurately predict the direction of the stock market using historical data. The remainder of this paper is organized as: Literature review of the published work in Sect. 13.2. Dataset description in Sect. 13.3. Proposed methodology in Sect. 13.4. Implementation in Sect. 13.5. Results and Discussion in Sect. 13.6. Conclusion in Sect. 13.7.
13.2 Literature Survey The authors propose a data model based on rough set theory that improves upon existing models [1]. The model was tested on stocks traded on the Kuwait stock exchange over a five-year period using MATLAB. The model reduced the number of generated decision rules compared to previous models. An integrated framework using genetic network programming was also proposed to predict the one-day return of stocks on the Tehran Stock Exchange [2]. This model used technical indicators to extract rules and reduced the error rate by up to 16% compared to previous models. The authors proposed a deep learning LSTM model that incorporates both numerical and text-based analysis to predict stock market trends [3]. The model incorporates external factors that can affect a company’s stock price and showed improved performance compared to similar models that only used numerical analysis. The researchers also proposed new CNN-based frameworks for market prediction which showed an increase in performance compared to standard CNN models [4, 5]. These models were tested on various stock indices and outperformed other strategies over a longer period of time. In [6], the authors propose a deep stock-trend prediction neural network (DSPNN) model that uses knowledge graph and graph embedding techniques to select relevant trading information for individual stocks. The proposed model is able to achieve a trend prediction accuracy of more than 70%, which is significantly higher than the current accuracy of other models. It also showed an increase in performance compared to traditional LSTM models. The authors propose an automated decision tree-adaptive neuro-fuzzy hybrid system for stock market prediction [7]. The model uses traditional technical analysis and decision trees for feature extraction and incorporates 18 distinct technical indicators. The proposed hybrid model is effective at predicting trends in fast-moving markets. The authors also propose a new method called the High-Low Point (HLP) for stock price prediction [8]. This method uses data pre-processing to focus on crucial high and low points and achieves a lowest prediction error of 7.16% with 20 neurons. Shanoli et al. used the type-2 fuzzistics methodology to develop IT2 FS models for linguistic values [9]. The model was tested on the sensex, composite index, and capitalization weighted stock index and showed improved performance compared to another model. The authors also proposed a modular neural network model for prediction of stocks on the Tokyo Stock Exchange that used a high-speed learning algorithm called supplementary learning [10]. The model was able to predict buy and sell times for stocks for 33 months.
164
H. Rathi et al.
The authors in [11] have introduced a 2D Deep Convolutional Neural Network (CNN) for identifying stock trading spots. The model uses 15 technical indicators with different time intervals and settings and was shown to outperform “Buy and Hold” and other models in out-of-sample testing. The authors suggest that incorporating a long-short strategy could potentially increase earnings. In [12], machine learning methods such as Linear Regression and Support Vector Machine were used to predict the closing prices of Bajaj Finance. The model uses historical and current trading data and was able to provide highly accurate forecasts. The paper in [13] proposes a stock market prediction tool using machine learning models such as Random Forest and Support Vector Machine. The model was tested on the Dow Jones Industrial Average Index and was able to provide accurate predictions. In [14], the authors use sentiment analysis to predict the stocks of companies. They use user comments from Twitter and StockTwits and apply sentiment analysis to obtain a sentiment score. This score is combined with market data and used to make predictions. The paper suggests that a company’s sentiment score should be considered when making investment decisions. The authors in [15] propose using kernel adaptive filtering (KAF) to improve the accuracy of stock price predictions by using both local and global models. The model outperforms baseline algorithms by 3–11% in terms of F-measure. In [16], the potential of LSTM for stock market forecasting is demonstrated using data from the Chinese stock market. The model’s accuracy can be improved through normalization and appropriate selection of the SSE Index and stock set. However, the provided model does not significantly improve upon previous models. In [17], Dharma S. et al. develop a framework for predicting share prices using deep learning and artificial neural networks (ANNs) and evaluate it using tick data for Reliance Private Limited. The CG model achieves the highest average prediction accuracy of 99.9% for tick data. In [18], D. Malathi et al. propose a machine learning algorithm using ANN for stock estimation. The CS-SVM-RBF model can achieve an accuracy of up to 87%. In [19], M. Nabipour et al. propose a framework for evaluating the performance of different methods for predicting stock prices using data from the Tehran Stock Exchange. The methods compared include multiple regression, Deep Convolutional Network (DCN), and artificial neural networks (ANNs). The proposed solution finds that the LSTM with multiple Linear Regression approach has the highest overall accuracy of 96%. In [20], Kranthi R. develops a predictive analysis model that uses Support Vector Machine (SVM) and Gaussian Technique classification approaches. The model’s overall accuracy for estimating the pattern in the share price of IBM is 94.
13.3 Dataset The dataset used for our research model is of Tata Motors which is listed in the Indian stock exchange, and the model was also trained and tested for other Indian companies which are listed in the Indian stock exchange (NSE or BSE). This data will be taken from Yahoo finance [21].
13 Indian Stock Price Prediction Using Long Short-Term Memory
165
Fig. 13.1 Stock data for Tata Motors from Yahoo finance
Yahoo finance is an online website which provides the most up to date information regarding a huge variety of listed stocks across the world. Yahoo finance also stores all of the stock data from the companies in its database which can be accessed via an API (Yahoo finance API). This API was developed by Yahoo finance itself in order to help researchers access past stock information of any company that they are studying upon. This API can be easily accessed via Python. The stock data provided by Yahoo finance API can be downloaded as a csv format or can be used directly in the code. There are various attributes that Yahoo finance records of the stocks like the opening price of that day, closing price, high, low, adjusted close, volume traded, and much more. This helps the researchers with the flexibility to choose to work on a single attribute or use multiple attributes at the same time for their research purposes (Fig. 13.1).
13.4 Methodology In the current world scenario, stock market prediction has emerged as one of the most sought-after research topics around the world and the recent advancements in the field of Artificial Intelligence (AI) and deep learning has only enhanced the research. There are many different ways in which the researchers around the world have tried to tackle this prediction problem like Support Vector Machines (SVM), Random Forest, Convolutional Neural Network (CNN), Recurrent Neural Network
166
H. Rathi et al.
Fig. 13.2 Stages of model development
Fig. 13.3 LSTM logic
(RNN), deep learning techniques like using long short-term memory (LSTM) model, and much more. LSTM is particularly helpful in this field due to its ability to remember information for a longer period of time with the use of a memory cell in its architecture [22]. A single LSTM cell contains four components which are the input gate, output gate, cell, and a forget gate. The cell is the one which remembers the information over random time intervals and the other three gates regulate the flow of information in the cell (see Fig. 13.2 for methodology steps). The data is imported for TATA Motors stock and is displayed as graph as seen above in Fig. 13.1. The period selected for the model is from 2010 to present. This is done to ensure that the model is up to date with the recent values. The data is then Min-Max Normalized using sklearn module.
13 Indian Stock Price Prediction Using Long Short-Term Memory
167
Fig. 13.4 Summary of proposed model
The proposed model implements LSTM (see Fig. 13.3 for LSTM Logic) as it has the ability to hold memory for longer period and it solves the issue of vanishing gradients which makes RNN model forget the previous inputs as it overwrites them. The proposed model has six layers (see Fig. 13.4). • 3 LSTM layer structure with 128, 64, 32 neurons, respectively • 3 dense layers of 64, 16, 1 neuron, respectively, these dense layers are added because the output from the LSTM layers is not SoftMax, and the dimension of the output from the LSTM layer is also the same dimension as we would like the output to be, i.e., closing price of the stock. So, an intermediate layer is required.
13.5 Implementation 13.5.1 Hardware and Software The hardware used in the study was an Apple M1 pro. This specific model was chosen for its powerful processing capabilities and ability to handle large amounts of data. The software tools used were Jupyter notebook with Python version 3.19.13. The study was conducted on a Unix operating system, and additional tools such as Pandas, NumPy, and Keras were used for data manipulation and analysis. These hardware and software choices were instrumental in conducting the study and achieving accurate results.
168
H. Rathi et al.
13.5.2 Evaluation Metrics Root Mean Squared Error (RMSE) RMSE as the name suggests is the root of mean squared error. The mean squared error is the mean of square of the difference between the actual value and the predicted value. RMSE is calculated by using Eq. (13.1), where .n is the number of samples, .Yi is the actual value, and .Yˆi is the predicted value. ┌ | n |1 ∑ .RMSE = √ (Yi − Yˆi )2 . (13.1) n i=1 Mean Absolute Error (MAE) MAE is calculated using Eq. (13.2), where .Yi is the actual value, .Yˆi is the predicted value, and .n is the number of samples. It is the mean of the absolute difference between .Yi and .Yˆi . MAE =
.
.
n 1∑ |Yi − Yˆi |. n i=1
(13.2)
R2 Score . R 2 Score is the regression score function which can be calculated using Eq. (13.3). It gives the coefficient of determination [23]. ∑n ˆ 2 i=1 (Yi Yi ) .R = 1 − ∑ . n ¯ 2 i=1 (Yi − Y ) 2
(13.3)
The model was trained using the Adam optimizer with a mean squared error loss function. The activation function used for LSTM was tanH and for dense layers ReLU. The model was trained for 20 epochs with a learning rate of 0.001 and a batch size of 15. These parameters were selected based on previous research and experimentation and were found to be effective in achieving good performance on the training and validation datasets. The use of the Adam optimizer helped to reduce the training time and improve the model’s convergence. The ReLU activation function provided a nonlinear mapping of the inputs to the outputs, allowing the model to learn complex relationships in the data. The chosen epoch, learning rate, and batch size values allowed the model to train efficiently without overfitting to the training data.
13.6 Results and Discussion Table 13.1 shows that we obtained varied MAE values for various numbers of epochs (5, 10, 15, 20, 30) as (0.88300, 0.70212, 0.59565, 0.52373, 0.54785), respectively. It can be noticed that as the number of epochs is increasing, MAE is falling and the model is moving in the direction of improved fit (see Fig. 13.5). However, if we
13 Indian Stock Price Prediction Using Long Short-Term Memory Table 13.1 Performance measurements for different epochs Epoch MAE 5 10 15 20 30
0.88300 0.70212 0.59565 0.52373 0.54785
Fig. 13.5 Plots of different epochs
Performance (%) 97.7255 98.7023 99.1264 99.2567 99.2041
169
170
H. Rathi et al.
Table 13.2 Comparison of proposed model with previously published work 2 Related work Model MAE . R score (%) [6] [7] [13] [13] [18] [19] [20] Proposed model
DSPNN HLP SVM RF FFNN LSTM SVM LSTM
– 0.71 – – – – – 0.52
70 – 86.2 84.6 87 96 94 99.25
drastically raise the count of epochs, overfitting becomes a concern. In our situation, 20 epochs provided the optimal accuracy and an MAE result that was suitably low (0.5237). We also attempted epoch 30, but it led to overfitting of proposed model at that point. Additionally, the chart illustrates that the green line depicts forecast price, the orange line represents the final closing price, and the blue line represents the data from training dataset. As the number of epochs rises, the validation line and prediction line could be seen to converge. Table 13.2 shows the comparison of proposed model with previously published work and it outperforms other models.
13.7 Conclusion In this study, we endeavored to examine numerous approaches for forecasting the share values of various firms listed on a wide range of stock markets. We furthermore get to understand that it is difficult to anticipate the future stock market trends of any company, per se, and that a variety of elements influence stock prices. Nevertheless, a number of methods have been devised worldwide to address these issues, primarily using ML and AI models like SVM prediction model, feed-forward neural nets, CNN, deep learning RNN, long short-term memory, and others. As a result of the market’s wide range of influences, we feel it is exceedingly challenging to create a forecasting model for stock prices that is both accurate and reliable. While few of the models showed great promise, others were naive. We discovered that the LSTM approach was ideal for predicting stock prices; as a result, we used it in our proposed study to achieve high precision and performance. We discovered that our algorithm was able to forecast potential valuation of an equities with minimal error after predicting value of many equities including various stock tickers like Aapl (Apple), TTM (Tata Motors), and HDFC.
13 Indian Stock Price Prediction Using Long Short-Term Memory
171
References 1. Sarkar, S.: An improved rough set data model for stock market prediction. In: 2014 2nd International Conference on Business and Information Management (ICBIM), pp. 96–100 (2014). https://doi.org/10.1109/ICBIM.2014.6970963 2. Ramezanian, R., Peymanfar, A., Ebrahimi, S.B.: An integrated framework of genetic network programming and multi-layer perceptron neural network for prediction of daily stock return: an application in Tehran stock exchange market. Appl. Soft Comput. 82(C) (2019). https://doi.org/ 10.1016/j.asoc.2019.105551 3. Akita, R., Yoshihara, A., Matsubara, T., Uehara, K.: Deep learning for stock prediction using numerical and textual information. In: 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS), pp. 1–6 (2016). https://doi.org/10.1109/ICIS.2016. 7550882 4. Hoseinzade, E., Haratizadeh, S.: CNNpred: CNN-based stock market prediction using a diverse set of variables. Exp. Syst. Appl. 129, 273–285 (2019). ISSN 0957-4174. https://doi.org/10. 1016/j.eswa.2019.03.029 5. Sezer, O.B., Ozbayoglu, A.M. (2018) Algorithmic financial trading with deep convolutional neural networks: time series to image conversion approach. Appl. Soft Comput. 70, 525–538 (2018). ISSN 1568-4946. https://doi.org/10.1016/j.asoc.2018.04.024 6. Long, J., Chen, Z., He, W., Wu, T., Ren, J.: An integrated framework of deep learning and knowledge graph for prediction of stock price trend: an application in Chinese stock exchange market. Appl. Soft Comput. 91, 106205 (2020). ISSN 1568-4946. https://doi.org/10.1016/j. asoc.2020.106205 7. Nair, B.B., Dharini, N.M., Mohandas, V.P.: A stock market trend prediction system using a hybrid decision tree-neuro-fuzzy system. In: Proceedings of the 2010 International Conference on Advances in Recent Technologies in Communication and Computing (ARTCOM’10), pp. 381–385. IEEE Computer Society, USA (2010). https://doi.org/10.1109/ARTCom.2010.75 8. Wang, L., Wang, Q.: Stock market prediction using artificial neural networks based on HLP. In: Third International Conference on Intelligent Human-Machine Systems and Cybernetics, pp. 116–119 (2011). https://doi.org/10.1109/IHMSC.2011.34 9. Pal, S.S., Kar, S.: Time series forecasting for stock market prediction through data discretization by fuzzistics and rule generation by rough set theory. Math. Comput. Simul. 162, 18–30. (2019). ISSN 0378-4754. https://doi.org/10.1016/j.matcom.2019.01.001 10. Kimoto, T., Asakawa, K., Yoda, M., Takeoka, M.: Stock market prediction system with modular neural networks. In: 1990 IJCNN International Joint Conference on Neural Networks, vol. 1, pp. 1–6, (1990). https://doi.org/10.1109/IJCNN.1990.137535 11. Verma, R., Choure, P., Singh, U.: Neural networks through stock market data prediction. In: International Conference of Electronics. Communication and Aerospace Technology (ICECA), pp. 514–519 (2017). https://doi.org/10.1109/ICECA.2017.8212717 12. Kadam, S., Jain, S.: Stock market prediction using machine learning. IJIRT 8(2) (2022). ISSN: 2349-6002 13. Maini, S.S., Govinda, K.: Stock market prediction using data mining techniques. In: International Conference on Intelligent Sustainable Systems (ICISS), pp. 654–661 (2017). https://doi. org/10.1109/ISS1.2017.8389253 14. Khatri, S.K., Srivastava, A.: Using sentimental analysis in prediction of stock market investment. In: 2016 5th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), pp. 566–569. https://doi.org/10.1109/ICRITO. 2016.7785019 15. Garcia-Vega, S., Zeng, X.-J., Keane, J.: Stock returns prediction using kernel adaptive filtering within a stock market interdependence approach. Exp. Syst. Appl. 160, 113668 (2020). ISSN 0957-4174. https://doi.org/10.1016/j.eswa.2020.113668 16. Chen, K., Zhou, Y., Dai, F.: A LSTM-based method for stock returns prediction: a case study of China stock market. In: IEEE International Conference on Big Data (Big Data), 2823–2824 (2015). https://doi.org/10.1109/BigData.2015.7364089
172
H. Rathi et al.
17. Selvamuthu, D., Kumar, V., Mishra, A.: Indian stock market prediction using artificial neural networks on tick data. Financ. Innov. 5, 16 (2019). https://doi.org/10.1186/s40854-019-0131-7 18. Raut, S., Shinde, I., Malathi, D.: Int. J. Pure Appl. Math. 115(8), 71–77 (2017). ISSN: 13118080 (printed version). ISSN: 1314-3395 (online version). Special issue. http://www.ijpam.eu 19. Nabipour, M., Nayyeri, P., Jabani, H., Mosavi, A., Salwana, E., Shahab, S.: Deep learning for stock market prediction. Entropy (Basel, Switz.) 22(8), 840 (2020). https://doi.org/10.3390/ e22080840 20. Kranthi, R.: Stock market prediction using machine learning. Int. Res. J. Eng. Technol. (IRJET) 05(10) (2018). e-ISSN: 2395-0056, p-ISSN: 2395-0 21. Yahoo Finance. https://finance.yahoo.com/quote/TTM/. Last accessed 6 Dec 2022 22. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997) 23. Draper, N.R., Smith, H.: Applied Regression Analysis. Wiley-Interscience (1998). ISBN 9780-471-17082-2
Chapter 14
On the Impact of Temperature for Precipitation Analysis Madara Premawardhana, Menatallah Abdel Azeem, Sandeep Singh Sengar, and Soumyabrata Dev
Abstract Climate is the result of the constant interaction between different weather variables where temperature and precipitation are significant factors. Precipitation refers to the condensation of water vapor from clouds as a result of gravitational pull. These variables act as governing factors for determining rainfall, snowfall, and air pressure while determining wide-ranging effects on ecosystems. Different calculation methods could be employed such as Standard Precipitation Index for determining precipitation. Temperature is the measure that is used to identify the heat energy generated by solar radiation and other industrial factors. For understanding the interplay between these two variables, data gathered from several regions of the world including North America, Europe, Australia, and Central Asia was analyzed, and the findings are presented in this paper. Prediction methods such as multiple linear regression and long short-term memory (LSTM) have been employed for predicting rainfall from temperature and precipitation data. The inter-dependency of other weather parameters is also observed in this paper relating to rainfall prediction. The accuracy of the prediction models using machine learning has also been experimented within the study. The implementation of our work is available via https://github.com/MadaraPremawardhana/On-the-Impact-ofTemperature-for-Precipitation-Analysis.
M. Premawardhana School of Computing, University of Buckingham, Buckingham, UK e-mail: [email protected] M. A. Azeem University College Dublin, Dublin, Ireland e-mail: [email protected] S. S. Sengar (B) Department of Computer Science, Cardiff Metropolitan University, Cardiff, UK e-mail: [email protected] S. Dev School of Computer Science, University College Dublin, Dublin, Ireland e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_14
173
174
M. Premawardhana et al.
14.1 Introduction Climatic changes occur due to the interplay between different weather variables. Temperature and precipitation could be identified as two major weather factors which are strongly dependent on each other [1]. While all weather and meteorological variables affect rainfall classification greatly, only a few of them have greater influence such as solar radiation [2], precipitable water vapor (PWV) [3], seasonal and diurnal features [4]. Earth receives temperature from solar radiance. Human activities such as rapid industrialization contribute to the global heat increase. The precipitation is water that comes down from the sky as rain, snow, hail, or any other form [5]. Some places on Earth which have very high precipitation experience very wet climates, and some have low precipitation, resulting in very dry climates. The interplay between temperature and precipitation also differs on factors such as Earth’s natural orientation. Rainfall prediction and its accuracy are important for long-term decision-making in a variety of areas including agriculture, irrigation, and scientific research [6]. Observing the prediction of rainfall occurrence by analyzing temperature and precipitation, we understand that each factor’s independent nature is important. The variation of each variable being influenced by other weather factors also has to be taken into consideration when analyzing the relationship between the selected factors. Furthermore, we have allocated data sets from around the world to cover different climatic regions such as North America, Europe, Central Asia, Canada, and Australia which are further discussed in Sect. 14.2. The data sets were collected from National Snow and Ice Data Center (NSIDC) and the National Oceanic and Atmospheric Administration (NOAA) which possess accurate weather data sets in a standard format [7]. These data sets were analyzed for anomalies, and non-available values and accordingly were cleaned. Then correlation analysis was done on the available data for identifying the relationship between different weather variables including temperature and precipitation. The selected weather variables were also predicted using the simple machine learning technique of multiple linear regression by using the available data. The potentiality of predictions of temperature and precipitation is also presented in the paper. Further, employing the weather data including temperature and precipitation, multiple linear regression, and long short-term memory (LSTM) models were developed to predict the rainfall in the Sydney-Australia region. The main contributions of the paper are as follows: • Analyzing and identifying the relationship between different environmental factors toward the prediction of proportionately collaborating environmental factors using simple linear regression (SLR) and long short-term memory (LSTM). Here, the focus is on comparing the results of the two aforementioned methodologies for identifying the most suitable approach. • Prediction of rainfall depended on using independent data sets which contain temperature and precipitation from different areas of the world and adopting base work for accurate rainfall prediction using artificial neural networks. This procedure could be easily re-modified and re-applied for finding relationships and predicting rainfall using a variety of other variables as well.
14 On the Impact of Temperature for Precipitation Analysis
175
• In order to support reproducible and extensible research, the code to reproduce the results of this manuscript is available online.1
14.2 Data Description The data sets utilized for the prediction and correlation analysis carried out in the paper were taken from the National Oceanic and Atmospheric Administration (NOAA) and National Snow and Ice Data Center (NSIDC) [8]. The data taken from NOAA includes three locations for the duration of 5 years in a temporal resolution of 24 h. The data collection locations are Alpena-USA, Beauceville-Canada, and Dublin-Ireland. The data sets include weather data such as snow depth, average temperature, precipitation, snow, the direction of a peak wind gust, peak gust wind speed, minimum temperature, maximum temperature, average wind speed, the direction of the fastest 2-min wind, the fastest 2-min wind speed, the direction of the fastest 5min wind, and the fastest 5-s wind speed which were represented by SNWD, TAVG, PRCP, SNOW, WSFG, WSF5, TMIN, TMAX, AWND, WDF2, WSF2, WDF5, and WSF5, respectively [9]. The data taken from NSIDC includes data from Central Asia, including data from Almaty, Kazakhstan, from 1879 to 2003. This data has not been employed in carrying out the predictions or analyzing correlations since the data set had a temporal resolution of 1 month, including data on average temperature and average precipitation only. Hence, this data has been used for identifying precipitation variation and temperature in Sect. 14.5 before carrying out correlation analysis for the data sets of Dublin, Alpena, and Beauceville. Furthermore, data sets of Lake Michigan Riverfront, Chicago-USA, and Sydney-Australia have been taken from NOAA and Kaggle, respectively. The Sydney, Australia, data set contains minimum and maximum temperatures, rainfall, wind gusts, wind gust direction, humidity, wind speed, pressure, cloud density, and rain status of the day and the day after. Chicago, USA, data set contains weather variables: rain interval, precipitation type, wind speed, barometric pressure, and solar radiation. Hence, these two data sets were allocated for identifying inter-relationships of weather variables that affect or are affected by temperature and precipitation uncovered through the correlation analysis in Sect. 14.7.
14.3 Precipitation Calculation In the process of calculation of precipitation, a few methods could be employed depending on the temporal resolution of the available data. 1
The source code related to this paper is available via https://github.com/MadaraPremawardhana/ On-the-Impact-of-Temperature-for-Precipitation-Analysis.
176
M. Premawardhana et al.
14.3.1 Standard Precipitation Index (SPI) SPI is a measure of how likely a certain amount of precipitation is to occur in a given period. It involves fitting the precipitation data to a curve that describes how often different amounts of precipitation occur and then finding the corresponding point on a normal curve that has the same probability. The gamma curve is usually the best choice for fitting the precipitation data. The density function expression for this distribution is as follows [10]: .
g(x) =
1
x α−1 e− β x
β α ┌(α)
(14.1)
Here, .α—shape { ∞ parameter, .β—scale parameter, .x—the amount of precipitation, and .┌(α) = 0 y α−1 e−1 dy is gamma function. The maximum likelihood estimates of the parameters .α and .β are: ( / ) 4A 1 .α ˆ = (14.2) 1+ 4A 3 βˆ =
.
x¯ αˆ
(14.3)
Here, . A = ln(x) ¯ − ∑ ln(x) and n is the sample size. By integrating Formula (14.1), n which is the distribution function of precipitation G(x), we have the following expression: {x .
G(x) =
1
0
x α−1 e− β dx, (x > 0), x
β α ┌(α)
(14.4)
where .G(x) shows the chance of getting a precipitation amount that is equal to or lower than .x. Sometimes the actual precipitation samples may have a value of 0, which means no precipitation. This means that the curve for precipitation has to be adjusted. The rectified distribution function is as follows [10]: .
H (x) = q + (1 − q)G(x)
(14.5)
where .q is the chance of getting no precipitation at all. Based on the normal curve, the function that shows how likely different amounts of precipitation are is given by [10]: {t ϕ(t) =
.
−∞
1 t2 √ e− 2 dt 2π
(14.6)
14 On the Impact of Temperature for Precipitation Analysis
177
where . F(t) shows the chance of getting a value that is equal to or lower than t. The actual precipitation amount gives the related probability . H (x). To find the SPI value, we need to solve for t in the equation: . H (x) = (t). This gives the following formula: SPI = t = ϕ−1 (H (x))
.
(14.7)
Therefore, . H (x) is equivalent to a standard normal variable .t that has a mean of zero and a variance of one. This means that SPI follows the standard normal distribution. The formula above indicates that computing SPI requires a sufficient number of precipitation samples, usually more than 30 years of data [11] to estimate a reliable .q value and the two parameters .α and .β [10].
14.3.2 Estimating Precipitation Distribution Parameters for Stations with Short Data Series It is hard to compute SPI without long-term observation data. Research suggests that the parameters of precipitation distribution for stations with short data series can be derived from the parameters of precipitation distribution for nearby stations with long data series as follows [10]: 1. Nearest Neighbor Substitution Method: This method calculates the spatial distance between the stations based on their latitude and longitude data. Then, it assigns the parameters of precipitation distribution for the short-sequence station to be the same as the parameters of the closest long-sequence station. 2. Regional Average Method: This method selects . N long-sequence stations around the short-sequence station based on the spatial distance information and a given value of . N . Then, it computes the average values of the parameters of precipitation distribution for the . N long-sequence stations and uses them to calculate the SPI for the short-sequence station. 3. Kriging Interpolation Method: This method is a spatial auto co-variance optimal interpolation method that is widely used in spatial interpolation problems in Geosciences. It applies the ordinary Kriging interpolation method (which assumes a linear relationship between the semi-variance function and the distance) to estimate the parameters of precipitation distribution using the same data as the regional averaging method.
14.4 Atmospheric Temperature in the Tropopause The sun’s radiation and altitude determine the atmospheric temperature. As the altitude increases, the atmospheric temperature decreases because the sunlight has less effect on warming the Earth. The boundary layer where the temperature is balanced
178
M. Premawardhana et al.
is called tropopause. Above the tropopause is the stratosphere, which warms up from above. The troposphere extends from the ground level to about 16 km (53,000 ft.) in altitude. The stratosphere reaches up to 50 km (164,000 ft.), just above the ozone layer. To calculate the atmospheric temperature, we need to know the specific location on the Earth and the altitude. The average temperature in the troposphere drops by 6.5 .◦ C per kilometer or 3.5 .◦ F per 1000 ft. of altitude. For example, at 5 km above the ground, the temperature would be .15 − (5 × 6.5) = − 17.5 ◦ C. This formula is fairly accurate up to the tropopause [12].
14.5 Correlation Analysis Toward Identifying Interplay Between Temperature and Precipitation Correlation analysis is important for identifying the inter-relationship between different variables. The elements that are off the diagonal reveal various insights [13–15]. The temperature and precipitation variations were observed in Fig. 14.1 which depicts data from 1990 to 2002 in Almaty, Kazakhstan-Central Asia which experienced an extreme continental climate. Furthermore, we also use Dublin, Beauceville, and Alpena data sets that have a better temporal resolution of 24 h, and we use them for carrying out correlation analysis. The variations in Fig. 14.2 have been observed using the pattern of precipitation and temperature oscillation with the temporal resolution of 24 h. The temperature oscillated closer to a .| sin x| wave yearly depending on seasonal changes. The orange line graph shows the temperature variation, and the blue line graph shows the precipitation variation. Correlation analysis is significantly important in figuring out the interplay of temperature and precipitation. In statistics, Correlation = σ =
.
cov(X, Y ) σ xσ y
(14.8)
When one variable changes, the other variable changes by the same magnitude in the same direction, which shows a positive correlation between them [16]. When both variables change in opposite directions, it shows a negative correlation. A value between 0 and .− 1 indicates that the two securities have opposite movements. The correlated variables have a perfect negative correlation when .σ equals .− 1. Correlation analyses of different weather data sets from the world are given below. Here, we have selected the areas Alpena-USA, Beauceville-Canada, Dublin-Ireland, SydneyAustralia with a temporal resolution of 24 h, and Chicago-USA with a temporal resolution of one hour [9].
14 On the Impact of Temperature for Precipitation Analysis
179
Fig. 14.1 Average variation of a precipitation, b temperature from 1990 to 2003 Almaty, Kazakhstan
The observations of the correlation graphs (cf. Fig. 14.3) show a negative correlation between temperature and precipitation. We also observe that wind direction demonstrates a smaller positive correlation with precipitation, whereas wind speed shows a negative relationship. Snow is positively correlated to precipitation and negatively correlated to temperature, confirming the aforementioned discovery.
180
M. Premawardhana et al.
Fig. 14.2 Alpena-USA temperature and precipitation oscillation from 2015 to 2020
(a) Alpena USA
(b) Dublin-Ireland
(c) Beauceville-Canada
Fig. 14.3 Correlation analysis of weather parameters around three distinct cities around the world
14 On the Impact of Temperature for Precipitation Analysis
181
14.6 Rainfall Prediction Using multiple linear regression and long short-term memory, this section shows how weather parameters can predict temperature and precipitation and how temperature and precipitation can predict rainfall. Since the Dublin, Alpena, and Beauceville data sets did not contain direct rainfall data, we considered precipitation as a major factor related to causing rainfall. We used the Chicago Lake Michigan lakefront data set with the parameters of precipitation type, wind speed, air temperature, solar radiance, rain intensity, wet bulb temperature, humidity, barometric pressure, total rain, interval rain, wind direction, and maximum wind speed; and Sydney-Australia data set which contains the parameters—minimum and maximum temperatures, rainfall, evaporation, wind data, humidity, pressure, and cloud data with a temporal resolution of two hours and twenty-four hours, respectively, for the rainfall prediction.
14.6.1 Rainfall Prediction Using Multiple Linear Regression Regression analysis is a statistical method that estimates how variables are related when one variable causes another variable to change. Univariate regression aims to find a linear equation that relates one dependent variable and one independent variable. The dependent variable is the outcome, and the independent variable is the predictor [17]. A multiple linear regression analysis is performed in predicting the values of a dependent variable, .Y , given a set of . p explanatory variables .(x1, x2, . . . , x p) [18]. In the first exercise, we consider temperature as the dependent variable; and average wind speed, snow depth, and precipitation are the independent variables. In the second exercise, the dependent variable is precipitation, and temperature, wind speed, and snow depth are the independent variables. Multiple linear regression was the first method used in this study to predict temperature using other variables available in the data set. We observe that the predicted results were significantly accurate with the actual values. This also confirms the theory of temperature’s effect on the weather variables of precipitation, humidity, and pressure. When multiple linear regression was applied to predict precipitation, results were observed, as mentioned in Fig. 14.4 were observed. Here, we can see that for most of the lower precipitation values, a slightly higher value has been predicted by the model. This could be a result of multicollinearity, which is the stronger dependency of predictors with the related variables and outcome, which causes problems in estimating the regression coefficients. A rain gauge was used to measure this precipitation in inches as rainfall [9]. Therefore, we used PRCP as the settlement feature for predicting rainfall in Dublin, Alpena, and Beauceville in this study. Rainfall data has been directly employed in predicting rainfall in Sydney, Australia data set which showed a false prediction toward the lower rainfall values and more accurate predictions for higher rainfall values as mentioned in Fig. 14.5.
182
M. Premawardhana et al.
(a) Dublin-Ireland
(b) Alpena - USA
(c) Beauceville - Canada
Fig. 14.4 Precipitation prediction with multiple linear regression Fig. 14.5 Rainfall prediction for Sydney, Australia using multiple linear regression
Figure 14.6 depicts the prediction of temperature using weather parameters such as average wind speed, snow depth and precipitation. The prediction of precipitation was conducted using multiple linear regression using the same data set. The data sets were segmented into 30% test and 70% training parts. The accuracy of the temperature predictions is displayed in Table 14.1 which is generated from Fig. 14.6.
14 On the Impact of Temperature for Precipitation Analysis
183
(a) Chicago - USA
(b) Alpena - USA
(c) Beauceville - Canada
(d) Dublin - Ireland
Fig. 14.6 Temperature prediction using multiple linear regression Table 14.1 Accuracy comparison between different data sets in predicting temperature Location Alpena-USA Chicago-USA BeaucevilleDublin-Ireland Canada Coefficient of determinations
0.98
0.99
0.99
0.95
14.6.2 Rainfall Prediction Using LSTM The long short-term memory (LSTM) model is a recurrent neural network that can handle long-term dependencies better [19, 20]. All recurrent neural networks have a structure of a chain that repeats the modules [21]. LSTMs are similar to RNNs, yet contain a different structure by having four network layers instead of having a single layer, which interacts in a special way. The model has been trained with 6 epochs of batch size 32, which achieved an F1 score [22] of 0.998. The accuracy of rainfall prediction methods is usually measured by how well they detect true events and avoid false ones [4]. Hence, the model has been developed with higher accuracy. The value and value loss achieved in each epoch are depicted in Fig. 14.7. When comparing the two approaches for the prediction of rainfall using temperature and
184
M. Premawardhana et al.
Fig. 14.7 Accuracy analysis of rainfall prediction for Sydney-Australia using LSTM Table 14.2 Accuracy comparison for different machine learning models for predicting rainfall in Sydney-Australia Prediction method Correlation coefficient Error % Multiple linear regression LSTM
0.13 0.998
87 0.2
precipitation, the accuracy was measured. In the study, multiple linear regression showed a lower accuracy, whereas LSTM showed a higher accuracy. Hence, we can conclude that for the used data set, LSTM was a better fit. The LTSM model was able to deal with the vanishing gradient problems [23, 24] (Table 14.2).
14.7 Conclusion and Future Work We have examined how temperature and precipitation affect the likelihood of rainfall in this paper. For this purpose, we have used data sets from Alpena, Chicago, Beauceville, Sydney, and Dublin. This data has been analyzed focusing on the detection of the correlation between temperature and precipitation and their interdependency with other weather parameters. These data sets have also been employed for predicting the occurrence of rainfall using different prediction models using machine learning. The accuracy of the developed techniques has been analyzed for finding the most accurate model fit for the data sets used. In our future work, we expect to broaden with study by adopting different machine learning techniques to predict weather variables as well as rainfall and to identify the best-fitting models depending on the length of the data set and availability of
14 On the Impact of Temperature for Precipitation Analysis
185
data. Consequently, it will be helpful for identifying the interplay between variables toward highly accurate rainfall prediction. Acknowledgements This research was conducted with the financial support of Science Foundation Ireland under Grant Agreement No. 13/RC/2106_P2 at the ADAPT SFI Research Centre at University College Dublin. ADAPT, the SFI Research Centre for AI-Driven Digital Content Technology is funded by Science Foundation Ireland through the SFI Research Centres Programme.
References 1. Ren, Y.Y., Ren, G.Y., Sun, X.B., Shrestha, A.B., You, Q.L.: Observed changes in surface air temperature and precipitation in the Hindu Kush Himalayan region over the last 100-plus years. Appl. Opt. 8, 148–156 (2017) 2. Fathima, T.A., Nedumpozhimana, V., Lee, Y.H., Winkler, S., Dev, S.: A chaotic approach on solar irradiance forecasting. In: Proceedings of Progress In Electromagnetics Research Symposium (PIERS) (2019) 3. Manandhar, S., Dev, S., Lee, Y.H., Meng, Y.S.: On the importance of PWV in detecting precipitation. In: Proceedings of IEEE AP-S Symposium on Antennas and Propagation and USNCURSI Radio Science Meeting (2018) 4. Manandhar, S., Dev, S., Lee, Y.H., Meng, Y.S., Winkler, S.: A data-driven approach for accurate rainfall prediction. IEEE Trans. Geosci. Remote Sens. 57(11), 9323–9331 (2019). arXiv: 1907.04816 5. Adler, R.F., Huffman, G.J., Chang, A., Ferraro, R., Xie, P.P., Janowiak, J., Rudolf, B., Schneider, U., Curtis, S., Bolvin, D., Gruber, A., Susskind, J., Arkin, P., Nelkin, E.: The version-2 global precipitation climatology project (GPCP) monthly precipitation analysis (1979–present). J. Hydrometeorol. 4(6), 1147 (2003) 6. Mills, F.L., Imama, E.A.: Rainfall Prediction for Agriculture and Water Resource Management in the United States Virgin Islands. Water Resources Research Center, University of the Virgin Islands (1990) 7. Jacobs, K.L., Street, R.B.: The next generation of climate services. Clim. Serv. 20, 100199 (2020). https://www.sciencedirect.com/science/article/pii/S2405880720300510 8. Williams, M.W., Konovalov, V.G.: Central Asia Temperature and Precipitation Data, 1879– 2003. NSIDC: National Snow and Ice Data Center, Boulder, CO. https://doi.org/10.7265/ N5NK3BZ8 9. Pathan, M.S., Wu, J., Nag, A., Dev, S.: A systematic analysis of meteorological parameters in predicting rainfall events, p. 22 10. Zuo, D., Hou, W., Wu, H., Yan, P., Zhang, Q.: Feasibility of calculating standardized precipitation index with short-term precipitation data in China. Atmosphere 12(5), 603 (2021). https:// www.mdpi.com/2073-4433/12/5/603 11. McKee, T.B., Doesken, N.J., Kleist, J.: The relationship of drought frequency and duration to time scales (1993) 12. Nyveen, L.: Tutorial of how to calculate altitude & temperature (2019). https://doi.org/10. 1093/acrefore/9780190228620.013.730 13. Manandhar, S., Dev, S., Lee, Y.H., Winkler, S., Meng, Y.S.: Systematic study of weather variables for rainfall detection. In: Proceedings of International Geoscience and Remote Sensing Symposium (IGARSS) (2018) 14. Dev, S., Lee, Y.H., Winkler, S.: Systematic study of color spaces and components for the segmentation of sky/cloud images. In: Proceedings of IEEE International Conference on Image Processing (ICIP) (2014)
186
M. Premawardhana et al.
15. AlSkaif, T., Dev, S., Visser, L., Hossari, M., van Sark, W.: A systematic analysis of meteorological variables for PV output power estimation. Renew. Energy (2020) 16. Nickolas, S.: What do correlation coefficients positive, negative, and zero mean (2021) 17. Kaya Uyanık, G., Güler, N.: A study on multiple linear regression analysis. Procedia Soc. Behav. Sci. 106, 234–240 (2013) 18. Tranmer, M., Elliot, M.: Multiple linear regression (2018) 19. Jain, M., Yadav, P., Lee, Y.H., Dev, S.: Improving training efficiency of LSTMs while forecasting precipitable water vapours. In: Proceedings of Progress In Electromagnetics Research Symposium (PIERS) (2021) 20. Pathan, M.S., Jain, M., Lee, Y.H., AlSkaif, T., Dev, S.: Efficient forecasting of precipitation using LSTM. In: Proceedings of Progress In Electromagnetics Research Symposium (PIERS) (2021) 21. Hochreiter, S., Schmidhuber, J.: Long short term memory (1997). http://www.bioinf.jku.at/ publications/older/2604.pdf 22. Sasaki, Y.: The truth of the F-measure (2007) 23. Hu, Y., Huber, A.E.G., Anumula, J., Liu, S.: Overcoming the vanishing gradient problem in plain recurrent networks. CoRR (2018). arXiv: 1801.06105 24. Jain, M., Manandhar, S., Lee, Y.H., Winkler, S., Dev, S.: Forecasting precipitable water vapor using LSTMs. In: Proceedings of IEEE AP-S Symposium on Antennas and Propagation and USNC-URSI Radio Science Meeting (2020)
Chapter 15
Vision-Based Facial Detection and Recognition for Attendance System Using Reinforcement Learning Siginamsetty Phani and Ashu Abdul
Abstract In this paper, we propose a reinforcement learning (RL)-based attendance system (RLAS) for marking attendance of the students presented during a class using the frames captured by a video camera. The RLAS comprises of agent module and the environment module. In the agent module, we fine-tune the multi- task cascaded convolution network (MTCNN) module and the ArcFace module to identify the students present in class. The MTCNN module consists of two neural networks, the P-Network (P-Net) and the R-Network (R-Net). In the P-Net, we add .2 convolutional layers for extracting the latent features from the facial images of the students. Similarly, we modify the R-Net by adding two dense layers for detecting the bounding boxes from the frames captured by the video camera. Based on the latent features obtained from the fine-tuned MTCNN, the ArcFace identifies the students present in the class. The environment module of the RLAS uses the reward function to evaluate the output generated from the agent module. If the agent module correctly identifies all the students presented in the frames captured by the camera, then the reward function marks the attendance to those students. Else, the environment module back-propagates the error obtained from the reward function to the agent module. To evaluate the RLAS, we created a dataset of .1, 20, 000 different images of 2400 students studying at our university. In our experimental evaluation, we observed that the fine-tuned MTCNN along with the ArcFace provides the transfer learning mechanism to the RLAS. Therefore, the RLAS obtains less time complexity than the different variants of the MTCNN and CNN models.
S. Phani · A. Abdul (B) SRM University, Mangalagiri-Mandal, Neerukonda, Amaravati 522502, Andhra Pradesh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_15
187
188
S. Phani and A. Abdul
15.1 Introduction Marking attendance is an essential activity of the faculty during a class to analyze the students’ consistency. However, the details of each student must be listed in the faculty’s records. If the student-to-teacher ratio is high, manually taking attendance is a hectic task. While some institutions adopted [1] digital attendance systems such as card swiping system and the finger print authentication system. If there are many students in a class, then they must wait in queue to mark the attendance using these systems. Thereby, these systems consume a lot of time. To overcome these limitations, the researchers proposed usage of deep neural networks (DNNs) for automating the task of marking attendance for the students who are present in the class [2, 3]. An automated attendance system uses the DNN with the facial recognition (FR) technique to recognize the students who are present in the class [4–6]. To train the existing FR techniques, we need a huge corpus which contains the facial images of students who are attending a class. In [6], the researchers fine-tuned the multi-task cascaded convolution network (MTCNN) module for detecting the students’ faces presented in the frames captured by the video camera. In the existing FR techniques, we need to train the DNN model from the beginning when there is a new student in the class. These techniques do not have the mechanism of transferring the knowledge from the previously trained DNN model to create a new DNN model (with the data from the new student). To overcome these limitations, we need a system which has the transfer learning (TL) mechanism. In this paper, we propose a reinforcement learning (RL)-based attendance system (RLAS) as shown in Fig. 15.1, for marking attendance of the students present during a class using the frames captured by a video camera. The RLAS consists of finetuned multi-task cascaded convolution network (MTCNN) module and an ArcFace [7] module. In the RLAS, we use the ArcFace module to provide the TL mechanism for the automated attendance system. The MTCNN module will identify the latent features from the facial images of the students present in the class. In the RLAS approach, the agent will recognize the students’ faces from a camera. However, the agent is the combination of fine-tuned MTCNN and ArcFace models. Similarly, the environment in our RLAS approach is checking whether the student is marked present in the class or not. We provide reward .(rd ) to the agent, if agent correctly classifies the student’s face, whereas the agent will perform actions in the environment to get attendance of the students. In our RLAS approach, we use data augmentation to get better results. The structure of this paper is as follows. In Sect. 15.2, we discuss the literature study of the existing mechanisms. In Sect. 15.3, we present the system architecture of the RLAS approach. In Sect. 15.4, we discuss the performance RLAS approach. Finally, we provide the conclusion in Sect. 15.5 (Fig. 15.1).
15 Vision-Based Facial Detection and Recognition for Attendance …
Video Stream
Capturing Frames
Face Detection
Face Recognitio
Training RLAS
189
Prediction RLAS
Fig. 15.1 Working of RLAS
15.2 Literature Study In [8], the authors generate face embedding by combining the Haar cascade [9] with the convolution neural network (CNN). In [10], the author proposed SRNet. The major goal of this strategy is to improve the poor streaming data quality, whereas the authors used the transfer learning (TL) for the FR task in [11]. They employ three TL methods: AlexNet, GoogleNet, and SqueezeNet on an unique dataset for the task of marking the attendance. The researchers [12] use R-CNN to solve the task of FR. Moreover, the R-CNN consumes high computation time since each region is sent to CNN independently, and it makes predictions using three different models. The authors of [13] utilize the MobileNetv2-SSD for the FR task. But, the MobileNetv2SSD does not consider the data augmentation task. In [14], the authors proposed the QR code-based attendance system. They implemented the FR system with a classifier of logistic regression, LDA, and KNN classifiers. In another assortment, the researchers used CNN model [15] to mark the attendance of the students present in the class. The limitation of this approach is that the fine-tuning of the network does not take place. Using CNN approach to mark the attendance is very time-consuming process. The researchers [16] use MTCNN and FaceNet models to capture the students’ faces in the frame and mark the attendance to the students present in the class. This method’s drawback is that the researchers did not carry out any parameter tuning of the suggested models. In another assortment, the CenterFace model [17] was adjusted by the authors to recognize the students’ faces when captured on camera. The loss for the suggested architecture is captured by the authors using the ArcFace model. The drawback of this strategy is that the authors did not test up MTCNN and other models for face detection. Therefore, to overcome these limitations, we proposed the RLAS approach as discussed in Sect. 15.3.
15.3 Proposed Methodology In this section, we discuss the working mechanism of our proposed RLAS-agent in Sect. 15.3.1 and RLAS-environment in Sect. 15.3.2. Since, the agent will perform actions in the environment to get optimal reward .rd . The agent section consists of P-Net and R-Net. In the P-Net, we add .2 convolution layers to extract the latent features from the input frame, whereas in the R-Net, we add .2 dense layers and .1 dropout layer to to extract the bounding boxes for the faces presented in the frame.
190
S. Phani and A. Abdul
S (captured, students) REWARD no
ACTION
AGENT Fine-tuned MTCNN with ArcFace
S > Dt yes
Attendance stored in the database ENVIRONMENT
Fig. 15.2 Proposed system
15.3.1 RLAS-Agent This section addresses the working flow of RLAS-agent approach as shown in Fig. 15.2. We have divided the process of creating fully automated attendance system into several phases in the agent module. In the first stage, a webcam is used to capture frames from a video feed. We send these frames to FD algorithms like a fine-tuned MTCNN in the second phase. In the third phase, we perform FR using ArcFace to create embedding of the collected frames. After that, using the images of students in a classroom that were captured, we trained our model. Finally, we use our model to generate predictions. Our entire application is implemented by using Tkinter framework. When compared to the ArcFace model, our method is more time-efficient and has a more accuracy rate. In the RLAS, we will simultaneously complete the student’s FD task while taking frames from a stream. On top of the frames, we apply data augmentation techniques like zooming in, flipping, scaling, and rotating. With the use of dropout layers and various convolution blocks, we optimize the MTCNN in our method. The primary goal of MTCNN is to extract the landmarks (LM) of the face as, D LM j D = −(y D j log( p j ) + (1 − y j )(1 − log( p j )))
.
(15.1)
where the probability that is produced by the tuned MTCNN is . p j . Additionally, the computed likelihood suggests that it is a face. The. y D j , on the other hand, is a part {0, 1}. of the ground truth and is a part of the . y D . ∈ . The bounding box generation j takes place between the predicted and the ground truth LM j B = || yˆ j B − yiB ||22
.
where . yˆ j B is obtained from the MTCNN and the . yiB is the ground truth.
(15.2)
15 Vision-Based Facial Detection and Recognition for Attendance …
191
Fig. 15.3 Cosine similarity of two users
The localization of landmarks (L) of face that are essential for the task of attendance system which is very similar to bounding box is as follows: LM j L = || yˆ j L − yiL ||22
.
(15.3)
where . yˆ j L is obtained from the MTCNN and the . yiL is the ground truth. The finetuned MTCNN is used to collect the cropped images. The frames are then provided to the ArcFace, which creates embedding for each frame, which we subsequently store in MongoDB. With a combination of dense and dropout layers, we create a sequential API to begin the training phase. With several layers of dense and then layers of dropout, the training process is fine-tuned. Following a dropout ratio of .0.45, the sequential API features two layers of dense nodes with .1024 neurons. We experimented with a variety of optimizers, including Adam, gradient descent, SGD, and Adagrad. Finally, Adam optimizer outperforms the FR task with a learning rate of .0.001. For the FR task, categorical cross-entropy uses as the loss function. The prediction pipeline follows, where students’ faces are recorded via the webcam. A fine-tuned MTCNN model is used to crop just the frame from the stream once the faces have been recovered from the stream. Then, we create each user embedding in the stream and store them in MongoDB.
15.3.2 RLAS-Environment In the environment module, the agent will perform actions to get rewards. Based upon the rewards, the learning takes place by the agent. Since, we have both the students’ target embedding (. E t ) and their reference embedding (. Er ) in the environment module. Then we compare these embedding using cosine similarity (S). Therefore, we measure the cosine distance (D) between these embedding. The appropriate student
192
S. Phani and A. Abdul
Fig. 15.4 Accuracy and loss
data is logged as present to the class once we map the . E t to the . Er and see which embedding are closest. The accuracy score of RLAS is 99.2 and loss is 0.02 as shown in Fig. 15.4. The S and D between . E t and . Er are as follows: ∑N j=1 E t j ∗ E r j . S(E t , E r ) = √ ∑ (15.4) √ ∑ N ( j=1 E t2j ) ∗ ( Nj=1 Er2j ) .
D(E t , Er ) = 1 − S(E t , Er )
(15.5)
We will use Eqs. (15.4) and (15.5) to calculate the S and the D between the . E t and the . Er as shown in Fig. 15.3. In our RLAS, we use ArcFace because of loss function which will generate the identity of faces for both the subjects in and out of the training data, whereas the loss of ArcFace is as follows: ⎛ ⎞ ( ) N f s ∗(cos(θ j )+am ) ∑ e ⎠ ∗ −1 .loss = ⎝ log fs ∗(cos(θ j )+am ) ∑n (15.6) N e + j=1 e fs ∗cos(θ j ) j=1 where N is the total number of captured faces, .θ j be the angle between . E t and Er , . f s be the feature scale, and .am be the angular marginal penalty. For our proposed model, the loss is .0.02, which is very negligible when compared to other approaches like the RetinaFace [18] and the FaceNet [19]. In MTCNN, the detection of faces from a frame is the challenging task. In our tuned MTCNN, we experimented it with
.
15 Vision-Based Facial Detection and Recognition for Attendance …
193
multiple threshold levels and finally come up with .1.24 detection confidence. We incorporate the RL mechanism into our RLAS approach. As a result, our method performs significantly better in terms of accuracy. The agent will perform actions on environment to attain rewards. We compute . S for each captured face to the students’ faces in our database. If the S is greater than the detection threshold (. Dt ), then the attendance of the student is stored in the database. We give positive reward .(+1) to the agent. Otherwise, we recapture the student’s face from the webcam to get the attendance of the student. We provide negative reward .(−1) to the agent as follows: { r =
. d
+1, if S > Dt −1, otherwise
By using this approach, the accuracy outperformed the conventional approach. The agent will perform actions on the environment to attain maximum .rd for the given input data. The RL approach increases the performance of our model. By incorporating the RL mechanism in our attendance system, the attendance is marked for the students who are present in the class. The RLAS-environment module will track the actions performed by the agent module. Based on the action carried on the environment, a .rd is given to the RLAS-agent to track the performance of the model. The time latency of our suggested approach is very minimal when compared to other face recognition approaches.
15.4 Results Discussion In this section, we present a detailed experimental discussion of the RLAS with different combinations of the MTCNN and the CNN with the FaceNet [19], the RetinaFace [18], and the VGGFace2 [20] as shown in Table 15.1. We observe that the RLAS performed better when combined with other approaches. We fine-tuned the P-Net of the MTCNN by adding two more convolution layers. In contrast, two dense layers were added to the R-Net to easily detect the bounding boxes. To avoid overfitting of the model, we also add a dropout layer to the R-Net. We track the live attendance from the video cameras during a class in our dataset1 . We collect data from our university of .40 classes consisting of 65 students per class. We capture .50 frames from the webcam which is located in the classroom. We count the total of faces identified by our model. We process the faces of students and store them in MongoDB. Our dataset consists of .120000 students’ information; thereby, we do data augmentation for the better performance of the model. The authors of the MTCNN tested it with three convolution blocks; we test it with several convolution blocks. We used the accuracy, precision, and recall which are the evaluation metrics to verify the correctness of our approach. The hyperparameters used in the RLAS approach are as follows: We experiment with many optimizers, and ultimately Adam 1
https://github.com/funnyPhani/RLAS-Approach.git
194
S. Phani and A. Abdul
Table 15.1 Results comparison Face detection Face recognition MTCNN MTCNN MTCNN MTCNN Fine-tuned MTCNN Fine-tuned MTCNN Fine-tuned MTCNN Fine-tuned MTCNN CNN CNN CNN CNN
ArcFace RetinaFace FaceNet VGGFace2 ArcFace RetinaFace FaceNet VGGFace2 ArcFace RetinaFace FaceNet VGGFace2
Accuracy
Loss
92.42 94.38 98.65 90.3 99.2 97.3 98.72 94.12 91.28 90.76 91.73 90.53
0.54 0.79 0.23 0.98 0.02 0.67 0.26 0.31 0.54 0.73 0.76 0.58
Fig. 15.5 Time complexity
optimizer outperformed the others. We use learning rate of .0.001 in our RLAS approach. We run for 500 epochs to evaluate the performance of RLAS approach. We employ early-stopping callbacks for the check-pointing of the model. Finally, our approach outperformed when compared to MTCNN approach. When compared to different approaches, our method surpassed them in terms of precision and recall. Our approach has a precision score of .94.0% and a recall score of .96.0%. In our experimental evaluation, we observed that the fine-tuned MTCNN along with the ArcFace and the RL mechanism provides the TL mechanism to the RLAS.
15 Vision-Based Facial Detection and Recognition for Attendance …
195
As a result, the efficiency of identifying students’ faces to verify their attendance in class is improved. In Table 1, we show the performance of the fine-tuned MTCNN with ArcFace, as it achieves the highest accuracy. Notably, the loss incurred by this approach is remarkably minimal (text marked as bold in Table 15.1), distinguishing it from the other approaches. We fine-tuned MTCNN with the convolution blocks with a .0.45 dropout ratio to obtain better accuracy in the RLAS approach. The time complexity of our proposed approach compared with the other approaches is shown in Fig. 15.5. The. Dt of fine-tuned MTCNN is also experimented with multiple values. Finally, the threshold value of 1.24 outperformed the other accuracy when compared to other approaches as shown in Table 15.1.
15.5 Conclusion We proposed a real-time system to automate the attendance system for the colleges (or) universities. Automatic attendance marking reduces the time consumption that occurs during the manual attendance marking. With the RLAS approach, we introduced the RL mechanism with the fine-tuned MTCNN and the ArcFace model to provide TL mechanism for automatic attendance marking system. As a result, the RLAS approach requires less training samples when compared to other approaches by achieving an accuracy of .99.2%. As part of future work, we are creating a mobile application with RLAS approach to enhance the faculty and students’ usability.
References 1. Rathod, H., Ware, Y., Sane, S., Raulo, S., Pakhare, V., Rizvi, I.A.: Automated attendance system using machine learning approach. In: 2017 International Conference on Nascent Technologies in Engineering (ICNTE), pp. 1–5. IEEE (2017) 2. Arsenovic, M., Sladojevic, S., Anderla, A., Stefanovic, D.: FaceTime-deep learning based face recognition attendance system. In: 2017 IEEE 15th International Symposium on Intelligent Systems and Informatics (SISY), pp. 000053–000058. IEEE (2017) 3. Kuang, W., Baul, A.: A real-time attendance system using deep-learning face recognition (2020) 4. Damale, C.R., Pathak, B.V.: Face recognition based attendance system using machine learning algorithms. In: 2018 Second International Conference on Intelligent Computing and Control Systems (ICICCS), pp. 414–419. IEEE (2018) 5. Patil, V., Narayan, A., Ausekar, V., Dinesh, A.: Automatic students attendance marking system using image processing and machine learning. In: 2020 International Conference on Smart Electronics and Communication (ICOSEC), pp. 542–546. IEEE (2020) 6. Gu, M., Liu, X., Feng, J.: Classroom face detection algorithm based on improved MTCNN. Signal, Image Video Proc. 16(5), 1355–1362 (2022) 7. Deng, J., Guo, J., Xue, N., Zafeiriou, S.: Arcface: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4690–4699 (2019) 8. Arsenovic, M., Sladojevic, S., Anderla, A., Stefanovic, D.: FaceTime-deep learning based face recognition attendance system. In: 2017 IEEE 15th International Symposium on Intelligent Systems and Informatics (SISY), pp. 000053–000058. IEEE (2017)
196
S. Phani and A. Abdul
9. Cuimei, L., Zhiliang, Q., Nan, J., Jianhua, W.: Human face detection algorithm via Haar cascade classifier combined with three additional classifiers. In: 2017 13th IEEE International Conference on Electronic Measurement & Instruments (ICEMI), pp. 483-487. IEEE (2017) 10. Gupta, R.K., Lakhlani, S., Khedawala, Z., Chudasama, V., Upla, P.: A deep learning paradigm for automated face attendance. In: Workshop on Computer Vision Applications, pp. 39-50. Springer, Singapore (2018) 11. Alhanaee, K., Alhammadi, M., Almenhali, N., Shatnawi, M.: Face recognition smart attendance system using deep transfer learning. Proc. Comput. Sci. 192, 4093–4102 (2021) 12. Rusdi, J.F., Kodong, F.R., Indrajit, R., Sofyan, H., Marco, R.: Student attendance using face recognition technology. In: 2020 2nd International Conference on Cybernetics and Intelligent System (ICORIS), pp. 1–4. IEEE (2020) 13. Le, M.C., Le, M., Duong, M.: Vision-based people counting for attendance monitoring system. In: 2020 5th International Conference on Green Technology and Sustainable Development (GTSD), pp. 349–352. IEEE (2020) 14. Sunaryono, D., Siswantoro, J., Anggoro, R.: An android based course attendance system using face recognition. J. King Saud Univ.-Comput. Inf. Sci. 33(3), 304–312 (2021) 15. Alhanaee, K., Alhammadi, M., Almenhali, N., Shatnawi, M.: Face recognition smart attendance system using deep transfer learning. Proc. Comput. Sci. 192, 4093–4102 (2021) 16. Pham, T.N., Nguyen, N.P., Dinh, N.M.Q., Le, T.: Tracking student attendance in virtual classes based on MTCNN and FaceNet. In: Intelligent Information and Database Systems: 14th Asian Conference, ACIIDS 2022, pp. 382–394. Cham: Springer Nature Switzerland (December 2022) 17. Seelam, V., Kumar Penugonda, A., Kalyan, B.P., Priya, M.B., Prakash, M.D.: Smart attendance using deep learning and computer vision. Mater. Today: Proc. 46, 4091–4094 (2021) 18. Deng, J., Guo, J., Ververas, E., Kotsia, I., Zafeiriou, S.: Retinaface: single-shot multi-level face localisation in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5203–5212 (2020) 19. Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015) 20. Cao, Q., Shen, L., Xie, W., Parkhi, O., Zisserman, A.: Vggface2: a dataset for recognizing faces across pose and age. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition, pp. 67–74. IEEE (2018)
Chapter 16
Cross-Modal Knowledge Distillation for Audiovisual Speech Recognition L. Ashok Kumar, D. Karthika Renuka, V. Dineshraja, and Fatima Abdul Jabbar
Abstract The capability of a machine or program to recognize words uttered aloud and translate them into legible text is known as speech recognition, often known as speech-to-text. Speech recognition features are built into a lot of contemporary gadgets and text-focused software to facilitate easier or hands-free usage. Without the use of human-annotated real-time data, this research tries to train reliable models for visual speech recognition. But in order for ASR to function at human levels and for speech to become a genuinely ubiquitous user interface, a unique, unconventional method that has the potential to significantly advance ASR is required. One such method that has the potential for channel and task independence for making significant improvements in noisy conditions is visual speech. For edge devices with constrained memory and processing power, the difficulty of deploying big deep neural network models is particularly relevant. To overcome this difficulty, a model compression method was first put up as a way to train a smaller model using the knowledge from a larger model without suffering any appreciable performance loss. Knowledge distillation is the name given to the process of learning a smaller model from a bigger model. Hence, a cross-modal distillation technique that combines frame-wise cross-entropy loss and Connectionist Temporal Classification (CTC) is employed to show that distillation greatly accelerates learning. On the complex LRS2 dataset, the most recent results are produced for training just using data that is available to the general public. The proposed system is evaluated based on two characteristics, namely word error rate (WER) and character error rate (CER). By achieving a WER of 1.21% and CER of 3.75% on a Librispeech corpus in ASR vs a WER of 0.31% and CER of 0.43% in AVSR, the experimental findings showed that the enhanced cross-modal distillation module is effective in delivering reduced error rate without much training.
L. Ashok Kumar · D. Karthika Renuka (B) PSG College of Technology, Coimbatore, India e-mail: [email protected] V. Dineshraja · F. Abdul Jabbar Department of IT, PSG College of Technology, Coimbatore, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_16
197
198
L. Ashok Kumar et al.
16.1 Introduction Lip reading, a skill that is particularly well-suited to those with hearing impairments, is used to comprehend or interpret communication without hearing it. A person with hearing loss can converse with people and participate in social activities, which would be challenging without the ability to lip read. There is increasing interest in automating this difficult activity of lip reading as a result of recent developments in the domains of computer vision, pattern recognition, and signal processing. In fact, automating the human capacity for speech reading, commonly referred to as visual speech recognition (VSR) or occasionally speech reading, may pave the way for other cutting-edge related applications. Its primary goal is to realize spoken words using solely the visual cues generated during speech. Very large proprietary datasets were recently shown to significantly increase performance in a number of machine learning applications, demonstrating the enormous benefits of training on such datasets. LRS2 and LRS3 are the largest publicly available datasets for training and evaluating visual speech recognition, but they are smaller than their audio versions that are used to train automatic speech recognition (ASR) models. In order to teach a VSR model, the recommended technique uses an ASR model as the basis for instruction and a teacher– student approach. Training by distillation also eliminates the need for professionally transcribed subtitles and the complex procedure of coordination between the captions and audio required to produce VSR training data. Pretraining on sizable unlabeled datasets is the fundamental objective in order to improve lip reading ability. To train a decent model in this method, human-generated captions are actually not required. In this method, a distillation loss is combined with regular Connectionist Temporal Classification (CTC). Using only CTC to train on the ASR transcriptions would have been another way to make use of the additional data. Distillation, however, significantly speeds up training.
16.2 Literature Review Torfi et al. [1] suggested employing a combined three-dimensional convolutional neural network (3D CNN) architecture to evaluate the significant relation of wav files using the multimodal approach that have been learned. Nefian et al. [2] contemplated a brand-new approach of combining audio and video that makes use of a linked hidden Markov model (HMM). Baevski et al. [3] employed a Connectionist Temporal Classification (CTC) loss to refine the transcribed speech-based pre-trained BERT models rather than transforming the representations into an errand explicit model. Neti et al. [4] achieved significant advancements in automatic voice recognition (ASR) for well-defined tasks including dictation and medium vocabulary transaction processing activities in reasonably controlled situations.
16 Cross-Modal Knowledge Distillation for Audio …
199
Xiao et al. [5] introduced an architecture for integrated audiovisual perception known as audiovisual SlowFast Networks which depict vision and sound in a single representation. Potamianos et al. [6]’s experiments show that the visual modality enhances ASR for all situations and types of data, however less for visually demanding environments and tasks requiring a big vocabulary. Guo et al. [7] suggested a conditional chain model for NAR multi-speaker ASR. A Connectionist Temporal Classification (CTC) encoder based on NAR is employed in each step to carry out simultaneous processing. Huang et al. [8] refined BERT, a language model (LM) developed through vast amounts of unlabeled text input and capable of producing an affluence of contextspecific perceptions. Xia et al. [9] concentrated on common definitions of audiovisual speech since databases in both single view and multi-view audiovisual speech recognition tasks should have highest priority. Sheshadri et al. [10] suggested WER-BERT, an e-WER architecture based on BERT that includes voice functionalities. Zhang et al.’s [11] proposed AVSR system surpassed the audio only the ground LF-MMI DNN system in word error rate (WER) reduction by up to 29.98% absolute, according to research on converged audio emulated from the LRS2 dataset. Understanding performance was comparable to a more complex pipelined system. Petridis et al. [12] utilized the LRS2 database to demonstrate how the proposed audiovisual model excels (word error rate of 7%) and results in an absolute 1.3% reduction over the audio-only model in word error rate. A comprehensive audiovisual speech recognition system based on a network of recurrent neurons with transducer (RNN-T) architecture was shown by Makino et al. [13]. Afouras et al. [14] presented a viable method for learning robust VSR models by learning from an equipped ASR model. Yu et al. [15] suggested a complete BERT-based system employing a nonautoregressive Transformer ASR model.
16.3 Proposed Methodology In a cross-modal distillation training strategy, the teacher receives training in one modality, and the student who needs knowledge from a different modality is taught using that teacher’s knowledge. This circumstance occurs when there aren’t any data or labels accessible for certain modalities either during training or testing, making it necessary to transfer knowledge between modalities. The visual domain is the one that employs cross-modal distillation the most. A student model with an unlabeled input domain, such as videos or images can, for instance, extract knowledge
200
L. Ashok Kumar et al.
from an instructor trained on labeled audio data. As a teacher, an end-to-end pretrained acoustic model CRDNN for ASR, a CTC/attention-based model is used. More specifically, the transformer model for lip reading has been used. This network accepts input in the form of optical characteristics taken from a multimodal residual CNN.
16.3.1 Teacher Model The Convolutional Recurrent Deep Neural Network (CRDNN) has been employed as a teaching model in this study together with Connectionist Temporal Classification (CTC) and an attention-based auditory model. This model (DNN) combines deep networks of neurons, neural networks with recurrence, and convolutional neural networks. The input signals are frame-level LLDs of the unprocessed voice recordings. CNN is used to think of LLDs as 2D images with both frequency and timing aspects to extract relevant emotional features from LLDs while minimizing feature engineering. The CNN result is then assessed using an RNN to obtain spatial and longitudinal data and analyze sequence elements of audio signals. The spectral and temporal modeling are handled by the CNN and RNN layers, while an attention layer is utilized to identify phonetic portions that really are crucial for the task at hand. To make continuous speech-to-text easier, the final result of the attention layer is routed into a fully connected (FC) layer. A single one-dimensional convolutional layer with filters between 1 and 8 makes up the CNN topology. Then comes an element-wise rectified linear unit, which is followed by an overlapping max-pooling layer with pooling size = (1, 3) on the frequency axis (ReLU). To expedite learning, the batch normalization layer [7] is then applied to the CNN output. In order to acquire a continuous pattern of the raw signal data significantly greater interpretations from CNNs is then directed into two bidirectional gated recurrent unit (GRU) layers. GRU is preferred over long short-term memory (LSTM) networks [11] because it is more successful in temporal categorization and has smaller training and testing required specifications. The RNN output is transformed using an attention layer. So rather than having to learn to weight the blocks based on the temporal dynamics of the signal out from RNN production, an innovative weighted accumulation surface is inserted that enables the model to rate the frames in accordance with speaker traits, which are mostly contained in the data features of the network. Following the RNN layer and before the FC layer, the attention strategy is inserted as a component. The generated results are supplied into the FC layer to generate the ultimate auditory depictions. The basic architecture of the CRDNN model is described in Fig. 16.1.
16 Cross-Modal Knowledge Distillation for Audio …
201
Fig. 16.1 Basic architecture of CRDNN model
16.3.2 Student Model The proposed student model is an attention-based transformer model composed of RNNs, LSTMs, and gated recurrent neural networks. It consists of encoder–decoder stacks, attention, and feedback layers. The transformer follows the overall architecture using stacked self-attention and point-wise, fully connected layers for both the encoder and decoder, shown in the upper and lower halves of Fig. 16.2, respectively.
202
L. Ashok Kumar et al.
Fig. 16.2 Transformer model architecture
16.3.2.1
Encoder and Decoder Stacks
The machine translation task is presented using stacked encoder and decoder blocks from the transformers. Similar to the numerous layers employed in conventional neural networks, the number of stacked blocks may change depending on the task. Word embeddings enhanced with positional data serve as the input to the stack of encoders. The hidden representation from each encoder stage is transmitted forward to be used by the next layer, and typically, each encoder/decoder consists of recurrent connections and convolution. Encoder The encoder consists of. N = 6 identical layers, each divided into two sub-layers. The first sub-layer is a multi-head self-attention mechanism, while the second sub-layer is a positionally connected, fully connected feed-forward network. Both sub-layers and the embedding layers produce outputs with a dimension equal to the input dimension, denoted as dmodel. This consistent dimensionality ensures compatibility and enables the use of residual connections via layer normalization. Specifically, the output of each sub-layer is obtained by applying layer normalization to the sum of the input and the output of the sub-layer, resulting in an output dimension of .dmodel = 512 for all sub-layers and embedding layers in the encoder.
16 Cross-Modal Knowledge Distillation for Audio …
203
Decoder The decoder comprises. N = 6 layers, with each layer having three sub-layers. Residual connections with layer normalization are applied, and the self-attention sub-layer is masked to restrict attention to previous positions. The output embeddings are also shifted by one position to ensure proper prediction dependencies during decoding.
16.3.2.2
Attention
In the attention mechanism, a query vector and a set of key-value pairs are used to generate an output vector. The query, keys, values, and output are all vector representations. The output is calculated by taking a weighted sum of the values, where the weight assigned to each value is determined by the compatibility function between the query and the corresponding key. Figure 16.3 depicts how the dot products are scaled down, and Fig. 16.4 depicts multi-head attention consisting of several attention layers running in parallel.
16.4 Results 16.4.1 The Dataset Table 16.1 provides a summary of AVSR datasets found in the literature. Small public audiovisual datasets known as LRS2 and LRS3 contain transcriptions. The keyword Tran. indicates whether the dataset is labeled, or if it contains transcriptions that are aligned and the keyword Mod. indicates modalities included (AV = audio + video, A = audio only). The YT31k, LSVSR, and MV-LRS datasets have ground truth recordings that have been linked and utilized to train cutting-edge models for lip reading. Only publicly available datasets are used in this work, and the proposed methodology employs the distillation method to pretrain the teacher ASR model on Librispeech before fine-tuning and evaluating the resulting student VSR model on LRS2. The implementation starts by running the ASR model for trained teachers (details in Sect. 16.3.1) on all of the unlabeled recordings to obtain transcriptions. Following that, a simple tool is applied to select good observations that is for every pronouncement, the fraction of phrases with four or more characters in the ASR output that really are justified English words is estimated, and then only samples with a percent of 90% or better are preserved.
204
L. Ashok Kumar et al.
Fig. 16.3 Scaled dot product
16.4.2 Evaluation Metrics Real-time or asynchronous recording of human-to-human conversations is possible with speech recognition software. When assessing an ASR system, contrast to the data available it can record with a transcript of a real-world conversation. Character error rate (CER) and word error rate (WER) are typically utilized to assess the execution of an automatic speech recognition system.
16 Cross-Modal Knowledge Distillation for Audio …
205
Fig. 16.4 Multi-head attention
16.4.2.1
CER
The character error rate represents the proportion of characters for which an inaccurate prediction is made. A decreased character error rate improves the performance of the ASR system. The formula for calculating CER is depicted in Eq. 16.1. Char Error Rate =
.
S+D+I S+D+I = , N S+ D+C
(16.1)
where S number of times a character is replaced by another, . D deletion statistic that reflects the frequency with which a character is misidentified by the model, . I number of times the system added a character to the transcript that the speaker did not mention, . N total number of characters. .
206
L. Ashok Kumar et al.
Table 16.1 Audiovisual dataset statistics Dataset Hours YT31k LSVSR MV-LRS Librispeech LRS2 (PRE-TRAIN) LRS2 (MAIN) LRS2 (TEST) LRS3 (PRE-TRAIN) LRS3 (MAIN) LRS3 (TEST)
16.4.2.2
31k 3.8k 775 1k 195 29 0.5 444 30 1
Mod.
Tran.
AV AV AV A AV AV AV AV AV AV
.√ .√ .√ .√ .√ .√ .√ .√ .√ .√
WER
WER stands for the ratio of transcription errors made by the ASR system to the actual number of words spoken. System accuracy increases with decreasing WER. There are three types of errors that can be used to measure WER: (i) Substitutions(. NSUB )—when a word is substituted for another in transcription. (ii) Deletions(. NDEL )—when an entire word is missed by the system. (iii) Insertions(. NINS )—when a word that the speaker did not say is added by the system to the transcript. The WER calculation formula is shown in Eq. 16.2. WER =
.
NSUB + NINS + NDEL . NWORDS − Transcript
(16.2)
16.4.3 Transcriptional CTC Loss CTC offers a loss function which enables systems to be conditioned on sequence-tosequence functions without having to perform training and testing target alignment to input screens. Equations 16.3 and 16.4 depict the CTC objective for a single.(X, Y ) pair. T ∑ ∏ . p(Y | X ) = pt (at | X ) , (16.3) A∈A X,Y t=1
( ) ( ( )) LC T C x, y ∗ = − log p y ∗ | x .
.
(16.4)
16 Cross-Modal Knowledge Distillation for Audio … Table 16.2 Result analysis of existing and proposed work Hyperparameters Existing work Methodology
Nvidia seq2seq framework and Jasper acoustic model LRS2—58.5% LRS2—68.3%
WER CER
207
Proposed work CRDNN and transformer model LRS2—31.8% LRS2—43.9%
16.4.4 Distillation Loss To encapsulate the acoustic model into the target lip reading model, the KL divergence among student and teacher CTC hindquarters is significantly reduced, or correspondingly the frame-level cross-entropy loss. Equation 16.5 depicts the KD loss, where a v . pt and . pt indicate the CTC posteriors acquired from the student and teacher models for frame .t, respectively. L K D (xa , xv ) = −
∑∑
.
log pta (c | xa ) ptv (c | xv ) .
(16.5)
t∈T c∈C '
16.4.5 Comparative Analysis Two different frameworks are evaluated, namely the Nvidia seq2seq framework with Jasper acoustic model and CRDNN with transformer model on the LRS2 dataset. Table 16.2 summarizes the evaluation results as well as the comparisons with a few existing systems. The results suggest improvements in the accuracy over previously published systems.
16.5 Conclusion In this work, for pre-training on unlabeled datasets, a training technique that doesn’t require manually annotated data is appropriate. With the exception of systems trained on proprietary data, it outperforms all other available lip reading algorithms and may optionally tailored to a tiny proportion of observations. Provided a pre-trained deep ASR model and unlabeled data for a different language, it is possible to extend the suggested method to lip read that language since it can be applied to almost any clip that involves a talking head.Future work may explore with scaling up both the dataset and the model size.
208
L. Ashok Kumar et al.
Acknowledgements Our sincere thanks to the Department of Science and Technology, Government of India, for funding this project under the Department of Science and Technology Interdisciplinary Cyber-Physical Systems (DST-ICPS) scheme.
References 1. Torfi, A, Shirvani, R.A, Keneshloo, Y, Tavaf, N, Fox, E.A.: Natural language processing advancements by deep learning: a survey (2020). arXiv preprint arXiv:2003.01200 2. Nefian, A.V, Liang, L., Pi, X., Xiaoxiang, L., Mao, C., Murphy, K.: A coupled HMM for audiovisual speech recognition. In: 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. II-2013. IEEE (2002) 3. Baevski, A., Mohamed, A.: Effectiveness of self-supervised pre-training for asr. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7694–7698. IEEE (2020) 4. Neti, C., Potamianos, G., Luettin, J., Matthews, I., Glotin, H., Vergyri, D., Mashari, A.: Audio visual speech recognition (2000) 5. Xiao, F., Lee, Y.J., Grauman, K., Malik, J., Feichtenhofer, C.: Audiovisual slowfast networks for video recognition (2020). arXiv preprint arXiv:2001.08740 6. Potamianos, G., Neti, C., Gravier, G., Garg, A., Senior, A.W.: Recent advances in the automatic recognition of audiovisual speech. Proc. IEEE 91(9), 1306–1326 (2003) 7. Guo, P., Chang, X., Watanabe, S., Xie, L.: Multi-speaker ASR combining non-autoregressive conformer CTC and conditional speaker chain (2021). arXiv preprint arXiv:2106.08595 8. Huang, W.C., Wu, C.H., Luo, S.B., Chen, K.Y., Wang, H.M., Toda, T.: Speech recognition by simply fine-tuning BERT. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7343–7347. IEEE (2021) 9. Xia, L., Chen, G., Xu, X., Cui, J., Gao, Y.: Audiovisual speech recognition: a review and forecast. Int. J. Adv. Robot. Syst. 17(6), 1729881420976082 (2020) 10. Sheshadri, A.K., Vijjini, A.R., Kharbanda, S.: WER-BERT: automatic WER estimation with BERT in a balanced ordinal classification paradigm (2021). arXiv preprint arXiv:2101.05478 11. Yu, J., Zhang, S.X., Wu, J., Ghorbani, S., Wu, B., Kang, S., Yu, D.: Audio-visual recognition of overlapped speech for the lrs2 dataset. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6984–6988. IEEE (2020) 12. Petridis, S., Stafylakis, T., Ma, P., Tzimiropoulos, G., Pantic, M.: Audio-visual speech recognition with a hybrid CTC/attention architecture. In: 2018 IEEE Spoken Language Technology Workshop (SLT), pp. 513–520. IEEE 13. Makino, T., Liao, H., Assael, Y., Shillingford, B., Garcia, B., Braga, O., Siohan, O.: Recurrent neural network transducer for audio-visual speech recognition. In: 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 905–912. IEEE (2019) 14. Afouras, T., Chung, J.S., Zisserman, A.: ASR is all you need: cross-modal distillation for lip reading. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 2143–2147). IEEE (2020) 15. Yu, F.H., Chen, K.Y.: Non-autoregressive transformer-based end-to-end ASR using BERT (2021). arXiv preprint arXiv:2104.04805
Chapter 17
Classification of Autism Spectrum Disorder Based on Brain Image Data Using Deep Neural Networks Polavarapu Bhagya Lakshmi, V. Dinesh Reddy, Shantanu Ghosh, and Sandeep Singh Sengar Abstract Autism spectrum disorder (ASD) is a neuro-developmental disorder that affects 1% of children and has a lifetime effect on communication and interaction. Early prediction can address this problem by decreasing the severity. This paper presents a deep learning-based transfer learning applied to resting state fMRI images for predicting the autism disorder features. We worked with CNN and different transfer learning models such as Inception-V3, Resnet, Densenet, VGG16, and Mobilenet. We performed extensive experiments and provided a comparative study for different transfer learning models to predict the classification of ASD. Results demonstrated that VGG16 achieves high classification accuracy of 95.8% and outperforms the rest of the transfer learning models proposed in this paper and has an average improvement of 4.96% in terms of accuracy.
17.1 Introduction ASD is a long-term developmental disease that impacts an individual’s ability for interaction and communication with family and friends. Autistic disorder can be identified at three months to four years old. A person with autism will have a lifelong effect on a person’s social behavior and communication [1]. There is no known cause of autism and no proper treatment available. Hence, prediction of this illness is helpful for identifying the problem in the early stages. The diagnosis of autism P. B. Lakshmi · V. D. Reddy (B) School of Engineering and Sciences, SRM University, Amaravati, AP, India e-mail: [email protected] P. B. Lakshmi e-mail: [email protected] S. Ghosh School of Liberal Arts and Social Sciences, SRM University, Amaravati, AP, India e-mail: [email protected] S. S. Sengar Department of Computer Science, Cardiff Metropolitan University, Llandaff Campus, Cardiff, UK © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_17
209
210
P. B. Lakshmi et al.
involves many disciplines like a psychiatry, neurology, cognitive psychology, therapy, and pediatrics. Lifelong support is typically required for most patients with ASD because their social abilities are generally compromised to some degree. The etiology of autism may be genetic, a result of their parent’s drug use when the parents experience psychological issues. As per the 5th edition of the Diagnostic and Statistical Manual of Mental Disorders [2], there are three levels in autism: level 1, level 2, and level 3, based on the severity of the symptoms of the patient. In Level 3, the severity is highest compared to levels 1 and 2. Individuals diagnosed with Level 3 condition require extremely strong support, whereas Level 2 individuals require substantial support. In 2021, the CDC disclosed that almost 2.273% children in the USA recognized with ASD. 44% have IQ scores in the average to above average range (i.e., IQ .< 85), 25% are in the borderline range (IQ 71–85), 31% of children with ASD have a comorbid intellectual disability (intelligence the quotient [IQ] .< 70)[3]. ASD does not have any specific remedy and is controllable with early treatments and behavioral therapies. Thus, accurate diagnosis in early childhood is crucial. Data-driven diagnosis of ASD is being implemented in an increasing number of neuroscience research investigations, which will result in more successful treatment outcomes [4]. In ASD prediction, MRI image analysis is a preferred method. MRI has become the method of choice for examining brain morphology. Additionally, because MRI needs no ionizing radiation exposure, it has become extremely useful for imaging the brains of children and adolescents [1]. Functional magnetic resonance imaging (fMRI) is a method helpful to analyze the functional brain activities that uses blood oxygenation level-dependent (BOLD) T2* imaging to provide information about brain function over time and has been used to perform parameter-based neuromarker mapping of intrinsic connectivity networks. Transfer learning is a relatively new technique where we use the existing knowledge and fine-tune the model based on our applications. From the literature, it is observed that various transfer learning strategies are based on relational knowledge, feature representation, parameter, and instance learning. In this paper, we use the resting-state functional MRI (rs-fmri) data from the publicly available Autism Brain Imaging Data Exchange (ABIDE) project [5]. We perform a prepossessing step and data augmentation before applying transfer learning models on the received data. In this work, we designed various deep learning models to classify the resting state fMRI data to identify individuals with autism. Key highlights of the paper are: • Identifying persons with ASD by analyzing fMRI images, this technique is lessexplored in the literature. • We Analyze different deep learning models such as Inception-V3, Resnet, Densenet, VGG16 and, Mobilenet. The remainder of the paper is organized into four sections. Section 17.2 explains related work, Sect. 17.3 explains about proposed work and methodology. Section 17.4 explains results, and Sect. 17.5 explains conclusion and future work.
17 Classification of Autism Spectrum Disorder …
211
17.2 Related Work Lai et al. [6] introduced a method to classify the autism patients based on retinal images. Transfer learning Resnet50 is applied to extract the pixel-based features, ARIA is applied to extract the other features, and then both are combined to identify the significant features. The problem with this model is the high-level abstract feature representations. Tang et al. [7] introduced a system that classifies ASD patients from a video-based streaming system and uses the SVM classifier. Piana et al. [8] designed a system to help autistic people using SVM-based emotion recognition. Rusl et al. [9] developed a wavelet transform-based system to classify ASD based on emotions. In [10, 11], the authors use the Q-CHAT data and applied different machine learning algorithms to classify ASD. Mujeeb et al. [12] proposes a transfer learning based model to classify the asd people by extracting the statistical features from face images. In [13, 14], authors used speech signals to classify ASD. The authors took video, acoustic, and handwriting data for analysis. The authors proposed an audio-based emotion estimation system integrated with robots to the emotional state of ASD people, this may improve the robot’s interaction with the children during therapy. Sharif et al. [15] used the ABIDE dataset to extract features from fMRI images and applied VGG16 to classify the ASD and control groups. Yang et al. [4] performed synchronization of data from many imaging sites using the ComBat method. Their proposed method extracts the functional connectivity features from rs-frmi data. They used ridge regression, SVC-brf to the selected features to classify the ASD and TD people. In [14], a deep learning RNN model is applied on MRI images to identify the brain regions sequence. In [16–18], EEG signals are used to devise a potential clinical technique for tracking abnormal brain development to analyze the behavior of individuals with ASD. From the literature, we can see that there are only few works on resting-state fMRI-based identification of ASD, indicating a scope to improve the classification of ASD using rs-fmri images. Thus, in this work, we performed a comparative study of transfer learning approaches for the classification of ASD using rs-fmri images.
17.3 Methodology 17.3.1 Convolution Neural Network (CNN) CNN is a deep learning technique mainly used for image processing. Here are different stages of data processing in CNN such as convolution, padding, pooling. Convolution is the process where small number of matrix pass over the image and the image changes based on the values present in the filter. .
M[a, b] = (i ∗ h)[a, b] = h[ j, k]i[a − i, b − k]
(17.1)
212
P. B. Lakshmi et al.
Fig. 17.1 Steps involved in proposed convolution neural networks
where .a and .b are the indices of the resultant matrix. ‘.h’ indicates the kernel and the input image is denoted by ‘.i’ over the kernel .h. After the convolution operation, padding is applied to increase the size of the image and maintain the image pixels. In pooling, we select the value from each region, and there are max pooling and avg pooling. Figure 17.1 shows the steps involved in proposed CNN model.
17.3.2 Transfer Learning This section explains transfer learning in terms of domain and task, which are stated below. Transfer learning is a machine learning technique where a model created for one job is utilized as the foundation for a model for another task. Definition 17.1 (Domain): . A domain . D is made up of two parts, i.e., Attribute space . A, and marginal distribution . P(A). Alternatively, .
D = {A, P(A)}
(17.2)
And the value . A denotes a feature area, which is defined as .
A = {a|ai ∈ X, i = 1, . . . , n}
(17.3)
Definition 17.2 (Task): Label space Y and decision function f make up the task T, which is expressed as T = {Y, f(.)}. The implicit decision function f is anticipated to be learned from the sample data. Actual predictions of conditional distributions of instances are output by some machine learning methods. .
f (x j) = {P(yk|x j)|yk ∈ Y, k = 1, . . . , |Y |}
(17.4)
A domain is frequently seen by several instances in practice, either with or without label information. For instance, instance-label pairings are commonly used to observe a source domain (Ds) corresponding to a source task (Ts).
17 Classification of Autism Spectrum Disorder … .
DS = {(x, y)|xi ∈ X S, yi ∈ Y S, i = 1, . . . , nS}
213
(17.5)
The majority of occurrences in observation of the target domain are either unlabeled or have a small number of labels. Definition 17.3 (Transfer Learning): Given a learning assignment Lt that is dependent on Dt, i.e., target domain, we can obtain assistance with the learning task Ls from Ds, i.e., source domain. The focus of the transfer learning is to discover and transfer latent knowledge from Ds and Ts in order to enhance the prediction function [19]. Figure 17.2 details the CNN and transfer learning model architectures used in this work.
17.3.2.1
Inception
In developing transfer learning models, more convolution layers are added to the model, resulting in overfitting of the model. To overcome this problem, inception introduced multiple filters with different sizes. So, the model becomes wider than deeper. Inception-V3 consists of 42 layers, with an architecture having steps like smaller convolutions, asymmetric convolutions, factorized convolutions, auxiliary classifiers, and grid size reduction [20].
17.3.2.2
Resnet
When creating deep learning architectures, when there are more layers, training becomes challenging, and performance decreases as well. To resolve the issue of vanishing gradients, Resnet was introduced. The Resnet was evaluated by VGG networks such as VGG-16 and VGG-19, with the convolutional network consisting of 3.×3 filters, batch normalization, max pooling, and ReLU as activation function [21].
17.3.2.3
Densenet
Densenet is a transfer learning model which consists of layer implementation. Densenet each layer is having connection to every other layer. In each layer L, the number of fully connected layers is L(L-1)/2. After basic convolution block, Densenet has two main implementations, i.e., transition layers and dense block. Each dense layer consists of predefined normalization as batch normalization, ReLU activation function, and 3 .× 3 convolution layer [22].
17.3.2.4
VGG16
VGG16 is a transfer learning model that takes a 224 .× 224 RGB image as input. Here, it uses 3 .× 3 or 1 .× 1 filter. VGG16 consists of 16 convolution layers and 3
214
P. B. Lakshmi et al. Input 224 * 224
Transfer Learning Models
Fully Connected
(i) Units=1024….ReLU (ii) Dropout=0.5 Fully Connected
(i) Units=512….ReLU (ii) Dropout=0.5 Fully Connected
(i) Units=256….ReLU (ii) Dropout=0.2 Fully Connected
(i) Units=128….ReLU (ii) Dropout=0.5 Fully Connected
(i) Units=64….ReLU (ii) Dropout=0.2 Flatten
Fully Connected (Softmax)
(a) Convolution Neural Network Fig. 17.2 Architectures of deep learning models
(b) Transfer Learning
17 Classification of Autism Spectrum Disorder …
215
fully connected layers. In this VGG16 model, not every convolution layer contains a pooling layer; it has pooling layers in only five of the 16 layers [23]. 17.3.2.5
MobileNet
Mobilenet architecture is made up of point-wise convolution and depth-wise convolution. It takes the input image with size of 224 .× 224 .× 3. It contains 28 depth-wise and point-wise convolutions as separate layers and 4.2 million parameters, which can be further modified by using hyper-parameter tuning [13]. In the proposed methodology, CNN and transfer learning models are used. CNN architecture includes convolution, pooling, dense layers, and activation functions like ReLU and softmax, whereas transfer learning architecture includes fully connected layers, dropout, and activation functions as shown in Fig. 17.2.
17.4 Results In our study, we trained (a) CNN, and (b) five transfer learning models (Densenet, Inception-V3, Resnet, VGG16, Mobilnet) and used data generator to feed in values to a model. The transfer learning models were pre-trained with Imagenet data set (consisting of 1000 different class classification data). The advantage of such transfer learning models is that they are trained on diverse types of data. We applied CNN and the five different transfer learning models on the rs-fmri data set. In this experiment, the rs-fMRI data set from ABIDE-1 repository was used as input. Initial weights are considered from transfer learning models, and subsequently, more layers are added to the models. Initially, we have taken base layers from the transfer learning models and added different layers such as convolution layers, dropout to reduce overfitting, and max pooling to select the maximum features. In each layer, ReLU is used as activation function, and softmax is used as activation function in the fully connected layer. The rs-fMRI set is arranged into training, testing, and validation sets in the ratio of 70:20:10 on which the CNN and transfer learning models are applied. The models are trained and tested using the same dataset. Every model is trained for 100 epochs, and their accuracies and losses were recorded and considered the maximum accuracy. Among all, VGG16 was observed to have the high accuracy. The accuracy and loss curves of CNN, Inception-V3, Resnet, Densenet, VGG16, and Mobilenet are presented in Figs. 17.3 and 17.4, respectively. For the analyzing the results of the classifiers, several performance measures are used. The measures are F1 score, precision, Recall, and accuracy plots. The outcome is defined by how effectively the model was trained as shown in Table 17.1.
216
P. B. Lakshmi et al.
(a) CNN
(b) Inception
(c) Resnet
(d) Densenet
(e) VGG16
(f) Mobilenet
Fig. 17.3 Accuracy plots of proposed models
(a) CNN
(b) Inception
(c) Resnet
(d) Densenet
(e) VGG16
(f) Mobilenet
Fig. 17.4 Loss plots of proposed models
17 Classification of Autism Spectrum Disorder …
217
Table 17.1 Performance metrics of the proposed models Model CNN VGG16 Mobilenet InceptionV3 Densenet . F1
score Recall Precision
0.87 0.86 0.87
0.98 0.98 0.98
0.93 0.92 0.94
0.84 0.84 0.88
0.89 0.89 0.9
Resnet 0.82 0.82 0.86
17.5 Conclusion and Future Directions In recent years, applications of deep learning techniques in health informatics have grown significantly, pointing to the apparent usefulness of brain imaging-based classification and diagnosis of mental health disorders including ASD. AI models have shown great promise in generating more objective and accurate diagnoses, as well as for early prediction and reliable identification of Autism. This article aims to classify ASD based on rs-fMRI images using deep learning architectures and examines the effectiveness of CNN and five transfer learning approaches. The highest accuracy (95.8%) was achieved in the VGG16 model, and it outperforms the other models with an average of 4.96% in terms of accuracy. In future, apart from the sample size, we must focus not only on accuracy but also reproducibility and generality, opening up impartial studies on mental health that will not only aid clinical decision-making, but also expand the opportunities provided by AI-based tools by encouraging clinicians to incorporate AI-based computational medicine approaches.
References 1. Ali, M.T., Elnakieb, Y.A., Shalaby, A., Mahmoud, A., Switala, A., Ghazal, M., Khelifi, A., Fraiwan, L., Barnes, G., El-Baz, A.: Autism classification using smri: a recursive features selection based on sampling from multi-level high dimensional spaces. In: Proceedings of the IEEE 18th International Symposium on Biomedical Imaging (ISBI), pp. 267–270 (2021) 2. American Psychiatric Association.: Diagnostic and Statistical Manual for Mental Disorders, 5th edn. https://doi.org/10.1176/appi.books.9780890425596 (2013) 3. Autism Speaks.: Autism and health: A special report. https://www.autismspeaks.org/autismstatistics-asd (2021) 4. Yang, X., Islam, M.S., Khaled, A.A.: Functional connectivity magnetic resonance imaging classification of autism spectrum disorder using the multisite abide dataset. In: Proceedings of the 2019 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI), pp. 1–4 (2019) 5. Di Martino, A., Mostofsky, S.: Autism brain imaging data exchange. fcon_1000.projects.nitrc.org/indi/abide/abide_I.html (2016) 6. Lai, M., Lee, J., Chiu, S., Charm, J., So, W.Y., Yuen, F.P., Kwok, C., Tsoi, J., Lin, Y., Zee, B.: A machine learning approach for retinal images analysis as an objective screening method for children with autism spectrum disorder. EClinicalMedicine 28. Article 100588 (2020) 7. ang, C., Zheng, W., Zong, Y., Qiu, N., Lu, C., Zhang, X., Ke, X., Guan, C.: Automatic identification of high-risk autism spectrum disorder: a feasibility study using video and audio data under the still-face paradigm. IEEE Trans. Neural Syst. Rehabil. Eng. 28(11), 2401–2410 (2020)
218
P. B. Lakshmi et al.
8. Piana, S., Malagoli, C., Usai, M.C., Camurri, A.: Effects of computerized emotional training on children with high functioning autism. IEEE Trans. Affect. Comput. 12(4), 1045–1054 (2019) 9. Rusli, N., Sidek, S.N., Yusof, H.M., Ishak, N.I., Khalid, M., Dzulkarnain, A.A.A.: Implementation of wavelet analysis on thermal images for affective states recognition of children with autism spectrum disorder. IEEE Access 8, 120818–120834 (2020) 10. Islam, S., Akter, T., Zakir, S., Sabreen, S., Hossain, M.I.: Autism spectrum disorder detection in toddlers for early diagnosis using machine learning. In: Proceedings of the IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE), pp. 1–6 (2020) 11. Akter, T., Khan, M.I., Ali, M.H., Satu, M.S., Uddin, M.J., Moni, M.A.: Improved machine learning based classification model for early autism detection. In: Proceedings of the 2021 2nd International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST), pp. 742–747 (2021) 12. Mujeeb Rahman, K.K., Subashini, M.M.: Identification of autism in children using static facial features and deep neural networks. Brain Sci. MDPI 12(1). Article 94 (2022) 13. Talkar, T., Williamson, J.R., Hannon, D.J., Rao, H.M., Yuditskaya, S., Claypool, K.T., Sturim, D., Nowinski, L., Saro, H., Stamm, C., Mody, M.: Assessment of speech and fine motor coordination in children with autism spectrum disorder. IEEE Access 8, 127535–127545 (2020) 14. Ke, F., Choi, S., Kang, Y.H., Cheon, K.A., Lee, S.W.: Exploring the structural and strategic bases of autism spectrum disorders with deep learning. IEEE Access 8, 153341–153352 (2020) 15. Sharif, H., Khan, R.A.: A novel machine learning based framework for detection of autism spectrum disorder (ASD). Appl. Artif. Intell. 36(1). Article 2004655 (2022) 16. Bosl, W.J., Tager-Flusberg, H., Nelson, C.A.: EEG analytics for early detection of autism spectrum disorder: a data-driven approach. Sci. Rep. 8(1), 1–20 (2018) 17. Lavanga, M., De Ridder, J., Kotulska, K., Moavero, R., Curatolo, P., Weschke, B., Riney, K., Feucht, M., Krsek, P., Nabbout, R., Jansen, A.C.: Results of quantitative EEG analysis are associated with autism spectrum disorder and development abnormalities in infants with tuberous sclerosis complex. Biomed. Sig. Process. Control 68, 102658 (2021) 18. Oh, S.L., Jahmunah, V., Arunkumar, N., Abdulhay, E.W., Gururajan, R., Adib, N., Ciaccio, E.J., Cheong, K.H., Acharya, U.R.: A novel automated autism spectrum disorder detection system. Complex Intell. Syst. 7(5), 2399–2413 (2021) 19. Weiss, K., Khoshgoftaar, T.M., Wang, D.: A survey of transfer learning. J. Big Data 3(1), 1–40 (2016) 20. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826 (2016) 21. Mascarenhas, S., Agarwal, M.: A comparison between vgg16, vgg19 and resnet50 architecture frameworks for image classification. In: Proceedings of the International Conference on Disruptive Technologies for Multi-Disciplinary Research and Applications (CENTCON) vol. 1, pp. 96–99 (2021) 22. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017) 23. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint, arXiv:1409.1556 (2014)
Chapter 18
Transformer-Based Attention Model for Email Spam Classification V. Sri Vinitha , D. Karthika Renuka , and L. Ashok Kumar
Abstract Over the past few decades, communication has become easier due to the rapid development in technology. Although several modes exist for communication, in this era of Internet, electronic mail or email, turn out to be very popular because of its effectiveness, inexpensive, and easy to use for personal communication as well as business purposes and sharing important information in the form of text, images, documents, etc. to others. This proficiency leads to email being exposed to numerous attacks together with spamming. At present, spam email is a major source of concern for email users where unbidden messages, used for business purposes, are directed extensively to several mailing lists, entities, or newsgroups. These spam emails are used for the purpose of advertising products, collecting personal information, sending destructive contents in the form of executable file to outbreak user systems or the link to malicious website to steal confidential data such as hacking bank accounts, passwords leading to reduction in efficiency, security threats, consume server storage space, and unessential consumption of network bandwidth. Currently, there are 47.3% spam emails out of all emails and henceforth it become necessary to build a competent spam filters to categorize and block spam email. In order to enhance the accuracy of the model, natural language processing is used. In the proposed framework, efficacy of word embedding is offered to categorize the spam emails. It fine-tunes the pretrained Bidirectional Encoder Representations from Transformers (BERT) model to classify the legitimate emails and spam emails. Attention layer is used by the BERT model to incorporate the context of the text into that perspective. The outcomes are compared with minimum Redundancy Maximum Relevance (mRMR), Artificial Neural Network, Recurrent Neural
V. S. Vinitha Department of IT, Bannari Amman Institute of Technology, Sathyamangalam, Tamil Nadu, India e-mail: [email protected] D. K. Renuka (B) Department of IT, PSG College of Technology, Coimbatore, Tamil Nadu, India e-mail: [email protected] L. A. Kumar Department of EEE, PSG College of Technology, Coimbatore, Tamil Nadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_18
219
220
V. S. Vinitha et al.
Network, and Long Short-Term Memory (LSTM) for Lingspam, Enron, and Spamassassin dataset. The proposed method accomplished the uppermost correctness of 97 and 98% F1-score.
18.1 Introduction At present, with the rapid development of the Internet, email seems to have become an effective tool for exchanging information both in personal as well as professional life due to its ease of communication, high efficiency, fastness, inexpensive, and accessible from everywhere [1]. Users are able to send any form of data like text, image, audio, video, links, and documents over email using the Internet. Many people started advertising the product with low cost because of its increase in use among people. Although email communication has lots of advantages, it has some weaknesses like spam emails, viruses, and phishing emails. In recent years, with the growing reputation of email unwelcome email may incline from anywhere termed as spam [2]. It is sending undesirable information in huge amounts to numerous email accounts. Email is not only used for communication purposes; spammers use this for spreading malwares, viruses, and phishing links advertising any products, handling jobs, resolving client queries [3]. These spam emails not only affect the performance or transmission delays, in addition it also causes financial problems to the user by collecting private information through phishing emails. Current trend indicates that 94% of Internet transfers appear to be spam. So, spam classification has to be done for classifying the legitimate and spam emails [4]. Although several spam filter mechanisms exist, identifying spam emails has turn out to be a thought-provoking task because of the new outbreak mechanisms developed by the spammer. Therefore, it is necessary to build a better spam filter to classify the spam emails accurately with the intention of diminishing the reading period and deleting spam mails, wastage of network resources, bandwidth and stealing of confidential information. Text classification [5] is a special occurrence of the classification problem where the input email is in the form of text and the goal is to distinguish the spam emails [6]. Here, pretrained BERT, one of the most popular transformer models, and fine-tunes it on spam detection. Traditional model has the capability to train only the word that immediately precedes or follows it but BERT model will train the whole set of words in a sentence present in the email. The main contribution of this research work is summarized as below, • In this research work, pretrained BERT model is proposed for classifying the spam and ham email. • The Ling spam, Enron, and Spamassassin dataset has been taken for email spam classification.
18 Transformer-Based Attention Model for Email Spam Classification
221
• Then the dataset is preprocessed for reducing irrelevant data. First, the emails are tokenized into individual keywords. Then from the tokenization, stop words are removed. Next, stemming is done with the tokens obtained from the previous step to reduce the size of word to its root word. • After eliminating irrelevant information, the preprocessed data are given to Term frequency-inverse document frequency (TF-IDF)-based features extraction for extract the effective features. • Then these extracted features are given to the BERT pretrained model for classifying the spam and ham emails. The hyper parameters are tuned with learning rate, batch size, epochs, and dropout rate to obtain better accuracy. • Then the performance is analyzed using different evaluation metrics like accuracy, precision, and recall. Also, the proposed BERT approach is compared with the existing techniques. The following is how the entire paper is structured; Segment 1 deliberates the literature survey. Segment 2 addresses the proposed methodology and algorithm in detail. Segment 3 discusses the empirical evaluation of the proposed model. Segment 4 provides the performance analysis. Lastly, Segment 5 addresses the summarization of the whole paper.
18.2 Related Works Nowadays, it becomes a great challenging task for email service providers to detect spam emails. It causes traffic on the Internet, wastes network bandwidth and contains malicious links which will direct mostly to phishing webpages for the purpose of advertising, theft of personal information or even spending malwares. In order to diminish the error caused by inclusion of all spam attributes in identifying spam emails, Pashiri et al. [7] proposes a feature selection-based method using Sine Cosine Algorithm (SCA). In the spam base dataset, optimal features are selected by using SCA for training the Artificial Neural Network (ANN). The Sine Cosine algorithm showed higher accuracy, precision and sensitivity compared to other machine algorithms. Email becomes a popular medium for communication in this world of the Internet era. This spam is an undesirable message sent to bulk email addresses for stealing personal information, distributing malwares. Sumathi et al. [8] introduced Random Forests algorithm and Deep Neural Networks (DNN) for detecting spam emails. Random Forests algorithm uses Gini Index measure to select the features for constructing the decision. Classification was done using three algorithms namely Deep Neural Network, K-Nearest Neighbors (KNN), and Support Vector Machine (SVM). Deep Neural Networks provide better accuracy when compared to SVM and KNN. In this world of the Internet era, email plays a vital role in communication. Increase in spam emails causes huge loss every year, in order to reduce this huge loss spam
222
V. S. Vinitha et al.
mails should be properly classified. Initially word filters are used for classifying the spam emails, now new methodologies are available for classifying the entire sentences. Here, Basyar et al. [9] used LSTM and Gated Recurrent Unit (GRU) text classification methods for classifying spam emails. Experiments were conducted on ENRON dataset using dropouts and also without dropouts. Analysis shows that LSTM and GRU obtain the same accuracy which was superior to the base model XGBoost without using dropout. In the meantime, using dropout, LSTM provides better accuracy when compared to GRU and XGBoost. With the popularity of the Internet, nowadays many people use email for their communication. Several threats for Internet users were exploited by using techniques like phishing, spamming, spreading malwares, etc. Currently spam mails accounted for 60% of total mails, so there is a need for spam filters for safe and secure email communication. Initially, spam mails are in the form of text. For classifying these spam mails, several methodologies exist to attain better performance. Nowadays spammers use image spam to escape from the text-based spam filters. Sriram et al. [10] designed two Deep Convolutional Deep Neural Networks using three different dataset and transfer learning using pretrained CNN models like VGG19, Xception, etc. Finally, features extracted by Convolutional Neural Network (CNN) models are trained by using various machine learning classifiers. Some of the proposed models provide better accuracy with zero false positive rate. Jitendra Kumar et al. [11] recommend ANN with adaptive learning rate algorithm for classifying the spam emails. In the existing heuristics approaches, experimentation is done with different learning rates to identify the learning rate that provides best classification accuracy, it consumes more time. The proposed system initially assigns the learning rate and updates learning rate using back propagation technique. In each recursive call, if there is a decrease in the average total error, then it uses the revised learning rate or else it uses the previous learning rate itself. This process continues until it provides high classification accuracy with optimal weight parameters. It has been observed that the rate of convergence is faster and performs better for Adaptive Neural Network, and also avoid the problem of over fitting. Email becomes the essential part in everyone’s life both professional along with personal communication. Some of the emails received are irrelevant because of spammers, leading to wastage of network bandwidth, increased email traffic, time consuming, and security threats. As a result, spam emails must be filtered. In the existing approaches, they mainly focus on the content present in the body of the email. Kulkarni et al. [12] uses email header for filtering the spam mails. Experiments were conducted on three datasets with five feature selection techniques and five classification algorithms. For obtaining better accuracy, minimum 5 and maximum 14 features are required. Minimum and maximum attributes are created by correlation and relief-based feature selection techniques. When compared to other algorithms, Random Forests technique offers improved accuracy.
18 Transformer-Based Attention Model for Email Spam Classification
223
18.3 Proposed Methodology The process of proposed email spam classification using BERT technique is shown in Fig. 18.1. Ling spam, Enron, and Spamassassin dataset is taken for performing email spam classification. After preprocessing, it is pretrained and fine-tuned using BERT technique. Finally, the result of proposed framework is compared with mRMR [13] and LSTM [14, 15].
18.3.1 Bidirectional Encoder Representations from Transformers The proposed system is used to classify the spam and ham emails from the three email dataset as shown in Fig. 18.2. For this spam classification, a pretrained BERT model is used because BERT is a big neural network with a huge number of parameters ranging from 100 to 300 million parameters. So, training a BERT model from scratch for the email dataset would result in over fitting. To prevent over fitting issues, the BERT model [16–18] is pretrained on a huge dataset and train on email dataset with the inclusion of softmax layer at the top of BERT to classify the spam and ham emails. It is an effective one as it uses a self-attention mechanism. The benefit of using this transformer-based model is that it processes the entire sentence in both the direction and it does not require labeled data to pretrain instead it require massive amount of unlabeled text data. So, using this model data can be compensated and provide better results. Based on the total number of parameters, computational power and cost, BERT has two models namely BERT-Base and Large. In BERT Large, the model correctness is affected irrespective of the learning rate because it needs additional memory than the BERT-Base. Hence, by considering the amount of computational power available, BERT-Base-uncased model has been used for our proposed methodology to classify
Email
Email Dataset
Preprocessing
Preprocessin
Compiling
Encode
Training
Dropout
Evaluation
Dense
Fig. 18.1 Proposed framework for email spam classification
Prediction
Ham Email
Spam Email
224
V. S. Vinitha et al.
Classification: Ham / Spam
Output Layer: Softmax Dense Layers
C
T1
T2
…….
TN
TSEP
BERT ECLS
E1
E2
…….
EN
ESEP
CLS
tok1
tok2
…….
tokN
SEP
Tokenization
Preprocessing
Input: Email Dataset (Lingspam / Enron / Spamassassain) Fig. 18.2 BERT architecture for email spam classification
the spam emails. It has 12 layers with 768 hidden units, 12 self-attention heads, and 110 million parameters. For this classification task, train with mask [CLS] and beforehand fine-tuning change all the text in email dataset into lower case with the help of tokenizer furnished through Google for BERT to predict the spam and ham emails. Include softmax function at the upper end of the BERT model to forecast the likelihood of the labels S in the email dataset. There are two phases in the BERT framework namely pre-training and fine-tuning. In order to avoid over fitting in the dataset, messages present in the email dataset is divided into 70 and 30% for training and testing during the training phase, respectively. The training set is used to find pattern in the message and also it will diminish the error rate. Whereas the testing set is used to assess how well the email classification is performing, which is not visible for the training set. In the last hidden state, messages are cumulated for classifying the spam emails which will be the output of the transformer model denoted as a vector S ∈ RH, where H is the hidden size.
18 Transformer-Based Attention Model for Email Spam Classification
225
For email spam classification, new criterion W ∈ R ∧ (K ∗ H) will be included at the time of fine-tuning, where K represents the number of classes, i.e., spam and ham. The likelihood for these K labels is calculated as shown in (18.1): T
P=
e S·W ∑K S · W T
(18.1)
where P ∈ RK are the label likelihood. To increase the prediction of class label spam or ham, pretrained attributes of BERT-Base, uncased model, and attributes of classification layer W will be fine-tuned together. The effective Adam optimizer is chosen to train the parameters because it has the capability to compute individual learning rate for each of the parameter which combines the property of Root Mean Square Propagation and traditional stochastic gradient descent with momentum. Using Adam optimizer, the parameter value is updated as shown in (18.2) and (18.3): m t = β1 m t−1 + (1 − β1 )gt
(18.2)
vt = β2 vt−1 + (1 − β2 )gt2
(18.3)
where mt and vt are mean estimate and variance at tth time step of the gradients gt, respectively. β 1 and β 2 are the decay rates. Moment’s mt and vt are then corrected for bias as shown in (18.4) and (18.5): Δ
mt = Δ
vt = Δ
mt 1 − β1t
(18.4)
vt 1 − β2t
(18.5)
Δ
where, m t and vt are the corrected mt and vt, respectively. These are then used to update parameter W as shown in (18.6): η mt Wt+1 = Wt − √ vt + ∈
Δ
Δ
(18.6)
where η is the learning rate and ∈ is a smoothing term. For classifying the spam emails, it outperforms better for BERT-Base-uncased model with Adam optimizer, softmax activation function, batch size of 32, 1e − 5 learning rate, and 200 epochs. Here, dropout value is set to 0.2 in order to avoid the problem of over fitting.
226
V. S. Vinitha et al.
18.3.2 Hyper Parameters The model is fine-tuned by using the Adam optimizer, learning rate of 1e – 5, and a linear learning rate scheduler. Here, the batch size of 32 is used and model is fine-tuned on individual tasks for 200 epochs. The proposed model, which has two fully connected linear layers with batch normalization layers, dropout layers, and certain activation functions, was produced by hyper parameter tuning and is shown in Table 18.1. The output of the [CLS] token side was used as the input in the proposed model to increase the detection of spam emails. This proposed model’s input is a vector with a length of 768, and it has 512 neurons in the linear layer. This linear layer will take in 768 input vectors and output 512 vectors, producing the shape (768, 512). After adding a dropout layer with a factor of 0.2 over the linear layer, the over fitting of the model will be lessened by ignoring 20% of the neuron outputs from the linear layer. By minimizing generalization error, the batch normalization layer is utilized to speed up training. The output from the batch-normalized outputs is subjected to the activation function ReLU. Once again, the dropout layer is added at the output of the batch normalization with 0.2 factors. The discrepancy between false positive and true negative values in the trained model might be avoided by adding the dropout layer before and after the batch normalization layer. The F1-score, precision, and recall values were all improved by the dropout layers. The measures used to assess the performance of the model are accuracy and F1-score. Accuracy measures how well the model categorizes spam emails based on the input emails that correlate to each spam email. The distribution of the data samples is represented by the F1-score measure. A linear layer with the shape (512, 2) and log softmax activation is then applied after one more iteration to determine if the input is ham or spam. The final result indicates whether the input sample is considered spam or ham based on a value of ‘0’ or ‘1’. This proposed model performed better than other model implementations after hyper parameter tuning. These layers and neurons were chosen based on the hyper parameter tuning. In order to update the weights and help the model converge more quickly, we employ the Adam optimizer, with the learning rate 1e − 5, to achieve a better result. The used loss function for this model is a cross-entropy loss. Table 18.1 The optimal value of hyper parameters for BERT
Hyper parameter
Optimal value
Learning rate
0.00001
Batch size
32
Epochs
200
Dropout rate
0.2
Optimization algorithm
Adam optimizer
Pretrained model
Bert-base-uncased
18 Transformer-Based Attention Model for Email Spam Classification
227
18.4 Evaluation Metrics The performance of proposed email spam classification is analyzed using several measures [19, 20] are given below: Accuracy [21] is calculated as the percentage of the total quantity of correct spam and ham classifications as shown in (18.7). Accuracy =
TP + TN TP + TN + FP + FN
(18.7)
Recall is the percentage of perfectly classified ham emails to the perfectly classified ham emails and falsely classified spam emails as ham as shown in (18.8). Recall =
TP TP + FN
(18.8)
Precision is the percentage of perfectly classified ham emails to the perfectly classified ham emails and falsely classified ham emails as spam is shown in (18.9). Precision =
TP TP + FP
(18.9)
where • True Positive (TP) signifies the ham emails were perfectly classified by the BERT model; • True Negative (TN) indicates the spam emails were perfectly classified by the BERT model; • False Positive (FP) implies the ham emails were imperfectly classified as belonging to the spam emails by the BERT model; • False Negative (FN) indicates the spam emails were imperfectly classified as belonging to the ham emails by the BERT model. Generally, confusion matrix is obtained using true and false positive, true and false negative. It is shown in the form of matrix where column represents the emails in the actual class and row depicts the emails in the predicted class and vice versa are shown in the below Fig. 18.3.
228
V. S. Vinitha et al.
Fig. 18.3 Confusion matrix for ling spam dataset
Table 18.2 Summary of the dataset
Corpus
Total email
Spam email
Ham email
Ling spam
2893
481
2412
Enron
32,638
16,544
16,094
Spamassassin
6047
1897
4150
18.5 Experimental Results 18.5.1 Dataset Ling spam dataset consists of 2893 samples with 481 spam messages and 2412 ham messages. This dataset was prepared by altering the Linguist list which was focused mainly on advertisement regarding jobs, software discussion, and research opportunity. Enron dataset consists of 32,638 emails with 16,544 spam emails and 16,094 ham emails. It is one of the standard benchmark dataset which includes a wide range of samples from nearly all of the possibilities. Spamassassin dataset consists of 6047 emails with 1897 spam mails and 4150 ham emails. It is also considered as one of the standard benchmark dataset. This dataset offers one classification level for spam messages and two classification levels for ham messages, such as easy and hard ham messages. However, the unified approach combines these two categories of ham messages into a single category. The summary of the dataset is shown in the Table 18.2.
18.5.2 Results and Discussion The three datasets namely Ling spam, Enron, and Spamassassin were taken. Then preprocessing technique is applied to remove the unwanted data and TF-IDF is applied to extract the features from the dataset. Finally, classification was done using proposed methodology by fine-tuning the hyper parameters to achieve the better
18 Transformer-Based Attention Model for Email Spam Classification
229
result. Precision, recall, F-score metrics are computed for Ling spam, Enron, and Spamassassin dataset using BERT technique. The proposed model training and validation loss is recorded for different epochs. Figure 18.4 displays the training and validation loss for Ling spam dataset. It illustrates that the training and validation loss is being increased respectively with respect to increase in the number of epochs. Figures 18.5, 18.6, 18.7, and 18.8 displays the precision, recall, F1-score, and accuracy for Ling spam dataset. The model obtained better accuracy of about 97% after running it to 200 epochs by minimizing the loss of the model. The precision, recall, accuracy, and F1-score for Enron and Spamassassin dataset are shown in the Tables 18.3 and 18.4. Figure 18.9 shows the comparison of BERT model with existing methodology like minimum Redundancy Maximum Relevance (mRMR) feature selection techniques and Artificial Neural Network (ANN), Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM) deep learning techniques in connection with evaluation Fig. 18.4 Training and validation loss for ling spam dataset
Fig. 18.5 Precision for ling spam dataset
230 Fig. 18.6 Recall for ling spam dataset
Fig. 18.7 F1-score for ling spam dataset
Fig. 18.8 Accuracy for ling spam dataset
V. S. Vinitha et al.
18 Transformer-Based Attention Model for Email Spam Classification
231
Table 18.3 Performance analysis using different evaluation metrics for Enron dataset using BERT Number of epochs
Training loss
Validation loss
Precision
Recall
Ham
Spam
Ham
Spam
F1-score Ham
Spam
Accuracy
10
0.663
0.664
0.98
0.33
0.73
0.96
0.84
0.63
0.79
25
0.458
0.471
0.98
0.42
0.81
0.96
0.88
0.64
0.85
50
0.338
0.372
0.97
0.61
0.90
0.97
0.94
0.82
0.93
75
0.277
0.328
0.97
0.71
0.93
0.96
0.93
0.88
0.96
100
0.341
0.306
0.96
0.70
0.93
0.96
0.93
0.88
0.96
125
0.284
0.204
0.97
0.73
0.94
0.96
0.94
0.9
0.97
150
0.291
0.213
0.97
0.71
0.94
0.96
0.94
0.89
0.96
175
0.249
0.305
0.97
0.73
0.94
0.96
0.94
0.9
0.97
200
0.237
0.332
0.98
0.78
0.96
0.93
0.94
0.92
0.97
Table 18.4 Performance analysis using different evaluation metrics for Spamassassin dataset using BERT Number of epochs
Training loss
Validation loss
Precision
Recall
Ham
Spam
Ham
Spam
F1-score Ham
Spam
Accuracy
10
0.572
0.627
0.97
0.29
0.70
0.93
0.82
0.61
0.74
25
0.467
0.426
0.97
0.38
0.78
0.94
0.86
0.62
0.81
50
0.247
0.328
0.96
0.56
0.89
0.93
0.92
0.80
0.89
75
0.283
0.328
0.97
0.69
0.91
0.92
0.91
0.86
0.94
100
0.382
0.269
0.96
0.68
0.91
0.91
0.91
0.86
0.93
125
0.315
0.178
0.96
0.71
0.92
0.92
0.92
0.88
0.94
150
0.347
0.184
0.96
0.67
0.92
0.92
0.92
0.87
0.93
175
0.296
0.265
0.96
0.70
0.91
0.92
0.92
0.87
0.96
200
0.273
0.324
0.97
0.74
0.94
0.90
0.93
0.90
0.96
metrics such as precision, recall, F1-score, and accuracy. From the figure, it is clearly seen that the BERT model simply outperforms than any other existing methods stated. The end result indicates that transfer learning can yield desirable outcomes in the case of identification of unsolicited spam emails. The fine-tuned BERT approach can accomplish an exactness of 97% on email data and is competent of outperforming the other two models by around 11 and 4% individually.
232
V. S. Vinitha et al.
Accuracy in %
100 95 90 85 80 75 mRMR
ANN
RNN
Ling spam
Enron
LSTM
BERT
Spamassassin
Fig. 18.9 Performance analysis for proposed methodology
18.6 Conclusion As there is an increase in usage of email, unwanted bulk emails called spam emails are also intensifying frequently. Millions of spam emails are directed through the Internet day-to-day to the large recipients for advertising products, distributing malwares, information theft and phishing, etc. These unwanted spam email will consume network bandwidth, storage space, and user time in large amounts. It also cost millions of dollars every year. Numerous techniques exist to detect spam emails from the non-spam emails but none has achieved 100% accurateness. To improve the model accuracy, a powerful and time efficient pretrained BERT model has been used in the proposed model for categorizing emails into two classes: spam and ham. The proposed model for Ling spam, Enron, and Spamassassin dataset is compared with minimum Redundancy Maximum Relevance (mRMR), Artificial Neural Network (ANN), Recurrent Neural Network (RNN), and Long Short-Term Memory (LSTM) using the evaluation metrics namely precision, recall, and accuracy. Pretrained BERT model classifies the emails with an accuracy of 97% which provides better accuracy when compared to existing techniques.
References 1. Hadeel, M.S.: An Efficient feature selection algorithm for the spam email classification. Period. Eng. Nat. Sci. 9(3), 520–531 (2021) 2. Thirumagal Dhivya, S., Nithya, S., Sangavi Priya, G., Pugazhendi, E.: Email spam detection and data optimization using NLP techniques. Int. J. Eng. Res. Technol. 10(8), 38–49 (2021) 3. Unnikrishnan, V., Kamath, P.: Analysis of email spam detection using machine learning. Int. Res. J. Modern. Eng. Technol. Sci. 3(9), 409–416 (2021)
18 Transformer-Based Attention Model for Email Spam Classification
233
4. Bansal, L., Tiwari, N.: Feature selection based classification of spams using fuzzy support vector machine. In: Proceedings of the 2020 International Conference on Smart Electronics and Communication (ICOSEC), pp. 258–263 (2020) 5. Lovelyn Rose, S., Ashok Kumar, L., Karthika Renuka, D.: Deep Learning using Python. Wiley, Hoboken (2019) 6. Sethi, M., Chandra, S., Chaudhary, V., Yash, S.: Email Spam detection using machine learning and neural networks. Int. Res. J. Eng. Technol. 8(4), 349–355 (2021) 7. Pashiri, R.T., Rostami, Y., Mahrami, M.: Spam detection through feature selection using artifcial neural network and sine–cosine algorithm. J. Math. Sci. 14(3), 193–199 (2020) 8. Sumathi, S., Pugalendhi, G.K.: Cognition based spam mail text analysis using combined approach of deep neural network classifer and random forest. J. Ambient Intell. Hum. Comput. 12(6), 5721–5731 (2020) 9. Basyar, I., Adiwijaya, S., Murdiansyah, D.T.: Email spam classification using gated recurrent unit and long short-term memory. J. Comput. Sci.Comput. Sci. 16(4), 559–567 (2020) 10. Sriram, S., Vinayakumar, R., Sowmya, V., Krichen, M., Noureddine, D.B., Anivilla, S., Soman, K.P.: Deep convolutional neural network based image spam classification. In: Proceedings of the 6th Conference on Data Science and Machine Learning Applications (CDMA), pp. 112–117, Riyadh, Saudi Arabia (2020) 11. Jitendra Kumar, A., Santhanavijayan, S., Rajendran, B., Bindhumadhava, B.S.: An adaptive neural network for email spam classification. In: Proceedings of the 2019 Fifteenth International Conference on Information Processing (ICINPRO), pp. 1–7, Bengaluru, India (2019) 12. Kulkarni, P., Jatinderkumar, S., Saini, R., Acharya, H.: Effect of header-based features on accuracy of classifiers for spam email classification. Int. J. Adv. Comput. Sci. Appl.Comput. Sci. Appl. 11(3), 396–401 (2020) 13. Sri Vinitha, V., Karthika Renuka, D.: MapReduce mRMR: random forests-based email spam classification in distributed environment. In: Data Management, Analytics and Innovation, pp. 241–253. Springer, Singapore (2020) 14. AbdulNabi, I., Yaseen, Q.: Spam email detection using deep learning techniques. Proced. Comput. Sci. 184, 853–858 (2021) 15. Isik, S., Kurt, Z., Anagun, Y., Ozkan, K.: Recurrent neural networks for spam e-mail classification on an agglutinative language. Int. J. Intell. Syst. Appl. Eng. 8(4), 221–227 (2020) 16. Liu, S., Tao, H., Feng, S.: Text classification research based on Bert model and Bayesian network. In: Proceedings of the 2019 Chinese Automation Congress (CAC), pp. 5842–5846, Hangzhou, China (2019) 17. Lagrari, F., Elkettani, Y.: Customized BERT with convolution model: a new heuristic enabled encoder for Twitter sentiment analysis. Int. J. Adv. Comput. Sci. Appl.Comput. Sci. Appl. 11(10), 423–431 (2020) 18. Si, S., Wang, R., Wosik, J., Zhang, H., Dov, D., Wang, G., Carin, L.: Students need more attention: BERT-based attention model for small data with application to automatic patient message triage. In: Machine Learning for Healthcare Conference; Virtual, pp. 436–456 (2020) 19. Ali, N., Fatima, A., Shahzadi, H., Ullah, A., Polat, K.: Feature extraction aligned email classification based on imperative sentence selection through deep learning. J. Artif. Intell. Syst. 3(1), 93–114 (2021) 20. Ablel-Rheem, D.M., Ibrahim, A.O., Kasim, S., Almazroi, A.A., Ismail, M.A.: Hybrid feature selection and ensemble learning method for spam email classification. Int. J. Adv. Trends Comput. Sci. Eng. 9(1), 217–223 (2020) 21. Bhattacharya, P., Singh, A.: E-mail spam filtering using genetic algorithm based on probabilistic weights and words count. Int. J. Integr. Eng. 12(1), 40–49 (2020)
Chapter 19
Agriculture Land Image Classification Using Machine Learning Algorithms and Deep Learning Techniques Yarlagadda Mohana Bharghavi, C. S. Pavan Kumar, Yenduri Harshitha Lakshmi, and Kuncham Pushpa Sri Vyshnavi
Abstract Research into agriculture has been gaining steam and displaying signals of significant expansion over the last several years. The most recent to emerge, using a variety of computer technologies in deep learning and remote sensing are simplifying agricultural tasks. The classification of agricultural land cover by humans necessitates a large team of experts and is time-consuming when dealing with vast areas. To implement this project, we have utilized the machine learning and deep learning algorithms to classify the land cover using the RGB version of the EuroSat dataset. This helps to differentiate the agricultural land from the other distinct landscapes aiding the farmers to determine the fine land for cultivation of crops. In this study, we examined the competence between four classifiers namely K-Nearest Neighbors (KNN), Random Forest (RF), Support Vector Machine (SVM), and the pretrained ResNet50 model, which is a convolutional neural network (CNN) technique, are all machine learning classifiers. However, few research have examined the performances of these classifiers with different training sample sizes for the same remote sensing photos, with a particular focus on Sentinel-2 Multispectral Imager images (MSI). Finally, the accuracy rates of the machine learning methods were only fair, with the ResNet50 model producing the best results with a 97.46% accuracy rate.
19.1 Introduction Wide range of environmental research and development now use remotely sensed data as a significant component, especially in the categorization of spectral imagery that has emerged as an efficient application, which is used to generate maps of land vegetation cover and to discern the use of land in accordance with its purpose for human usage (Table 19.1). Y. M. Bharghavi · C. S. Pavan Kumar (B) · Y. H. Lakshmi · K. P. Sri Vyshnavi Department of Information Technology, Velagapudi Ramakrishna Siddhartha Engineering College, Vijaywada, Andhra Pradesh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_19
235
236
Y. M. Bharghavi et al.
Table 19.1 Literature review Author(s)
Methods used
Dataset used
Advantages
1. Helber et al. [1]
• CNN • Classification
Sentinal-2 satellite image dataset
• Introduction of Sentinal-2 satellite-based image dataset called EuroSAT • Provided benchmark for EuroSAT dataset using deep CNN, which can be used for land cover classification
2. Yassine et al. [4]
• CNN
EuroSAT dataset
• Used two different approaches for enhancement of lulc classification with Eurosat dataset
3. Li et al. [2] • CAM encoded CNN model • GRAD-CAM method
UC Merced dataset, aerial • DDRL-AM proposed for image dataset, classification of remote NWPU-RESISC45 dataset sensing data with two and EuroSAT dataset approaches
In this research, we assessed the efficiency of novel deep learning approaches in conjunction with high-resolution Sentinel-2 remote sensing data for a variety of agricultural land cover mapping applications. The Eurosat dataset was used to test our suggested deep neural network methods for taking into account the temporal correlation of Sentinel-2 data. We showed that excellent classification performance could be obtained using the traditional algorithms as well (KNN, SVM, and RF). We conducted experiments to show that convolutional neural networks are superior to traditional machine learning methods for classifying agricultural data based on Sentinel-2 satellite data. Differentiating between agricultural land cover types requires, which are typically characterized by same temporal characteristics that are complex. The trials demonstrate the suitability of a certain class of deep learning models (CNN) that specifically take the temporal correlation of the data into account [3–6]. This approach serves as an example of how data fusion techniques are applied to the analysis of the built environment and is a first step in creating an accurate globalscale map of urban land cover across time for the nations of India, Mexico, and the United States [7]. Using mono-temporal and multi-temporal Landsat sceneries, two classifications of agricultural land cover are provided [8]. Based on flower pollination algorithm, multispectral and panchromatic image fusion approach is proposed [9]. Automatic extraction of urban areas is from Landsat imagery, validation of images, and performing high-resolution mapping from GEE image dataset [10]. The comparison of RFs, RNN, TempCNN is done and concluded that RFs and TempCNN are good for classification of Sentinal-2 land image dataset over RNN [11]. The suggested method improves the quality of picture fusion while maintaining image resolution for remotely sensed images [12]. According to the findings, the KTH-Pavia Urban Extractor can accurately extract urban areas at a resolution of 30 m from a single
19 Agriculture Land Image Classification Using Machine Learning …
237
date of single-polarization ENVISAT ASAR data [13]. In order to explicitly take into account the temporal correlation of Sentinel-1 data, which were applied to the Camargue region, the study suggested using two deep RNN techniques, LSTM and GRU [14]. Based on geometrical, textural, and contextual data of land parcels and using the decision tree algorithm, the paper proposes a method for categorizing specific urban land use [15]. A review of the deep supervised learning, unsupervised learning, reinforcement learning, evolutionary computation, and indirect search for short programs encoding deep and huge networks was conducted [16]. The paper provides a thorough analysis of current developments in deep learning-based visual object recognition [17]. Examining the performance of four nonparametric algorithms for classification: SVM, RF, XGBoost, and DL, a complicated mixed-use landscape with eight LCLU classes has been selected as the study region in south-central Sweden [18]. This study is the first evaluation of conventional and emerging machine learning techniques for classifying land cover and land use over a complicated boreal terrain using multi-temporal Sentinel-2 data [19]. The classification of land cover and land use was the first use of joint deep learning; patch-based CNN and pixelbased MLP were combined by JDL with mutual reinforcement and complementarity, iterative updating was used to create a Markov process from the joint distributions of LCLU [20]. Proposed by the residual squeeze VGG16 compressed convolutional neural network model to address the speed and size issues, the suggested model condenses the previously and extremely successful VGG16 network [21]. Geometric predictions have been used to predict the area of agricultural land in Kendari City in 2027 and the area of agricultural land is projected to significantly decrease [22]. In this study, a system for classifying aquarium family fish species is put forth, it recognizes eight family fish species and 191 subspecies and the system’s performance is compared with that of other CNN architectures like AlexNet and VGGNet [23]. In order to improve the performance of classification systems, this study illustrates the advantages and drawbacks of data augmentation and it was discovered that, if label preserving transforms are available, it is preferable to do data augmentation in data space [24].
19.2 Literature Review This section focuses on the resources that improved our comprehension of the philosophy underlying the study of image categorization techniques. The study articles go into great detail regarding the various classification frameworks and algorithms that can be applied. We studied these publications and found that deep learning algorithms outperform machine learning methods.
238
Y. M. Bharghavi et al.
19.3 Proposed Method In this paper, we contrasted how well the KNN, SVM, and RF classifiers performed with the pretrained ResNet50 model after fine-tuning it. Our main goal is to demonstrate the competence of the algorithms on the EuroSat dataset along with necessity of classifying the agricultural land cover in order to facilitate the identification process in dense forest areas, vegetative land cover, and other similar landscapes. It has been demonstrated that splitting the Eurosat dataset in an 80:20 ratio produces the best results, hence we have considered the same for our implementation as well. The machine learning techniques were applied in the traditional approach without any alteration. Our ResNet50 model has over 16 million trainable parameters and which consists of 50 layers where 48 are convolution layer, 1 max pool layer, and 1 average pool layer. Model consists of four convolution blocks where three convolution layers are present in each of convolution block and identity block. In the outermost layer, dropout layer is added to prevent over fitting of the model and also softmax activation function is added. However, the deep neural network, i.e., pretrained ResNet50 model was fine-tuned to perform the image classification by removing a new set of completely connected layers with random initializations is used to replace the fully connected layers toward the end of the network, when the real class label predictions are made. All of the convolution layers below the head have now been frozen in order to preserve the characteristics that have already been learned and prevent backpropagation. The system architecture (see Fig. 19.1) shows the brief flow of the process and description of all the modules. We used the Eurosat image dataset of Sentinel 2 satellite and split the data in 80:20 ratios. The algorithm we mainly focused on is Resnet50.
Fig. 19.1 Design methodology
19 Agriculture Land Image Classification Using Machine Learning …
239
Fig. 19.2 ResNet50 architecture
19.3.1 Dataset Description The EuroSat dataset is based on multispectral image data from the Sentinel-2 satellite considered as an exceptional dataset. The dataset comprises 27,000 labeled georeferenced images in 13 spectral bands with 2000–3000 images per each class split up into 10 distinct scene classes. The images measure 64 × 64 pixels. In this work, we have only used the RGB version (bands 2, 3, and 4) of the EuroSat dataset for the implementation of the models. The dataset is split into 80–20 ratios for training and testing of the model. The figures show (See Figs. 19.2 and 19.3) the class distribution of labels in the dataset and sample images of dataset (Table 19.2).
19.4 Experimental Results Utilizing evaluation criteria such as precision, recall, F1 score, and accuracy values derived from the confusion matrix, the performance of the classifiers is evaluated and compared. The loss and accuracy of the model are visualized in the form of graphs. During the training of the models SVM, RF, KNN, CNN, accuracy of models are visualized using graphs. The graph in figures (see Figs. 19.4, 19.5, 19.6 and 19.7) shows accuracy of SVM, RF, KNN, and ResNet50 model. Among the four classifiers the fine-tuned ResNet50 model showed the highest accuracy of 97.46% followed by random forest with 74.38%, SVM with 86.38%, and KNN with 79.76%. All these results are shown in following figure (see Fig. 19.8) confusion matrix of ResNet50. The matrix displays the corresponding scores for each type of land, as indicated by the labels in the dataset, which are annual crop, forest, herbaceous vegetation, highway, industrial, pasture, permanent crop, residential, river, and sea lake (Figs. 19.9 and 19.10). The accuracy of models are compared and tabulated (in the Table 19.3). The highest accuracy is attained by ResNet50. It is observed that the ResNet50 model suits the dataset efficiently with less sensitivity toward the data (Table 19.4).
240
Y. M. Bharghavi et al.
Fig. 19.3 Accuracy graph of SVM at epoch-5
Table 19.2 Description of classes in our dataset
Class name
Description
Annual crop
Growing crops or trees
Forest
Densely grown trees
Herbaceous vegetation
Non-woody plants
Highway
Roadways and vehicles
Industrial
Massive plots with warehouses
Pasture
Grassland
Permanent crop
Arable crops, cultivation land
Residential
Buildings and houses
River
Waterbody with land
SeaLake
Only water bodies
19 Agriculture Land Image Classification Using Machine Learning … Fig. 19.4 Accuracy graph of SVM at epoch-10
Fig. 19.5 Accuracy graph of SVM at epoch-15
241
242 Fig. 19.6 Accuracy graph of SVM at epoch-20
Fig. 19.7 Accuracy and loss graph of ResNet50 at epoch-5
Y. M. Bharghavi et al.
19 Agriculture Land Image Classification Using Machine Learning …
Fig. 19.8 Accuracy and loss graph of ResNet50 at epoch-10
Fig. 19.9 Accuracy and loss graph of ResNet50 at epoch-15
243
244
Y. M. Bharghavi et al.
Fig. 19.10 Accuracy and loss graph of ResNet50 at epoch 20
Table 19.3 Classification results Algorithm
ResNet50
SVM
KNN
RF
Accuracy
97.46
86.38
79.76
74.38
F1-score
97.46
86.39
78.46
73.33
Precision
97.24
86.39
78.89
73.27
Recall
97.65
86.38
78.73
73.76
Table 19.4 Accuracy comparison table of models Class\Algorithm
ResNet50
SVM
KNN
RF
Annual crop
0.964883
0.862612
0.737944
0.717944
Forest
0.992543
0.833093
0.760152
0.730152
Herbaceous vegetation
0.974444
0.851208
0.745971
0.715971
Highway
0.970677
0.853159
0.754963
0.724963
Industrial
0.959585
0.814575
0.767522
0.737522
Pasture
0.970996
0.833974
0.771769
0.731769
Permanent crop
0.962000
0.857289
0.717071
0.727071
Residential
0.974776
0.810213
0.717658
0.717658
River
0.978218
0.837168
0.748171
0.728171
SeaLake
0.991653
0.827213
0.772637
0.732637
19 Agriculture Land Image Classification Using Machine Learning …
245
19.5 Conclusion In this study, we compared the performance of the traditional machine learning algorithms K-nearest neighbors (KNN), random forest (RF), and support vector machine with the performance of the deep neural network approaches (ResNet50) on the Eurosat image dataset for the classification of agricultural land (SVM). ResNet50, the best-fitting of the four classifiers for the strategy, demonstrated the maximum accuracy of 97.46%.
References 1. Helber, P., Bischke, B., Dengel, A., Borth, D.: EuroSAT: a novel dataset and deep learning benchmark for land use and land cover classification. IEEE J. Selected Topics Appl. Earth Observations Remote Sens. (2019) 2. Li, J., Lin, D., Wang, Y., Xu, G., Zhang, Y., Ding, C., Zhou, Y.: Deep discriminative representation learning with attention map for scene classification. In: Remote Sensing (MDIP) (2020) 3. Oo, T. K., Arunrat, N., Sereenonchai, S., Ussawarujikulchai, A., Chareonwong, U., Nutmagul, W.: Comparing four machine learning algorithms for land cover classification in gold mining: a case study of Kyaukpahto Gold Mine, Northern Myanmar. In: Sustainability (MDIP) (2022) 4. Yassine, H., Tout, K., Jaber, M.: Improving lulc classification from satellite imagery using deep learning—eurosat dataset. In: The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences (2021) 5. Naushad, R., Kaur, T., Ebrahim, G.: Deep transfer learning for land use and land cover classification: a comparative study. In: Sensors (2021) 6. Asawa, T.S., Balaji, V., Helwatkar, T.: Deep ensemble learning for agricultural land mapping and classification from satellite images. Int. J. Eng. Res. Technol. (2021) 7. Goldblatt, R., Stuhlmacher, M.F., Tellman, B., Clinton, N., Hanson, G., Georgescu, M., Wang, C., Serrano- Candela, F., Khandelwal, A.K., Cheng, W.H., Balling, R.C.: Using Eurosat and nighttime lights for supervised pixel-based image classification of urban land cover. Remote Sens. Environ. 205, 253–275 (2018) 8. Samaniego, L., Schulz, K.: Supervised Classification of Agricultural Land Cover Using a Modified k-NN Technique (MNN) and Landsat Remote Sensing Imagery (2009) 9. Gharbia, R., Hassanien, A.E., El-Baz, A.H., Elhoseny, M., Gunasekaran, M.: Multi-spectral and panchromatic image fusion approach using stationary wavelet transform and swarm flower pollination optimization for remote sensing applications. In: Future Generation Computer Systems (2018) 10. Patel, N.N., Angiuli, E., Gamba, P., Gaughan, A., Lisini, G., Stevens, F.R., Tatem, A.J., Trianni, G.: Multitemporal settlement and population mapping from Eurosat using Google Earth Engine. Int. J. Appl. Earth Obs. Geoinf. 35, 199–208 (2015) 11. Pelletier, C., Webb, G. I., Petitjean, F.: Deep Learning for the Classification of Sentinel-2 Image Time Series (2019) 12. Gharbia, R., El-Baz, A.H., Hassanien, A.E.: An adaptive image fusion rule for remote sensing images based on the particle swarm optimization. In: 2016 International Conference on Computing, Communication, and Automation (ICCCA), pp. 1080–1085. IEEE (2016) 13. Ban, Y., Jacob, A., Gamba, P.: Spaceborne SAR data for global urban mapping at 30 m resolution using a robust urban extractor. ISPRS J. Photogramm. Remote Sens. 103, 28–37 (2015) 14. Ndikumana, E., Minh, D.H.T., Baghdadi, N., Courault, D., Hossard, L.: Deep Recurrent Neural Network for Agricultural Classification using multitemporal SAR Sentinel-1 for Camargue, France (2018)
246
Y. M. Bharghavi et al.
15. Wu, S.S., Qiu, X., Usery, E.L., Wang, L.: Using geometrical, textural, and contextual information of land parcels for the classification of detailed urban land use. Ann. Assoc. Am. Geogr. 99(1), 76–98 (2009) 16. Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015) 17. Vibhute, A.D., Gawali, B.W.: Analysis and Modeling of Agricultural Land use using Remote Sensing and Geographic Information System (2013) 18. Khalifa, N.E.M., Taha, M.H.N., Hassanien, A.E.: Aquarium family fish species identification system using deep neural networks. In: International Conference on Advanced Intelligent Systems and Informatics, pp. 347–356. Springer, Cham (2018) 19. Wong, S.C., Gatt, A., Stamatescu, V., McDonnell, M.D.: Understanding Data Augmentation for Classification: When to Warp? (2016).arXiv preprint arXiv:1609.08764 20. Pavan Kumar, C.S., Dhinesh Babu, L.D.: Novel text preprocessing framework for sentiment analysis. In: Satapathy, S., Bhateja, V., Das, S. (eds.) Smart Intelligent Computing and Applications. Smart Innovation, Systems and Technologies, vol. 105. Springer (2019) 21. Pavan Kumar, C.S., Dhinesh Babu, L.D.: Fuzzy Based Feature Engineering Architecture for Sentiment Analysis of Medical Discussion over Online Social Networks, pp. 11749–11761 (2021) 22. Varun, P.S., Manohar, G.L., Kumar, T.S., Pavan Kumar, C.S.: Novel sentiment analysis model with modern bio-NLP techniques over chronic diseases. In: Satapathy, S.C., Peer, P., Tang, J., Bhateja, V., Ghosh, A. (eds.) Intelligent Data Engineering and Analytics. Smart Innovation, Systems and Technologies, vol. 266. Springer (2022) 23. Kumar, C.S.P., Babu, L.D.D.: Evolving dictionary based sentiment scoring framework for patient authored text. Evol. Intel. 14, 657–667 (2021) 24. Agarwal, R., Jain, R., Regunathan, R., Pavan Kumar, C.S.: Automatic attendance system using face recognition technique. In: Kulkarni, A., Satapathy, S., Kang, T., Kashan, A. (eds.) Proceedings of the 2nd International Conference on Data Engineering and Communication Technology. Advances in Intelligent Systems and Computing, vol. 828 (2019)
Chapter 20
A Comprehensive Machine Learning Approach in Detecting Coronary Heart Disease ElaproluSai Prasanna, T. Anuradha, Vara Swetha, and Jangam Pragathi
Abstract Heart disease is on the rise, among addition to affecting the elderly heart disease is becoming more prevalent in young people. Not only India, but the entire world is impacted by this disease. A recent study found that at least 500,000 open heart surgeries are carried out annually and that the main causes of heart disease include high blood pressure, smoking, inactivity, etc. Diagnosing the heart disease early on leads to potentially avoid a serious issue. Many researchers are working on machine learning techniques for early diagnosis of disease. Non-invasive diagnose methods are used for diagnosing heart disease without opening the skin. This paper proposes a machine learning methods to accurately predict whether a person having heart disease or not. Supervised machine learning techniques were used for prediction and Django Framework for designing web page. An easy to use user interface was designed by using Django Framework as front end, where we can check the status of having coronary heart disease. At the back end, machine learning algorithms will predict based on the data entered by the user. Our approach attained a maximum of 98% accuracy.
20.1 Introduction Heart is one of the most important human organs. The main function of the heart is to move blood throughout the body and to maintain blood pressure. If the heart doesn’t work correctly, then the humans will suffer from many diseases, and sometimes they may get heart attack which leads to death. Nowadays, heart related diseases are increasing rapidly day by day not only old people of age 50 and above even the adults and the children are suffering from heart disease. According to recent survey, at least 500,000 open heart surgeries have been performed per year. Not only in India
E. Prasanna · T. Anuradha (B) · V. Swetha · J. Pragathi Department of Information Technology, Velagapudi Ramakrishna Siddhartha Engineering College, Vijayawada, Andhra Pradesh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_20
247
248
E. Prasanna et al.
is entire world suffering from these disease. As per the recent study by WHO, heart diseases are increasing. 17.9 million people die every year due to this [1]. The two types of diagnostic methods of diseases are invasive and non-invasive. The instruments are used to cut the skin membrane, and connective tissues will come under invasive method. Logistic regression, Support Vector Machine, K-means, etc., are some of the non-invasive machine learning methods. If we can predict the coronary heart disease in early stages, then we may avoid the major problem. Machine learning is one of the most popular technologies which is used in many real-time applications like health care, agriculture, etc. By using machine learning algorithms, we can predict whether a person may suffer from this disease or not. We apply machine learning techniques on a dataset which have 14 attributes to predict the coronary heart disease accurately.
20.2 Literature Review Authors in [2] proposed a system for heart disease prediction in which they applied KNN, DT, LR and SVM by using a UCI repository dataset. Here they have used Anaconda notebook and achieved the highest accuracy in KNN. Authors in [3] have used the dataset that consists of irrelevant features, and it contains 14 attributes. By using the deep learning techniques, it has achieved an accuracy of 94.2%. Authors in [4] have proposed a prediction system in which they have applied MLP, SVM, RF, and NB out of which SVM gave better results. Authors in [5] have developed a Python-based application for healthcare research; here, they used categorical data and achieved 83% of accuracy. Authors in [6] have developed a prediction system. They have applied six different ML algorithms and achieved highest accuracy of 85% using logistic regression. Authors in [7] have proposed an application in which it can predict heart disease based on the patient’s information. In this neural networks algorithm has given the most accurate results. Authors in [8] have proposed a prediction system for heart disease. The authors have applied logistic regression and KNN algorithms. These algorithms showed good accuracy then that of the Naive Bayes algorithm. Authors in [9] have proposed a prediction system using different supervised ML algorithms, out of which Random Forest produced the highest accuracy. Authors in [10] proposed a system for prediction of cardiac disease. They utilized NB and DT both of which were commonly used for prediction. The result for decision tree has provided the best accuracy when compared with Bayesian classifier. Authors in [11] have proposed prediction of heart disease. By using ml algorithms, they developed the system. This system has provided classification performance accuracy of more than 85% Authors in [12] developed a model with Cleveland data repository and different supervised algorithms. They achieved highest accuracy with Random Forest.
20 A Comprehensive Machine Learning Approach in Detecting Coronary …
249
Authors in [13] here developed a web-based automatic prediction system. Here, they have applied eight algorithms and achieved highest accuracy in both Random Forest and decision tree. Authors in [14] have developed a prediction system using NB, KNN, DT, and RF that were used and achieved accuracy of 85%. Here, a front-end interactive system has been developed, and Python language is used that has a back end.
20.3 Design Methodology 20.3.1 Description of Dataset The dataset contains a total of 1024 instances and 14 attributes. The 14 attributes are age, sex(1,0), cp, trestbps, chol, fbs, etc. In this the 14th attribute is the target attribute which indicates only two values 0 or 1. Here, the zero represents that the patient may not have the heart disease and one represents that patient may have the heart disease. The detailed description of each attributes is given in Table 20.1. Correlation matrix of features is shown in Fig. 20.1. Figure 20.2 describes the flow of the project. Firstly, the dataset is extracted from the Kaggle which contains nearly around 1020 instances, and selection of attribute was done using some of attribute selection methods and reducing the dimensionality Table 20.1 Dataset description S. No
Column
Datatype
Description
0
age
int64
Age in years
1
Sex
int64
1 = male 0 = female
2
cp
int64
Chest pain type (0,1,2,3)
3
trestbps
int64
Resting blood pressure in mm Hg
4
chol
int64
Serum cholesterol in mg/dl
5
fbs
int64
Fasting blood sugar > 120 mg/dl(1,0)
6
restecg
int64
Resting electro cardio graphic results
7
thalanch
int64
Maximum heart rate achieved
8
exang
int64
Exercise induced angina 1 = yes 0 = no
9
oldpeak
float64
ST depression induced by exercise relative
10
slope
int64
The slope of the peak exercise ST segment
11
ca
int64
Number of major vessels
12
thal
int64
Thallium stress result
13
target
int64
Have disease or not
250
E. Prasanna et al.
Fig. 20.1 Exploratory data analysis of attributes
of the data. After dataset is imported, we need to check whether it has any noisy data or not. Noisy data includes missing values, unknown values, etc. Replacing the missing data with the respected attribute columns mean was done. For this we used different pre-processing techniques. Here, we have used the data cleaning and data integration techniques for pre-processing. In data Integration, data from the different sources is integrated into single unit. After applying the data pre-processing techniques on the data set, we split the data into testing and training data. Here, the data is divided into 70% and 30% for training and testing. Then, we applied different machine learning techniques to build the model using training data. After completing the training of the model, now it can be used to predict the unknown sample which is testing data. After the evaluation phase, we have chosen an algorithm which gives highest accuracy and used that algorithm for building the web application using Django Framework. So that user can easily predict whether he/she is having a heart disease or not. In this application, patients can enter the details of the 14 attributes to check the result. After
20 A Comprehensive Machine Learning Approach in Detecting Coronary …
251
submitting the data, the status of coronary heart disease is displayed so that user or patient can easily identify the status of having heart disease.
20.3.2 Experimental Work
Attribute Selection
Dataset
Data Pre-processing
Classification
Linear Regression
SVM
Decision tree
Random forest
Prediction
1 (May have heart disease)
0 (Not have heart disease)
Fig. 20.2 Proposed process model
20.3.3 Algorithms Used 20.3.3.1
Logistic Regression
One of the classification techniques that is based on the idea of linear regression is logistic regression. It is a technique for estimating a result’s probability based on an input variable. The logistic regression’s range is constrained between 0 and 1. The
252
E. Prasanna et al.
key difference between logistic and linear regression is this. The top bound and lower bound are 1 and 0, respectively, because the sigmoid function only returns values between 0 and 1.
20.3.4 Support Vector Machine The two groups of data points can be separated using a variety of hyper planes. Finding a plane with a maximal margin of separation between data points from the two classes is the main objective of SVM. The equation of the hyper plane w · x + b = 0, where W is the vector normal to the hyper plane. x + b > 0 says positive point and x − y < 0 says negative point.
20.3.5 Decision Tree A tree-based classification algorithm mainly depends on the concept of selecting best attribute for a node. Information Gain, Gain Ratio, and Gini Index are three forms of decision trees that are examined to determine the optimum attribute. Finally, the decision tree rules retrieved from the training data are subjected to reduced error trimming. The steps in the decision tree algorithm steps are as follows. Step-1: Begin the tree with root node S. Step-2: ∑c Find best attribute A using attribute selection measures. Information Gain −pi log2pi. = i=1 Step-3: Split S into subsets which contains possible values of A. Step-4: Generate tree node, which contains the A among all. Step-5: Repeat: Generate new decision trees using subsets of S. until: we cannot further classify the nodes.
20.3.6 Random Forest Random Forest illustrates an ensemble learning technique for classification and regression that involves the construction of several decision trees during training before predicting or categorizing the outcome. The goal of Random Forest is to combine weak leaning models into a powerful and robust leaning model. Algorithm steps are shown below. Input: D—Dataset, t—number of trees to build for i = 1 to t: Select the bootstrap Sample B from D Grow a tree from bootstrapped data by repeating following steps:
20 A Comprehensive Machine Learning Approach in Detecting Coronary …
253
1. Select r = {V 1 , V 2 ,…, V n } from B where r e B. 2. Find the Vi to split D into groups in best manner. End for
20.4 Result Analysis Random Forest performed the best, outperforming the other three. Therefore, Random Forest was used for prediction in web application. The evaluation matrix for Random Forest shows that False Negative Rate is zero. It clearly says that there is a less chance for predicting false if a person really have heart disease. Figure 20.3 shows the evaluation metrics results of various algorithms that we have used to build the machine learning model. And Fig. 20.4 shows confusion matrix of best prediction algorithm of the problem taken.
120 100 80 60 40 20 0
Random Accuracy
Decision Sensitivity
Support Vector Machine Specificity
Fig. 20.3 Comparison of algorithms with evaluation metrics
PRECISIO
Logistic FP
FN
254
E. Prasanna et al.
Fig. 20.4 Random Forest confusion matrix
20.5 Conclusion Machine learning algorithms like logistic regression, SVM, decision tree and Random Forest classifier were used for prediction and to establish a user interface for users who want to check for heart disease; a web application was built using the Python web development framework Django. Among all algorithms that we have used for prediction, the Random Forest classifier achieves highest accuracy of 98%. So, machine learning model was built by using the Random Forest algorithm.
References 1. Sharma, V., Yadav, S., Gupta, M.: Heart disease prediction using machine learning techniques. In: 2020 2nd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN), pp. 177–181 (2020). https://doi.org/10.1109/ICACCCN51052. 2020.9362842 2. Singh, A., Kumar, R.: Heart Disease prediction using machine learning algorithms. In: 2020 International Conference on Electrical and Electronics Engineering (ICE3), pp. 452–457 (2020). https://doi.org/10.1109/ICE348803.2020.9122958 3. Bharti, R., Khamparia, A., Shabaz, M., Dhiman, G., Pande, S., Singh, P.: Prediction of heart disease using a combination of machine learning and deep learning. In: Computational Intelligence and Neuroscience 2021 (2021) 4. Boukhatem, C., Youssef, H.Y., Nassif, A.B.:Heart disease prediction using machine learning. In: 2022 Advances in Science and Engineering Technology International Conferences (ASET), pp. 1–6 (2022). https://doi.org/10.1109/ASET53988.2022.9734880 5. Chang, V., Bhavani, V.R., Xu, A.Q., Hossain, M.A.: An artificial intelligence model for heart disease detection using machine learning algorithms. In: Healthcare Analytics, vol. 2, p. 100016 (2022) 6. Dwivedi, A.K.: Performance evaluation of different machine learning techniques for prediction of heart disease. Neural Comput. Appl. 29(10), 685–693 (2018)
20 A Comprehensive Machine Learning Approach in Detecting Coronary …
255
7. Gavhane, A., Kokkula, A., Pandya, I., Devadkar, K.: Prediction of heart disease using machine learning In: 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA), pp. 1275–1278. IEEE (2018) 8. Jindal, H., Agrawal, A., Khera, R., Jain, R., Nagrath, P.: Heart disease prediction using machine learning algorithms. In: IOP Conference Series: Materials Science and Engineering, vol. 1022, no. 1, p. 012072. IOP Publishing (2021) 9. Likitha, K.N., Nethravathi, R., Nithyashree, K., RitikaKumari, Sridhar, N., Venkateswaran, K.: Heart disease detection using machine learning technique. In: 2021 Second International Conference on Electronics and Sustainable Communication Systems (ICESC), pp. 1738–1743. IEEE (2021) 10. Lutimath, N.M., Chethan, C., Pol, B.S.: Prediction of heart disease using machine learning. Int. J. Recent Technol. Eng. 8(2), 474–477 (2019) 11. Nikhar, S., Karandikar, A.M.: Prediction of heart disease using machine learning algorithms. Int. J. Adv. Eng. Manage. Sci. 2(6), 239484. 12. Otoom, A.F., Abdallah, E.E., Kilani, Y., Kefaye, A., Ashour, M.: Effective diagnosis and monitoring of heart disease. Int. J. Softw. Eng. Appl. 9(1), 143–156 (2015) 13. Padmaja, B., Srinidhi, C., Sindhu, K., Vanaja, K., Deepika, N.M., Krishna Rao Patro, E.: Early and accurate prediction of heart disease using machine learning model. Turkish J. Comput. Math. Educ. (TURCOMAT) 12(6), 4516–4528 (2021) 14. Rahman, M.M., Rana, M.R., Alam, M.N.A., Khan, M.S.I., Uddin, K.M.M.: A web-based heart disease prediction system using machine learning algorithms. Netw. Biol. 12(2), 64–80 (2022)
Chapter 21
Evaluation of Crime Against Women Through Modern Data Visualization Techniques for Better Understanding of Alarming Circumstances Across India P. Chandana, Kotha Sita Kumari, Ch. Devi Likhitha, and Sk. Shahin
Abstract Crime aimed at women has turned into a major human rights concern that stands in the way of freedom and equality. Crime against women has a strong influence over their lives and affects women of all ages, races, locations, and religions. Studies show that women are most vulnerable to known guys. Based on a technique, material from other published sources, such as books, papers, and reports from governmental and non-governmental organizations, is added to data from NCRB reports. This tries to look at several types of crimes against women. Additionally, it tries to theoretically discuss concepts and links connected to the law, crime, and women. The main problem in any nation, state, or district is crime or violence against women. In order to control the problem, pertinent and timely information must be extracted. Investigating and identifying crime as well as the connections between various offenders and crime leaders is the definition of crime analysis.
21.1 Introduction 21.1.1 Preface The current rate of crime is a threat to our nation. Every day, horrific kidnappings, rapes, and murders make the news [1]. The majority of the laws that the government enacted to lower the rate of criminal activity in order to improve the stability of the nation did not produce the expected results [2]. Therefore, criminal data analysis can be utilized to reduce wrongdoing.
P. Chandana · K. Sita Kumari (B) · Ch. Devi Likhitha · Sk. Shahin Department of Information Technology, Velagapudi Ramakrishna Siddhartha Engineering College, Vijayawada, Andhra Pradesh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_21
257
258
P. Chandana et al.
21.1.2 Origin of the Problem Domestic violence is a serious social problem not just in India but all throughout the world [2]. Women still have to put up with violence from their husbands and other male members of society, despite the fact that they are educated and work in respectable positions [3]. Despite being a worldwide epidemic, violence against women is still a taboo subject. The most frequent violations of human rights are still domestic violence and violence against women. India, a nation where women are revered as goddesses, is painted in a terrifying light by the data from the National Crime Records Bureau [4]. The majority of women between the ages of 18 and 30 are weak and are not even safe in their own houses because they are preyed upon by husbands, family members, or friends [3]. Evil societal customs like dowry and genital mutilation persist despite several laws and regulations. More than a third of the reported incidents involve abuse by husbands or family members. According to NCRB data, the rate has increased by more than 15% [1]. Reiterated that sexual assault is a threat to safety and security. Women are excluded from democracy, peacemaking, post-conflict reconstruction, and racial healing processes [5]. As a result of its social acceptance, it can be utilized as a weapon of war long after the final shot has been fired [6]. Rape causes numerous women to lose their health, livelihoods, husbands, and support system [7]. This can prevent the dissemination of shared beliefs to future generations by uprooting the pillars that hold them in place [8]. Children who witness rape may grow up with a normative perception of it. This loop must be broken because we cannot sustain a selective zero-tolerance approach.
21.1.3 Problem Statement with Objectives 21.1.3.1
Problem Statement
Our country is in danger due to the current crime rate. Brutal murders, rapes, and kidnappings are covered in the major publications on a daily basis [7]. The government established numerous legislations in an effort to lower crime and boost stability, but most of them had little to no impact on either. Consequently, crime data analysis can be used to lower inappropriate behaviors.
21.1.3.2
Objectives
To make the justice system more effective and assist law enforcement in providing security by identifying the locations with a greater incidence of crime.
21 Evaluation of Crime Against Women Through Modern Data …
21.1.3.3
259
Societal Impact
People are socializing themselves for the purpose of transgressions and crimes. In this way, risk evaluation is aided by crime data analysis. Due to the fact that this facilitates criminal investigation, crime rates are decreased. Discourages socializing and urges them to adopt security precautions.
21.1.4 Literature Review S. No
Study
Method used
Tasks
1
Ozal (2012)
Association rule mining
Analysis of crime Processing time data using and visualization patterns of crime of the model weren’t considered
Description
2
Ester (2014)
Based on statistics
Analysis of the data on related crime
3
Bagula (2015)
The process of clustering Data clustering and similarity searches
Produces erroneous outcomes when data is noisy and there are few clusters
4
Chang (2006)
The technique of supervised learning
Analysis and classification of crime data
Prediction accuracy and model effectiveness weren’t taken into consideration
5
Ranathunga (2017)
Hybrid methods
Exploiting and using relationships between crime scenes and crime data to create crime prediction models
The system was unable to forecast the types of crimes or the times when they would occur. Additionally predicting crime is virtually impossible when there is not enough data to feed the algorithms
Data collection is not possible and processing speed is not an issue
(continued)
260
P. Chandana et al.
(continued) S. No
Study
Method used
6
Tayal (2015)
The process of clustering Data clusterings
Tasks
Description Produces unreliable findings when using noisy data
7
Paek (2012)
The technique of supervised learning
Data on criminal analysis and classification
The effectiveness of the model and prediction accuracy were not taken into account
8
Lee (2009)
Association rule mining
Using crime patterns to analyze crime data
The model’s visualization and processing time were not taken into account
9
Byun (2014)
Hybrid methods
Leveraging and utilizing the connections between crime data and crime sites to produce crime prediction models
The algorithm was unable to predict the kinds of crimes that would occur or the times that they would take place
10
Huang (2016)
Based on statistics
Analysis of the related criminal data
Both data collection and processing speed are not feasible
21.2 Preliminaries 21.2.1 Dataset Description This analysis effort utilized the caw.csv dataset, which comprises ten attributes. The answer variable is the dataset’s final attribute, while the first eight variables are used for analysis. Our dataset contains characteristics that fall under the numerical and other categories. The National Crime Records Bureau, Department of States, and Ministry of Home Affairs provided the information that was used to compile the dataset (NCRB). The data is divided down into numerous criminal heads and related to crime against women in each State and UT. The different crime heads covered are 1. Kidnapping and abduction of women 2. Rape
21 Evaluation of Crime Against Women Through Modern Data …
3. 4. 5. 6. 7.
261
Deaths caused due to dowry Assault on women with intention of upsetting their modesty Insulting women’s modesty Husband or his relative’s cruelty Girls importation.
21.3 Architecture Diagram See Fig. 21.1.
21.3.1 Description of Architecture Diagram The information used to create the dataset was gathered from a variety of data sources, including the National Crime Records Bureau, the Department of States, and the Ministry of Home Affairs (NCRB). The data is analyzed by regression analysis. Following analysis, visualization is carried out using the Tableau visualization tool in the form of maps, graphs, and charts as live analysis.
Fig. 21.1 Architecture diagram
262
P. Chandana et al.
21.4 Proposed Methodology Four major steps are performed, they are • • • •
Collection of data Manipulation of data Analysis of data Data visualization.
Step 1: Data Collection We gather data during the data gathering stage from a variety of sources, including blogs, news websites, social media, etc. [9]. Name, address, and other information are included in the data collected. And a database is used to store the collected data for further processing. Because crime data is structured, the number of fields, content, and size of the document will all be the same, which will significantly increase the high level of effectiveness [10]. Step2: Data Manipulation Following data collection, we execute data modification, adding the criminal’s information as a new entry. This information is then automatically kept in the database and can be deleted as well as used for future purposes. The criminal data can be sorted and searched first and foremost by various criteria or by alphabetical order. Step 3: Data Analysis Data about crime can be analyzed using regression analysis. A statistical method known as regression analysis is used to find correlations between various variables, such as the frequency of crime and demographic parameters [11]. It can assist in finding data trends that can be used to guide policy choices or pinpoint areas that require more research. Regression analysis can be used to investigate how various variables affect crime rates. Regression analysis can be used, for instance, to find out whether poverty and crime rates are related, or whether particular demographic traits are linked to greater crime rates [12]. The analysis’ findings can be used to create plans for lowering crime in certain places or for focusing prevention efforts on particular groups of people. The general formula for regression analysis is given by
21 Evaluation of Crime Against Women Through Modern Data …
263
where Y i = Dependent Variable X 1 , X 2 , X n = Independent Variable B0 = Intercept B1 , B2 = Coefficients i = No. of observations. Step 4: Data Visualization Visualization-1: See Fig. 21.2. This is a study of crime incidences broken down by state. Different states are referenced on the x-axis in the study, and the number of crime events is referenced on the y-axis. Visualization-2: See Fig. 21.3. It is the state-wise packed bubble chart with the attribute assault on women. Here, the size of the bubble denotes the severity of the offense. The state name, the year, and the number of cases are displayed when the cursor is placed over bubble. Visualization-3: See Fig. 21.4.
Fig. 21.2 Vertical bar chart visualization
264
P. Chandana et al.
Fig. 21.3 Bubble chart visualization
Fig. 21.4 Map visualization
The property is rape, and it is a map visualization. It shows a comparison of rape cases across different states. Visualization-4: See Fig. 21.5.
21 Evaluation of Crime Against Women Through Modern Data …
265
Fig. 21.5 Horizontal bar chart visualization
It is a horizontal bar chart with the attribute fidnapping and abduction over time. Visualization-5: See Fig. 21.6. It is a line graph, and the attribute is insult to modesty of women over year-wise.
Fig. 21.6 Line graph visualization
266
P. Chandana et al.
21.5 Conclusion and Conflicts of Interests 21.5.1 Conclusion The government agencies, including the police and other security personnel, will benefit from the proposed work [13]. The proposed effort would assist the officers in doing an analysis of the criminals and identifying those on the list who had the greatest likelihood of committing the crime. This will save time and may hasten the resolution of the case.
21.5.2 Conflicts of Interests For the writers, there are no apparent conflicts of interest. Each co-author has examined and approved the manuscript’s content, and none of the authors has any competing financial interests to mention. We certify that the contribution is original and that another publisher is not actively considering it.
References 1. Bruce, A., Bruce, P.C., Gedeck, P.: Practical Statistics for Data Scientists, 2nd edn, 29 June 2020 2. Granville, V.: Developing Analytical Talent, 1st edn. Wiley, 9 May 2014 3. Cole Nussbaumer Knaflic, Storytelling with Data. Wiley, vol. I, 7 Oct 2015 4. Jayaweera, I., Sajeewa, C., Liyanage, S., Wijewardane, T., Perera, I.: Crime Analytics: Analysis of Crimes Through Newspaper Articles, Moratuwa Engineering Research Conference (MERCon) (2015) 5. Kureja, G., Mav, S., Vaswani, A., Pathak, p.: Crime data analysis. IEEE Transl. J. Magn. Jpn. 2, 740–741, August 1987 [Digests 9th Annual Conf. Magnetics Japan, p. 301, 1982] 6. Bachner, J.: Predictive Policing: Preventing Crime with Data and Analytics. IBM Center for the Business of Government, Washington, DC (2013) 7. KS: Author, Definition and Types of Crime Analysis. International Association of Criminalists. Definition and Types of Crime Analysis [White Paper 2014-02] (2014) 8. Santos, R.: The effectiveness of crime analysis for crime reduction:cure or diagnosis? J. Contemp. Criminal Justice 30, 147–169 (2014) 9. Silver, M., Watters, J.C.: Criminal Analytics: Understanding and Preventing Crime Using Data Analysis (2017) 10. Jayaweera, I., Sajeewa, C., Liyanage, S., Wijewardane, T., Perera, I.: Crime Analytics: Analysis of Crimes Through Newspaper Articles, Moratuwa Engineering Research Conference (MERCon) (2015) 11. KS: Author, Definition and Types of Crime Analysis. International Association of Criminalists. Definition and Types of Crime Analysis [White Paper 2014-02] (2014) 12. Al-Ahmadi, A.A., Abdelaziz, Y.A.: Data Analytics for Intelligent Crime Detection and Prevention: New Technologies for Advancing Crime Analysis (2021) 13. Barton-Bellessa, S.M., Miller, K.S.: The Handbook of Crime and Criminality (2018)
Chapter 22
Artificial Neural Network-Based Malware Detection Model Among Shopping Apps to Increase the App Security N. Manasa, Kotha Sita Kumari, P. Bhavya Sri, and D. Sadhrusya
Abstract Even though there are many reliable online merchants these days, scammers can still prey on unsuspecting clients by using the Internet’s anonymity. Using the most latest technologies, scammers produce fraudulent retailer apps that mimic genuine online retail sites. They might utilize complex patterns and designs. Numerous of these applications provide exceptionally inexpensive prices on premium goods including renowned jewelry, clothing, and technology companies. Sometimes you might receive what you paid for, but it might be a fake or you might not receive anything at all. Money orders, pre-loaded cards, and wire transfers are common payment methods requested by con artists, but it is doubtful that you will receive your order or see your money again if you send it this way. Con artists frequently ask for money orders, pre-loaded cards, and wire transfers as payment, but it is unlikely that you will receive your order or see your money again if you send it this way. Social networking sites are now being used to establish fake online storefronts in a more recent iteration of online retail scams. They briefly launch the company and frequently sell counterfeit trademark jewelry or clothing. Several sales later, the stores are gone. Because they also use social media to promote their fake app, we shouldn’t believe something just because we have seen it advertised or shared there. There is a need to mark these submissions. The experiment’s fundamental pillars are three dimensions: feature extraction, selection, and malware prediction.
22.1 Introduction The Android application (app) industry has also recently seen the release of a significant amount of Android malware. As stated by 1,898,731, new Android malware applications were discovered in 2016, according to Tencent Mobile Security Lab. N. Manasa · K. Sita Kumari (B) · P. Bhavya Sri · D. Sadhrusya Department of Information Technology, Velagapudi Ramakrishna Siddhartha Engineering College, Vijayawada, Andhra Pradesh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_22
267
268
N. Manasa et al.
This work is distributed and used without restriction thanks to a Creative Commons Attribution 4.0 International License. These dangerous software applications are employed by attackers to harm user systems, breach user privacy, fraudulently deduct money, etc., posing a major risk to the security of user data and property. There have been several methods for detecting Android malware that can be divided into two categories: dynamic detection and static detection, respectively. These methods, nevertheless, rely on the capacity to identify harmful activity during while offering the ideal setting to launch harmful software. In order to find malware, detection includes analyzing application features. These methods can locate apps that stop malware before the software is even installed. Through reverse engineering, to extract static features, we can get associated files from the Android application package (APK). Contrary to the process, extracting static features is a rather quick and efficient method of acquiring dynamic features. A lot of application information is typically required for the extraction of static features, such as intentions, opcodes, permissions, and API requests. These data can be used to further extract the relevant feature data that the application functions need.
22.2 Related Works The following are the many earlier works that were created by the researchers and are currently available. They provided several recommendations for ways to identify malware and fraud in apps using various types of methodologies. Smartphone Market Share [1]. Total of Android applications on Google Play Store [2]. Android malware detection using machine learning based on identification of significant permissions [3]. Android security: a survey, issues, and taxonomy [4]. A software journal based on the development of Android security research and progress [5]. An overview of machine learning approaches for Android malware detection [6]. Hybrid analysis for Android malware detection (HAAMD) [7]. A description or an overview of software flaws, malware, and attacks that affects the cellphone security [8]. A cutting-edge analysis of malware detection strategies that makes use of data mining techniques [9]. There is a survey paper on permission-based malware detection in Android applications; however, it only focuses on five approaches and the presented work provides very limited knowledge. A similar survey paper exists; however, 13 papers were explained very briefly in this survey study, and an in-depth analysis was not performed.
22 Artificial Neural Network-Based Malware Detection Model Among …
269
22.3 Malware in Apps Researchers have warned that over two million Android users have downloaded a number of harmful apps that got past security measures to enter the Google Play app store. After installation, the apps serve up dangerous adverts that might lead directly to malware while using cunning strategies to hide themselves from the user and prevent being deleted. Researchers have warned that over two million Android users have downloaded a number of harmful apps that got past security measures to enter the Google Play app store. Because they only establish connections to the servers where they receive the malicious download after they have been installed on the user’s device, malware-filled apps frequently appear clean enough to get past app store security measures.
22.4 Machine Learning Machine learning, a cutting-edge field of research, enables computers to learn on their own using past data. In order to build mathematical models and generate predictions based on previously collected data or information, machine learning employs a range of methodologies. Some claim that the field of artificial intelligence known as “machine learning” focuses mostly on developing algorithms that let computers independently learn from data and past experiences features. Machine learning uses data to identify various patterns in a dataset. Through the use of historical data, it can automatically improve. The engine of this technology is data. Because they both use enormous volumes of data, data mining and machine learning are quite similar. Machine learning starts with observation or data, including examples, firsthand experience, or instructions. To later derive conclusions from the supplied instances, it looks for patterns in the data. The primary objective of ML is to enable computers to learn independently, without assistance from humans, and to modify their behavior accordingly.
22.5 Proposed Architecture The dataset gathered from various online sources is first pre-processed and then dataset partitioning as train and validation datasets. Constructing a sequential model with TensorFlow and Keras (Artificial Neural Network). It was trained using our training dataset. We are now validating our model with test data, also known as the validation dataset. The accuracy of our model is then calculated. Finally, we put our model to the test on new data to predict the malware. Initially, APK file uploaded as input. Determine whether the APK file is malicious or safe as output (safe).
270
N. Manasa et al.
Step 1: Open the CSV data set containing the various features. Step 2: Extract the APK file’s characteristics, including its permissions. Step 3: Assess before selecting an optimum feature. Step 4: Use a genetic algorithm to produce optimum feature attributes using these features. Step 5: Using optimized features, train the neural network and support vector classifier. Step 6: Create a model Step 7: Apply the neural network’s normalized feature vector in accordance with the user’s chosen option. Step 8: Using the output neural network result to determine whether an APK file is safe or malicious. Feature Extraction: The dimensionality reduction method, which divides and condenses a starting set of raw data into smaller easier-to-manage grouping, includes feature extraction. As a result, processing will be simpler. Any application or file that purposefully hurts a computer, a network, or server is known as malware, or malicious software.
Fig. 22.1 System architecture
22 Artificial Neural Network-Based Malware Detection Model Among …
271
22.6 Architecture Description: The input will be the APK file of the corresponding app, and all the features and permissions of the app will be there in the Android Package Kit (APK) file. All those features are extracted and considered as vectors. The neural networks algorithm will be applied to the resulting data. The resulting data will be finally compared with the trained model, i.e., dataset which contains the class variable that denotes the malware or benign sign of the particular app.
22.7 Methodology 22.7.1 Dataset The considered dataset is fetched from kaggle website. The description about the dataset is furnished below: Therefore, the dataset contains 436 attributes, whereas the last attribute is the class variable that indicates whether the record is malware or benign. A training dataset and testing dataset are created from the original dataset. Various techniques are applied to the data and the implementation of deep learning techniques is the next step after data preprocessing. The main attributes of the dataset includes some permissions that an app asks after installations, some of them are mentioned below: 1. 2. 3. 4.
Camera access Cache file system Access all downloads Access location.
Fig. 22.2 Dataset description
272
N. Manasa et al.
22.7.2 Classification Models 22.7.2.1
Artificial Neural Network (ANN)
They process each record individually and gain knowledge by contrasting their classification of the record—which is mostly arbitrary—with the actual classification of the record as it is known. The independent variables, along with the weights and bias terms (or intercept) associated with each neuron, are all linearly combined to form the neural network equation. The neural network equation appears as follows: Z = Bias + W 1X 1, W 2X 2, W n X n, . . . + W n X n, where The above graphical representation of ANN is denoted by the letter Z. W = Beta coefficients or weights, X = Independent variables, Bias = Inverse or bias Inverse or bias = W 0 There are three actions in a neural network: Step 1: We begin by using the input variables and the aforementioned linear combination equation, Z = W 0 + W 1X1 + W 2X2 +…+ WnXn, to compute the output or predicted Y values, known as the Y pred. Step 2: Calculate the error term or loss in step two. The error term represents the difference between actual and anticipated values. Step 3: Reduce the error term or loss function in step three. Step 4: Weight matrices representing each layer’s outputs in neural networks are obtained. 22.7.2.2
Genetic Algorithm (GA)
Genetic Algorithm: The following phases can be used to outline the feature selection process using genetic algorithms: Step 1: Use feature subsets that are binary encoded to initialize the algorithm. If a feature is present in the chromosome, it is represented by a 1; otherwise, it is absent. Step 2: Launch the algorithm after defining a starting population that was produced at random. Step 3: Assign a fitness score based on the genetic algorithm’s established fitness function. Step 4: Choosing the parents: In order to create the next generation of offspring, chromosomes with high fitness scores are prioritized over those with lower scores. Step 5: Carry out crossover and mutation operations on the chosen parents with the specified crossover and mutation probabilities for offspring generation.
22 Artificial Neural Network-Based Malware Detection Model Among …
22.8 Performance Evaluation Outcomes:
Fig. 22.3 Safe app. predicted using neural network algorithm
Fig. 22.4 Fraud app. predicted using neural network algorithm
273
274
N. Manasa et al.
22.9 Result Analysis The accuracy score in machine learning is a measurement statistic that compares the proportion of accurate predictions made by a model to all predictions made. We determine it by dividing the total number of forecasts by the number of correct guesses. The amount of neurons in the hidden layer affects how well the ANN predicts outcomes; although this isn’t always the case, studies have shown that having more neurons improves how closely model outputs resemble training data. To represent a dataset using learning methods, the ANN model is trained.
22.9.1 Accuracy We got 97.3 % accuracy.
22.10 Conclusion It is crucial to develop a framework that can accurately detect malware in those shopping apps because there are more dangers being posed to Android shopping platforms every day, most of which spread through malicious applications or malware. Machine learning-based approaches are utilized when signature-based approaches are unable to detect new malware variants that pose zero-day threats. In order to obtain the most optimized feature subset that can be used to train machine learning algorithms in the most effective manner, the proposed methodology makes use of an evolutionary genetic algorithm. Using Neural Network Classifiers (ANN), it is possible to retain a respectable classification accuracy while dealing with lower dimension feature sets, which lowers the classifiers’ training complexity. The model created may be scaled to different malware applications, adjusted as needed with fresh samples, and serves as a trustworthy manual for Android cyber security specialists to analyze harmful APKs more quickly. When used with genetic algorithm, future work can be improved by analyzing the impact on other machine learning methods and using larger datasets for better results.
22 Artificial Neural Network-Based Malware Detection Model Among …
275
References 1. Market Share of Smartphones [Online]. [Fetched: April 30 2020] 2. Aggregate of Android applications on Google Play [Online]. [Fetched: 30 April 2020] 3. Li, J., Sun, L., Yan, Q., Li, Z., Srisa-an, W., Ye, H.: Machine-learning-based Android malware detection based on significant permission identification. IEEE Trans. Ind. Inform. 14(7), 3216– 3225 (2018) 4. Sufatrio, Tan, D.J.J., Chua, T.W., Thing, V.L.L.: Securing android: a survey, problems, and taxonomy. ACM Comput. Surv. 47(4), 58(45pp) (2015) 5. Qing, S.H.: Android security research progress. Softw. J. 27(1), 45–71 (2016) 6. Lopes, J., Serrao, C., Nunes, L., Almeida, A., Oliveira, J.: An outline of machine learning algorithms for malware identification in android. In: 7th International Symposium on Digital Forensics and Security (ISDFS), Barcelos, Portugal, pp. 1–6 (2019) 7. Choudhary, M., Kishore, B.: HAAMD: hybrid analysis for android malware detection. In: Activities of the 2018 International Conference on Computer Communications and Informatics (ICCCI), Coimbatore, India, pp. 1–4 (2018) 8. Ahvanooey, M.T., Li, Q., Rabbani, M., Rajput, A.R.: An overview on cellphones security: software vulnerabilities, malware, and strikes. Int. J. Adv. Comput. Sci. Appl. 8(10), 30–45 (2017) 9. Souri, A., Hosseini, R.: A state-of-the-art study of the methods of malware detection utilizing the techniques of data mining. Human Cent. Comput. Inf. Sci. 8
Chapter 23
A Satellite-Based Rainfall Prediction Model Using Convolution Neural Networks T. Lakshmi Sujitha, T. Anuradha, and G. Akshitha
Abstract Agriculture, transportation, forestry, and tourism are just a few of the human efforts that are impacted by rainfall, which is a climatic factor. As Rainfall is the climatic element that governments have all voiced worry about how challenging it is to predict when it will rain. Because it is most frequently associated with undesirable natural events including landslides, flooding, mass movements, and avalanches, rainfall prediction is essential in this respect. These events have had a lasting effect on society. Therefore, by implementing a reliable method for predicting rainfall, it is feasible to take precautionary measures for these natural occurrences. Because meteorological systems are not consistently recognized across time, predicting rainfall is difficult. This study forecasts the rainfall using deep learning techniques. As an outcome, in this research, we propose a model for forecasting the rate of rainfall using the convolution neural network (CNN) technique. As a consequence, we were able to generate rainfall model with an average root mean squared error of 2.82% and a standardized mean absolute error of 2.43%. Our method is unique in that we have developed a neural network-based model that uses cyclone datasets to forecast the rate of precipitation.
23.1 Introduction For applications requiring the management and planning of water resources, rainfall prediction is a crucial component. Over the years, there have been several attempts to collect rainwater. Previously, many of the papers have used radar dataset [1, T. Lakshmi Sujitha (B) · T. Anuradha · G. Akshitha Department of Information Technology, Velagapudi Ramakrishna Siddhartha Engineering College, Vijayawada, Andhra Pradesh, India e-mail: [email protected] T. Anuradha e-mail: [email protected] G. Akshitha e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_23
277
278
T. Lakshmi Sujitha et al.
2] and satellite dataset [3]. One of the most challenging areas of meteorology is weather prediction, which is the prediction for brief period of time. This is because severe weather events are occurring more often in many regions of the world. Operational meteorologists must employ weather radar and satellite as essential nowcasting instruments [4]. To precisely forecast precipitation in an area and take precautionary action against natural occurrences like heavy rainfall, meteorologists employ various trustworthy rainfall forecasting approaches and techniques. In the study, a deep learning model is developed based on meteorological data captured by nation’s monitoring stations that is used to create the capacity to predict rainfall [5]. l When it comes to managing agricultural fields and taking early safeguards that might lessen risks to life and property, forecasting and planning future when it comes to the local climate is a big advantage. The success of the agriculture is reliant on rainfall. Additionally, it helps to conserve water resources. Previous rainfall data has aided farmers in crop management, leading to improved economic growth in the nation. In forecasting, specific criteria that evaluate a set of input data are used to estimate future instances. It is a fundamental element of predictive modelling, a subcategory of big data that evaluates effort, usefulness, and changes to use both present and old data [6]. Rain rate forecasting using satellite images is a challenging task. Precipitation is an environmental factor that has an impact on a wide variety of urbanization, like forestry, construction, agriculture, and tourism, among others. Using satellite image analysis gives early warning and forecasting for the prevention and mitigation of disasters. The major goal of the prediction models is to execute a multi-output casting from satellite pictures as data while projecting rainfall data. For rainfall nowcasting, a convolutional neural network design incorporating several fundamental rainfall factors, such as maximum and lowest temperature, humidity, radiation, precipitation, and wind speed, may be helpful.
23.2 Review of Literature In [4], authors used deep knowledge approach to satellite knowledge study, and grown a DeePSat and an elementary convolutional interconnected system (CNN) for fast forecasting of subsidiary data that maybe valuable for of or in the atmosphere forecasting. Authors establish that immediately the composed results are analysed and distinguished accompanying those of identical approaches. The effectiveness of the DeePSat model presented a usual normalized average of alterations from the mean of 3.84% for all geospatial conclusions outputs. In [5], authors used the artificial neural network employing forward and backward propagation, Ada Boost, Gradient Boosting, and XGBoost algorithms for forecasting the rainfall in the paper prediction using machine learning and deep learning algorithm. It was found that, compared to boosting techniques, artificial neural networks implementing forward and backward propagation performed better. It is more suited to forecast the rain for tomorrow.
23 A Satellite-Based Rainfall Prediction Model Using Convolution Neural …
279
In [6], the authors of this study suggest a Long Short-Term Memory (LSTM)based prediction model that can predict Jimma’s daily rainfall using deep learning. The dataset was gathered in the Oromia area of southwest Ethiopia. Data from 1985 to 2012 are used for train the model, while data from 2013 to the first half of 2015 were used to verify it. It was found that LSTM-based rainfall predictive models are useful for a range of applications that call for rainfall prediction, including smart agriculture. In [7], predicting rain from satellite images was done. This work correctly forecasted precipitation in regions from Meteomatics, which provides an accessible API that enables us to easily obtain training and ground-truth data where data is scarce and they must rely on satellite pictures. Lightly’s score set technique is used to diversify the dataset with increased validation accuracy while cutting training time by around half. In [8], the modified BP-NN algorithm provides the basis for the short-term rainfall forecast model employed in this study. Using a variety of atmospheric factors and the enhanced Back Propagation Neural Network (BP-NN) method, they concentrated on forecasting rainfall. They discovered that the suggested model performed well, with a TFR is greater than 96% and an FFR of nearly 40%. The projected approach raised TFR by around 10%, even though FFR was in keeping with prior research. In [9], an invention for predicting short-term precipitation utilizing multi-task convolutional method was used. In this study, the knowledge problem is planned as an end-to-end multi-spot interconnected system model, allowing us to administer the information we have captured from individual site to added compared sites and model the connections between them. They so submitted that the composite model can offer a useful displaying design to capture the 119 nonlinear situations of monthly storm order, with contribution more accurate predicting results. In [10], Authors applied a machine intelligence approach and employed a Convolutional 3D GRU (Conv3D-GRU) model to predict short-term rainfall intensity. This model was evaluated by forecasting sonar echo patterns for the next few hours, precisely estimating temporary precipitation. The model effectively extracts both spatial and temporal features from sonar echo maps while reducing discrepancies between predicted and actual precipitation values. In [11], a comparison of modern machine learning techniques for time-series rainfall forecasting, Bidirectional-LSTM networks were proposed as a promising tool for predicting precipitation. The researchers established assessment metrics to measure the performance of the trained rainfall prediction model and evaluated the performance of prediction models based on the best hyperparameter search results. In [12] In this Study, titled ‘Rainfall Estimate from Remote Data Using Artificial Neural Networks and Long Short-Term Memory (LSTM)’ (abbreviated as PERSIANN), rainfall estimates are projected. This study covers over three districts in the USA. The storm forecasts from the foundation are compared and integrated with the PERSIANN format, along with the mathematical model produced from the initial Rapid Refresh redundancy (RAPv1.0). In [3], Authors have proposed the use of the Inception-v3 model where The dataset comprises satellite pictures from October 2017. Two approaches were employed for
280
T. Lakshmi Sujitha et al.
Fig. 23.1 System architecture
training the final fully connected layer: one involved running it through a pre-trained model, and the other was built from scratch, with all layers of Inception-v3 being trained.
23.3 Proposed Work 23.3.1 Architecture Diagram Deep neural networks (CNNs) are a type of neural networks in which neurons have weights and biases. Figure 23.1 shows the proposed architecture. Each neuron enters information into the system and carries out nonlinear spot product function. Because the architecture also functions with multi-dimensional input, it varies from other neural networks in this respect. A picture is sent to CNN as input, where it is classified and processed in accordance with a certain category. A computer interprets a picture as a collection of pixels, the size of which depends on the resolution of the image. It will appear as x * y * z based on image resolution, where x = height, y = width, and z = dimension.
23.3.2 Description of Datasets and Tools 23.3.2.1
Dataset
The dataset that we collected includes of photos from various classifications and image categories. The input image in Fig. 23.2 provided to the data for training and validation is separated out from the dataset. Utilizing training data, the dataset will be trained before applying the model. The dataset that we collected in kaggle (https://www.kaggle.com/sshubam/datasets) includes of photos from various classi-
23 A Satellite-Based Rainfall Prediction Model Using Convolution Neural …
281
Fig. 23.2 Input image
fications and image categories. Image dataset that contains every INSAT3D captured INFRARED and RAW Cyclone Imagery over through the Indian Ocean spanning 2012 to 2021, as well as the strength of each Cyclone Image in KNOTS. Marked each image by correlating the timestamp with its respective position within every cyclone directory’s frequency graph. There are 512 images in JPEG format.
23.3.3 Design Methodology 23.3.3.1
Convolution Neural Network
CNN with architecture shown in Fig. 23.3 is used with the convolution layer that comprises input shape of 512 * 512 * 3 where 512 is the image shape and 3 is the kernel size. The model consists of five convolution 2D layers and max pooling layer. Max pooling operation is used to extract the maximum value from a specific region of convolutional layer. This region is typically defined by a small kernel. To each feature map, add a pooling layer that converts the input picture of dimensions. It consists of four max pooling layers. The fully connected layer that is dense are of three layers. The network gains nonlinearity suitable way. All hidden layers include ReLU activation features, which give the network more nonlinearity. The ReLU, Softmax, tanH, and sigmoid functions are any models of continually used activation functions. Each of these functions has the use.
282
T. Lakshmi Sujitha et al.
Fig. 23.3 CNN architecture
23.3.3.2
Description of Algorithms
The working of CNN algorithm is explained in below steps: (1) Apply filters to input to create a feature map. (2) Add a pooling layer to each feature map converting the input image of dimensions. (3) Using the Batch Normalization to speed up the training data and accuracy. (4) Enters the vector into a neural network that is completely linked. (5) This layer’s goal is to predict the input test data using the features from the learned data. (6) Applying a ReLU function to increase nonlinearity f(x) (7) Flattens the pooled images into one long vector (8) End (9) Return the trained CNN. 23.3.3.3
Evaluation Metrics
ReLU: The most popular activation function in deep learning models is the Rectified Linear Unit (Eq. 23.1). The algorithm accepts 0 if it takes any lower performance, whereas it tries to return xx if it lasts any constructive input. Hence, might even intended as f (x) = max(0, x).
(23.1)
Softmax: The Softmax function (Eq. 23.2) will return 0 if it maintains the negative value, even as it returns xx if it manages to survive any constructive feedback. x is employed as the activation function in issues involving many classes of categorization
23 A Satellite-Based Rainfall Prediction Model Using Convolution Neural …
283
when class membership. It must be written on more than two class labels; thus, it can be written as shown below. σ (→z )i =
ez i ∑ kj=1 ez j
.
(23.2)
Mean Absolute Error (MAE): The MAE (Eq. 23.3) is the average of the absolute deviations of predictions and actual observations. MAE =
n ∑ | 1 || ∗ yi − yˆi |, n i=0
(23.3)
where yi is an instance that really happened in the dataset, i is an instance that was predicted in the dataset, and h is the dataset’s duration (Eq. 23.4). It is therefore possible to express the forecasting model. rain = ϕ(δ),
(23.4)
where φ is the trained convolutional neural network, δ = {δ 1 , δ 2 ,…, δ n }, δ j is an input feature 1 ≤ j ≤ n, and n is the number of recommendation features. Root Mean Squared Error: The root mean squared error (Eq. 23.5) quantifies the variance among variables that a model predicts the corresponding values. It provides an evaluation of just how efficiently the method is able to forecast the output values (accuracy). [ | N |1 ∑ RMSE = | wiui 2 . W t=1
(23.5)
23.4 Result Analysis The collection has a total of more than 512 photos. The dataset has been uploaded to Google Colab after being mounted from a disc. We utilized a CNN model for picture identification and rain rate estimation, resulting in a model with Mean Absolute Error (MAE) metrics. The generated findings were examined and compared with those from similar methodologies. For all satellite products, a glancing mean of absolute errors of 2.43% and a root mean squared error of 2.82% were found (Table 23.1), demonstrating the power of the CNN model in this method. The model was created and tested on the Google Colab environment. It’s a free notebook environment that enables Google Drive connections. Colab comes with a number of pre-installed deep
284 Table 23.1 Errors versus metrics
T. Lakshmi Sujitha et al.
Metrics
Error %
Mean absolute error (MAE)
2.439
Root mean squared error (RMSE)
2.825
learning libraries, and we can use the import keyword and the library name to add them to our notebook. The table below contains the errors that we discovered. Additionally, as the number of epochs increases, so does the exactness of our training. The validation accuracy and loss change less steadily than the training accuracy and loss when we add fewer photos to the validation dataset. As the model distinguishes the pictures, decreases dimensionality for improved performance, and trains the CNN model after preprocessing the image, we can see changes in loss and accuracy with respect to epochs. Convolution with nonlinearity (ReLU), pooling, and fully connected layer are the three operations that are implemented in this architecture in the form of several layers. Figure 23.4 shows the result of the proposed model. This shows some sample satellite images and rain rate predicted corresponding to the image. Here, o denotes the image’s label, and p is the projected rate of rainfall for that specific image. Fig. 23.4 Predicted rain rate on selected images
23 A Satellite-Based Rainfall Prediction Model Using Convolution Neural …
285
23.5 Conclusion As part of this research, a rainfall prediction system based on data from geostationary satellite images was developed. We used a CNN model to analyze photos taken during cyclones and calculated the expected rain rate in each image. This process resulted in a model with MAE metrics. The resulting findings are examined and contrasted with those of comparable meteorological data. The average root mean squared error and generalized mean absolute error for all satellite products were 2.43% and 2.82%, respectively.
References 1. Han, L., Sun, J., Zhang, W.: Convolutional neural network for convective storm nowcasting using 3D doppler weather radar data. IEEE Trans. Geosci. Remote Sens. 58, 1487–1495 (2020) 2. Dixon, M., Weiner, G.: TITAN: thunderstorm identification, tracking analysis and nowcasting—a radar-based methodology. J. Atmos. Ocean. Technol. 10, 785–797 (1993) 3. Boonyuen, K., Kaewprapha, P., Srivihok, P.: Daily rainfall forecast model from satellite image using Convolution neural network. In: 2018 International Conference on Information Technology (InCIT), pp. 1–7 (2018) 4. Ionescu, V.-S., Czibula, G., Mihule¸t, E.: A deep learning model for prediction of satellite images for nowcasting purposes. In: 25th Conference on Knowledge-Based and Intelligent Information & Engineering Systems (KES 2021), pp. 622–631 (2021) 5. Meena, B., Preethi, Gowtham, R., Aishvarya, S., Karthick, S., Sabareesh, D.G.: Rainfall prediction using machine learning and deep learning algorithms. In: Kariniotakis, G. (ed.) IWA Publishing, pp. 3448–3461 (2021) 6. Endalie, D., Haile, G., Taye, W.: Deep learning model for daily rainfall prediction: case study of Jimma, Ethiopia. Water Supply 22(3), 3448–3461 (2022) 7. Susmelj, I., Heller, M., Wirth, P., Prescott, J., Ebner, M., et al.: Predicting Rain from Satellite Images, on articlesusmelj2020lightly in lightyai (2020) 8. Liu, Y., Zhao, Q., Yao, W., et al.: Short-term rainfall forecast model based on the improved BP–NN algorithm. Sci. Rep. 9, 19751 (2019) 9. Qiu, M., et al.: A short-term rainfall prediction model using multi-task convolutional neural networks. In: 2017 IEEE International Conference on Data Mining (ICDM), pp. 395–404 (2017). https://doi.org/10.1109/ICDM.2017.49 10. Sun, D., Wu, J., Huang, H., Wang, R., Liang, F., Xinhua, H.: Prediction of short-time rainfall based on deep learning. In: Mathematical Problems in Engineering, vol. 2021, Article ID 6664413 (2021) 11. Barrera-Animas, A.Y., Oyedele, L.O., Bilal, M., Akinosho, T.D., Delgado, J.M.D., Akanbi, L.A.; Rainfall prediction: a comparative analysis of modern machine learning algorithms for time-series forecasting. In: Machine Learning with Applications, vol. 7, p. 100204 (2022). ISSN 2666-8270 12. Akbari Asanjan, A., Yang, T., Hsu, K., Sorooshian, S., Lin, J., Peng, Q.; Short-term precipitation forecast based on the PERSIANN system and LSTM recurrent neural networks. J. Geophys. Res.: Atmospheres 123, 12543–12563 (2018)
Chapter 24
A Smart Enhanced Plant Health Monitoring System Suitable for Hydroponics I. N. Venkat Kiran, G. Kalyani , B. Mahesh, and K. Rohith
Abstract As we all know that plants require more water and sunlight for their efficient growth. Due to daily activities we often forget to water the plants. This may lead to the death of plants in our garden. Farmers may be unable to look after the crops due to various reasons such as health issues, out of station. In the case of crops, this may lead to decrease in its production or yield. Then farmers have to bear the loss. To overcome all such losses, we want to introduce an automated irrigation system. With the help of this automated irrigation system, you need not worry about the supply of water to your plants in your garden or crops in your fields. To develop this system, we have to consider some factors from the soil and atmosphere, so we use some sensors to detect those factors from the soil and from the atmosphere. Over all this system is developed based on Internet of Things (IOT) which uses sensors and their connections. Based on the requirements to the plant, this system releases the required amount of water. In this way the plant will always get water whenever it is needed, so if we forget to water them for some days it won’t matter. Farmers will also be quite happy that they should not have to watch after every time for the supply of water to their fields.
I. N. Venkat Kiran · G. Kalyani (B) · B. Mahesh · K. Rohith VelagapudiRamakrishna Siddhartha Engineering College, Vijayawada, Andhra Pradesh, India e-mail: [email protected] I. N. Venkat Kiran e-mail: [email protected] B. Mahesh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_24
287
288
I. N. Venkat Kiran et al.
24.1 Introduction Proper monitoring is required to ensure the plant’s optimal development. Plants play an important role in the environmental cycle and serve as the foundation for food. So, automation is required for plant monitoring. Many people are preoccupied with their jobs. Many individuals neglect to water their plants. Many plants perish or have their production decreased. We need to automate plant monitoring to reduce this type of incident. To reduce this kind of problem, we need to make the plant monitoring automated. We can achieve automation by using IOT. This subject emphasizes a number of characteristics that can be achieved by using IOT, choosing the appropriate decision, taking the correct values from the sensor, etc. IOT is suitable for plant monitoring. Using numerous sensors, IOT enables us to monitor various moisture content and temperature data. Sensors detect moisture and temperature changes. Knowing the moisture content helps us to provide water to the plant if it is dehydrated. When the necessary volume of water is attained, the watering is automatically stopped. Research conducted by Saha et al. [1] by title Smart Irrigation System Using Arduino and GSM Module. The moisture sensor measures the amount of moisture and transmits the result to Arduino. These values are contrasted with those that the program already contains. If the value is below the necessary value, a message alert is sent to the crops owner, and water is then pumped to the plant until the moisture level returns to the desired level. Crop development is mostly influenced by moisture and temperature, according to research by Jariyayothin [2] published under the title “IOT Backyard: Smart Watering Control System”. Therefore, those two features are the primary emphasis of our irrigation system. Using computers or mobile devices, people may also keep an eye on their crops. From the control panel on their equipment, they can even operate water pumps. The farmers will benefit by saving both time and money. Research conducted by Prathamesh Pawar (2022) [3] under the title “IOT-Based Smart Plant Monitoring System”. In the paper, the moisture level is detected by the moisture sensor, which displays the value in the Blynkapp. If the water level is below the required level, a notification alert has been sent to the user using the Node MCU WiFi Module. The user can then turn on the motor, and the plant will be watered. When the plants have received enough water, another message alert is sent to turn off the motor. Research conducted [4] with the title “IOT-Based Plant Monitoring System”. Three sensors are primarily used DHT11, moisture, ultrasonic. Moisture sensors detect the moisture level, and DHT11 sensors detect the temperature and humidity. When the moisture level is low, the moisture sensor detects it, and the motor turns on. When the plant gets a sufficient amount of water, the motor is automatically switched off. An ultrasonic sensor is used for checking the water level in the water tank. All the information is displayed in the Android app.
24 A Smart Enhanced Plant Health Monitoring System Suitable …
289
24.2 Review of Literature In the study [5] they that designed automated plant watering device that uses a copper plate sensor. It works as an electrode that passes signals to the Arduino board. They concluded that it can reduce the usage of electrical energy. This can be done with the help of Arduino board which commands to open the solenoid valve when the soil is dry. And when the soil is wet, the water flow will be blocked by closing the solenoid valve. The moisture values required for Arduino board are obtained with the help of soil moisture sensor. In [6] the main theme is to make water available for plants without the manual involvement. The accurate measurement and watering improve yield obtained because of the accuracy of nutrients provided to field as a balanced diet for plants at various levels of their growth. It is a user friendly system. While the amount of water is supplied to the plant, sensor waits for some part of time to again measure its readings. Then it sends the values to the main NodeMCU board. If the measurements are okay, then NodeMCU goes to sleep. In [7] if humans are trying to maintain the dampness of soil then it may not be possible. But the robotized control can have more accuracy and precision. Pipe sizes will be directed by other determinations such as the estimate of the dribble water system pipe and the measure of the valve outlet. Drones can also be used in the agricultural activities. With the help of them, farmers gain oversight of the nutrient requirements. Thus time and money can also be saved for the farmers. Robots are more accurate and can work without any fatigue. In [8] with humans maintaining the soil levels that will definitely lead to the decrease in efficiency of the soil. For proper health and growth of the plant, moisture must be maintained perfectly. So this automatic system can avoid problems of over irrigation or no irrigation with cost efficiency. Nowadays watering the plants can be done in two ways, one is drip surface and the other is sprinklers. Arduino is programmed with some code with estimated values so that it can act on its own whether to water the plant or not. In [9] the real data was collected and used in implementation of this system. Predicting the soil moisture by analyzing the weather data, crop coefficients, and amount of irrigation water. They took the problem of soil prediction and designed a methodology which uses training models. With the help of programming and simulation we can make a model. The information can be useful for developing the irrigation system and to know the distance between the lines of the crops. In [10] Internet of Things (IOT) can really combine real time data and virtual system networks. With the help of IOT agriculture field has improved a lot these times. It stores real time data, processes it and analyses it. Thus, a lot can be developed based on IOT such as automated irrigation system. The semiconductor as an Integrated Circuit (IC) material makes the microcontroller technology more advanced. The Aurdino microcontroller is designed in the similar way to make it easier for designers.
290
I. N. Venkat Kiran et al.
24.3 Materials and Methods 24.3.1 Proposed Architecture Figure 24.1 shows how the suggested work is designed. For the purpose of measuring moisture, temperature, and humidity, the system has sensors for each. Following a conditional check, Arduino receives measurement values, and after sending signals to the relay module that is attached to it, the relay module receives the signals. Batteries and the Arduino are linked to the motor. A moisture sensor detects the moisture level in the soil. The readings that are taken from the sensor are passed to the Arduino board. For temperature sensing, a sensor called dht11 is utilized to gauge the environment’s temperature and humidity levels. Temperature and moisture levels are crucial for a plant growth. A relay module is used to give a signal to a water pump. It works as a switch for the pump. When the relay module sends the signal, the solenoid valve starts pumping the water. A solenoid valve is used for pumping the water.
Fig. 24.1 Architecture for the plant monitoring
24 A Smart Enhanced Plant Health Monitoring System Suitable …
291
24.3.2 Materials 24.3.2.1
Arduino UNO
The Arduino UNO is a microcontroller board with both analogue and digital input and output pins. There are 14 digital pins on it. It features six analogue pins. The Arduino IDE is used for programming, and a USB Type-B connector is used to dump the code. For this project, we are using the 5 V pin, two ground pins that are connected to sensors, and the Vin pin.
24.3.2.2
Moisture Sensor
For checking the moisture level, we are using the moisture sensor. A moisture sensor detects the moisture level in the soil. It takes the readings from the soil and converts them into the required format, which is useful for comparing the required level of water. The readings that are taken from the sensor are passed to the Arduino board. Moisture sensors have four pins vcc, ground, a digital/output in, and an analogue output pin.
24.3.2.3
Solenoid Valve
A solenoid valve is used for pumping the water. We are using a 24-V pump. When it receives the signal from the relay module, it starts pumping the water. It is made up of two wires, one of which is connected to the relay module and the other to the battery.
24.3.2.4
Relay Module
The 5-V relay module is used in this project. Relay modules consist of six pins: vcc, ground, signal, normally open, commonly contacted, and normally closed. The relay module is coated with blue plastic material. It has a LED indicator, too. A relay module is used to give a signal to a water pump. It works as a switch for the pump. When the relay module sends the signal, the solenoid valve starts pumping the water. The relay module is connected to the moisture sensor as well.
24.3.2.5
Jumper Wires
Jumper wires are employed for circuit connections male to male, male to female, and female to female jumper wires are the three varieties available the usage of jumper wires prevents soldering.
292
24.3.2.6
I. N. Venkat Kiran et al.
Temperature and Humidity Sensor
For temperature sensing, a sensor called dht11 is utilized to gauge the environment’s temperature and humidity levels. Temperature and moisture levels are crucial for a plant growth.
24.4 Experimentation A soil moisture sensor is employed in this system to measure the soil’s moisture content. When the soil’s water content drops, a sensor detects it, shows the moisture level on an LCD, and activates the water pump to start supplying the plant with water. As plants receive enough water and the soil becomes moist, a sensor detects the appropriate level of moisture and shuts off the water supply. The environment’s temperature and humidity are measured by the temperature and humidity sensor, which then displays the data on the LCD. The LCD screen may also be utilized to display an ON/OFF signal for the motor. A breadboard can be used to construct the circuits. The Arduino UNO microcontroller is directly connected to the DHT11, LCD, Relay Module. The moisture sensor is connected to the Relay Module. Displaying values on the LCD is done using the resister connected to the breadboard. You can modify the LCD’s brightness with the use of the resister. The Arduino has a USB connection for an external current source. The relay module is connected to the solenoid valve, which controls the water supply. The LCD values will change with respect to the sensor detected values. When the plant is dry, the DHT11 displays the room temperature value by default (Figs. 24.2 and 24.3). This prototype is generally used for the common plants that we usually plant in our garden. The prototype started working after connecting an external supply to the Arduino. If the plant moisture level falls, the relay module will indicate the light, and the LCD will display the decreasing trend in moisture values at the same time the Arduino will turn on the motor function. The motor is turned off until the moisture value increases. The test scenarios are shown in Table 24.1 (Fig. 24.4).
24.5 Conclusion A successful smart plant monitoring system built on the Internet of Things has been developed. It was created by integrating functionality from every piece of hardware used. Every module’s presence has been thoughtfully considered and arranged, which helps the unit function as best it can. The system’s ability to run autonomously has been evaluated. The moisture sensors gauge the various plants’ moisture content (or moisture level). The moisture sensor alerts the microcontroller if the moisture level is below the acceptable level, which causes the water pump to switch on and feed the
24 A Smart Enhanced Plant Health Monitoring System Suitable …
Fig. 24.2 Components used in the model development
Fig. 24.3 Circuit diagram of the proposed model
293
294
I. N. Venkat Kiran et al.
Table 24.1 Test results of the developed prototype S. No
Testing scenarios
Expected results
Test result
1
Moisture sensor
Detecting the moisture level and display it on LCD
Successful
2
Moisture sensor is placed in the dry condition
It display the moisture level and turns on the motor
Successful
3
Moisture sensor is placed in the wet condition
It display the moisture level and turns off the motor
Successful
4
Temperature placed at wet soil
It displays the value below the room temperature
Successful
5
Temperature placed at dry soil
It displays the room temperature
Successful
Fig. 24.4 Prototype of the developed model
appropriate plant with water. The mechanism automatically stops when the target moisture level is attained, and the water pump is shut off.
24 A Smart Enhanced Plant Health Monitoring System Suitable …
295
References 1. Saha, H.N., Banerjee, T., Saha, S.K., Das, A., Dutta, A., Roy, A., Kund, S., Patra, A., Neogi, A., Bandyopadhyay, S., Das, S.: Smart irrigation system using Arduino and GSM module. In: 2018 IEEE 9th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), pp. 532–538. IEEE, 2018 2. Jariyayothin, P., Jeravong-Aram, K., Ratanachaijaroen, N., Tantidham, T., Intakot, P.: IoT backyard: smart watering control system. In: 2018 Seventh ICT International Student Project Conference (ICT-ISPC), pp. 1–6. IEEE, 2018 3. Nehra, V., Sharma, M., Sharma, V.: IoT based smart plant monitoring system. In: 2023 13th International Conference on Cloud Computing, Data Science & Engineering (Confluence), pp. 60–65. IEEE, 2023 4. Athawale, S.V., Solanki, M., Sapkal, A., Gawande, A., Chaudhari, S.: An IoT-based smart plant monitoring system. In: Smart Computing Paradigms: New Progresses and Challenges: Proceedings of ICACNI 2018, Volume 2, pp. 303–310. Springer Singapore, Singapore (2019) 5. Toai, T.K., Huan, V.M.: Implementing the Markov decision process for efficient water utilization with Arduino board in agriculture. In: 2019 International Conference on System Science and Engineering (ICSSE), pp. 335–340. IEEE, 2019 6. Devika, C.M., Bose, K., Vijayalekshmy, S.: Automatic plant irrigation system using Arduino. In: 2017 IEEE International Conference on Circuits and Systems (ICCS), pp. 384–387. IEEE, 2017 7. Siva, K.N., Bagubali, A., Krishnan, K.V.: Smart watering of plants. In: 2019 International Conference on Vision Towards Emerging Trends in Communication and Networking (ViTECoN), pp. 1–4. IEEE, 2019 8. Divani, D., Patil, P., Punjabi, S.K.: Automated plant watering system. In: 2016 International Conference on Computation of Power, Energy Information and Commuincation (ICCPEIC), pp. 180–182. IEEE, 2016 9. Thilagavathi, G.: Online farming based on embedded systems and wireless sensor networks. In: 2013 International Conference on Computation of Power, Energy, Information and Communication (ICCPEIC), pp. 71–74. IEEE, 2013 10. Aygün, S., Güne¸s, E.O., Suba¸sı, M.A., Alkan, S.: Sensor fusion for IoT-based intelligent agriculture system. In: 2019 8th International Conference on Agro-Geoinformatics (Agro-Geoinformatics), pp. 1–5. IEEE, 2019
Chapter 25
A Survey: Classifying and Predicting Features Based on Facial Analysis J. Tejaashwini Goud, Nuthanakanti Bhaskar, Voruganti Naresh Kumar, Suraya Mubeen, Jonnadula Narasimharao, and Raheem Unnisa
Abstract The facial features of a human being are to deliver their opinions and intentions. The human face consists of unique features (like eyes, nose, cheeks, eyebrows, and the rest) that make a person vary from another individual for identification. This paper correlates and analyzes various facial features like gender, age, and emotion. Human emotion analysis considered for estimation till now are fear, happiness, surprise, sadness, neutrality, anger, and contempt. The analysis of gender detection, age estimation, and facial expression recognition obtain via an image, video clip, or real-time detection like a webcam. The facial analysis consists of numerous authentic applications such as human–computer interaction, biometrics, electronics, surveillance, personality development, and cosmetology. This paper allocates several procedures of the facial estimation analysis process and trends.
25.1 Introduction The human face is a valuable and incredible origin of the human, which describes their behavior. From the last few decades, human face acknowledgment has arisen as a significant exploration region. Face looks are perhaps the most impressive, customary, and prompt methods of discrete humans to impart their feelings and intent. Age detection from face pictures assumes a significant part in human and PC intuition have numerous enactment in, for instance, crime scene investigation or web-based media. It can decide the expectation of other biometrics of human and facial credits errands like gender identity, age, and emotions. Face preparing depends on the way data about a client’s character can be detached from the pictures by the
J. Tejaashwini Goud · N. Bhaskar (B) · V. N. Kumar · J. Narasimharao · R. Unnisa Department of Computer Science and Engineering, CMR Technical Campus, Hyderabad, India e-mail: [email protected] S. Mubeen Department of Electronics and Communication Engineering, CMR Technical Campus, Hyderabad, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_25
297
298
J. Tejaashwini Goud et al.
PCs and can act appropriately. Face discovery has numerous applications, biometrics, divertissement, and information security. Several motions or positions of the muscles underneath visualizing the face comprise a facial expression. According to a popular idea, these movements convey an individual’s inner feelings to observers. Non-verbal communication includes facial movements. Humans can make a conscious or unconscious facial expression, and the neural factors that regulate the expression differ in each case. Feelings can convey in unique ways through looks, voice, tone, and other physiological means. Even though there are contentions on the most proficient method to decipher these physiological estimations, there is a solid connection between quantifiable physiological signs and feelings of a person. The visible recognition agent to guide the report highlights the expected range of obstructed faces. The detection delay interpreted into high-stage connotative specific detection trouble through the forward analytical identity then the place and scale of the face expected by the renewing plan remain away from further parameter distances [33]. Facial expression is perhaps the most significant for influence acknowledgment. Analysis has created various frameworks to portray and measure facial conduct [17]. Facial expression is wholly specific position that promotes face for separation. A position like eyes, mouth, nose, eyebrows are the primary elements in facial expression. For the sample, the finding might for the relative size, position, and condition of the cheekbones, nose, eyes, nose, jaw [25]. However, individuals without much of a stretch recognize faces in numerous age ranges. It is as yet trying for a PC or machine to finish that. Many components add to the trouble of programmed sex and age expectation. Gender and age forecast is delicate to inherent composing like personality, identity, etc., just for extrinsic factors like posture, brightening, and emotion [16]. This paper gives an outline of the system for facial analysis. The leftover paper is assembled as follows: module 2 depicts the facial analysis of age, gender, emotion, and module 3 provides the conclusion.
25.2 Facial Analysis Methodology In the previous few years, a few papers have effectively tackled the issue of face analysis acknowledgment. So far, a few errands face analysis outgrow for constant expression, gender, age analysis. Some papers are discussed which are relevant to early techniques work in this segment.
25.2.1 Face expression A nonlinear and non-desk-bound sign can fragment into several impulse-response components using the Empirical mode decomposition (EMD) approach for facial expression identification. The core IMF (IMF1) was converted into an extract and
25 A Survey: Classifying and Predicting Features Based on Facial Analysis
299
taken into account as having the capacity to interpret facial movements. Principal component analysis with linear discriminant analysis (LDA), PCA with local fisher discriminant analysis (LFDA), and kernel LFDA (KLFDA) are three threedimensionality compression methods that were separately implemented on EMDbased capabilities for dimensionality reduction. These condensed skills were supplied to the classifiers for the particular set of seven well-known facial emotions via the KNN, support vector machine, and extreme learning machines with radial basis function (ELM-RBF). Two benchmark databases as JAFFE database, the overall performance of dimensionality compression with the use of classifiers have demonstrated that outcomes have transferred out well-known identification of a 100% accuracy, and CK database, the most reliable estimation charge that’s 99.75% [4]. Algorithms detection and classification that learn offline for implementations as part of learning for distinguishing human facial sentiments. Examine the AdaBoost cascade classifiers to locate faces in the images. Using limited expression statistics, extract area differentiation factors define the characteristics of the human face. A random forest classification system learns with a concealed depth of character implies taking into account the mistaken or incorrect detection. This strategy outperformed both the static facial expressions and genuine expressive faces datasets with 57.7% and 59.0% s in two baseline methods versus five model depicts [5]. The framework considered applying hybridization like gathering optimization and extraction utilizing PSO and PCA to complete extreme accuracy. The extraction of feature and optimization facial photo samples pooled PCA and gradient filtering manageable with advanced accuracy. MLP, random forest, and decision tree are classifiers used. Emotion recognition obtains an accuracy of 94.87% with this framework [7]. The TFEID database characterizes the three facial expressions like happy, angry, and unhappy. For detaching the exception, specific photo pre-processing steps as representation resizing and color to the shade frame. Three feature styles like Binary, Histogram, and Texture extracted functions deployed to trees and meta-bases classifiers. An estimate of consequences is located between these forms of classifiers like tree (Random Forest) classifier 95.277% compared to meta (Random Committee) 93.1944% classifiers [8]. Dimensionality reduction method PCA, extracting features techniques by local binary pattern, Gabor wavelets, and classification method SVM were all proposed in the FER version. The research included expression datasets from MMI, CohnKanade, and JAFFE for genuine facial gestures recorded by a camera. Applying Gabor wavelets, the precision value was 84.17% using JAFFE, 93.00% with MMI, 85.83% with Cohn-Kanade, 88.00% for JAFFE applying LBF, 88.16% for MMI utilizing LBF, and 96.83% for Cohn-Kanade using LBF feature extraction method [9].
300
J. Tejaashwini Goud et al.
The method contains categories that combine results from prominent factor analysis, use linear discriminant analysis to solve design identification issues and use a helper vector component to extract emotions through images. Six facial emotions were recognized, which then used Fisherface and ASM to change photographs in real time to match the gestures. Previously provided Fishesface information or face mood extracted from the SVM version of facial features. Self-taken face photographs in the database have a precision of 86.4% [12]. The end-up leaving and okay-fold trials are two different evaluation procedures for the three-stage SVM. Japanese woman face characteristics expanded Cohn-Kanade Dataset, and Radboud Faces Database is experimental data from these datasets. For the RaFD, JAFFE, and CK+ datasets, the chances acquired from the go-away-oneout check were 99.72, 96.71, and 93.29%. However, the probabilities received from the 10-fold test achieved accuracies of 95.14, 98.10, and 94.42% [13]. Neurophysiological devices can make humans revel in emotion and figure out quantities of the mind bringing records related to one-of-a-kind emotions. It accomplishes this through acting conventional ml strategies of logistic regression, naive Bayes, decision tree, SVM, LDA, and KNN, each one generating accuracy varying from 55 to 75% and an F1 rate forming between 70 and 86% [14]. Gathered area-specific physical characteristics by segmenting the whole face region into domain-distinctive neighboring areas. The geometry characteristic removed from their respective local locations. A combination of geometric form and appearance elements from nearby face areas are used by FER. Looking at surrounding face regions holding the best distinguishing facial structure data results in an aggregate increase in performance to the dimensional discount. For the 6 and 7-magnificence features, so many experiments are conducted inside the CK+ dataset to verify the effectiveness of the suggested FER approach [17]. To categorize emotional expressions of autistic children and differently abled people (deaf, dumb, and bedridden) by using facial characteristics and electroencephalogram alerts by designing a set of rules for CNN and LSTM classifiers. The use of real indicators via a visual go-with-the-flow set of guidelines that effectively executes under discontinuous flashing and severe upturns (up to 25 degrees), distinct backdrops, and various skin tones is required for genuine emotion identification. The findings show that the device can determine emotion in 87.25% of EEG signs and 99.81% of the feature point [19]. Troubles incorporating description and arrangement of static or dynamic characteristics of those perversions or face complexion. Consequences by utilizing the CVIPtools. RST-Invariant abilities and texture function analysis later categorized them by consumption K-NN algorithm with an accuracy of 90% [25].
25 A Survey: Classifying and Predicting Features Based on Facial Analysis
301
Our innovative team is composed of roughly 35 people who participated in 8 organizations through Zooming during each course on Creative Innovation Systems. It held stereo video chats where each society presented its advancement. Zooming face segment of people’s faces tracking facial emotion predominance found six aspects of emotion. This response variable shows the grade that is given following each session by all colleagues aside from the speaker [26]. The routinely spotting emotions every day solely in an actual-global context represent using an individual’s natural conditions proficient real emotions. Every day contributes an observational attempt of the probability of emotion estimation from contextual statistics using four kinds of system day-to-day practices, especially multilayer perceptron random forest, naive Bayes, neural networks, logistic regression [27]. SQI filtration can overcome weak color and inadequate light. The pattern of the possibilities being proactively in use by schemes at once: the Gabor feature, the angular radial remodel (artwork), and the discrete cosine transform (DCT). As a related inter-predictor that may anticipate facial expressions, the experienced SVM fulfills its role. Research findings show that the FERS approach’s aggregate category efficiency is superior to some other recent techniques that deserve consideration [29]. The hybrid function recognition of facial expressions from a photo utilizes an endorsed SIFT and dl features of different stages extracted from the CNN version, which adopts the combined functions and classifies through SVM. The performance of the proposed technique demonstrates on public CK+ databases [31] (Table 25.1).
25.2.2 Age and Gender estimation Accurately determine the gender and the age from a genuine CNN imagery of a qualified face. For gender and age classification, feature identification will have to use Haar Cascades and Caffenet. Computerized gender and age classification of people based on actual facial images. The face identification approach used by the Haar Cascade pre-trained model was altered to be accepted for facial recognition, and the identified face was then loaded into the Caffenet CNN model for age and gender predictions. The age forecasting CNN’s output layer consists of 8 variables for eight pre-defined age ranges, while the gender forecast identifies training specifically for men and women. The absolute precision achieved by the system was 68.89% [1]. An up-to-date CNN strategy is described in a novel for evaluating unprocessed, authentic images’ age and gender absolute precision. The extraction process and the actual classifying process make up the two-tier CNN model. The original dataset is presented by the machine after it has been performed on MORPH-II and learned on an IMDb-WIKI with distorted tags. According to empirical results examined for categorization precision at the OIU-Adience test, the model achieves an actual quality of 93.8% and 96.2% for each age group and gender class [2]. A reliable method uses generic geometrical parameters to categorize the gender, age, and human characteristics from video streams. Skin color segmentation and the
302
J. Tejaashwini Goud et al.
Table 25.1 Overview of face expression analysis Article references
Features
Classifiers
Datasets
Performance accuracy
[4]
IMF1 + PCA + LDA, IMF1 + PCA + LFDA, IMF1 + KLFDA
KNN, SVM, ELM-RBF
JAFFE [53]
100%
IMF1 + KLFDA
ELM-RBF
CK [51]
99.75%
[5]
HAAR-like, NDF
AdaBoost, random forest
SFEW [47] RAF [35]
57.7% 59.0%
[7]
Gradient filter and Decision tree, PCA + PSO MLP, random forest
JAFFE [53]
94.97%
[8]
Binary, texture, histogram
Tree random forest
TFEID [37]
95.277%
[9]
Gabor wavelets LBF
SVM
JAFFE [53] CK [51] MMI [42]
84.17%, 88.0% 85.83%, 96.83% 93.0%, 88.16%
[12]
FisherFace
SVM
Self
86.4%
[13]
Leave-one-out 10 folds
SVM
JAFFE [53] CK+ [51] RafD [46]
96.71%, 98.1% 93.29%,94.42% 99.72%, 95.14%
[17]
LBP, NCM
SVM (Local representation)
CK+ [51]
Frame 6-97.25% Frame 7-91.95%
[19]
Histogram, LSTM CNN
Self
99.8%
[28]
Traditional CNN architecture
FER2013 [52]
69.2%
face ellipse’s derived geometrical abilities are used in the feature extraction based on facial analysis. These geometrical parameters create the facial component paths, which are then used to control the Broyden-Fletcher-Goldfarb-Shanno function and monitor neural network activity over time. In comparison to the screening collection’s 91.2% precision for gender recognition, 88% precision for age classification, and 83% precision for personal authentication, the evolved methodology based on global spatial facial expertise accomplishes an intense category frequency of 100% for all variations within the learning environment [3]. Exceptional target age encodings, CNN layouts with many depths, and transferable approaches like pretraining and multi-task training are all used by CNN for gender detection and age estimation. The optimal moment for GR and AE to train simultaneously is whenever a CNN starts learning from the beginning. The strategy for estimating age and gender using the well-known standards LFW, MORPH-II, and FG-NET [6]. A supervised appearance model that enhances beyond AAM by replacing underperforming least-squares analysis with PCA. The extracting features method is
25 A Survey: Classifying and Predicting Features Based on Facial Analysis
303
advantageous for age and gender-related constraints. These benchmarks aggregate the efficiency of Dartmouth (75.80%), HQFaces (92.50%), and FG-NET-AD (76.65%) [11]. To handle age and gender classifications, a hybrid design contains a CNN and ELM which blends the harmony of classifiers. This hybrid framework makes use of its unique strengths by using CNN to extract capabilities from the image features and ELM to characterize those findings. The usage of both Adience Benchmark and MORPH-II aids in the overall effectiveness of our combination CNN-ELM. [15]. A technique primarily based on muti-level gaining knowledge proposed: The first step of classification is to designate the unit regions using an encoder decoder-based segmented community. Specifically, the segmented group can divide each pixel into two categories. The proposed prediction community, which encodes global records, local location data, and the relationships among distinctive adjacent locations into the ultimate presentation before finishing the prediction, is what the secondary level is all about. To deal with these problems preliminary concept is that include a few existing photograph recognition and item detection strategies into our second sight to improve its flexibility and generalization capability [16]. Age prediction is only based on face images. On a contemporary category frame cascade ensemble, recently referred to as a deep random forest region. Two cuttingedge DRFs make up the framework. The main objects enhance and achieve the role of representing a specific face description at the moment. The supplementary type performs at the merged form province representation to generate an age forecast while also taking into account age estimation fuzzy sets. MORPH Caucasian, FG-Internet, LFW+, FACES, APPA-real, and buddy serve as the research’s exceptional publicly available databases. These trials show that contemporary suggested architectures outperform several existing techniques [18]. A new collection of brain–computer interactions predicts the gender and age of a facial image using EEG analysis. To build a mixed learning strategy, for preventative assessment of the Depth BLSTM-LSTM group [24]. Absolute precision rates of 93.7 and 97.5% are achieved for concerns about age and gender [20]. The suggested model provides stand stack elements for face knowledge, such as head posture evaluation, age range, and gender reputations. Training the Conditional Random Fields-based separation variation are subjectively categorized human face facts. A sophisticated CRF-based multi-magnification face classification framework divides a facial picture into six parts. For each challenge, a simple random woody region classifier is created using an aggregation of unique likelihood maps. Numerous studies were conducted to determine which facial characteristics aid in the identification of age, gender, and head position [21]. The segmentation method uses Conditional Random Fields based on subjectively classified facial images. A facial image is divided into six distinct parts using the CRFs-based complete version: the pores, eyes, mouth, nose, skin, and hair. These six classes were made possible using employment (computers) and a probabilistic approach. Use a trained RDF to categorize the gender of people using the possibilities regions as category attributes. Online available datasets, including FEI, LFW,
304
J. Tejaashwini Goud et al.
Adience, and FERET used to classify the efficiency trend with SOA performance impacts [22]. As compared more than one sentiment analysis technique at the dataset, extracted dataset into four corporations to investigate the effect of human age and gender the manner user expresses their opinion. Gadget gaining knowledge of and Dictionaryprimarily based techniques executed to recognize the sentiment evaluation of viewpoint. The effects further advanced by utilizing gathering information for both human gender and age companies [23]. Two modes: something for estimating age and gender that used a complex resnet structure, and another for predicting feelings using a traditional CNN architecture. By deploying the CNN, changes in how occurred by those actions are performed were found. The efficiency of the comprehensive framework is 96.26%, while the result rate by the recognition system version is 69.2% [28]. Comparative features were considered in classified info by the raSVM+ contemporary method, which improved the precision of age prediction while restricting outliers in teaching data. The supremacy raSVM+ evolved into established through evaluating it with algorithms at FG-net and MORPH face growing elderly databases. raSVM+ is an encouraging improvement that advances age estimation, with the MAE achieving 4.07 on FG-net [30]. CNN is used to predict age and gender from an enter photograph. With two combined architectures, learned and the predict turns to accomplish on various comparison pictures. The CNN [10] essentially assembles report purposes from background picture reports, which could otherwise have to be handcraft in conventional devices obtaining knowledge of the model. Each layer will perform a variety of operations on the information in the input data till it produces a description and a percentage for the class [32]. An artificial NN used for unsupervised effective coding techniques is called an autoencoder. It intends to determine an illustration of sturdy and secure data. The goal of those pictures is to utilize the overall achievement autoencoder’s research abilities superintended mode to evaluate years. MORPH, FG-NET dataset’s experimental outcomes display robustness and effectiveness via the MAE value of 3.34 and 3.75% [34] (Table 25.2).
25.3 Conclusion This paper has endeavored to examine various kinds of paper to reach the modern advancement of facial examination. In short, this paper overviews the method of facial expression, age, and gender prediction using a different methodology. Examine shows that the face algorithm has to adopt the hybrid method using various algorithms together for the overall performance accuracy of the facial analysis of age, gender, and expression separately. We intend to connect the characteristics of gender, age, iris color, and emotion analysis using deep machine learning of facial appearance to obtain more reliable performance accuracy.
25 A Survey: Classifying and Predicting Features Based on Facial Analysis
305
Table 25.2 Overview of age and gender analysis Article reference
Features
Classifiers
Datasets
Age accuracy
Gender accuracy
[1]
HAAR cascades and caffenet
CNN
–
63.07%
74.75%
[2]
CNN 2-Level architecture
OIU adience [38] 93.8%
96.2%
[3]
Geometry-based
–
88%
91.2%
[6]
CNN architecture
LFW [49] FG-NET [39] MORPH-II [48]
99.3% 99.4%
2.84 (MAE) 2.99 (MAE)
[11]
sAM features
sAM SVM
HQFaces [36] Dartmouth [50] FG-NET-AD [44]
5.49 (MAE)
92.50% 75.70% 76.65%
[15]
CNN-ELM+Dropout 0.5 CNN-ELM+Dropout 0.7
ELM
Adience [43] MORPH-II [48]
5.14%, 52.3% 3.44 (MAE)
87.3%, 88.2% –
[20]
Deep BLSTM, LSTM
EEG signals –
93.7%
97.5%
[21]
HAG-MSF-CRFs
Random forest
Adience [43] LFW [49] FERET
89.7% 93.9% 100%
[22]
GC-MSF-CRFs
Random forest
Adience [43] LFW [49] FERET FEI [40]
–
[28]
Wide resnet architecture
[30]
DSIFT
raSVM+
FG-NET [39] MORPH [45]
4.07 (MAE) 5.05 (MAE)
-
[34]
DSSAE
Softmax
MORPH [45] FG-NET [39]
3.34 (MAE) 3.37 (MAE)
-
TDNN
91.4% 93.9% 100% 93.7%
IMDb-WIKI [41] 96.26%
References 1. Abirami, B., Subashini, T.S., Mahavaishnavi, V.: Gender and age prediction from real time facial images using CNN. Mater. Today Proc. 33, 4708–4712 (2020) 2. Agbo-Ajala, O., Viriri, S.: Deeply learned classifiers for age and gender predictions of unfiltered faces. Sci. World J. 2020 (2020) 3. Al Mashagba, E.F.: Real-time gender classification by face. Int. J. Adv. Comput. Sci. Appl. 7(3) (2016)
306
J. Tejaashwini Goud et al.
4. Ali, H., et al.: Facial emotion recognition using empirical mode decomposition. Expert Syst. Appl. 42(3), 1261–1277 (2015) 5. Alreshidi, A., Ullah, M.: Facial emotion recognition using hybrid features. In: Informatics, vol. 7. no. 1. Multidisciplinary Digital Publishing Institute (2020) 6. Antipov, G., et al.: Effective training of convolutional neural networks for face-based gender and age prediction. Pattern Recogn. 72, 15–26 (2017) 7. Arora, M., Kumar, M.: AutoFER: PCA and PSO based automatic facial emotion recognition. Multimedia Tools Appl. 80(2), 3039–3049 (2021) 8. Aslam, T., et al.: Emotion based facial expression detection using machine learning. Life Sci. J. 17(8), 35–43 (2020) 9. Bellamkonda, S., Gopalan, N.P.: A facial expression recognition model using support vector machines. IJ Math. Sci. Comput. 4, 56–65 (2018) 10. Bhaskar, N., Ganashree, T.S., Patra, T.S.: Pulmonary lung nodule detection and classification through image enhancement and deep learning. Int. J. Biom. 1(1), 1 (2023). https://doi.org/10. 1504/IJBM.2023.10044525 11. Bukar, A.M., Ugail, H., Connah, D.: Automatic age and gender classification using supervised appearance model. J. Electron. Imaging 25(6), 061605 (2016) 12. Chang, J.K., Ryoo, S.T.: Implementation of an improved facial emotion retrieval method in multimedia system. Multimedia Tools Appl. 77(4), 5059–5065 (2018) 13. Dagher, I., Dahdah, E., Al Shakik, M.: Facial expression recognition using three-stage support vector machines. Visual Comput. Indus. Biomed. Art 2(1), 1–9 (2019) 14. Doma, V., Pirouz, M.: A comparative analysis of machine learning methods for emotion recognition using EEG and peripheral physiological signals. J. Big Data 7(1), 1–21 (2020) 15. Duan, M., et al.: A hybrid deep learning CNN–ELM for age and gender classification. Neurocomputing 275, 448–461 (2018) 16. Fang, J., et al.: Muti-stage learning for gender and age prediction. Neurocomputing 334, 114– 124 (2019) 17. Ghimire, D., et al.: Facial expression recognition based on local region specific features and support vector machines. Multimedia Tools Appl. 76(6), 7803–7821 (2017) 18. Guehairia, O., et al.: Feature fusion via deep random forest for facial age estimation. Neural Netw. 130, 238–252 (2020) 19. Hassouneh, A., Mutawa, A.M., Murugappan, M.: Development of a real-time emotion recognition system using facial expressions and EEG based on machine learning and deep neural network methods. Inform. Med. Unlocked 20, 100372 (2020) 20. Kaushik, P., et al.: EEG-based age and gender prediction using deep BLSTM-LSTM network model. IEEE Sens. J. 19(7), 2634–2641 (2018) 21. Khan, K., et al.: A unified framework for head pose, age and gender classification through end-to-end face segmentation. Entropy 21(7), 647 (2019) 22. Khan, K., et al.: Automatic gender classification through face segmentation. Symmetry 11(6), 770 (2019) 23. Kumar, S., et al.: Exploring impact of age and gender on sentiment analysis using machine learning. Electronics 9(2), 374 (2020) 24. Morampudi, M.K., Gonthina, N., Bhaskar, N., Dinesh Reddy, V.: Image description generator using residual neural network and long short-term memory. Comput. Sci. J. Moldova 31(1(91)), 3–21 (2023). https://doi.org/10.56415/csjm.v31.01 25. Perveen, N., et al.: Facial expression recognition through machine learning. Int. J. Sci. Technol. Res. 5(03) (2016) 26. Rößler, J., Sun, J., Gloor, P.: Reducing videoconferencing fatigue through facial emotion recognition. Future Internet 13(5), 126 (2021) 27. Salido Ortega, M.G., Rodríguez, L.F., Gutierrez-Garcia, J.O.: Towards emotion recognition from contextual information using machine learning. J. Ambient Intell. Hum. Comput. 11(8), 3187–3207 (2020) 28. Singh, A., et al.: Age, gender prediction and emotion recognition using convolutional neural network. Available at SSRN 3833759 (2021)
25 A Survey: Classifying and Predicting Features Based on Facial Analysis
307
29. Tsai, H.-H., Chang, Y.-C.: Facial expression recognition using a combination of multiple facial features and support vector machine. Soft Comput. 22(13), 4389–4405 (2018) 30. Wang, S., Tao, D., Yang, J.: Relative attribute SVM+ learning for age estimation. IEEE Trans. Cybern. 46(3), 827–839 (2015) 31. Wang, F., et al.: Facial expression recognition from image based on hybrid features understanding. J. Vis. Commun. Image Representation 59, 84–88 (2019) 32. Wanga, G., Davies, S.R.: Deep machine learning for age and gender prediction. ICTACT J. Soft Comput. (2019) 33. Yuan, Z.: Face detection and recognition based on visual attention mechanism guidance model in unrestricted posture. Sci. Programm. 2020 (2020) 34. Zaghbani, S., Noureddine B., Bouhlel, M.S.: Age estimation using deep learning. Comput. Electr. Eng. 68, 337–347 (2018) 35. http://whdeng.cn/RAF/model1.html 36. https://areeweb.polito.it/ricerca/cgvg/siblingsDB.html 37. https://bml.ym.edu.tw/tfeid/ 38. https://computervisiononline.com/dataset/1105138612 39. https://cvhci.anthropomatik.kit.edu/433_451.php 40. https://data.fei.org/Default.aspx 41. https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/ 42. https://mmifacedb.eu/ 43. https://paperswithcode.com/dataset/adience 44. https://paperswithcode.com/dataset/fg-net 45. https://paperswithcode.com/dataset/morph 46. https://rafd.socsci.ru.nl/RaFD2/RaFD?p=main 47. https://researchdata.edu.au/static-facial-expressions-wild-sfew/2729 48. https://uncw.edu/oic/tech/morph_academic.html 49. https://vis-www.cs.umass.edu/lfw/ 50. https://www.dartmouth.edu/oir/data-reporting/cds/index.html 51. https://www.jeffcohn.net/Resources/ 52. https://www.kaggle.com/datasets/msambare/fer2013 53. https://zenodo.org/record/3451524#.Y_xDD3ZBzIU
Chapter 26
Satellite Ortho Image Mosaic Process Quality Verification Jonnadula Narasimharao, P. Priyanka Chowdary, Avala Raji Reddy, G. Swathi, B. P. Deepak Kumar, and Sree Saranya Batchu
Abstract In remote sensing applications, image fusion plays an indispensable role. The images that are obtained from the satellite sensors are two types of images, they are panchromatic images and multispectral images. The major agenda of this paper is to verify the quality of an orthoimage that is obtained from the satellite which involves the process of mosaicing in it. It has been found from a survey that there 27 measures that are specially designed for assessing the quality of a fused image. This paper presents the index value for 8 measures from among 27 measures of quality assessment metrics. The index value from a fused image expounds the quality of a fused image by utilizing a thorough mathematical analysis. The work presented here attempts to develop a system with satellite orthoimage mosaic process quality verification developed using the python. Bhuvan Content Geoportal updates frequently scalable images on to the Bhuvan portal. This request of orthoimage generation, image fusion and mosaic process for the large number of images and the quality verification of this process is manual process. This project is attempted to deliver the automatic quality verification for the large dataset using Fusion Quality Index and Blur Detection Algorithm. A comparative experiment is also done on the some fused images with different image format and the satellite raster data format, i.e., TIFF images. Further the most representative measure from the 8 measures is elite.
J. Narasimharao (B) Department of Computer Science and Engineering, CMR Technical Campus, Hyderabad, India e-mail: [email protected] P. Priyanka Chowdary Department of Computer Science and Engineering, BITS, Pilani, Hyderabad, India A. R. Reddy Department of Mechanical Engineering, CMR Technical Campus, Hyderabad, Telangana, India G. Swathi · B. P. Deepak Kumar Department of CSE, CMR Technical Campus, Hyderabad, India S. S. Batchu Department of CSE (AI&ML), CMR Engineering College, Hyderabad, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_26
309
310
J. Narasimharao et al.
26.1 Introduction Image fusion is a course of action in which the 2 or more than that are combined to create a new form of an image. Picture fusion in remote sensing is the combination of a pan (panchromatic) picture and a multispectral image, in which spatial data such as edges and textures, as well as spectral information such as colors, are used in order to provide a multispectral image with high resolution. The image fusion techniques are used for many different applications such as remote sensing and medical imaging. The fusion of 2 different images reduces the uncertainty connected to the single image. The output of the different fused images is used efficiently in different security applications. All the images that are taken from the satellite are not so perfect. There may be some difference in the quality of the original image and the image that has been picturized. The approaches are used to evaluate the quality of photos acquired from satellites. The assessment is done with different measures. The images which are taken as input images are fused images, multispectral images [1] along with the panchromatic images. All images have three different bands RGB, i.e., Red, Green, and Blue.
26.2 Fusion Quality Index 26.2.1 Average Gradient (AG) The average gradient [2] is used to represent the image’s clarity. It’s used to assess a fused image’s spatial resolution. Image gradient is the directional derivative in intensity or color of an image. Average gradient is given by Eq. (26.1), N −1 M−1 ∑∑ 1 X AG = (M − 1)X (N − 1) i=1 j=1
(()
∂ f (x, y) ∂x
(2
) +
∂ f (x, y) ∂y
(2 ) )1/2 /2 (26.1)
where (
)
(
)
∂ f (x,y) ∂x ∂ f (x,y) ∂y
= slope or gradient of fused image in x direction
= slope or gradient of fused image in y direction M, N = rows and columns of image pixels respectively.
26 Satellite Ortho Image Mosaic Process Quality Verification
311
26.2.2 Correlation Coefficient (CC) The degree of correlation is given in between the fused and referred pictures are determined using the CC [3]. Correlation coefficient is a statistical measure which is as a comparison between two images of same object. The ideal value for correlation coefficient is 1. The correlation coefficient is given by Eq. (26.2), ∑M ∑N (
) f (x, y) − μ f (r (x, y) − μr ) CC = /∑ ∑ ( )2 M N 2 i=1 j=1 f (x, y) − μ f (r (x, y) − μr ) i=1
j=1
(26.2)
where f (x, y) = values of pixel in fused image r(x, y) = values of pixel in reference image μ f = Mean value of fused image μr = Mean value of reference image.
26.2.3 Entropy (E) The richness of information in a fused picture is measured using entropy [4]. The larger the entropy of the fused image, the additional information it can retain and the higher the fusion quality. The entropy of an image in general is obtained by using the histogram. The histogram which is represented in the graph format by exhibiting the number of pixels in an image at a different value of intensity. Entropy is given by Eq. (26.3), Entropy = −
L ∑
h(i ) log2 h(i )
(26.3)
i=0
whereh(i) = probability between two adjacent pixels equal to i.
26.2.4 Relative Average Spectral Error (RASE) The performance of the average values of pictures which are tested spectral bands is calculated using relative average spectral error [5]. It’s an error index that gives you a general idea of how well your fused and referenced photos are. Relative average spectral error is expressed in percentage. Let us also discuss about RMSE. It is used for measuring standard error of fused images or it gives an idea of amount of distortion induced by each method. It is given by Eq. (26.4),
312
J. Narasimharao et al.
[ | M ∑ N | 1 ∑ RMSE = | (Ir(i, j ) − If(i, j ))2 M N i=1 j=1
(26.4)
where Ir(i, j) = pixel values for referenced image If(i, j) = pixel values for fused image Root average spectral error is given by Eq. (26.5), [ | K 1 ∑ 1| RASE = | RMSE2 (Bi ) μ K i=1
(26.5)
where μ = mean radiance of multispectral image K = number of bands Bi = difference between multispectral and panchromatic image.
26.2.5 Spectral Angle Mapper (SAM) Spectral angle Mapper [6] can be used for measuring the global spectral distortion. Distortion is defined as the divergence from a rectilinear projection. A rectilinear projection, in changing the spatial relationship between parts of image is done. The Spectral angle Mapper is given by Eq. (26.6), ⎛
⎞ t r i i ⎠ a = cos−1 ⎝ /∑ i=1/∑ ab 2 ab 2 t r i=1 i i=1 i ∑ab
(26.6)
where nb = number of bands in image t = pixel spectrum r = reference spectrum a = spectral angle.
26.2.6 Standard Deviation (SD) The standard deviation [1] is used to represent visual contrast. A high contrast image’s standard deviation will have a significant variance, whereas a low contrast image’s standard deviation will have a small variance. At the pixel level, it represents how near the fused image is to the real multispectral image. Then the desired standard deviation value is zero. The standard deviation is given by Eq. (26.7),
26 Satellite Ortho Image Mosaic Process Quality Verification
[ | | SD = |
313
M N 1 ∑∑ ( f (x, y) − μ)2 M x N x=1 y=1
(26.7)
where f (x, y) = value of pixel in fused image μ = mean value of fused image M, N = rows and columns of image pixels.
26.2.7 Signal-to-Noise Ratio (SNR) The information-to-noise ratio of fused and referenced photos is calculated using the signal-to-noise ratio [7]. In remote sensing, it is also defined as the ratio between the variance of an image and variance of noise is given in Eq. (26.8). RASE [5] is used to calculate the average performance of photographs in the tested spectral bands. ( SNR(d b) = 10 log10 ∑
∑ x,y x,y (r (x,
f (x, y)2
)
y) − f (x, y))2
(26.8)
where f (x, y) = pixel value in fused image r (x, y) = pixel value of referenced image.
26.2.8 Universal Image Quality Index (UIQI) The linear correlation, mean brightness, and contrast of the fused and referenced pictures are measured using the Universal Image Quality Index [8]. The mean brightness and contrast present in between the fused pictures and reference pictures are specified as linear correlation. Mean luminance is defined as a photometrical measure of the lambent intensity per a unit area of light moving in a given direction. The Universal Image Quality Index is given by Eq. (26.9), Q=
σx y 2σx σ y 2x y . 2 . 2 2 σx σ y x + y σx + σ y2
σ
(26.9)
where σxxσyy defines the degree of correlation between x and y with dynamic range between [− 1, 1] 2x y measures how close the luminance between x and y with [0, 1] x 2 +y 2 2σx σ y σx2 +σ y2
measures how similar the constraints of the image x and y.
314
J. Narasimharao et al.
26.3 Environment Setup The environment setup that is used for evaluating the index value of the above mentioned eight measures is Python. Along with python we also use the Python Imaging Library, Numpy, Geo Spatial Data Abstraction library, Matplot library. The libraries that are used for coding the eight measures to get the index value for the images.
26.4 Results The results of analyzing the quality of fused pictures using the eight metrics are summarized in Tables 26.1 and 26.2; Figs. 26.1, 26.2, 26.3, 26.4, 26.5 and 26.6. Table 26.1 A performance comparison using quality assessment metrics for portable network graphics images (PNG)
Table 26.2 A performance comparison using quality assessment metrics for tagged image file format images (TIFF)
Parameter
Reference images
Index value
AG
Fused or MS
5.315
CC
Fused, MS
0.894
E
Fused or MS
RASE
Fused, MS
44.536
SAM
Fused, MS
0.317
SD
Fused or MS
15.671
SNR
Fused, MS
54.769
UIQI
Fused, MS
0.675
Parameter
Reference images
AG
Fused or MS
8.785
CC
Fused, MS
0.987
E
Fused or MS
6.881
RASE
Fused, MS
SAM
Fused, MS
SD
Fused or MS
39.013
SNR
Fused, MS
25.497
UIQI
Fused, MS
0.778
4.453
Index value
19.058 0.037
26 Satellite Ortho Image Mosaic Process Quality Verification Fig. 26.1 FUSED.TIFF
Fig. 26.2 MS.TIFF
315
316 Fig. 26.3 PAN.TIFF
Fig. 26.4 PAN.PNG
J. Narasimharao et al.
26 Satellite Ortho Image Mosaic Process Quality Verification Fig. 26.5 MS.PNG
Fig. 26.6 FUSED.PNG
317
318
J. Narasimharao et al.
26.5 Conclusion A complete examination of eight current methods for judging the quality of fused pictures was undertaken in this work. For each category, a representative measure is recognized as follows: (1) Difference-based: Relative average Spectral Mapper, Relative Average Spectral Error. (2) Noise-based: SNR. (3) Similarity-based: UIQI, Correlation Coefficient. (4) Information- and Clarity-based: Standard Deviation, Average Gradient, and Entropy. Thus, the users are recommended to make use of the difference, noise, similarity, information, and clarity-based measures while calculating the difference between the panchromatic, multispectral, and fused images, respectively.
References 1. Li, S., Li, Z., Gong, J.: Multivariate statistical analysis of measures for assessing the quality of image fusion. Int. J. Image Data Fusion 1(1), 47–66 (2010) 2. Han, D.S., Choi, N.W., Cho, S.L., Yang, J.S., Kim, K.S., Yoo, W.S., Jeon, C.H.: Characterization of driving patterns and development of a driving cycle in a military area. Transp. Res. Part D Transp. Environ. 17(7), 519–524 (2012) 3. Yang, X., Li, F., Fan, W., Liu, G., Yu, Y.: Evaluating the efficiency of wind protection by windbreaks based on remote sensing and geographic information systems. Agrofor. Syst. 95, 353–365 (2021) 4. Samadzadegan, F., DadrasJavan, F.: Evaluating the sensitivity of image fusion quality metrics to image degradation in satellite imagery. J. Indian Soc. Remote Sens. 39, 431–441 (2011) 5. Ranchin, T., Wald, L.: Fusion of high spatial and spectral resolution images: the ARSIS concept and its implementation. Photogramm. Eng. Remote. Sens. 66(1), 49–61 (2000) 6. Zhang, Y.: Methods for image fusion quality assessment-a review, comparison and analysis. Int. Arch. Photogrammetry Remote Sens. Spat. Inf. Sci. 37(PART B7), 1101–1109 (2008) 7. Damera-Venkata, N., Kite, T.D., Geisler, W.S., Evans, B.L., Bovik, A.C.: Image quality assessment based on a degradation model. IEEE Trans. Image Process. 9(4), 636–650 (2000) 8. Wang, Z., Bovik, A.C.: A universal image quality index. IEEE Sig. Process. Lett. 9(3), 81–84 (2002)
Chapter 27
Early Prediction of Sepsis Utilizing Machine Learning Models J. Sasi Kiran, J. Avanija, Avala Raji Reddy, G. Naga Rama Devi, N. S. Charan, and Tabeen Fatima
Abstract Sepsis is an usually dangerous disease caused by the body’s response to an infection, which usually result in tissue damage, organ failure, or death. Inflammation spreads throughout the body as a result of the immune system’s response to the infection, which can lead to organ damage and failure. Fever, increased heart rate, low blood pressure, and confusion or disorientation are all symptoms of sepsis. Early recognition and treatment with antibiotics and supportive care can improve outcomes, but sepsis can progress rapidly, so prompt medical attention is crucial. The early detection of sepsis is critical for improving patient outcomes, as the condition can progress rapidly and become life-threatening. Traditional methods for sepsis detection can be subjective and may not always accurately identify the condition in its early stages. To address this issue, machine learning models have been proposed as a tool for early sepsis detection. In the proposed work, several machine learning J. Sasi Kiran Department of CSE, Lords Institute of Engineering and Technology (Autonomous), Hyderabad, Affiliated to Osmania University, Hyderabad, Telangana, India e-mail: [email protected] J. Avanija (B) Mohan Babu University, Tirupati, AP, India e-mail: [email protected] A. R. Reddy Department of Mechanical Engineering, CMR Technical Campus, Hyderabad, Telangana, India e-mail: [email protected] G. Naga Rama Devi Department of Computer Science and Engineering, CMR College of Engineering and Technology, Hyderabad, Telangana, India e-mail: [email protected] N. S. Charan Department of Computer Science and Engineering, Sree Vidyanikethan Engineering College, Tirupati, AP, India e-mail: [email protected] T. Fatima Department of CSE, CMR Technical Campus, Hyderabad, Telangana, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_27
319
320
J. Sasi Kiran et al.
models were applied to patient data, including vital signs, lab results, and electronic health records, to identify patterns and trends that might not be apparent to human clinicians. The performance of these models was evaluated using a dataset of real-world patient data (https://my.clevelandclinic.org/health/diseases/23255-sep tic-shock) and compared to traditional sepsis detection methods. The results showed that the machine learning models were able to accurately predict sepsis with high sensitivity and specificity, providing a promising solution for early sepsis detection in clinical settings. This work highlights the potential of machine learning models for improving sepsis detection and management and provides a basis for further research in this area.
27.1 Introduction Sepsis is a fatal illness that occurs when the body’s immune response to an infection causes tissue and organ damage. If not detected and treated promptly, it can result in multiple organ failures and death. Sepsis is caused by the release of chemicals into the circulation to combat an infection, which can cause widespread inflammation. This can reduce blood supply to essential organs including the heart, brain, and kidneys, causing organ damage and malfunction. Because the illness can worsen quickly and become life-threatening if not treated promptly, early detection and treatment of sepsis are critical for improving patient outcomes. There is no one test that can be used to diagnose sepsis. Instead, doctors establish a diagnosis based on a battery of tests and a comprehensive examination of the patient’s medical history and symptoms. The following tests may be used to detect sepsis include: • Blood tests: These can be used to identify the presence of an infection and to check the levels of certain substances in the blood that may indicate sepsis. • Imaging tests: These may include X-rays, CT scans, or ultrasound to help doctors identify the source of the infection. • Culture tests: These may include tests of urine, blood, or other body fluids to identify the specific type of bacteria or other organisms causing the infection. Severe sepsis, sepsis, and septic shock are the three phases of sepsis. Sepsis: This is a life-threatening condition. The immune system overreacts to an infection when this occurs. Severe sepsis: Organs start to fail as a result of severe sepsis. Low blood pressure, which is a result of inflammation throughout your body, is often the reason of this. Sepsis’ last stage, septic shock, is characterized by extremely low blood pressure despite receiving a lot of IV (intravenous) fluids (Fig. 27.1). Any infection can cause sepsis, which can progress to septic shock if left untreated. Not every infection will result in sepsis or septic shock. However, if an infection creates enough inflammation, it can lead to sepsis. Most infections are caused by bacteria, although viruses and fungi can also cause infections and sepsis. Infections
27 Early Prediction of Sepsis Utilizing Machine Learning Models
321
Fig. 27.1 Sepsis infection [1]
can originate everywhere, but they most usually develop in your lungs, bladder, or stomach.
27.2 Background Work Traditional methods for detecting the condition of sepsis can be subjective and may not always accurately identify the condition in its early stages. Several studies have investigated the use of machine learning models like ANN [2] for sepsis detection. To compare the performance of different machine learning models for sepsis detection, several studies have conducted benchmarking studies that evaluate the accuracy and performance of multiple algorithms on the same dataset. These studies have found that some machine learning models, such as random forests and ANNs [2], perform better than others, such as decision trees and SVMs, in terms of accuracy and speed. Despite these advances, there is still much work to be done to optimize the performance of machine learning models [3–5] for sepsis detection. This includes developing more robust models that can handle missing or imbalanced data, as well as improving the interpretability of these models to facilitate their adoption in clinical settings [6–10]. To improve the accuracy of prediction, deep learning model [11–14] had been used combining various models such as CNN and long short-term networks.
322
J. Sasi Kiran et al.
The study’s goal was to identify potential subphenotypes and compare clinical outcomes in a large sepsis cohort. The K-means clustering analysis was used to quickly identify sepsis subphenotypes with varying clinical outcomes.
27.3 Methodology Our proposed method starts with analyzing the data, the raw data is acquired from the Physionet challenge 2019 [4], and it is a time series of patient health data. Each attribute contains current patient info. Each patient’s duration may differ. To further comprehend the dataset, a data analysis is performed to determine the overall number of patients. There were 20,000 distinct patients discovered during the examination. An heatmap is drawn to further analyze the connection of attributes. It was discovered that there were too many attributes with a large percentage of null values and that there was a high correlation with inconsistency between features. The proposed methodology (Fig. 27.2) for sepsis detection using multiple machine learning models is a systematic approach to diagnose sepsis using data-driven techniques. The methodology consists of the following steps: Data Collection and Preprocessing: The first step involves collecting and gathering relevant patient data, such as demographic information, vital signs, laboratory results, and previous medical history. Since there are a lot of missing values in the dataset, therefore imputation was done to fill the missing values. While imputing, it is important to note that imputation should be done on per patient basis, otherwise the data from one patient will leak into the data of the other patient. Also another point that should be taken into consideration is that mean, median, mode cannot directly be used to impute as it will result in uneven distribution of the parameters with respect to time. Feature Selection: The next step is to select the relevant features that are likely to have a significant impact on sepsis diagnosis. Feature selection techniques such as correlation analysis, mutual information, or chi-squared test are used to determine the most important features. Splitting the Data: The preprocessed data is then divided into training and testing datasets. The training dataset is used to build the machine learning models, while the testing dataset is used to evaluate the models’ performance. Model Training: Multiple machine learning models, including random forest, KNN, and logistic regression, are trained using the training dataset. The hyperparameters for each model are selected using techniques such as grid search or cross-validation to obtain the best performance. Model Evaluation: Evaluation of the model is performed using measures like accuracy, precision, recall, and F1-score. The best performing model is selected by comparing the performance of each model.
27 Early Prediction of Sepsis Utilizing Machine Learning Models
323
Fig. 27.2 Machine learning model-based architecture for sepsis detection
The dataset used for training is from Kaggle. 80% of dataset if used for training the model and the remaining is used for testing and evaluating the model. The accuracy of prediction is better even if the infection is minimal.
27.4 Results and Discussions Several models were developed to predict the development of sepsis six hours in advance. Several attributes such as ‘TroponinI’, ‘Bilirubin_direct’, ‘AST’, ‘Bilirubin_total’, ‘Lactate’, ‘SaO2 ’, ‘FiO2 ’, ‘Unit’, ‘Patient_ID’ were deleted after extensive study, and the remainder was imputed. Since there are a lot of missing values in the dataset, therefore imputation was done to fill the missing values. While imputing, it is important to note that imputation should be done on per patient basis, otherwise the data from one patient will leak into the data of the other patient. Also, another point that should be taken into consideration is that mean, median, mode cannot directly be used to impute as it will result in uneven distribution of the
324
J. Sasi Kiran et al.
parameters with respect to time. Generally models tend to give a better result for a normal distribution, the remaining characteristics were then transformed using a Gaussian function, and we explored different techniques to plot histograms and QQ plots of all the features and then we applied different transformations on it to see which were giving good results. The ones giving the best results were then adopted in the dataframe and normalized to prevent a single feature from dominating the result. After all of this investigation and feature development, several classifier models were run such as random forest, XGBoost, logistic regression, KNN classifier, and Naïve Bayes classifier in which random forest classifier gave the best result (Fig. 27.3).
Fig. 27.3 Dataset description
As can be seen in Fig. 27.4 representing the correlation heat map, almost all the features do not have high correlation. The evaluation metrics with values for random forest algorithm is specified in Table 27.1. The accuracy of the model is around 95.5%.
27 Early Prediction of Sepsis Utilizing Machine Learning Models
325
Fig. 27.4 Correlation matrix to check if there is high correlation between the remaining features
Table 27.1 Evaluation metrics
Accuracy
0.9558390578999019
Precision
0.9178960872354073
Recall
0.9505147791431419
F1 score
0.9339207048458149
AUC-ROC
0.9544781687923501
Mean absolute error
0.04416094210009813
Root mean squared error
0.21014505014417573
27.5 Conclusion The suggested methodology for detecting sepsis using various machine learning models is an ensemble approach to diagnosing sepsis using data-driven methodologies. Data collection and preprocessing, feature selection, partitioning the data, model training, model assessment, model deployment are all significant aspects in
326
J. Sasi Kiran et al.
the technique. Using this technology, healthcare practitioners may rapidly and reliably identify sepsis and take necessary actions to prevent it from progressing and having a negative impact on patients. Multiple machine learning models, such as random forest, KNN, and logistic regression, are used to give a robust and accurate solution for sepsis diagnosis, and feature selection approaches guarantee that the most significant features are taken into account. The ensemble technique provides a viable option for improving sepsis identification and treatment, with the potential to enhance patient outcomes and lessen the healthcare system’s burden of sepsis. During validation, the suggested method’s average accuracy is about 95.5%. Future study will consider using several alternative hybrid deep learning models to do liver segmentation while improving prediction accuracy.
References 1. https://my.clevelandclinic.org/health/diseases/23255-septic-shock 2. Nakhashi, M., Toffy, A., Achuth, P.V., Palanichamy, L., Vikas, C.M.: Early prediction of sepsis: using state-of-the-art machine learning techniques on vital sign inputs. In: Proceedings of IEEE Computer Society, p. 1 (2019) 3. Li, X., Ng, G.A., Schlindwein, F.S.: Convolutional and recurrent neural networks for early detection of sepsis using hourly physiological data from patients in intensive care unit. In: Proceedings of Computing in Cardiology Conference (CinC), pp. 1–4 (2019) 4. https://physionet.org/content/challenge-2019/1.0.0/ 5. Lauritsen, S.M., Kalør, M.E., Kongsgaard, E.L., Lauritsen, K.M., Jørgensen, M.J., Lange, J. and Thiesson, B.: Early detection of sepsis utilizing deep learning on electronic health record event sequences. Artif. Intell. Med. 104, Art. no. 101820 (2020) 6. Apalak, M., Kiasaleh, K.: Improving sepsis prediction performance using conditional recurrent adversarial networks. IEEE Access 10, 134466–134476 (2022). https://doi.org/10.1109/ACC ESS.2022.3230324 7. Early prediction of sepsis based on machine learning algorithm. PubMed, (2021). https://doi. org/10.1155/2021/6522633 8. Kijpaisalratana, N., Sanglertsinlapachai, D., Techaratsami, S., Musikatavorn, K., Saoraya, J.: Machine learning algorithms for early sepsis detection in the emergency department: a retrospective study. Int. J. Med. Inf. 160, Art. no. 104689 (2022) 9. Lyu, R.: Improving treatment decisions for sepsis patients by reinforcement learning, M.S. thesis, Univ. Pittsburgh, Pittsburgh, PA, USA (2020) 10. Reyna, M.A., Josef, C., Seyedi, S., Jeter, R., Shashikumar, S.P., Westover, M.B., Sharma, A., Nemati, S., Clifford, G.D.: Early prediction of sepsis from clinical data: the PhysioNet/ computing in cardiology challenge 2019 (2019). (Online). Available: https://physionet.org/con tent/challenge-2019/1.0.0/#challenge-data 11. Bedoya, A.D., Futoma, J., Clement, M.E., Corey, K., Brajer, N., Lin, A., Simons, M.G., Gao, M., Nichols, M., Balu, S., Heller, K., O’Brien, C.: Machine learning for early detection of sepsis: an internal and temporal validation study. JAMIA Open 3(2), 252–260 (2020) 12. Zhang, D., Yin, C., Hunold, K.M., Jiang, X., Caterino, J.M., Zhang, P.: An interpretable deeplearning model for early prediction of sepsis in the emergency department. Patterns 2(2), Art. no. 100196 (2021) 13. He, Z., Du, L., Zhang, P., Zhao, R., Chen, X., Fang, Z.: Early sepsis prediction using ensemble learning with deep features and artificial features extracted from clinical electronic health records. Crit. Care Med. 48(12), E1337–E1342 (2020)
27 Early Prediction of Sepsis Utilizing Machine Learning Models
327
14. Camacho-Cogollo, J.E., et al.: Machine learning models for early prediction of sepsis on large healthcare datasets. MDPI, 7 May 2022, www.mdpi.com/2079-9292/11/9/1507
Chapter 28
Detection of Malicious URLs Using Gradient Boosting Classifier Saba Sultana, K. Reddy Madhavi, G. Lavanya, J. Swarna Latha, Sandhyarani, and Balijapalli Prathyusha
Abstract The Internet has integrated seamlessly into our daily lives. The use of Internet depends on individual requirements and goals. It has also provided opportunities to perform activities which are malicious such as spamming, spyware, phishing, etc. Phishers try to deceive the users by social engineering or creating mockup or compromised websites which are sent the users mail or through SMS. When these URLs are clicked, the information such as account ID, username, password, and many other details are stolen from individuals and organizations which can lead to reputation damage, infrastructure damage, etc. Although many methods have been proposed to detect the websites which are malicious, the cyberattackers have evolved their methods to pretend to be trustworthy. In order to detect these malicious attacks, the concept of machine learning is used. This paper proposes that the gradient boosting classifier, and XGBoost classifier outperformed than other machine learning models. The gradient boost classifier machine learning model is stored and used for detection of URLs (generous or malignant) and resulted an accuracy of 97.4%.
28.1 Introduction With the advancement of Internet, nowadays, a growing number of activities, such as e-commerce, business, social networking, and banking, are carried out online. This leads to raise of many cyberattacks. Therefore, protecting the Internet is crucial. The S. Sultana · G. Lavanya · J. Swarna Latha Department of CSE, CMR Technical Campus, Hyderabad, Telangana, India K. Reddy Madhavi (B) School of Computing, Mohan Babu University (Erstwhile Sree Vidyanikethan Engineering College), Tirupati, AP, India e-mail: [email protected] Sandhyarani Department of CSE (Data Science), CMR Technical Campus, Hyderabad, Telangana, India B. Prathyusha Department of CSE, Sree Vidyanikethan Engineering College, Tirupati, AP, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_28
329
330
S. Sultana et al.
system can be hacked or sensitive data can be accessed when users are lured into clicking through to malign Uniform Resource Locators (URLs). Consequently, it is becoming much important to detect the fraudulent URLs in early stage. The channel over which clients and servers communicate is protected by protocols and regulations, but cyberattackers with awful intent can still exploit it. Malign URLs are used to pull out sensitive information and fooling customers and organizations, resulting in annual losses of billions of dollars. Blacklisting services have been developed by the online security community to assist in identifying the threat posed by rogue websites. A database called the blacklist holds a list of all URLs that have already been identified as dangerous. Blacklisting of URLs has occasionally been shown to be successful. The attacker can still take advantage of these by quickly altering one or more URL string components to mislead the system. Many harmful websites are invariably not banned because they are either too new, never or incorrectly assessed, or both. In this research, we provide a comparative performance analysis of nine different machine learning models—they are “gradient boosting classifier”, “XGBoost Classifier”, “multi-layer perceptron”, “random forest”, “support vector machine”, “decision tree”, “K-nearest neighbor”, “logistic regression”, and “Naïve Bayes classifier”. The best model among them is gradient boosting classifier which gave that the accuracy of 97% is used for the detection of URL’s into generous or malignant. The features are extracted from URLs can support classification of the URL as dangerous or benign by ML algorithms in a faster way. To decrease the number of dangerous attachments, it is suggested that we investigate the suitability and appropriateness of various machine learning models for automatically detecting URLs as malicious or benign.
28.2 Related Work In [1], they surveyed a list of machine learning (ML) and deep learning (DL) methods to identify websites which are bad using phishing dataset containing 67,000 + of URLs. They used a number of feature selection procedures, including chi square, correlation analysis, and ANOVA. Results showed that “Naive Bayes” of accuracy 96% was the good model for identifying dangerous URLs. Authors compared [2], the efficiency of deep learning framework models like “Fast.ai” and “Keras TensorFlow” in comparison with other machine learning algorithms like “random forest”, “CART”, and “kNN”. In this [3], the analysis and detection of dangerous URLs is proposed using a parallel neural joint model. Character embedding vectors and lexical embedding vectors are created from these retrieved features. To get over the mentioned drawbacks in [4], a innovative heuristic method using “TWSVM” to identify phishing websites. In experiments [5], the hand-crafted feature technique was outperformed by deep learning mechanisms. The deep learning methods are capable of learning long range dependencies and hierarchical feature representation in arbitrary length sequences.
28 Detection of Malicious URLs Using Gradient Boosting Classifier
331
To address the limitations of feature engineering, they proposed [6] a rapid detection method which uses deep learning for multidimensional feature phishing identification. The approach used in [6] can speed up the process of setting a threshold. This study [7] is intended to create a novel hybrid rule-based solution that uses six different algorithm models to effectively identify and manage the phishing problems. The highest accuracy level was obtained, 96.4% for “CNN” model and 93% for “MLP” model. A Data Mining Method called “Classification Based on Association (CBA)” is investigated [8] to identify malicious URLs utilizing aspects of the URL and webpage content. According to the experimental findings [8], CBA performs similarly to standard classification algorithms, achieving 95.8% of accuracy and having low positive and negative rates. They conducted [9] experiments with a fivefold cross-validation approach on the “WEBSPAM-UK2007” and “UK2011” dataset. They designed [9] three sets of tests that were conducted to assess the effectiveness of the suggested method. In comparison with previous techniques, the suggested method with unique feature sets outperforms them with an accuracy and recall of 0.929 and 0.930, respectively. In this [10], the Extreme Learning Machine (ELM) was suggested by the authors as a machine learning-based way to identify malware domain names. Modern Neural Networks like ELM give high accuracy and quick learning rates. They applied [10] ELM to categorize the names of the domain based on characteristics taken from several sources. Their experiment [10] demonstrates the excellent detection rate and accuracy of the suggested detection approach. (of > 95%). They introduced [11] and evaluated a binary classification technique intended to identify fraudulent URLs based on details about the URL string syntax and its domain attributes. The main aim of their approach [11] lies in the use of an algorithm based on spherical separations as opposed to SVM-type techniques, which use hyperplanes as separation surfaces in the sample space. Nowadays, signature-based Intrusion Detection Systems (IDS) are the most common. They are ineffective against unidentified attacks though. In order to combat this, utilizing machine learning detection algorithms is one approach. In this paper [12], by using ML approaches that patternize the URLs, they provide a novel lexical approach to classify URLs.
28.3 Proposed Methodology In this paper, the comparative performance analysis of different machine learning models is evaluated. The “gradient boosting classifier” performed well. The gradient bosting classifier is represented in Fig. 28.1, and it involves the iterative training of decision trees, the use of boosting to combine the decision trees. The boosting process is done in a stepwise manner where each subsequent model is trained to minimize the errors made by the previous model. The architecture of gradient boost also involves the use of gradient descent to adjust the weights of the trees and the use of regularization to prevent overfitting.
332
S. Sultana et al.
Fig. 28.1 Architecture of gradient boosting classifier
28.3.1 Gradient Boosting Classifier Working of an algorithm is represented in Fig. 28.2. It has remarkable forecasting accuracy and it deals with missing data, imputation is not necessary. Input: The training set {{xi, yi}}n from i equal 1 to n, a distinct loss function L(y, F(x)), The number of iterations M. Algorithm: 1. Initialize gradient boosting model with a fixed value F0 (x) = arg min γ
n ∑
L(yi , γ ).
i=1
2. m value starting from 1 to M a. Calculate the pseudo-residuals rim
] [ ∂ L(yi , F(xi )) =− for i = 1, . . . , n. ∂ F(xi ) F(x)=Fm−1 (x)
b. Train the weak learner closed using the training set by fitting it to the pseudoresiduals under scaling hm (x). c. By resolving the following one-dimensional optimization issue, determine multiplier γ m : γm = arg min γ
d. Revise the model:
n ∑ i=1
L(yi , Fm−1 (xi ) + γ h m (xi ))
28 Detection of Malicious URLs Using Gradient Boosting Classifier
333
Fig. 28.2 Working of gradient boosting algorithm
Fm (x) = Fm−1 (x) + γm h m (x) 3. Output is: F M (x).
28.4 Experimentation and Results We have resources for experimentation, the “phishing” dataset [13, 14] which is downloaded from Kaggle. The dataset consists a list of more than 11,000 + websites URLs. Each sample has 30 URL specifications and a class label identifying it as a malign website or not (1 or − 1). The overview of the dataset is represented in Table 28.1. The exploratory data analysis (EDA) is performed on the dataset. The phishing data is visualized using pie chart in Fig. 28.3: Our dataset falls under the category of a regression. Regression supervised machine learning models were taken into account during training the dataset. The
334
S. Sultana et al.
Table 28.1 Overview of the dataset; it has11054 samples with 32 features Index Using Long Symbol Redirecting Prefix Sub HTTPS Stats – Class Ip URL sum domains Report 0
0
1
1
1
−1
0
1
1
– −1
1
1
0
1
1
−1
−1
−1
−1
– −1
2
1
0
1
1
−1
−1
−1
1
– −1
3
1
0
1
1
−1
1
1
1
– 1
4
−1
0
1
−1
−1
1
1
−1
– 1
Fig. 28.3 Pie chart for visualizing the dataset
learning rate of K-nearest neighbor is viewed in Fig. 28.4; the performance of decision tree classifier is represented in Fig. 28.5; the random forest gives a good training and testing accuracy as shown in Fig. 28.6; the gradient boosting classifier gives the highest training and testing accuracy; it is represented in Fig. 28.7. Fig. 28.4 Performance of K-nearest neighbor
28 Detection of Malicious URLs Using Gradient Boosting Classifier Fig. 28.5 Performance of decision tree classifier
Fig. 28.6 Performance of random forest
Fig. 28.7 Performance of gradient boosting classifier
335
336
S. Sultana et al.
Table 28.2 shows the comparison of ML models, the best model is the “gradient boost classifier” with the accuracy, f1-score, recall, precision of 0.974, 0.977, 0.994, and 0.986, respectively. The graphical representation of performances of these models with their accuracy, precision, f1-score and recall is represented in Fig. 28.8. Though, gradient boosting classifier has highest learning rates and showed good accuracy when compared to other ML models which are included in this paper, it also • can handle different types of data such as categorical, numerical, and binary data • iteratively improves the model’s performance by focusing on the most difficultto-predict data points Table 28.2 Comparison of models ML model
Accuracy
f 1 score
Recall
Precision
0
Logistic regression
0.934
0.941
0.943
0.927
1
K-nearest neighbors
0.956
0.961
0.991
0.989
2
Support vector machine
0.964
0.968
0.980
0.965
3
Naive Bayes classifier
0.605
0.454
0.292
0.997
4
Decision tree
0.961
0.965
0.991
0.993
5
Random forest
0.967
0.970
0.992
0.991
6
Gradient boosting classifier
0.974
0.977
0.994
0.986
7
Cat boost classifier
0.972
0.975
0.994
0.989
8
XGBoost classifier
0.969
0.973
0.993
0.984
9
Multi-layer perceptron
0.971
0.974
0.992
0.985
Fig. 28.8 Graphical representation of ML models
28 Detection of Malicious URLs Using Gradient Boosting Classifier
337
Fig. 28.9 Feature importance in the model
• can capture nonlinear relationships between variables, making it suitable for complex problems. By identifying the most important features in the dataset, we can reduce the dimensionality of the data and improve the model’s performance by reducing overfitting. The various feature importances are pointed out in Fig. 28.9. Storing best model: With a phishing dataset, we have investigated multiple machine learning models and conducted exploratory data analysis. We learned a lot about the elements that affect models’ ability to determine if a URL is secure or not, as well as how to finetune the model and how these factors impact model performance. Certain dataset attributes, such as “HTTTPS”, “AnchorURL,” and “WebsiteTraffic,” are more significant when determining if a URL is a phishing URL or not. Progressive boosting classifier decreases the likelihood of malicious content by correctly classifying URLs for up to 97.4% of the respective classes. So, the gradient boosting classifier model which has best performance compared to other ML models shown in Table 2 is finally selected and stored for detection of URLs into malign or benign.
28.5 Conclusions The machine learning approaches are the most effective and productive for detecting malicious URLs when compared to blacklisting techniques. A machine learning approach with lexical features is used in this project to detect malicious shortened URLs. Malicious websites can be detected using this proposed method simply by
338
S. Sultana et al.
using the URL string. The chosen algorithm, gradient boosting, has a diverse set of features that will assist the model in achieving a higher accuracy rate that is 97.4%. One can conduct additional research by taking into account the site’s popularity and host based features. This model can be investigated further to categorize malicious URLs as phishing, spamming, or malware. It is suggested that the best machine learning algorithms can be combined to achieve higher accuracy with less training, testing, and prediction time. To collect live data, a better approach should be considered, which keeps the model up to date by training with recent malicious URLs data that may use unfamiliar techniques. The machine learning model should be able to store all of the URLs encountered by the browser on the server and update the model in real time to provide us with exact results. Updating the trained model may complicate the entire process, but we can reduce the complexity by performing weekly or monthly updates.
References 1. Aljabri, M., Alhaidari, F., Mohammad, R.M.A., Mirza, S., Alhamed, D.H., Altamimi, H.S., Chrouf, S.M.B.: An assessment of lexical, network, and content-based features for detecting malicious URLs using machine learning and deep learning models. Comput. Intell. Neurosci. 2022, 1–14 (2022). https://doi.org/10.1155/2022/3241216 2. Johnson, C., Khadka, B., Basnet, R.B., Doleck, T.: Towards detecting and classifying malicious URLS using deep learning. J. Wirel. Mob. Netw. Ubiquitous Comput. Dependable Appl. 11(4), 31–48 (2020). https://doi.org/10.22667/JOWUA.2020.12.31.031 3. Yuan, J., Chen, G., Tian, S., Pei, X.: Malicious URL detection based on a parallel neural joint model. IEEE Access 9, 9464–9472 (2021). https://doi.org/10.1109/ACCESS.2021.3049625 4. Rao, R.S., Pais, A.R., Anand, P.: A heuristic technique to detect phishing websites using TWSVM classifier’. Neural Comput. Appl. 33(11), 5733–5752 (2021). https://doi.org/10.1007/ s00521-020-05354-z 5. Vinayakumar, R., Soman, K.P., Poornachandran, P.: Evaluating deep learning approaches to characterize and classify malicious URL’s. J. Intell. Fuzzy Syst. 34(3), 1333–1343 (2018). https://doi.org/10.3233/JIFS-169429 6. Yang, P., Zhao, G., Zeng, P.: Phishing website detection based on multidimensional features driven by deep learning. IEEE Access 7, 15196–15209 (2019). https://doi.org/10.1109/ACC ESS.2019.2892066 7. Mourtaji, Y., Bouhorma, M., Alghazzawi, D., Aldabbagh, G., Alghamdi, A.: Hybrid rule-based solution for phishing URL detection using convolutional neural network. Wirel. Commun. Mob. Comput. 2021, 1–24 (2021). https://doi.org/10.1155/2021/8241104 8. Kumi, S., Lim, C., Lee, S.-G.: Malicious URL detection based on associative classification. Entropy 23(2), 182 (2021). https://doi.org/10.3390/e23020182 9. Liu, J., Su, Y., Lv, S., Huang, C.: Detecting web spam based on novel features from web page source code. Secur. Commun. Netw. 2020, 1–14 (2020). https://doi.org/10.1155/2020/6662166 10. Shi, Y., Chen, G., Li, J.: Malicious domain name detection based on extreme machine learning. Neural. Process. Lett. 48(3), 1347–1357 (2018). https://doi.org/10.1007/s11063-017-9666-7 11. Astorino, A., Chiarello, A., Gaudioso, M., Piccolo, A.: Malicious URL detection via spherical classification. Neural Comput. Appl. 28(S1), 699–705 (2017). https://doi.org/10.1007/s00521016-2374-9 12. Hai, Q., Hwang, S.: Detection of malicious URLs based on word vector representation and ngram. J. Intell. Fuzzy Syst. 35(6), 5889–5900 (2018). https://doi.org/10.3233/JIFS-169831
28 Detection of Malicious URLs Using Gradient Boosting Classifier
339
13. Eswar Chand: Phishing Website Dataset. Kaggle (2019). (Online). Available: https://www.kag gle.com/eswarchandt/phishing-website-detector 14. GitHub—Machine learning approach to detect malicious. Accessed: https://github.com/Vaibha vBichave/Phishing-URL-Detection
Chapter 29
License Plate Recognition Using Neural Networks D. Satti Babu, T. V. Prasad, B. Revanth Sai, G. D. N. S. Sudheshna, N. Venkata Kishore, and P. Chandra Vamsi
Abstract This guide uses an automatic number plate recognition method. We test the program and save the number plate text and image in a directory after studying real traffic recordings. The ANPR method is effective in several circumstances, according to the results. Performance evaluation takes robustness, complexity, utility, cost, and deployment ease into account. The ANPR can scan vehicle number plates without requiring human intervention. The process of detecting and recognizing these systems is difficult due to the presence of both artificial and natural noise as well as various lighting and climatic factors. Hence, the proposed methodology uses image pre-processing techniques initially and then perform recognition process, this approach increased the accuracy gradually. While validating the results, If one alphabet or digit is erroneous, nevertheless regarded the entire plate as being inaccurate. However, The proposed CNN model has 100% accuracy for vehicle and plate detection. The overall accuracy of model is 94.2% with a very minimal average execution time.
29.1 Introduction ANPR has been one of the most useful tools for vehicle surveillance over the past several years. It may be used in a variety of public places to perform a number of different tasks, including toll fee collecting, parking management, and autonomous vehicle monitoring. ANPR algorithms typically include four steps: Taking a photograph of a car, looking for license plates, segmenting characters, and identifying characters are the first three steps. In this project, CNN, darknet, contour detection, and contour segmentation are used by this system to automatically identify and read license plates from D. Satti Babu (B) · T. V. Prasad · B. Revanth Sai · G. D. N. S. Sudheshna · N. Venkata Kishore · P. Chandra Vamsi Department of Computer Science and Engineering, Godavari Institute of Engineering and Technology (A), Rajamahendravaram, India T. V. Prasad e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_29
341
342
D. Satti Babu et al.
Fig. 29.1 Typical registration plate training samples
moving vehicles. We use the YOLO method in our system to produce regions of interest (ROI). In several different application sectors, the word “ROI” is frequently employed. In our case, the number plate is the first ROI, and the second ROI is made up of alphabets (A–Z or a–z) and digits (0–9) on number plate. Segmentation is a technique used in image processing to separate or split data from the desired target area of the picture. Finding contours makes it straightforward to separate them from the image’s backdrop (Fig. 29.1).
29.2 Methods (a) Pattern Recognition The ability to spot combinations of characteristics or data that reveal details about a particular system or data collection is known as “pattern recognition.” (b) Image Filtering These enable the removal of bogus characters that would otherwise be identified incorrectly. (c) Contour Segmentation Image processing uses segmentation to separate or split information from the needed target region of the image. (d) Convolutional Neural Networks A deep learning-based convolutional neural network (ConvNet) method can take a picture as input, assign various objects and components in the image importance (learnable weights and biases), and distinguish between them. In comparison, the preprocessing time for a ConvNet is much less than for other classification techniques.
29 License Plate Recognition Using Neural Networks
343
29.3 Literature Survey In [1], The authors used a clustering approach to create a segmentation algorithm (Kmeans). The process is as follows: (1) Recognition of license plates; (2) Segmentation using the clustering approach (K-means); and (3) Recognition of numbers using the CNN model. The authors noted that these operations are regretfully time-consuming owing to the special plate foreground and fluctuating outside illumination constraints at the time of photo taken, vehicle speed, including backdrop brightness, and distance between the cars and camera. With limitations like consistent illumination, reduced vehicle speeds, specified routes, and static scenery, a variety of strategies may be employed. It is believed to be a more challenging procedure overall since the placement technique directly effects accuracy and efficiency of succeeding operation. The accuracy of character identification using CNN was reasonable. In [2], Artificial intelligence and computer vision were employed by the writers. Before filtering the locations depending on features of the number plates, computer will primarily select a few suspect sites based on the traits of the RGB space and Hue, Saturation, Value (HSV) color models. To help neural networks identify the area of a picture that is most likely to be problematic, it is cropped and rotated. In the next step, neural network system with multiple additional networks is introduced. The goal is to teach machine how to recognize number plates by providing it with examples. One of the reasons why ANPR problems in low-light conditions are difficult for standard vision processing to handle is the system’s inability to arrange registration plates correctly. SqueezeNet, GoogleNet, Vectorcardiography (VCG), and other multi-layer perceptron systems have all been developed on CNN and have proven to be successful and accurate. In the field of ANPR, CNN has assumed the lead and is gradually displacing traditional visual processing approaches. In [3], The author talked about how challenging it was to collect character and picture datasets for the model’s training as well as its implementation for Bangla license plates. CNN performed better than customary image processing formulas, and these models compared well in a range of scenarios. In [4], “The author discussed the findings of research that changed our understanding of how the brain processes vision and showed the presence of receptive fields, which are clusters of locally coupled neurons that are specifically sensitive to specific regions of the visual field. The visual cortex uses a three-layer hierarchical structure instead of processing all of the image data from the retinal sensors at once, with locally sensitive simple nerve cells acting as the bottom layer, more complex cells moving up the hierarchy, and then hyper-complex cells acting as the top layer. A CNN uses a regionally responsive input layer, where neurons are programmed to respond to certain regions of the input picture, resulting in localized responses, as opposed to a typical NN classifier, which depends on a fully linked input layer with a sizable number of weights. The localized answers are produced, processed via a number of phases, and then sent through an input layer with substantially smaller dimensions. The majority of
344
D. Satti Babu et al.
Fig. 29.2 Loss function equation
CNN’s early uses, such as visual data segmentation and identification, were based on pattern recognition. The author mentions CNN’s rising fame, noting its accomplishments in facial recognition, gesture recognition, natural language processing, autonomous cars, and object identification, in addition to object identification. In [5], “Images with different lighting situations were the author’s main area of interest.” It is particularly difficult to handle in a range of lighting conditions. It is quite effective to isolate noise using principal component analysis (PCA). It has characteristics like object detection method with the usage of regions with convolutional neural networks (R-CNN). The ROI pooling layer is utilized to reshape the squares into a confirmed size before they are given into another FC layer. The convolutional feature map finds the proposed regions and packages them into squares. A similar approach is observed in [6] but the model used YOLO instead of R-CNN (Fig. 29.2). The network uses above equation for YOLO loss function. It is employed to lessen the effect, increase loss via bounding boxes that include objects, and leesen loss via boxes which do not. Here, the parameters are p(c), w, x, h, y, C. The input picture is reduced in size to 448 × 448 before being analyzed by the network. With fewer channels and a cuboidal output, a 1 × 1 convolution is followed by a 3 × 3 convolution. The rectified linear unit is the activation function employed throughout network, except for the last layer, which makes use of a linear activation function (ReLU). Overfitting is avoided by using regularization techniques like batch normalization and dropout (Fig. 29.3).
29 License Plate Recognition Using Neural Networks
345
Fig. 29.3 YOLO architecture
29.4 Proposed Methodology The success of the suggested model, which is a vital stage in the model, is heavily dependent on the precision of the license plate extraction. In this phase, the ROI, in this case the plate, is extracted from gathered image. The proposed approach locates the plate in provided picture using histogram-based analysis. The initial ROI is the license plate itself, as was already mentioned. To distinguish plate from the background of the image, we employ the YOLO algorithm. The accuracy of the plate extraction is a key factor in the recommended model’s performance, which is an important stage in the model. This stage involves extracting the ROI, in this case plate, from acquired picture. The suggested method uses histogram-based analysis to find plate in given picture. The number plate itself is the initial ROI, as was already explained. We use the YOLO method to separate plate from image’s backdrop. The weighting of these bounding boxes is based on projected likelihood. The researchers used the Darknet YOLO v3 algorithm to gather and classify a specific dataset of “number plates” in order to learn the model’s weights. In C and Compute Unified Device Architecture, a neural network framework known as Darknet was created. It’s speedy, easy to install, and can handle CPU and GPU computation. Weights for Keras’ “.h5” file are created using the Darknet Yolo v3 weights. The term “h5” file refers to a data file that is stored in the hierarchical data format (HDF). The converted weights of the Keras model through the Keras library are used to construct the predicted bounding boxes and class probabilities. Once the license plate has been found, it is removed from the picture and subjected to further processing in order to reveal the second region of interest, or specific characters (Fig. 29.4).
346
D. Satti Babu et al.
Fig. 29.4 CNN character recognition architecture
29.4.1 Recognition Process Based on CNN A well-known Deep Learning CNN model is utilized to distinguish individual letters of segmented plates. Convolutional pooling and fully connected layers, which are used to build CNN models with different numbers of blocks and add or remove blocks as necessary, are components of the CNN model. The core intention of using CNN for the model is to take full advantage of CNN. It is proven to be highly accurate and is capable of reading vehicle license plate without any human intervention. Speed of execution plays a major role for any ANPR model. CNN models are proven to be faster than other traditional models.
29.4.2 Conv Layer This is where it varies from a NN since not every pixel will be related to next layer with biases and weights. Entire picture is split to the small sections, each segment is given weights and biases. Filters or kernels are used to enclose each small region in the input picture, generate feature maps as result. Convolution layer perceives the filters as basic “features” in the input picture. Fewer parameters are need to run convolution function since same filter scans entire picture rather than just a specific feature. Padding, local area size, filter count, and stride are the main considerations for the convolution layer. In accordance with the size and picture kind of input picture, these parameters are set to provide efficient working (Fig. 29.5).
29 License Plate Recognition Using Neural Networks
347
Fig. 29.5 Execution flow
29.4.3 Pooling Layer The spatial dimensions of the image and the number of parameters are decreased by using the pooling layer, which also lowers processing costs. Without adding any additional arguments, it transforms the input into a pre-defined function. There are several pooling layer types, such as maximal, average, and stochastic pooling. An nxn window that advances across the input with a stride of s covers it. The input size is decreased by choosing the highest value in the nxn area for each location. Due to the translation invariance provided by this layer, even minute changes in position may be recognized.
348
D. Satti Babu et al.
29.4.4 Fully Connected Layer The last pooling layer’s flattened output is sent into the fully linked layer as input. This layer receives connections from every neuron from the layer before operating like a conventional neural network (NN). Because of this, this layer has more parameters than the convolutional layer. The output layer, sometimes referred to as the classifier, is connected to this layer.
29.4.5 Activation Function Different functions are employed throughout the various CNN architecture models. Swish, Leaky Rectified Linear Unit (LReLU), ReLU, and Parametric Rectified Line Unit are a few nonlinear activation functions (PReLU). The training process was accelerated with the help of the nonlinear activation function. This study shows that ReLUs function outperforms the competition.
29.5 Conclusion We conducted a number of tests in this study and collected some enlightening data for further investigation. The straight-front images are quite accurate. When compared to other conventional methodologies, the execution time is quite short. However, a few environmental factors, such as heavy rain, intense fog, dim illumination, smoke, might somewhat affect the accuracy. The same blockage can be noticed up to 37% more when the input picture hasn’t been properly pre-processed. By examining whether the license plate was correctly localized such a way that each of digits, words, and alphabets were readable in localization zone, we were able to assess the accuracy of the algorithm used to recognize license plates. The segmentation and recognition component’s accuracy was measured using a binary scale. We only consider a license plate to be accurate when all of the letters and numerals are classified properly and appear in same sequence as they were originally put on the plate. Even if only one character on the license plate is erroneous, we nevertheless consider the entire plate to be misclassified. Although the structure of our approach ensures rapid identification, additional work has to be done. More complex neural networks can be used to improve accuracy. There are also further ANPR alternatives available. The study’s initial findings are really encouraging. Even though recent data suggests CNN has a bright future.
29 License Plate Recognition Using Neural Networks
349
29.6 Results and Discussion We first ran the execution without performing any picture dataset preparation; instead, we generated a directory of 500 photographs using kaggle datasets [7, 8] and recorded all of the output values into an.xlsx file for validation. Only 374 out of 500 photos were accurate, meaning that the letters and digits on the plate are classified properly and appear in right sequence as they did on the original plate. If one alphabet or digit is erroneous, nevertheless regarded the entire plate as being inaccurate. When the picture is not pre-processed, the model’s accuracy is 74.8%. The model functioned well and was more accurate when the appropriate pre-processing was applied to each input image than when the input image was not pre-processed. The model was tested using the same set of data and 500 photos. Initially, a number of image processing methods, such as contour detection, contour segmentation, BGR2Gray, are used to preprocess the pictures. Out of 500 photos, 471 were accurate. When the picture is correctly pre-processed, the model’s accuracy is 94.2% (Fig. 29.6; Table 29.1).
Fig. 29.6 Sample execution screenshot
Table 29.1 Sample datasets chosen and test results acquired Images without pre-processing
Pre-processed Images
500
500
Total
374
471
Correctly classified
74.8%
94.2%
Accuracy
350 Fig. 29.7 Comparison of test results without and with pre-processing
D. Satti Babu et al.
600
Model Correctness
400 200 0 Without Pre-processing Total
Pre-processed Images
Correctly classified
However, as compared to other conventional ANPR approaches, our model’s execution time is always relatively short. Compute Unified Device Architecture (CUDA) is quicker than CPU at running the model. However, CUDA installation is not required to run the project. In the absence of CUDA, the model would just run on the CPU and its execution time could slightly differ. The accuracy of model can be considered as 94.2% and average execution time is very less (Fig. 29.7).
References 1. Pustokhina, I.V., et al.: Automatic vehicle license plate recognition using optimal K-means with convolutional neural network for intelligent transportation systems. IEEE Access 8, 92907– 92917 (2020) 2. Zhang, Y., Qiu, M., Ni, Y., Wang, Q.: A novel deep learning based number plate detect algorithm under dark lighting conditions. In: 2020 IEEE 20th International Conference on Communication Technology (ICCT), Nanning, China, pp. 1412–1417 (2020) 3. Saif, N., et al.: Automatic license plate recognition system for Bangla license plates using convolutional neural network. In: TENCON 2019—2019 IEEE Region 10 Conference (TENCON), Kochi, India, pp. 925–930 (2019) 4. Mondal, M., Mondal, P., Saha, N., Chattopadhyay, P.: Automatic number plate recognition using CNN based self synthesized feature learning. In: 2017 IEEE Calcutta Conference (CALCON), Kolkata, India, pp. 378–381 (2017) 5. Palanivel Ap, N., Vigneshwaran, T., Arappradhan, M.S., Madhanraj, R.: Automatic number plate detection in vehicles using faster R-CNN. In: 2020 International Conference on System, Computation, Automation and Networking (ICSCAN), Pondicherry, India, pp. 1–6 (2020) 6. Xie, L., Ahmad, T., Jin, L., Liu, Y., Zhang, S.: A new CNN-based method for multi-directional car license plate detection. IEEE Trans. Intell. Transp. Syst. 19(2), 507–517 (2018) 7. Datacluster Labs. Indian Number Plates Dataset: Detection and Classification for Indian Licence Plates, Version 2. Retrieved 20 Dec 2022 from https://www.kaggle.com/datasets/dataclusterlabs/ indian-number-plates-dataset 8. Larxel. Car License Plate Detection: 433 images of license plates, Version 1. Retrieved 13 Dec 2022 from https://www.kaggle.com/datasets/andrewmvd/car-plate-detection
Chapter 30
Dehazing of Satellite Images with Low Wavelength and High Distortion: A Comparative Analysis Amrutha Sajeevan and B. A. Sabarish
Abstract Dehazing is a procedure to improve visibility of remote sensing (RS) satellite images that are affected negatively visually by meteorological conditions when the horizontal visibility on the ground level is more significant than 1 km. Remote sensing is the method of acquiring the material characteristics of an area by estimating the reflected and emitted radiation at a distance. The presence of low wavelengths, such as gamma waves, makes it difficult for the light to penetrate the atmosphere and thus makes the image more scattered by airborne particles. Therefore, it is necessary to use gamma-corrected remote sensing images to better the quality of dehazing. The proposed study compares various strategies used in dehazing remote sensing satellite images, and then specific quality standards are employed to judge their performance.
30.1 Introduction Remote sensing (RS) gained popularity due to the technological advancements in the emerging technologies for various applications and infrastructural development. Remote sensing images are captured using drones, satellites, aircrafts, or a satellite sensor while not in physical contact with the earth’s surface [1]. Among the different RS data, satellite images are being focused here. Commercial satellites such as Worldview-3 satellite with a resolution of 0.31 m are currently available [2]. RS photos taken by satellites under certain circumstances are vulnerable to being obscured by fog or haze, which results in low contrast or pale colour. This negative
A. Sajeevan (B) · B. A. Sabarish Department of Computer Science and Engineering, Amrita School of Computing, Coimbatore, Amrita Vishwa Vidyapeetham, India e-mail: [email protected] B. A. Sabarish e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_30
351
352
A. Sajeevan and B. A. Sabarish
impact not just lowers the optical standard of RS photos but also makes it more difficult to use such beneficial remote sensing features. To restore the RS data, a reliable and real-time haze reduction technique is essential [3]. Decision-making application such as object detection [4] and night light emissions [5] utilises remote sensing data traditionally in the visible spectrum range, while other spectrum images are not considered. Non-visible spectrum data also have useful features for the decision-making process about remote sensing, which makes it essential to include these data. The issue with such data is that despite it supplying vital information, it cannot always achieve common RGB [6]. There are also other difficulties such as objects with wavelengths between 700 and 1000 nm would not be visible. It would also require added illumination at night due to the lack of visible light. The non-visible spectrum data also include distortion due to environmental factors and haziness, thus making it essential to preprocess the image by using methods such as gamma correction. As a check to the distortion caused by low wavelengths of the non-visible spectrum, the RS-Haze dataset [7], a large-scale non-homogeneous remote sensing picture dehazing dataset with gamma-corrected images, is employed in this study. The paper is organised as, literature survey on the various image dehazing techniques, followed by metrics for comparative analysis where Mean Squared Error (MSE) [3], Mean Absolute Error (MAE) [3], Peak Signal-to-Noise Ratio (PSNR) [3], and Structural Similarity Index Metric (SSIM) [3] are chosen as metrics for Image Quality Assessment (IQA) [8], result and comparative analysis where a few of the selected methods of dehazing are compared on basis of the metrics selected and followed by conclusion.
30.2 Literature Survey Haze is a significant obstacle on satellite image visual quality, which can be brought on by pollution, cloud cover, or other atmospheric particles that deflect light away from its intended line of propagation, making the remote sensing images often have low contrast and limited vision. Dehazing, the procedure which removes haze from the photos, is a critical step in the processing of remote sensing photographs since it can increase the visual appeal of the images and enhance image details. The methods of image dehazing can be classified into three—image enhancement, physical dehazing, and data driven. Each class of methodologies is described below:
30 Dehazing of Satellite Images with Low Wavelength and High Distortion …
353
30.2.1 Image Enhancement Dehazing based on image amplification does not take the visible model of image deterioration into account, instead, it enhances the image quality by boosting contrast. The most representative of these algorithms are Retinex algorithm and histogram equalisation. Khan et al. [9] propose a histogram equalisation technique that guarantees selecting a starting point for histogram partitioning through the level-snip technique. By identifying an optimum threshold that may favour all performance measures equally rather than concentrating on reaching only a few metrics out of them, the approach seeks to improve low contrast pictures. The algorithm uses the luminance (V) part alone to convert images into HSV (Hue, Saturation, Value) prototype. Histogram equalisation has very few serious drawbacks, is simple to use, and has little computational cost. As a result, it may be able to handle high resolution remote sensing data. However, because to its strong general contrast enhancement ability and lack of naturalness, it only functions well on a picture with significant haze. Wang et al. [10] suggest a method for enhancing low-light colour photographs using the MSRCR (Multi-scale Retinex algorithm for colour saturation) image enhancement technique in the HSI (Hue, Saturation, Intensity) colour space. The MSRCR enhancement is applied to the brightness I component with colour recovery to provide a picture with enhanced luminance, I component, and colour restored. The image is then enhanced using single-scale Retinex (SSR) [11] in the RGB colour space. The algorithm’s main benefits are that it suffers from less colour distortion and keeps most details. The drawbacks of this method include greying out and subpar results with artificial photos.
30.2.2 Physical Dehazing To lower the unreliability of that of haze removal, the physical dehazing technique applies more advance knowledge or presumptions to the well-known Atmospheric Scattering Model. Golts et al. [12] present the well-known dark channel prior (DCP) energy function reduction as a completely unsupervised training method. The parameters are changed by directly lowering the DCP rather than providing the network with synthetic data. Deep DCP approach is a rapid approximation of DCP [13]. Regularisation is included that was discovered through the network and learning process. The main merits of this suggested method are that the output image is similar to the input image and has good colour restoration, while it lacks in the aspects that it overestimates transmission and has a high computational cost as well as that it does not restore images with large white or sky areas and inhomogeneous fog and haze.
354
A. Sajeevan and B. A. Sabarish
Zhang et al. [14], utilising a dehazing system based on the non-local dehazing network (NLDN) is advocated. The complete point-wise convolutional segment extracts non-local statistical regularities for intermediate representation. A feature combination section then determines how statistical regularities relate to one another spatially. The haze-free picture is then recovered utilising the characteristics from the second segment during a reconstruction portion. This approach handles most hazy example scenarios with ease, but it may falter in scenes where the air light is noticeably brighter than the scene, perform well only at specific haze levels. This method will perform erratically for a small number of hazy photos that do not meet the initial priors.
30.2.3 Data Driven Convolution neural network (CNN) [15] has been used and successfully produced satisfactory results in feature identification, image dissolution, and further domains with the continual evolution of deep learning postulates. When compared to conventional haze removal approaches, many data-driven rooted picture dehazing methodologies have shown extensive triumph. A convolutional neural network called Dehaze-Net is provided by Ullah et al. [16], which works well in terms of computing. The Dehaze-Net uses a modified atmospheric scattering representation to approximate both the transmission map and the atmospheric light, in contrast to existing learning-based systems that measure these separately. A colour visibility restoration technique is used to prevent colour distortion in the dehaze picture. The main advantage of Dehaze-Net is that it better restores the sky and white area in the images. It also lacks in the areas that small error in air light will drop the performance, it enhances single images dark colours and dehazing result in the remote area needs to be enhanced. FD-GAN (Generative Adversarial Network with Fusion Discriminator) is presented by Li et al. [17], which addresses the difficulties associated with nonhomogeneous fog. It employs a two-branch Generative Adversarial Network (GAN) with several convolutional layers of a learnable fusion tail [18]. The traits of the transfer learning subnetwork and the current data fitting subnetwork are combined. They are then mapped to a distinct image using the fusion tail. A fusion discriminator that employs frequency information as additional prior is also included. The main advantages of this method include the fact that the fusion discriminator uses frequency information as other priors, can produce dehazed images that are more realistic and natural-looking, and can be trained on a sizable training set. This method’s primary flaw is that learning-based dehazing algorithms’ success is entirely dependent on training data.
30 Dehazing of Satellite Images with Low Wavelength and High Distortion …
355
Most contemporary dehazing algorithms are still unable to support computer vision systems that require great processing efficiency for routine operation. A decent haze removal algorithm must have both a low computing complexity and a reliable recovery ability. Methods currently in use reduces complexity by improving the algorithm itself, and some may require a Graphics Processing Unit (GPU) to speed up processing. The choice of the training dataset also affects the data-driven restoration’s quality. Usage of datasets that are created artificially, rather than real-world data collection, will reduce the dehazing outcome on verified RS data.
30.3 Metrics for Comparative Analysis After using the techniques to remove RS image haze, it is essential to utilise IQA measurements to assess how well images are perceived [8]. IQA is used to show how far from the ground truth or reference model an image is. Ground truth, which is accessible for the dataset utilised, the RS-Haze dataset [7], is necessary for an exact assessment of image quality. Comprehensive reference measurements like MSE, MAE, and PSNR are employed to measure the statistical data of an image, as they are easy to compute, have clear physical consequences, and are straightforward to theoretically apply in the context of optimization. Normalised reference metric called SSIM which compares and standardises the structural and feature similarity measurements between restored and original items is also utilised. These four metrics for evaluating picture quality are all full-reference measurements [8].
30.3.1 Mean Squared Error MSE is the average of squared differences between the original and restored image and is calculated as [3] MSE =
Q P 1 ( f (x, y) − h(x, y))2 , P · Q x=1 y=1
(30.1)
where the original image is represented by f (x, y) and the restored image is represented by h(x, y). P and Q stand for the image’s length and breadth, while x and y are the coordinates for each pixel in the image. The lower the value MSE, better the model.
356
A. Sajeevan and B. A. Sabarish
30.3.2 Mean Absolute Error MAE refers to the average absolute error between the predicted and actual values. It delivers a positive integer for an 8-bit picture range from 0 to 255 and may avoid the problem of mistakes cancelling each other out. Formally, it is decided using [3] Q P 1 | f (x, y) − h(x, y)|. MAE = P · Q x=1 y=1
(30.2)
The lesser the value of MAE score, better is the model.
30.3.3 Peak Signal-to-Noise Ratio PSNR uses MSE to calculate the ratio of the evaluated error to the actual pixels value. It can be computed by [3] PSNR = 10 log10
M2 P · Q Q P
.
(30.3)
( f (x, y) − h(x, y))2
x=1 y=1
Higher the value of PSNR, better the model.
30.3.4 Structural Similarity Index Metric SSIM is a statistic that assesses the degree of similarity between images and can also be used to evaluate the calibre of compressed images. It holds crucial details about the constitution of the object in the visible site and is figured out as [3] 2μx μ y + c1 · 2σx y + c2 , SSIM = 2 μx + μ2y + c1 · σx2 + σ y2 + c2
(30.4)
where μx , μy and σ 2 x , σ 2 y are the mean and variance of x and y, respectively, c1 = (r 1 T )2 , c2 = (r 2 T )2 is a constant used to support stability, r 1 = 0.01, r 2 = 0.03, σ xy is the covariance of x and y, and T is the active range of the pixel value, T = 25. Higher the value of SSIM, better the model.
30 Dehazing of Satellite Images with Low Wavelength and High Distortion …
357
30.4 Results and Comparative Analysis Among the different dehazing methods, data-driven methods are more proficient nowadays. Performance comparison of major data-driven methods, including Dehaze-Net, U-Net, AIDED-Net (Anti-Interference and Detail Enhancement Dehazing Network), Style-GAN, is compared along with a common physical dehazing method, DCP. Each of these models are pre-trained on REalistic Single Image DEhazing (RESIDE) dataset [19] and are re-trained using 51,300 images of the RS-Haze train dataset [7] and tested with the 2700 images in the RS-Haze test dataset [7].
30.4.1 Basic Definitions DCP—The statistics of haze-free photographs serve as the foundation for DCP. It is founded on a crucial finding: most local patches in outdoor haze-free photos hold a small number of pixels with extremely low intensity in at least one colour channel [12]. Dehaze-Net—Dehaze-Net uses an input hazy image to create a medium transmission map, which is then utilised with an Atmospheric Scattering Model to recover an output haze-free image. Convolutional neural networks (CNN) [20], whose layers are designed to stand for the specified presumptions/priors in photo dehazing, form the foundation of Dehaze-Net’s deep architecture [16]. U-Net—U-Net is a fully convolutional network (FCN)-based architecture utilised to o operate with fewer training photographs and deliver more precise dehazing. The U-Net architecture utilises two directions. The context of the image is captured via the first path, known as a contraction path. The second method is the symmetric expanding path, which uses transposed convolutions to accomplish accurate localisation. It can analyse pictures of any size, as it only has convolutional layers and no dense layers [21]. AIDED-Net—The way that AIDED-Net is trained on haze-free photos set it apart from existing dehazing networks. On the network training flight, a brand-new approach is recommended for creating a hazy patch. From the input photos, the patch and haze thickness are randomly selected, respectively [20].
358
A. Sajeevan and B. A. Sabarish
Style-GAN—Style-GAN was created with neural style transfer in mind. A mechanism for progressive expansion is the main architectural feature. Every image that is generated begins as a fixed 4 × 4 × 512 array that is then repeatedly passed through style blocks. A “style latent vector” is applied by each style block. Gaussian noise is then added, and the normalisation is done via adaptive instance normalisation [22]. Figure 30.1 displays the some of the photographs and the outcomes of the compared dehazing techniques. Nine synthetic hazed photos with three distinct haze densities—light, moderate, and dense—are created for each haze-free image. [7]. Here, three haze images are selected for each density category of the synthetic non-homogeneous hazed images. Figure 30.1 proves that the outcomes of the physical dehazing may produce some darker-than-expected outputs. Contrarily, data-driven dehazing may fall short in scenes with heavy haze, even though it can supply a high-quality haze-free scene for most of the samples shown. The comparison of these methods is done in MS Visual Studio Code—Version: 1.74.2 (user setup), OS: Windows_NT x64 10.0.19044, using Python. As given in Table 30.1, the MSE, MAE, PSNR, and the SSIM are used to assess the refurbishment standard of dehazing techniques. Since GAN-based [18] Style-GAN method of dehazing receives the lowest score in all the dehazed images for MSE and MAE, and the highest score across all the dehazed images for PSNR and SSIM evaluation indices, it may be concluded that it has a greater potential to produce image dehazing. GANs [18] are efficient in producing images with excellent quality and significant resolution due to the discriminator and generator. Style-GANs create huge, high-quality images by gradually expanding both the discriminator and generator models during the training phase from tiny to large images. By adjusting the style vector and noise, the Style-GAN model creates high-quality images with elaborate information by sharpening the images.
30.5 Conclusion In conclusion, this work distinguishes the traits of several dehazing methods and gives a brief review of them. The image enhancement solely emphasises increasing contrast while ignoring the imaging theory of cloudy data. Physical dehazing isolates the variables and allows the Atmospheric Scattering Model to physically generate a haze-free viewpoint. The robust learning capabilities of the neural network are utilised in data-handled dehazing. The dehazing methodologies have been tested over RS-Haze dataset with different synthetic hazed photo categories for three different haze densities. Among the tested methodologies, Style-GAN has provided the best performance, and this has been proved quantitatively using the metrics of MSE, MAE, PSNR, and SSIM. Thus, dehazing using Style-GAN can be used as a prerequisite step for any earth modelling application.
30 Dehazing of Satellite Images with Low Wavelength and High Distortion … Hazy Image
DCP
Dehaze-Net
U-Net
AIDED-Net Style-GAN
359 Ground Truth
1
Light 2
3
4
Moderate
5
6
7
Dense 8
9
Fig. 30.1 Comparison of dehazing methods on RS-Haze dataset. The first column is the hazy images, and the last column is the ground truth
360
A. Sajeevan and B. A. Sabarish
Table 30.1 Values of comparison of various dehazing methods on RS-Haze dataset Density
Metric Image DCP
Light
MSE
Dehaze-Net U-Net
AIDED-Net Style-GAN
1
100.9144
108.23814
110.37985
107.57184
2
103.42801
103.83766
110.52074
99.293902
3
108.63388
97.901736 103.06104
98.751855
1
153.51664
185.52453
201.19931
209.46555
87.841947
2
162.17693
124.12638
204.49607
126.56892
72.511831
3
190.10358
144.25111
159.31655
168.03571
90.875659
PSNR 1
19.38982
19.817297
16.152341
19.403776
21.270569
2
18.33311
14.767546
18.442297
19.595038
22.954251
3
16.250509
11.871465
17.5078
17.816898
21.125989
1
0.70028
0.78223
0.631901
0.653591
0.864209
2
0.789111
0.701047
0.684981
0.799105
0.882876
3
0.72979
0.771996
0.798567
0.772931
4
106.13016
105.42439
105.3897
108.71722
96.35513
5
112.07619
112.41838
114.0293
116.91183
93.65034
6
103.31723
103.41641
108.50684
107.64659
93.23782
4
165.84059
131.7491
150.1697
140.68302
100.43453
5
177.95087
170.71472
178.80737
185.38344
6
173.50898
135.56349
176.07989
121.21524
94.619925
MAE
SSIM
Moderate MSE
MAE
90.47578 86.14014 93.16051
0.856349
96.217289
PSNR 4
16.656269
15.004271
18.934025
15.45985
20.074021
5
12.038762
12.359817
10.711515
18.224717
20.278619
Moderate PSNR 6
15.723637
15.730328
14.923209
13.696621
21.001808
4
0.604762
0.626984
0.595434
0.629094
0.824327
5
0.514623
0.549085
0.354613
0.611948
0.834958
6
0.644407
0.635768
0.586882
0.680733
SSIM
Dense
MSE
MAE
101.62122
113.57098
0.854921
7
106.9327
8
100.33733
9
100.53074
104.03077
100.98964
110.0986
97.45059
7
192.31758
152.90488
184.7725
141.97151
104.14474
8
184.63328
144.48859
207.98543
173.84042
102.06905
9
161.86977
161.79049
138.59254
134.50783
107.93674
99.385901 110.24844
104.22482
97.11567
108.5712
97.10052
PSNR 7
16.321049
16.110803
11.570678
16.493215
19.144106
8
15.331003
15.081661
15.923875
15.06612
19.620858
9
16.836795
14.678877
15.433657
14.984782
19.258649
7
0.740906
0.758414
0.4443
0.722693
0.811496
8
0.769865
0.745966
0.697175
0.750168
0.822846
9
0.636606
0.743084
0.720824
0.485207
0.80557
SSIM
30 Dehazing of Satellite Images with Low Wavelength and High Distortion …
361
References 1. Alias, B., Karthika, R., Parameswaran, L.: Classification of high resolution remote sensing images using deep learning techniques. In: 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (2018) 2. Awwad, Z. et al.: Self-supervised deep learning for vehicle detection in high-resolution satellite imagery. In: 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS (2021) 3. Liu, J., Wang, S., Wang, X., Ju, M., Zhang, D.: A review of remote sensing image dehazing. Sensors 21, 3926 (2021) 4. Takano, H. et al.: Visible light communication on LED-equipped drone and object-detecting camera for post-disaster monitoring. In: 2021 IEEE 93rd Vehicular Technology Conference (VTC2021-Spring) (2021) 5. Levin, N., et al.: Remote sensing of night lights: a review and an outlook for the future. Remote Sens. Environ. 237, 111443 (2020) 6. Illarionova, S., Shadrin, D., Trekin, A., Ignatiev, V., Oseledets, I.: Generation of the NIR spectral band for satellite images with convolutional neural networks. Sensors 21, 5646 (2021) 7. Song, Y., He, Z., Qian, H., Du, X.: Vision transformers for single image Dehazing. IEEE Trans. Image Process. 1–1 (2023) 8. Min, X., et al.: Quality evaluation of image dehazing methods using synthetic hazy images. IEEE Trans. Multimedia 21(9), 2319–2333 (2019) 9. Khan, M.F., et al.: Fuzzy-based histogram partitioning for bi-histogram equalisation of low contrast images. IEEE Access. 8, 11595–11614 (2020) 10. Wang, P., Wang, Z., Lv, D., Zhang, C., Wang, Y.: Low illumination color image enhancement based on Gabor filtering and Retinex Theory. Multimedia Tools Appl. 80, 17705–17719 (2021) 11. Li, P., et al.: Deep Retinex network for single image dehazing. IEEE Trans. Image Process. 30, 1100–1115 (2021) 12. Golts, A., Freedman, D., Elad, M.: Unsupervised single image dehazing using dark channel prior loss. IEEE Trans. Image Process. 29, 2692–2701 (2020) 13. Likhitaa, P.S., Anand, R.: A comparative analysis of image dehazing using image processing and deep learning techniques. In: 2021 6th International Conference on Communication and Electronics Systems (ICCES) (2021) 14. Zhang, S., He, F., Ren, W.: NLDN: non-local dehazing network for dense haze removal. Neurocomputing 410, 363–373 (2020) 15. Ullah, H., et al.: Light-dehazenet: a novel lightweight CNN architecture for single image dehazing. IEEE Trans. Image Process. 30, 8968–8982 (2021) 16. Rashid, H., et al.: Single image dehazing using CNN. Procedia Comput. Sci. 147, 124–130 (2019) 17. Li, F., Di, X., Zhao, C., Zheng, Y., Wu, S.: FA-GAN: A feature attention GAN with fusion discriminator for non-homogeneous dehazing. SIViP 16, 1243–1251 (2022) 18. Sanjay, A., Nair, J.J., Gopakumar, G.: Haze removal using generative Adversarial Network. In: Lecture Notes in Electrical Engineering, pp. 207–217 (2021) 19. Li, B., et al.: Benchmarking single-image dehazing and beyond. IEEE Trans. Image Process. 28(1), 492–505 (2019) 20. Zhang, J., He, F., Duan, Y., Yang, S.: AIDEDNet: Anti-interference and detail enhancement dehazing network for real-world scenes. Front. Comput. Sci. 17 (2022) 21. Ge, W., Lin, Y., Wang, Z., Wang, G., Tan, S.: An improved U-net architecture for image dehazing. IEICE Trans. Inform. Syst. E104.D, 2218–2225 (2021) 22. Karras, T. et al.: Analyzing and improving the image quality of stylegan. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Chapter 31
Automated Segmentation of Tracking Healthy Organs from Gastrointestinal Tumor Images Sanju Varghese John
and Bibal Benifa
Abstract Oncologists can locate tumors precisely and provide a specific dosage based on their location using advanced technologies such as MR-Linacs. In this study, the position of healthy organs such as the stomach and intestines is automatically segmented in order to direct the X-ray beam toward the tumor without affecting healthy organs. Using deep learning, healthy organs are segmented from images of GI tumors to accelerate the process and improve the quality of care for patients. A model based on EfficientNetB7 is proposed for segmenting the stomach and intestines on GI tract MRI scans, which uses compound scaling to scale all dimensions uniformly, that is network width, depth, and resolution. MRI images of anonymous patients treated at the University of Wisconsin-Madison Carbone Cancer Center were provided for the GI tract image segmentation competition. The proposed methodology achieves 0.8693 IoU score and 0.8991 dice score.
31.1 Introduction Cancer is the top cause of mortality worldwide and one of the most significant impediments to increasing life expectancy, accounting for nearly 10 million deaths in 2020, or nearly one in six deaths [1]. Gastrointestinal (GI) cancer represents a wide range of cancers of the digestive system, such as cancers of the esophagus, gallbladder, liver, pancreas, stomach, small intestine, large intestine, rectum, and anus [2]. A large number of people suffer from GI cancer, making it one of the most serious health problems. Approximately, 5 million people were diagnosed and 3.5 million died of GI cancers in 2020 globally. The severity of the lesion can be roughly classified into benign conditions, precancerous lesions, early GI cancers, and advanced GI cancers. The presence of precancerous GI lesions can progress S. V. John (B) · B. Benifa Indian Institute of Information Technology, Kottayam, Kerala 686635, India e-mail: [email protected] B. Benifa e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_31
363
364
S. V. John and B. Benifa
to early GI cancer or even advanced GI cancer if the condition is not detected and treated in a timely manner. Detecting and diagnosing precancerous lesions and early cancer as quickly as possible is essential to preventing advanced GI cancer. Medical image segmentation helps to speed up the treatment process and provide patients with efficient care on time.
31.1.1 Analysis of Gastrointestinal Cancers Worldwide The following graphs in Figs. 31.1 and 31.2 depict the estimated number of GI cancers and deaths in 2020 worldwide.
Fig. 31.1 Estimated number of GI cancer cases worldwide in 2020, spanning all age groups and genders [1]
Fig. 31.2 Estimated number of GI cancer deaths worldwide in 2020, spanning all age groups and genders [1]
31 Automated Segmentation of Tracking Healthy Organs …
365
Fig. 31.3 Estimated number of GI cancer cases and fatalities from 2020 to 2040, globally, for all ages groups and genders [1]
31.1.2 Projected GI Cancer New Cases and Deaths Globally, GI cancer cases and deaths are projected to increase by 60.4% and 66.8%, respectively, based on projected changes in age composition and population growth by 2040 as shown in Fig. 31.3.
31.1.3 Objectives Radiation treatment is given to cancer patients in 10-min daily doses for one to six weeks. Radiation oncologists use X-rays during radiation treatments to target tumors and deliver the maximum radiation dosage without harming the stomach or intestines. Using MRI and linear accelerator technologies, oncologists can monitor the position of tumors and intestines, which may change from day to day. During these scans, radiation oncologists must manually segment the stomach and intestines to focus radiation on the tumor. Patients may be unable to tolerate the hard and timeconsuming manual segmentation procedure, which may lengthen their therapy from 15 min to an hour each day. The major objective of this study is to automate the segmentation of healthy organs, such as the stomach and intestines from GI tumor images, in order to increase treatment effectiveness.
366
S. V. John and B. Benifa
Table 31.1 Analysis of existing research on medical image segmentation Paper Ref.
Title
Methodology
Dataset
Result
[4]
U-Net: Convolutional Networks for Biomedical Image Segmentation
U-Net: Encoder–decoder architecture
ISBI Cell Tracking Challenge 2015
PhC-U373 IoU-0.9203 DIC–Hela IoU-0.7756
[5]
UNETR: U-Net Transformers for Transformers 3D medical image segmentation
BTCV (CT) MSD (MRI/CT)
Average: 0.891 Spleen Segmentation—Dice—0.964 Brain tumor segmentation—Dice—0.711
[6]
Gastrointestinal tract polyp anomaly segmentation on colonoscopy images
Graft-U-Net
Polyps colinoscopy images CVC-ClinicDB, Kvasir-SEG
CVC-ClinicDB (mDice—0.8995, mIoU—0.8138) Kvasir-SEG (mIoU—0.8245, (mDice—0.9661)
[7]
ResUNet++: An advanced architecture for medical image segmentation
ResUNet++
Kvasir-SEG
Dice—0.8133 mIoU—0.7927 Recall—0.7064 Precision—0.8774
[8]
Multi-scale network for thoracic organs segmentation
EfficientNetB7
SegTHor
IoU—0.93405 F1 score—0.95138
31.2 Literature Review There have been several studies exploring the detection and classification of GI cancers using DL methods. It has been demonstrated that convolutional neural networks can be highly useful for analyzing medical images in a number of studies [3]. There has been steady progress in the design and implementation of deep convolutional methods for segmentation in the literature on medical imaging, as given in Table 31.1.
31.3 Materials and Methods Deep learning has recently been the standard for almost all digital medical image processing segmentation tasks.
31 Automated Segmentation of Tracking Healthy Organs …
367
31.3.1 Dataset The dataset is provided by UW-Madison Carbone Cancer Center as part of the UWMadison GI tract image segmentation competition [9]. An annotated medical image dataset and segmentation masks for anonymized MRIs of patients treated there have been provided. The images are 16-bit grayscale PNGs with RLE-encoded training annotation masks. By using run-length encoding (RLE), consecutive data elements with the same value are stored as a single value and count rather than separated. Each case is represented by multiple sets of scan slices with a total of 115,488 images. Three types of annotation information are available for 38,496 patient cases, including large bowel, small bowel, and stomach.
31.3.2 Preprocessing The image’s size varies across the dataset, which contains four distinct sizes—266 * 234, 310 * 360, 276 * 276, and 234 * 234. Images are resized before being fed to the model to maximize GPU utilization and reduce training time. Multiple data augmentation strategies enhance the quantity of training and testing samples, with 80% of the dataset being used for training and 20% for validation.
31.3.3 Data Augmentation A transformation can be classified into various categories based on how it is applied. A model’s performance on data it has previously seen vs data it has never seen before is measured by its generalizability. Models with poor generalizability have overfitted the training set. As augmented data represents a wider range of data, it reduces the distance between training and validation sets, as well as any upcoming testing sets. We used rotation, scaling, horizontal flipping, grid distortion, and elastic transforms to enhance segmentation performance [10, 11].
31.4 Proposed Methodology The proposed model makes use of an EfficientNetB7-based encoder–decoder architecture. The EfficientNetB7 model involves a compound scaling technique that uniformly scales all dimensions, i.e., network width, depth, and resolution with a fixed ratio in order to improve accuracy and efficiency while scaling ConvNet [12]. The encoder–decoder architecture consists of an encoder and decoder path. The encoder path is a stack of convolution and max pooling layers. Generally, these
368
S. V. John and B. Benifa
Fig. 31.4 MBConv—the basic building block of EfficientNet [13]
CNN designs gradually decrease the image’s input resolution to produce the final feature map. The encoder feature map is up-sample by the decoder path in order to retrieve spatial information. The compound scaling approach was used in the development of the EfficientNet architecture, and there are eight variants of the EfficientNet architecture named EfficientNetB0 to EfficientNetB7. The decoder consists of a sequence of up-convolution and concatenation layers to obtain the segmentation map. Mobile inverted bottleneck convolution (MBConv) with squeeze and excitation optimization [14] is the fundamental component of the EfficientNet architecture as shown in Fig. 31.4. EfficientNetB7 is composed of seven block networks based on filter sizes, strides, and channels as shown in Fig. 31.5. In the decoder path, up-convolutions and concatenation can be used to obtain precise localization. We used EfficientNetB7 as the backbone for the proposed segmentation as shown in Fig. 31.6.
31.5 Experiment and Results The proposed model was implemented by using PyTorch. With an initial learning rate of 0.0001, the network is trained using an image size of 320 × 384 with five iterations using the Adam optimizer. The EfficientNetB7 encoder is initialized with
31 Automated Segmentation of Tracking Healthy Organs …
Fig. 31.5 EfficientNetB7 architecture with MBConv as the building block [13]
Fig. 31.6 Architecture of the proposed system by using EfficientNetB7 architecture [13]
369
370
S. V. John and B. Benifa
ImageNet pre-trained weights, which are then fine-tuned. StratifiedGroupKFold is used as cross-validation technique that returns stratified folds with non-overlapping groups. Folds are created by maintaining the percentage of samples in each group. The performance of the model is evaluated using Intersection Over Union (IoU) and dice coefficient. • Intersection over union metric (IoU): The IoU has been used to analyze how organs overlap between target and predicted regions based on their overall similarity. IoU = |X ∩ Y | ÷ |X ∪ Y |.
(31.1)
The IoU metric for an individual class is defined as follows: IoU = TP ÷ (TP + FN + FP).
(31.2)
Dice Coefficient: The dice coefficient measures how closely the predicted segmentation mask corresponds to the ground truth at the pixel level. Dice Coefficient = 2 × (|X ∩ Y |) ÷ (|X | + |Y |),
(31.3)
where X is the predicted set of pixels and Y is the ground truth. The dice score for an individual class is defined as follows: Dice Coefficient = 2 × TP ÷ (2 × TP + FN + FP),
(31.4)
where TP, FP, and FN represent the number of true positives, false positives, and false negatives, respectively.
31.5.1 Results The proposed methodology achieves 0.8693 IoU score and 0.8991 dice score as shown in Fig. 31.7. A generalization gap can be quantified using losses computed on the train and validation split of the data as shown in Fig. 31.8. The class-wise dice score and the IoU score for the large bowel, small bowel, and stomach are shown in Fig. 31.9. Based on our experiments by using different CNN architectures as given in Table 31.2, EfficientNetB7 architecture performs better than other CNN architectures for segmenting healthy organs on GI tract images. Automated segmentation of large bowel, small bowel, and stomach on GI track MRI scan images is shown in Fig. 31.10.
31 Automated Segmentation of Tracking Healthy Organs …
Fig. 31.7 Training versus validation IoU and dice score
Fig. 31.8 Training versus validation loss
Fig. 31.9 Class-wise dice and IoU score
371
372 Table 31.2 Results on GI tract multi-class image segmentation
S. V. John and B. Benifa
Model
Jaccard index (IoU)
Dice coefficient
U-Net
0.8370
0.8564
UNETR
0.7352
0.8088
ResNet152
0.8299
0.8625
EfficientNetB7
0.8693
0.8991
Fig. 31.10 Segmentation of large bowel, small bowel, and stomach
31.6 Conclusions The proposed solution automatically segments the stomach and intestines on gastrointestinal tumor scans. The model will help radiation oncologists to safely apply higher doses of radiation to malignancies without damaging healthy organs such as the stomach and intestines. This will speed up the daily treatments for cancer patients, enabling them to receive more efficient care with fewer side effects.
31.7 Future Work To increase the accuracy of all classes, we will evaluate further methods that will be incorporated into the proposed system. Furthermore, train the model on more datasets to enhance segmentation and generalize it.
References 1. GLOBOCAN 2020 Graph production: Global Cancer Observatory (htg://gcoiarc.fr) © International Agency for Research on Cancer 2023 2. Tyagi, A.K., Prasad, S.: Gastrointestinal Cancers: Prevention, Detection, And Treatment. Nova Science Publishers, Inc. (2016)
31 Automated Segmentation of Tracking Healthy Organs …
373
3. Du, W, et al.: Review on the applications of deep learning in the analysis of gastrointestinal endoscopy images. IEEE Access 7, 142053–142069 (2019). https://doi.org/10.1109/ACCESS. 2019.2944676 4. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W., Frangi, A. (eds.) Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015. MICCAI 2015. Lecture Notes in Computer Science, vol. 9351. Springer, Cham (2015). https://doi.org/10.1007/978-3319-24574-4_28 5. Hatamizadeh, A., Tang, Y., Nath, V., Yang, D., Myronenko, A., Landman, B., Roth, H., Xu, D.: UNETR: transformers for 3D medical image segmentation (2021). https://arxiv.org/abs/2103. 10504 6. Ramzan, M., et al.: Gastrointestinal tract polyp anomaly segmentation on colonoscopy images using graft-U-Net. J. Personal. Med. 12(9), 1459. https://doi.org/10.3390/jpm12091459 7. Jha, D., et al.: ResUNet++: an advanced architecture for medical image segmentation. In: 2019 IEEE International Symposium on Multimedia (ISM), San Diego, CA, USA, pp. 225–2255 (2019). https://doi.org/10.1109/ISM46123.2019.00049 8. Khalil, M., Tehsin, S., Humayun, M., Zaman, N., AlZain, M.: Multi-scale network for thoracic organs segmentation. Compute. Mater. Continua. 70, 3251–3265 (2022). https://doi.org/10. 32604/cmc.2022.020561 9. https://www.kaggle.com/competitions/uw-madison-gi-tract-image-segmentation/overview 10. Chaitanya, K., Karani, N., Baumgartner, C.F., Erdil, E., Becker, A., Donati, O., Konukoglu, E.: Semi-supervised task-driven data augmentation for medical image segmentation. In: Medical Image Analysis, vol. 68 (2021). ISSN 1361-8415. https://doi.org/10.1016/j.media.2020.101934 11. Buslaev, A., Iglovikov, V., Khvedchenya, E., Parinov, A., Druzhinin, M., Kalinin, A.: Albumentations: fast and flexible image augmentations. Information 11, 125 (2020). https://doi.org/ 10.3390/info11020125 12. Tan, M., Le, Q.: Efficientnet: rethinking model scaling for convolutional neural networks. In: Proceedings of PMLR, California, USA, pp. 6105–6114 (2019) 13. . Baheti, B., Innani, S., Gajre, S., Talbar, S.: Eff-UNet: a novel architecture for semantic segmentation in unstructured environment. In: Proceedings of CVPR, Seattle, WA, USA, pp. 358–359 (2020) 14. Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.-C.: Mobilenetv2: inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Chapter 32
Collision Free Energy Efficient Multipath UAV-2-GV Communication in VANET Routing Protocol Mohamed Ayad Alkhafaji, Nejood Faisal Abdulsattar, Mohammed Hasan Mutar, Ahmed H. Alkhayyat, Waleed Khalid Al-Azzawi, Fatima Hashim Abbas, and Muhammet Tahir Guneser
Abstract Vehicular Ad-Hoc Networks (VANETs) are extensively utilized in Intelligent Transportation Systems (ITS) for their various communication modes, including vehicle-to-vehicle and vehicle-to-infrastructure. VANETs are equipped with certain special characteristics such as dynamic mobility, high-speed vehicles, and so on. Both the transmission models are ineffective when it is applied to the high-speed vehicles. During transmission vehicles will transfer the information to the road side unit (RSU), but RSU deployment is the most challenging task. Due to improper deployment of RSU, deployment cost of the network gets increased, end-to-end delay get increased as well as it directly reduced the performance of packet delivery M. A. Alkhafaji (B) Department of Medical Device Industry Engineering, College of Engineering Technology, National University of Science and Technology, Dhi Qar, Iraq e-mail: [email protected] N. F. Abdulsattar · M. H. Mutar Department of Computer Technical Engineering, College of Information Technology, Imam Ja’afar Al-Sadiq University, 66002 Al-Muthanna, Iraq A. H. Alkhayyat College of Technical Engineering, The Islamic University, Najaf, Iraq e-mail: [email protected] W. K. Al-Azzawi Department of Medical Instruments Engineering Techniques, Al-Farahidi University, Baghdad, Iraq e-mail: [email protected] F. H. Abbas Medical Laboratories Techniques Department, Al-Mustaqbal University College, 51001 Hillah, Babil, Iraq e-mail: [email protected] M. T. Guneser Department of Electrical Electronic Engineering, College of Engineering, Karabuk University, Karabuk, Turkey e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_32
375
376
M. A. Alkhafaji et al.
ratio and throughput calculation of the network. In order to overcome these drawbacks, Unmanned Aerial Vehicles (UAVs) are introduced in VANETs. In this paper, Collision Free Energy Efficient Multipath UAV-2-GV communication (CEM-UAVs) is proposed to improve the stability of the VANETs. This CEM-UAVs protocol is segmented into two parts; they are collision free path selection and energy efficient multipath UAV communication. Using UAVs-based data, transmission greatly reduces the delay and loss produced by the ground level traffic and congestion in the network. To analysis the performance of the proposed CEM-UAVs protocol, NS2 software is used. Four parameters are calculated to investigate the performance of the proposed CEM-UAVs protocol; they are end-to-end delay, packet delivery ratio, and number of UAVs count and throughput as well as it is compared with the earlier research works such as NC-UAVs and DP-UAVs. From the results it is proven that the proposed CEM-UAVs protocol produced 160–260 ms lower end-to-end delay, 8–10% higher packet delivery ratio, 100UAVs to 280UAVs lower UAVs count, and 1200–2000 Kbps higher throughput when compared with the earlier works NC-UAVs and DP-UAVs.
32.1 Introduction Vehicular communication is a most trending technology in Intelligent Transmission System (ITS) where the information is transmitted between the vehicles in an intelligent manner. Due to the high mobility nature of the vehicles, efficient communication becomes a challenging issue [1]. In the initial phase, it equipped VANETs with transmission models for both vehicle-to-vehicle and vehicle-to-infrastructure (using road side units or RSUs), with their performance being significantly influenced by ground-level traffic conditions [2–4]. The RSUs present in the network are static in nature as that it cannot able to cover the densely populated region, and the resource management is very much limited in this place. To overcome this drawback in earlier researches, the RSUs count is increased but that increased the deployment cost with is not a cost-effective process. In many of the times, the supplementary RSUs are unused that directly affects the efficiency of the network [5–7]. So an innovative model is essential here to improve the effectiveness of the VANETs. As the result, the RSU-based VANETs require more improvement to achieve effective communication during data transmission. Therefore, this topic is mostly attracted by the researchers, and it is one among an open research area. In this paper for this purpose, Unmanned Aerial Vehicles (UAVs) are introduced in VANETs. Through UAVs, maximum of the transmission is carried out in air medium so that ground level traffic is reduced greatly. The contribution of the research is discussed below. In order to improve the vehicular network performance, Unmanned Aerial Vehicles (UAVs) are introduced in VANETs which eliminate the ground level traffic through transmitting the data in the air medium.
32 Collision Free Energy Efficient Multipath UAV-2-GV Communication …
377
• Here Collision Free Energy Efficient Multipath UAV-2-GV communication (CEM-UAVs) protocol is proposed to achieve high effective performance during communication. • This protocol is divided into two segments; they are collision free path selection and energy efficient multi path communication in UAVs. • Through this method, the packet delivery ratio and throughput of the network is highly improved likewise end-to-end delay of the network is reduced. The rest of the paper is listed as follows. In Sect. 32.2, the related works talked about the UAVs-based effective communication in VANETs. In Sect. 32.3, the UAVsbased VANETs environment is created. In Sect. 32.4, the proposed CEM-UAVs protocol is detailed. In Sect. 32.5, the results evaluation of the proposed CEM-UAVs protocol is performed, and it is compared with the recent research methods such as NC-UAVs and DP-UAVs. In Sect. 32.6, the conclusion and the future works are given.
32.2 Related Works References [8, 9], developed Non-Orthogonal Multiple Access scheme for UAV communication to efficiently handle the bandwidth. The main advantage of this framework is it significantly reduced the communication delay but fails to reduce the energy consumption and computational overhead. References[10–13] presented robust routing scheme for the UAV communication to ensure high level stability of network. This framework reduced the packet loss in VANET. But under high traffic scenarios, this framework faces link failures which degrade the system performance. Reference [14] presented a social spider optimization protocol for the effective UAV communication. The key benefit of this framework is it enhances the framework’s performance in terms of packet delivery ratio (PDR). But main flaw identified in this framework is high energy consumption during handover operations. Reference [15] investigated a dual-UAV-enabled secure communication system using 3D trajectory and time switching optimization. This framework delivers fair communication by reducing transmission delays. But the main drawback of this framework is that it is not suitable for large coverage areas due to its higher energy consumption. Reference [16] formulated a robust energy efficiency maximization problem with channel uncertainties and coordinate uncertainties for energy harvesting UAV communication. This structure provided a reliable resource allocation. However, this framework is unsuitable for high-density situations and is insufficient for the transfer of massive amounts of data. Reference [17] developed an energy efficient Alternating Optimization (AO) for effective UAV communication. This framework displays good PDR and transmission rate performance. However, this framework’s primary drawback is that it adds computational complexity for long-distance communication. Reference [18] structured a UAV-assisted backscatter communication network to maximize the energy efficiency
378
M. A. Alkhafaji et al.
of VANET. Both energy consumption and communication delay are decreased by this architecture. However, during changes in the network structure, the nodes’ mobility also has an impact on the link connection. Reference [19] considered a non-convex problem and presented a locally optimal solution based on the successive convex approximation (SCA) technology and the alternate optimization method. The system’s security performance was successfully improved by this approach. However, this architecture raised the network’s energy consumption and the delay in circumstances where the network was extremely congested. Reference [20] presented efficient iterative algorithm based on the block coordinate descent and successive convex approximation techniques. The network’s energy usage was decreased using this architecture, but the network collision delay was increased. Reference [21] developed a secure and energy efficient communication multi-objective optimization problem (SECMOP) to reduce the propulsion energy consumption of UAVs and avoid the effects of known and unknown eavesdroppers. This framework’s main advantages are its lower energy consumption and higher path connectivity. However, it is unable to give high PDR under network collision situations, which further increased the transmission delay. References [22–25] developed a method to improve the path selection process in VANETs called multi-objective optimization process. This technique works with the principle of Gaussian Mutation Harmony Searching. This method achieves moderate results, but it is not suitable for the high-speed VANETs with huge density. Reference [26, 27] developed a multi-objective optimization to perform effective data forwarding in VANETs. The bio-inspired algorithm which is used in this optimization technique is using firefly algorithm. This algorithm achieves moderate results, but it is not applicable for the dynamically varying VANETs with huge density. Reference [28] developed a drone-based network to improve the effectiveness communication in VANETs called dynamic IoD collaborative communication approach. Through effective deployment of drones, the transmission stability is improved. Additionally, to improve the vehicle to vehicle communication, an improved bio-inspired optimization is performed. The results indicate improved coverage and enhanced quality; however, it remains imperative to reduce the drone count to minimize deployment costs, thus necessitating a concentrated effort on achieving maximum coverage. Reference [29] proposed a novel approach called collaborative network coverage enhancement scheme with the combination of UAVs in VANETs. Coverage maximization is performed in this method, and additionally, Particle Swarm Optimization (PSO) is used to find the best path among UAVs. This method achieves moderate results in terms of delivery rate, delay, and throughput but fails to achieve lower overhead. Through a comprehensive review of prior UAV-based VANETs research, there is a prevailing focus on optimizing coverage and ensuring collision-free data transmission. In this paper, Collision Free Energy Efficient Multipath UAV-2-GV communication (CEM-UAVs) is proposed. The step-by-step process of this protocol is elaborated in the upcoming sections.
32 Collision Free Energy Efficient Multipath UAV-2-GV Communication …
379
32.3 UAVs-Based VANETs Environments UAVs-based VANETs network model consists of two types of transmission. They are Ground Vehicle—Ground Vehicle (GV to GV) transmission and UAVs to Ground Vehicle (UAVs-GV) transmission. Both this transmission models are briefly explained in this section.
32.3.1 GV–GV Communication In VANETs, each ground vehicle can communicate with other to transmit the data which is present in its coverage area and in its line of sight. Due to the presence of obstacles in the road side environment, the multi-hop communication becomes more ineffective. So the source ground vehicle transfers the information to the GV which is present outside its coverage area through the UAVs. UAVs perform communication in the air medium, so the information can able to get transmitted through obstacles free manner.
32.3.2 UAVs–GV Communication Through UAVs in VANETs, air-to-ground data transmission is performed so that the information gets transmitted in congestion free manner that will not be affected by any kinds of obstacles in the network. UAVs maintain maximum coverage area to cover inactive regions also. So that the accuracy level of the UAVs are very high. To perform effective multipath routing through UAVs, a Collision Free Energy Efficient Multipath UAV-2-GV communication (CEM-UAVs) is proposed. The proposed CEM-UAVs are elaborated in Sect. 32.4.
32.4 Collision Free Energy Efficient Multipath UAVs (CEM-UAVs) Collision Free Energy Efficient Multipath UAVs are proposed in VANETs to improve the effectiveness and the stability during communication. This protocol is sub-divided into two segments; they are (i) collision free path selection and (ii) energy efficient multipath communication. The work flow of the proposed CEM-UAVs protocol is shown in Fig. 32.1.
380
M. A. Alkhafaji et al.
Fig. 32.1 Work flow of the proposed CEM-UAVs protocol
32.4.1 Collision Free Path Selection The collision free path selection is carried out between the interconnected vehicles, and the decision making is performed by the UAVs. Each UAVs perform its stability calculation by considering the parameters such as distance, speed, and successful transmission rate. The calculation of these parameters is shown below. At the initial stage, the distance between vehicle and UAVs is measured and it is mathematically expressed below. Dis(V ,UAV) =
√
(V 1 − UAV1)2 + (V 2 − UAV2)2 .
(32.1)
Vehicle distance is denoted as that Dis(V ,UAV) which represents the distance between two vehicles vehicle and UAV. The coordinates are (V 1 , V 2 ) and (UAV1 , UAV2 ), respectively. The speed of the vehicle is expressed as below: S(V ,UAV) =
PTR(V,UAV) × V(V,UAV) . time
(32.2)
32 Collision Free Energy Efficient Multipath UAV-2-GV Communication …
381
In Eq. (32.2), the terms PTR(V, UAV) represent the previous transmission rate of the vehicle to UAVs, and V(V,UAV) is the velocity of vehicle and the UAV. Finally, the successful transmission rate is measured as follows: STR(V,UAV) =
RP(V,UAV) . TP(V ,UAV)
(32.3)
In Eq. (32.3), the terms RP(V,UAV) and T P(V,UAV) denote the received packets rate and transmission packet rate of vehicle to UAV. By using the above parameters, the stability of the UAVs is measured. The stability factor calculation is determined by the equation below. SF(V,UAV)
( ) ( ) C1 ∗ S(V,UAV) + C2 ∗ STR(V,UAV) = . Dis(V,UAV)
(32.4)
In Eq. (32.4), the terms C1 and C2 are the experimental constants that satisfies the condition C 1 + C 2 = 1. By Eq. (32.4), the stability of the UAVs is measured, and the data transmission is carried out. Higher the stability factor produces the collision free path to the data transmission.
32.4.2 Energy Efficient Multipath Communication Through energy efficient multipath communication, the link between the GVs to GVs and the link between the UAVs to GVs are maintained so that the UAVs can track any vehicle frequently which is present in its coverage area. Through this method, the bandwidth utility and the density of the vehicles can be increased to the maximum level, and it also be easily monitored by the UAVs. The link establishment and maintenance are the actions performed by the energy efficient multipath communication process. The initial stage of link establishment UAVs which are present in the region will transmit the FIND message in a periodical manner. Once after receiving the FIND packets, the vehicles transmit the ACK packets. Unmanned Aerial Vehicles (UAVs) possess the capability to efficiently gather a substantial volume of ACK (Acknowledgment) packets continuously at every moment. So, the vehicle joining to the UAVs and leaving the UAVs becomes very easy. At the stage there is no delay and energy consumption happen. Then, the process of link maintenance is carried out. After the construction of link between the UAVs and the GVs, both units will update its routing table at once for future references. Here, the link stability is high, and resource allocation is high, so there is no need for re-optimization in this method. This process greatly helps to improve the overall performance of the network.
382
M. A. Alkhafaji et al.
32.5 Performance Analyses The simulation of the proposed CEM-UAVs protocol involves the utilization of the NS2.35 network simulator for network-related simulations, and for generating mobility patterns in Vehicular Ad Hoc Networks (VANETs), the SUMO (Simulation of Urban Mobility) tool has employed. To perform result evaluation, the parameters which are considered in the analysis are end-to-end delay, packet delivery ratio, number of UAVs, and throughput. For the process of comparative analysis, the earlier recent researches which are taken into consideration are NC-UAVs [17] and DP-UAVs [18]. The simulation parameter setup is given in Table 32.1.
32.5.1 End-to-End Delay It is the time taken to send the information from the sender to the receiver. Results of the end-to-end delay for the proposed CEM-UAVs protocol and other two methods in terms of number of vehicles are presented in Fig. 32.2. The results prove that the increase of number of vehicles increases the end-to-end delay during the communication. Because of the existence of UAVs in VANETs, the communication is more effective and leads to reduce the end-to-end delay in the network. The end-to-end delay value calculation of the proposed CEM-UAVs protocol with other two methods is given in Table 32.2. The end-to-end delay achieved with 5000 Table 32.1 Simulation parameter settings
Input parameters
Values
Operating system
Ubuntu 16.04
NS version
NS-2.35
Mobility generator
SUMO
Running time
500 ms
Dimension
5000 m * 5000 m
No. of vehicles
500–5000 vehicles
No. of UAVs
250 UAVs
Antenna type
Omni-directional Antenna
Propagation model
Two ray ground model
Queue type
DropTail
Link bandwidth
1 Kbps
Traffic flow
CBR
Speed of vehicles
75 km/h
Speed of UAVs
150 km/h
Transmission power
0.500 J
Receiving power
0.050 J
32 Collision Free Energy Efficient Multipath UAV-2-GV Communication …
383
Fig. 32.2 End-to-end delay calculation
Table 32.2 End-to-end delay measured values
No. of vehicles
NC-UAVs
DP-UAVs
CEM-UAVs
500
134
88
25
1000
152
101
43
1500
196
124
59
2000
213
153
73
2500
247
201
82
3000
286
241
96
3500
324
251
104
4000
356
295
128
4500
396
302
154
5000
425
321
165
as number of the vehicles by the simulated methods are such as NC-UAVs, DP-UAVs and proposed CEM-UAVs protocol are 425 ms, 321 ms, and 165 ms, respectively. The proposed CEM-UAVs protocol produced 160ms to 260ms lower end-to-end delay compared with the earlier works [30].
32.5.2 Packet Delivery Ratio It is the number of packets received by the receiver to the number of packet transmitted by the sender. Increase of packet delivery ratio improves the effectiveness of communication in VANETs. Results of the packet delivery ratio for the proposed CEM-UAVs
384
M. A. Alkhafaji et al.
protocol and other two methods in terms of number of vehicles are presented in Fig. 32.3. By providing effective communication between UAVs and vehicle, delay is reduced that leads to increase the packet delivery ratio of the proposed CEM-UAVs protocol. The packet delivery ratio value calculation of the proposed CEM-UAVs protocol with other two methods is given in Table 32.3. The packet delivery ratio achieved with 5000 as number of the vehicles by the simulated methods are such as NC-UAVs, DP-UAVs and proposed CEM-UAVs protocol are 86%, 88%, and 96%, respectively. The proposed CEM-UAVs protocol achieves 8–10% higher packet delivery ratio compared with the earlier works.
Fig. 32.3 Packet delivery ratio calculation
Table 32.3 Packet delivery ratio measured values
No. of vehicles
NC-UAVs
DP-UAVs
CEM-UAVs
500
84
92
97
1000
85
88
94
1500
81
90
95
2000
82
91
93
2500
86
89
96
3000
84
87
94
3500
81
88
95
4000
83
91
97
4500
86
89
95
5000
86
88
96
32 Collision Free Energy Efficient Multipath UAV-2-GV Communication …
385
Fig. 32.4 No. of UAVs calculation
32.5.3 UAVs Count It is the calculation of number of UAVs allocated for number of vehicles. Increase of number of UAVs reduces the effectiveness of communication in VANETs in case if the UAVs count if high that is not cost-effective as well as the number of vehicles that can able to control by the UAVs will be low if the UAVs count is high. So the network with lower UAVs count is highly effective in communication. Results of the UAVs count for the proposed CEM-UAVs protocol and other two methods in terms of number of vehicles are presented in Fig. 32.4. The UAVs count value calculation of the proposed CEM-UAVs protocol with other two methods is given in Table 32.4. The packet delivery ratio achieved with 5000 as number of the vehicles by the simulated methods are such as NC-UAVs, DPUAVs and proposed CEM-UAVs protocol are 500 UAVs, 320 UAVs, and 220 UAVs, respectively. The proposed CEM-UAVs protocol works with 100–280 UAVs lower UAVs count compared with the earlier works.
32.5.4 Throughput It is defined as the total amount of packets transmitted during communication for the entire network. Increase of throughput gets better the stability for the data transmission in VANETs. Results of the throughput for the proposed CEM-UAVs protocol and other two methods in terms of number of vehicles are presented in Fig. 32.5. The proposed CEM-UAVs protocol achieves throughput by providing effective UAVs to vehicle communication in VANETs.
386 Table 32.4 No. of UAVs measured values
M. A. Alkhafaji et al.
No. of vehicles
NC-UAVs
DP-UAVs
CEM-UAVs
500
50
25
12
1000
100
52
45
1500
150
76
58
2000
200
105
75
2500
250
130
86
3000
300
180
100
3500
350
240
130
4000
400
265
160
4500
450
290
190
5000
500
320
220
Fig. 32.5 Throughput calculation
The throughput value calculation of the proposed CEM-UAVs protocol with other two methods is given in Table 32.5. The throughput achieved with 5000 as number of the vehicles by the simulated methods are such as NC-UAVs, DP-UAVs and proposed CEM-UAVs protocol are 1025 Kbps, 1864 Kbps, and 3024 Kbps, respectively. The proposed CEM-UAVs protocol achieves 1200–2000 Kbps higher throughput compared with the earlier works.
32 Collision Free Energy Efficient Multipath UAV-2-GV Communication … Table 32.5 Throughput measured values
387
No. of vehicles
NC-UAVs
DP-UAVs
CEM-UAVs
500
365
610
1024
1000
401
724
1246
1500
421
854
1524
2000
451
902
1754
2500
496
965
1964
3000
521
1025
2013
3500
645
1267
2475
4000
714
1414
2547
4500
865
1624
2854
5000
1025
1865
3024
32.6 Conclusion In this paper, to improve the VANETs from the drawbacks occurred by the improper deployment of RSUs, Unmanned Aerial Vehicles (UAVs) are introduced. UAVs perform the transmission in air medium, so it can easily able to overcome the traffic and congestion created by the ground level data transmission. To achieve efficient and steady communication in VANETs, Collision Free Energy Efficient Multipath UAV-2-GV communication (CEM-UAVs) is proposed. Through collision free path selection process, network congestion is greatly reduced as well as using energy efficient multipath UAV communication the coverage area of the UAVs are maximized that leads to reduce the number of UAVs count in the regions. So that the UAVs can able to coverage maximum region as well as it becomes highly cost-effective. For the process of performance evaluation, NS2 software is used. The parameters which are considered for the value calculation are end-to-end delay, packet delivery ratio, and number of UAVs count and throughput, and it is compared with the earlier works such as NC-UAVs and DP-UAVs. Based on the discussed results, the proposed CEMUAVs protocol exhibits superior performance metrics, including a 10% increase in packet delivery ratio, a 2000 Kbps improvement in throughout, a reduction of 260 ms in end-to-end delay, and a decrease of 280 UAVs in the UAV count. This method produces stable and effective performance during data transmission in VANETs. In the future direction, UAVs security is concentrated.
References 1. Abbas, A.H., Mansour, H.S., Al-Fatlawi, A.H.: Self-adaptive efficient dynamic multi-hop clustering (SA-EDMC) approach for improving VANET’s performance. Int. J. Interact. Mob. Technol. 17(14) (2022)
388
M. A. Alkhafaji et al.
2. Manzoor, A., Dang, T.N., Hong, C.S.: UAV trajectory design for UAV-2-GV communication in VANETs. In: 2021 International Conference on Information Networking (ICOIN), pp. 219–224 (2021). https://doi.org/10.1109/ICOIN50884.2021.9333983 3. Ghazzai, H., Khattab, A., Massoud, Y.: Mobility and energy aware data routing for UAVassisted VANETs. In: 2019 IEEE International Conference on Vehicular Electronics and Safety (ICVES), pp. 1–6 (2019). https://doi.org/10.1109/ICVES.2019.8906323 4. Malik, R.Q., Ramli, K.N., Kareem, Z.H., Habelalmatee, M.I., Abbas, A.H., Alamoody, A.: An overview on V2P communication system: architecture and application. In: 2020 3rd International Conference on Engineering Technology and its Applications (IICETA), pp. 174–178. IEEE (2020) 5. Alsharoa, A., Yuksel, M.: Energy efficient D2D communications using multiple UAV relays. IEEE Trans. Commun. 69(8), 5337–5351 (2021). https://doi.org/10.1109/TCOMM.2021.307 8786 6. Zhang, R., Zeng, F., Cheng, X., Yang, L.: UAV-aided data dissemination protocol with dynamic trajectory scheduling in VANETs. In: ICC 2019—2019 IEEE International Conference on Communications (ICC), pp. 1–6 (2019). https://doi.org/10.1109/ICC.2019.8761170 7. Mansour, H.S., Mutar, M.H., Aziz, I.A., Mostafa, S.A., Mahdin, H., Abbas, A.H., Jubair, M.A.: Cross-Layer and Energy-aware AODV routing protocol for flying Ad-Hoc networks. Sustainability 14(15), 8980 (2022) 8. Mokhtari, S., Nouri, N., Abouei, J., Avokh, A., Plataniotis, K.N.: Relaying data with joint optimization of energy and delay in cluster-based UAV-assisted VANETs. IEEE Internet Things J. (2022). https://doi.org/10.1109/JIOT.2022.3188563 9. Abbas, A.H., Habelalmateen, M.I., Audah, L., Alduais, N.A.M.: A novel intelligent clusterhead (ICH) to mitigate the handover problem of clustering in VANETs. Int. J. Adv. Computer Sci. Appl. 10(6) 10. Oubbati, O.S., Lakas, A., Lorenz, P., Atiquzzaman, M., Jamalipour, A.: Leveraging communicating UAVs for emergency vehicle guidance in urban areas. IEEE Trans. Emerg. Topics Comput. 9(2), 1070–1082 (2021). https://doi.org/10.1109/TETC.2019.2930124 11. Jubair, M.A., Hassan, M.H., Mostafa, S.A., Mahdin, H., Mustapha, A., Audah, L.H., Abbas, A.H.: Competitive analysis of single and multi-path routing protocols in mobile Ad-Hoc network. Ind. J. Electr. Eng. Computer Sci. 14(2) (2019) 12. Mostafa, S.A., Mustapha, A., Ramli, A.A., Jubair, M.A., Hassan, M.H., Abbas, A.H.: Comparative analysis to the performance of three mobile Ad-Hoc network routing protocols in timecritical events of search and rescue missions. In: International Conference on Applied Human Factors and Ergonomics, pp. 117–123. Springer, Cham (2020) 13. Abbas, A.H., Ahmed, A.J., Rashid, S.A.: A Cross-layer approach MAC/NET with updated-GA (MNUG-CLA)-based routing protocol for VANET network. World Electric Vehicle J. 13(5), 87 (2022) 14. Azzoug, Y., Boukra, A.: Enhanced UAV-aided vehicular delay tolerant network (VDTN) routing for urban environment using a bio-inspired approach. Ad Hoc Networks 133(102902) (2022). https://doi.org/10.1016/j.adhoc.2022.102902 15. Wang, W., et al.: Robust 3D-trajectory and time switching optimization for dual-UAV-enabled secure communications. IEEE J. Sel. Areas Commun. 39(11), 3334–3347 (2021). https://doi. org/10.1109/JSAC.2021.3088628 16. Xu, Y., Liu, Z., Huang, C., Yuen, C.: Robust resource allocation algorithm for energyharvesting-based D2D communication underlaying UAV-assisted networks. IEEE Internet Things J. 8(23), 17161–17171. https://doi.org/10.1109/JIOT.2021.3078264 17. Li, S., Duo, B., Renzo, M.D., Tao, M., Yuan, X.: Robust secure UAV communications with the aid of reconfigurable intelligent surfaces. IEEE Trans. Wireless Commun. 20(10), 6402–6417 (2021). https://doi.org/10.1109/TWC.2021.3073746 18. Yang, G., Dai, R., Liang, Y.-C.: Energy-efficient UAV backscatter communication with joint trajectory design and resource optimization. IEEE Trans. Wireless Commun. 20(2), 926–941 (2021). https://doi.org/10.1109/TWC.2020.3029225
32 Collision Free Energy Efficient Multipath UAV-2-GV Communication …
389
19. Wang, Z., Guo, J., Chen, Z., Yu, L., Wang, Y., Rao, H.: Robust secure UAV relay-assisted cognitive communications with resource allocation and cooperative jamming. J. Commun. Networks 24(2), 139–153 (2022). https://doi.org/10.23919/JCN.2021.000044 20. Cao, D., Yang, W., Chen, H., Wu, Y., Tang, X.: Energy efficiency maximization for buffer-aided multi-UAV relaying communications. J. Syst. Eng. Electron. 33(2), 312–321 (2022). https:// doi.org/10.23919/JSEE.2022.000032 21. Sun, G., Li, J., Wang, A., Wu, Q., Sun, Z., Liu, Y.: Secure and energy-efficient UAV relay communications exploiting collaborative beamforming. IEEE Trans. Commun. 70(8), 5401– 5416 (2022). https://doi.org/10.1109/TCOMM.2022.3184160 22. Rashid, S.A., Alhartomi, M., et al.: Reliability-Aware multi-objective optimization-based routing protocol for VANETs using enhanced gaussian mutation harmony searching. IEEE Access 10, 26613–26627 (2022). https://doi.org/10.1109/ACCESS.2022.3155632 23. Habelalmateen, M.I., Abbas, A.H., Audah, L., Alduais, N.A.M.: Dynamic multiagent method to avoid duplicated information at intersections in VANETs. TELKOMNIKA (Telecommun. Comput. Electron. Control) 18(2), 613–621 (2020) 24. Abbas, A.H., Habelalmateen, M.I., Jurdi, S., Audah, L., Alduais, N.A.M.: GPS based location monitoring system with geo-fencing capabilities. In: AIP Conference Proceedings, vol. 2173, No. 1, p. 020014. AIP Publishing LLC (2019) 25. Hassan, M.H., Jubair, M.A., Mostafa, S.A., Kamaludin, H., Mustapha, A., Fudzee, M.F.M., Mahdin, H.: A general framework of genetic multi-agent routing protocol for improving the performance of MANET environment. IAES Int. J. Arti. Intelli.e 9(2), 310 (2020) 26. Abbas, A.H., Audah, L., Alduais, N.A.M.: An efficient load balance algorithm for vehicular Ad-Hoc network. In: 2018 Electrical Power, Electronics, Communications, Controls and Informatics Seminar (EECCIS), pp. 207–212. IEEE (2018) 27. Abdulbari, A.A., Abdul Rahim, S.K., Abedi, F., Soh, P.J., Hashim, A., Qays, R., Zeain, M.Y.: Single-layer planar monopole antenna-based artificial magnetic conductor (AMC). Int. J. Antennas Propag. (2022) 28. Ahmed, G.A., Sheltami, T.R., et al.: A novel collaborative IoD-assisted VANET approach for coverage area maximization. IEEE Access 9, 61211–61223 (2021). https://doi.org/10.1109/ ACCESS.2021.3072431 29. Islam Muhammad, M., Khan Malik, T.R., et.al.: Dynamic positioning of UAVs to improve network coverage in VANETs. Veh. Commun. 36 (2022). https://doi.org/10.1016/j.vehcom. 2022.100498 30. Obaid, A.J., Abdulbaqi, A.S., Hadi, H.S.: Testing and integration method utilization for restricted software development. In: Kumar, A., Fister Jr., I., Gupta, P.K., Debayle, J., Zhang, Z.J., Usman, M. (eds.) Artificial Intelligence and Data Science. ICAIDS 2021. Communications in Computer and Information Science, vol. 1673. Springer, Cham (2022). https://doi.org/ 10.1007/978-3-031-21385-4_28
Chapter 33
Intelligent Data Transmission Through Stability-Oriented Multi-agent Clustering in VANETs Ali Alsalamy, Mustafa Al-Tahai, Aryan Abdlwhab Qader, Sahar R. Abdul Kadeem, Sameer Alani, and Sarmad Nozad Mahmood
Abstract Vehicular Ad hoc Networks (VANETs) are one of the intelligent data transmission technologies which captured the attention of maximum of the applications of Intelligent Transport Systems. Due to the high mobility nature of VANETs, the consumption of energy is increased during the process of communication between the vehicles which leads to an increase in the end-to-end delay of the network. To overcome the network from this drawback Stability-Oriented Multi-agent Clustering (SOMAC)-based effective CH selection is performed in VANETs to improve the effectiveness of the communication. The parameters which are considered for the process of CH selection are distance, speed, connectivity, average acceleration and velocity, and residual energy. According to the parameters, the weight factor of the A. Alsalamy (B) Department of Computer Technical Engineering, College of Information Technology, Imam Ja‘afar Al-Sadiq University, 66002, Al-Muthanna Samawah, Iraq e-mail: [email protected] M. Al-Tahai Department of Medical Instruments Engineering Techniques, Al-Farahidi University, Baghdad, Iraq e-mail: [email protected] A. A. Qader Department of Computer Technical Engineering, Bilad Alrafidain University College, 32001, Diyala Baqubah, Iraq e-mail: [email protected] S. R. A. Kadeem College of Engineering Technology, Department of Medical Device Industry Engineering, National University of Science and Technology, Dhi Qar Nasiriyah, Iraq e-mail: [email protected] S. Alani University of Mashreq, Research Center, Baghdad, Iraq e-mail: [email protected] S. N. Mahmood Computer Technology Engineering, College of Engineering Technology, Al-Kitab University, Kirkuk, Iraq © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_33
391
392
A. Alsalamy et al.
vehicle is measured and the vehicle with the highest weight factor is chosen as a CH. Two types of vehicles are present in the network which is a smart vehicles and ordinary vehicles. Smart vehicles can able to communicate directly with the RSU, but it is fewer in number. The ordinary vehicle is huge in numbers, and it transmits the data to the RSU using the CH. Effective CH selection provides a better communication platform for ordinary vehicles where it can reduce the energy consumption and delay of the network. The proposed SOMAC approach is simulated by using NS2 and evaluates the performance by focusing on the four performance metrics which are end-to-end delay (E2E), packet delivery ratio (PDR), throughput (TP), and energy efficiency (EE). Also, it compares with the earlier research on DGCM and ECRDP. From the simulation outcome, it is proven that the proposed SOMAC approach produced better PDR, TP, and EE as well as lower end-to-end delay when compared with the earlier works.
33.1 Introduction Nowadays, VANETs become the trending technology in Intelligent Transportation Systems (ITSs) [1–4]. VANETs perform Vehicle-to-Vehicle (V2V) and Vehicleto-Infrastructure (V2I) (roadside unit—RSU) data communication [5–7]. At the current stage, usage of vehicles in VANETs becomes huge in numbers and it is highly randomized. The stability and effectiveness of communication in VANETs are reduced. Due to the individuality of VANETs, it consumes more energy during communication which reflects in the increase in delay at the time of data transmission between vehicles [8]. For this reason, improvisation of VANETs’ efficiency becomes an open research area, as well as many researchers are interested to study this issue. According to the earlier study, clustering is the best way to improve the EE in VANETs. In VANETs during communication, similar destination-based vehicles are grouped to form a cluster. One among the vehicles becomes cluster head (CH) and the other is cluster associates (CAs). A stable and effective CH selection is a challenging issue in VANETs [9–11]. In this paper, to transmit the data intelligently, a stable and effective CH selection approach is introduced. Multi-factors are considered for the selection of CH to determine its weight factor of it. Through CH—the stability, efficiency, and reliability of the network are increased as well as the energy consumption and end-to-end delay (E2E) of the network are reduced. The main contribution of this research is discussed as follows: To achieve effective communication in VANETs, the major drawbacks such as the increase in energy consumption and delay caused by the nature of VANETs are addressed in this research. Stability-Oriented Multi-agent Clustering (SOMAC)-based effective CH selection is performed in VANETs. Multi-agent-based CH selection is performed using the agents such as distance, speed, connectivity, average acceleration and velocity, and residual energy. The main process of the SOMAC approach is multi-agent-based CH selection, cluster formation, and cluster maintenance. Four performance metrics are
33 Intelligent Data Transmission Through Stability-Oriented Multi-agent …
393
used to evaluate the performance of the proposed method such as (E2E) delay, packet delivery ratio (PDR), throughput (TP), and energy efficiency (EE). The organization of the paper is as follows: In Sect. 33.2, the earlier research based on CH selection in VANETs is discussed and the drawbacks are found. In Sect. 33.3, the proposed SOMAC approach is elaborated. In Sect. 33.4, performance analysis is done and the results are computed. In Sect. 33.5, the conclusion and future works are discussed. The organization of the paper is as follows: In Sect. 33.2, the earlier research based on CH selection in VANETs is discussed and the drawbacks are found. In Sect. 33.3, the proposed SOMAC approach is elaborated. In Sect. 33.4, performance analysis is done and the results are computed. In Sect. 33.5, the conclusion and future works are discussed.
33.2 Related Works In [12], proposed a bioinspired node clustering optimization known as Whale Optimization Algorithm for Clustering in Vehicular Ad hoc Networks (WOACNETs). By reducing the number of packets to a level that approaches optimal and extended network lifetime, it lowers the communication cost. But it fails to achieve a high PDR and TP. In [13], the presented link connection is used as a measurement for cluster formation and cluster head (CH) identification in the network-based clustering scheme. Energy consumption is reduced using this method, but it creates more overhead during communication. In [14], to determine the optimum CH under urban VANET conditions and to optimize overall packer delivery ratio (PDR) and data delivery delay, the author Bukuroshe Elira presented a destination-aware contentbased routing system. The performance is moderate when it is applied to a network with huge mobility. In [15], Real-Time Traffic-Aware Data Gathering Protocol (TDG), introduced by the author M. Gillani, adopts dynamic segmentation switching to address communication limitations. This system’s main advantages include less communication and high data transmission rates which increase computing overhead. In [16–18], an effective and dependable multi-hop broadcasting protocol, Intelligent Forwarding Protocol (IFP), was proposed IFP, the PDR is greatly increased, and the message propagation delay is significantly decreased. However, this framework’s set route lifespan, fixed maintenance window, and significant E2E delay are its only shortcomings. [19], to efficiently choose CH, presented a clustering algorithm based on Moth-Flame Optimization. The fundamental benefit of this approach is a greatly lower routing cost, although, in actual urban contexts, TP is not maintained. In [20], for urban VANETs to select CH, they developed a new dynamic Internet of Drones (IoD) collaborative communication technique. However, this network’s drawback is that it cannot handle circumstances with a lot of traffic, which leads to performance degradation. [21], for speedier communication on VANET, suggested the Hybrid Genetic Firefly Algorithm-based Routing Protocol (HGFA). The main drawback of this system, however, is the extra time required for the process due to temporal connectivity and link reliability. In [22], a multi-hop cluster-based algorithm (MhCA)
394
A. Alsalamy et al.
for VANET was presented with the use of fuzzy logic, and the rank index is calculated for MhCA, a unique clustering technique that selects the node with the highest rank index as the cluster leader. The computational overhead was lowered by this methodology, but PDR performance was suffered in real-time applications. In [23], to eliminate false messaging in VANET, suggested a new PSO-enabled multi-hop technique that assists in choosing the optimum route, locating the stable cluster head, and removing malicious nodes from the network. However, this framework is expensive and area-specific as a result of the installation of the expensive RSUs. In [24] proposed a vehicular-hypergraph-based spectral clustering model to improve the trust score in CH selection. Through this method, CH stability, PDR, and TP are improved, but it fails to achieve delay. In [25] proposed an effective CH selection method by using density-based clustering and PSO algorithm. Through this algorithm, the stability and TP are increased but delay and overhead are high. After analyzing the earlier research, it is understood that the major drawback of the VANETs is energy consumption and E2E delay is increased during the data communication in the high-speed VANETs. On another hand, several works deal with multi-agent system methods in different domains such as Knowledge Management Processes [26], Air Pollution Assessment, Mobile Ad hoc Networks, Online Customers. To overcome this issue in this paper, SOMAC-based effective CH selection is developed.
33.3 Proposed SOMAC Approach 33.3.1 Network Environment The network model is initiated with the deployment of several communication modules such as vehicles (smart vehicles and ordinary vehicles), cluster head (CH), cluster associate (CA), and roadside units (RSUs). At this point, the vehicles are classified into two types which are smart and ordinary vehicles. All the vehicles will travel in various directions to reach their destination. The smart vehicle acquires some additional facilities such as it can able to communicate with the RSU and CH directly, while the ordinary vehicle can able to communicate RSU only through the CH. All the ordinary vehicles become cluster associates (CAs) once after the section of CH. The speed of the smart vehicles is higher than that of ordinary vehicles. The vehicle which contains similar destination is grouped to form clusters and CH is selected and is accountable for the entire cluster. Other vehicles which are presented inside the cluster become the cluster associate (CA). All the ordinary vehicles become CAs and they communicate the RSU through CH. Smart vehicles are equipped with inbuilt GPS so that they can able to communicate with CH and RSU as per the current situation as well as smart vehicles are the decision-makers. Then, the CH collects the required data from its associates and transfers the information to the RSU which is present inside its coverage area. If the RSU is not present in the coverage area of
33 Intelligent Data Transmission Through Stability-Oriented Multi-agent …
395
the RSU, then inter-cluster communication takes place. The source CH transmits the data to other CH and the RSUs.
33.3.2 Stability-Oriented Multi-agent Clustering (SOMAC) To perform intelligent data transmission in VANETs, Stability-Oriented Multi-agent Clustering (SOMAC) is introduced. Through this SOMAC model, a stable and optimized cluster head (CH) is chosen to perform effective communication in VANETs. The workflow of the SOMAC approach is shown in Fig. 33.1. The main aim of this method is to systematically circulate the load anywhere in the network. The parameters which are considered for CH selection are distance, speed, connectivity, average acceleration and velocity, and residual energy. The node which maintains the highest trust score is recommended to become a CH. Once the CH is chosen, it transmits the HELLO packets to all the vehicles which are existing within the transmission range of the CH. The ordinary vehicle which is readily available to communicate with CH will transmit the reply accept message immediately. The smart
Fig. 33.1 Workflow of the SOMAC approach
396
A. Alsalamy et al.
vehicles will transmit the information directly to the RSU without any intervention from CH. Only in emergencies, if obstacles are present in the path of vehicle-tovehicle transmission, then the smart vehicle will transmit the data to the CH and the RSU. A stable and optimized cluster head (CH) is initiated. The parameters’ calculation for CH selection includes distance, speed, connectivity, average acceleration and velocity, and residual energy. The distance between the vehicles is mathematically expressed as: Dcal =
dmin (x1 , y1 ) + dmax (x1 , y2 ) . 2
(33.1)
In Eq. (33.1), the d min and d max are the minimum and maximum distances of the vehicles from the source vehicle, where x 1 denotes the position of the source vehicle and y1 and y2 are the vehicles present with minimum and maximum distances to the source, respectively. RSU calculates the speed of the vehicle which is expressed as Aspeed =
V DTi ∗ ρ . T0 i=1
(33.2)
In Eq. (33.2), V denotes the total number of vehicles present in the region, DTi denotes the distance traveled by the vehicle, ρ denotes the vehicle density, and T0 denotes the average time taken for the vehicle movement. Vehicle connectivity is measured according to the speed and distance of the vehicle. The vehicle connectivity Cvehicle is expressed as: Cvehicle =
Aspeed ∗ ρ , Dcal ∗ Vmob
(33.3)
where V _mob denotes the vehicle speed. The average acceleration and velocity concerning time T _0 are defined as: Aacc = α
Avel Dcal 1 ∗ ρand Avel = β ∗ , T0 T0 ρ
(33.4)
where α and β denote the experimental constants. The residual energy of the vehicles of the vehicle is defined as the summation of the terms such as energy consumed at the time of communication and the energy consumed at the time of data aggregation. The residual energy is defined as: REv = (E T + E G ).
(33.5)
The possibility of a vehicle becoming a cluster head (CH) is based on the above Eqs. (33.1)–(33.5). The weight factor for the vehicle to become a CH is defined as WeightCH = (α1 ∗ Cvehicle ) + (β1 ∗ Aacc ) + (γ1 ∗ REv ).
(33.6)
33 Intelligent Data Transmission Through Stability-Oriented Multi-agent …
397
In Eq. (33.6), the terms (α1 , β1 , γ1 ) are the experimental constants that satisfy the condition (α1 + β1 + γ1 = 1). The vehicles that occupy the highest WeightCH value capture the maximum probability to become the CH. The CH selection process in the SOMAC approach is explained using a pseudocode below: Start Parameter considered for CH selection • Distance, speed, connectivity, average acceleration, average velocity, residual energy. max (x 1 ,y2 ) Distance between vehicles = Dcal = dmin (x1 ,y1 )+d . 2 V DTi ∗ρ Speed of the vehicle = Aspeed = . T0
i=1 A
∗ρ
Vehicle connectivity = Cvehicle = Dcalspeed . ∗Vmob Average acceleration and average velocity = Aacc = α ATvel0 ∗ ρand Avel = β DTcal ∗ ρ1 . 0 Residual energy = REv = (E T + E G ). Weight factor for CH selection = WeightCH = (α ∗ Cvehicle ) + (β ∗ Aacc ) + (γ ∗ REv ). Stop After the selection of stable CH, the process of cluster formation and maintenance is performed. At the initial stage of cluster formation, the ordinary vehicle transmits the HELLO packets to the CH which is presented in its coverage area and the packets consist of information (such as ID, location, speed, and destination address). The chosen CH receives the packets and then retransmits the SELECTION packet to the source ordinary vehicle. The ordinary vehicle receives the SELECTION packet and stores the information in the routing table, then joins in the cluster, and becomes the cluster associate (CA). The ordinary vehicle may receive many SELECTION packets from various CHs and it selects the CH according to its line of sight toward the destination. VANETs are the network with huge mobility, so it is very essential to perform cluster maintenance; hence, the vehicle joining the cluster and leaving the cluster happens frequently. During the process of joining the cluster, the ordinary vehicle collects the SELECTION packets and gets connected with the CH. Once the vehicle decides the leave the cluster and gets joins another CH it transmits a RELEIVE packet to the CH as well as getting acceptance it will get detached from the current CH and transmit the HELLO packet to others. CH updates its routing table periodically at each instant of time to properly store the collected details. In case the CH wants to get detached from the cluster, then it transmits the RELEIVE packet to the RSU; then once after getting confirmation, new CH selection process will be executed again to select the fresh cluster head.
398
A. Alsalamy et al.
33.4 Performance Analysis The performance of the proposed SOMAC approach is evaluated using NS2.35 (network simulator) under the operating system Ubuntu 12.04. NS2 is one of the well-known academic research tools which is used to analyze the various network models [26]. It is written in two languages, Object Tool Command Language (OTcl) used in the front-end network construction and C++ used in the back end. Simulation of urban mobility (SUMO) is used to generate the mobility of the vehicle. The parameters which are used to analyze the performance of the proposed SOMAC approach are PDR, E2E delay, EE, and TP. The earlier researches which are used for the process of comparative analysis are DGCM [15] and ECRDP [16]. The basic simulation parameters which are required for the process of implementation are shown in Table 33.1.
33.4.1 Packet Delivery Ratio Figure 33.2 shows the graphical analysis of the PDR of the proposed SOMAC approach, and it is compared with the earlier work such as DGCM and ECRDP. The PDR is analyzed in terms of varying the density of the vehicles. From Fig. 33.2, it is proved that the PDR achieved by the proposed SOMAC approach is higher Table 33.1 Simulation parameters
Input parameters
Values
Operating system
Ubuntu 16.04
Software
NS-2.35
Mobility generator
SUMO-1.1.0
Running time
200 ms
Dimension
1000 m * 1000 m
Vehicle density
200 vehicles
Antenna type
Omni-directional antenna
Propagation model
Two ray ground model
Queue type
DropTail
Link bandwidth
1Kbps
Idle time
Random [0, 1]
Available bandwidth
Idle time * link bandwidth
Topology
Urban
Speed
50 km/h
Connection
Multiple
Packet size
512 bytes
33 Intelligent Data Transmission Through Stability-Oriented Multi-agent …
399
Fig. 33.2 Packet delivery ratio versus density of vehicles
Table 33.2 Packet delivery ratio measured values
Density of vehicles
DGCM
ECRDP
SOMAC
40
85
93
97
80
80
94
98
120
78
95
99
160
79
94
97
200
88
96
99
than the earlier methods which is achieved by using the stable clustering model in VANETs. Table 33.2 shows the PDR values of the methods DGCM, ECRDP, and proposed SOMAC and it is varied according to the density of vehicles. The result shows that the PDR achieved by the considered methods such as DGCM, ECRDP, and proposed SOMAC is 88%, 96%, and 99%, respectively, and it proves that the proposed SOMAC approach achieved nearly 3%–11% better PDR when compared with the earlier works.
33.4.2 End-to-End Delay Figure 33.3 shows the graphical analysis of the E2E delay of the proposed SOMAC approach and it is compared with the earlier work such as DGCM and ECRDP. E2E delay is analyzed in terms of varying the density of the vehicles. In Fig. 33.3, it is providing evidence that the E2E delay produced by the proposed SOMAC approach is lower than the earlier methods. For the reason of providing an effective clustering model, delays occurring during the process of communication between the vehicles are reduced.
400
A. Alsalamy et al.
Fig. 33.3 End-to-end delay versus density of vehicles
Table 33.3 End-to-end delay measured values
Density of vehicles
DGCM
ECRDP
SOMAC
40
19
15
8
80
35
28
12
120
43
31
17
160
56
34
21
200
62
41
24
Table 33.3 shows the E2E delay values of the methods DGCM, ECRDP, and proposed SOMAC with varied densities of vehicles. The result shows that the E2E delay produced by the considered methods such as DGCM, ECRDP, and proposed SOMAC is 62 ms, 41 ms, and 24 ms, respectively, and it proves that the proposed SOMAC approach produced 20 ms–40 ms lower E2E once compared with the previous works.
33.4.3 Energy Efficiency Calculation Figure 33.4 illustrates the graphical analysis of the EE of the proposed SOMAC approach with the DGCM and ECRDP. EE is analyzed in terms of varying the density of the vehicles. In Fig. 33.4, it is shown that the EE achieved by the proposed SOMAC approach is higher than the earlier methods. The reason for this achievement is that an effective clustering model is introduced in VANETs. Table 33.4 shows the EE values of the methods DGCM, ECRDP, and proposed SOMAC with varied densities of vehicles. The result shows that the EE produced by the considered methods such as DGCM, ECRDP, and proposed SOMAC is 61%, 78%, and 85%, respectively, and it proves that the proposed SOMAC approach produced 7%–24% higher EE once compared with the previous works.
33 Intelligent Data Transmission Through Stability-Oriented Multi-agent …
401
Fig. 33.4 Energy efficiency versus density of vehicles
Table 33.4 Energy efficiency measured values
Density of vehicles
DGCM
ECRDP
SOMAC
40
61
73
88
80
65
75
91
120
62
74
87
160
60
76
86
200
61
78
85
33.4.4 Throughput Calculation Figure 33.5 illustrates the graphical analysis of the TP of the proposed SOMAC approach with the DGCM and ECRDP. TP is analyzed in terms of varying the density of the vehicles. In Fig. 33.5, it is shown that the TP achieved by the proposed SOMAC approach is higher than the earlier methods. The stable clustering method in VANETs greatly helps to increase the TP during communication. Table 33.5 shows the TP values of the methods DGCM, ECRDP, and proposed SOMAC with varied densities of vehicles. The result shows that the TP achieved by Fig. 33.5 Throughput versus density of vehicles
402 Table 33.5 TP measured values
A. Alsalamy et al.
Density of vehicles
DGCM
ECRDP
SOMAC
40
384
489
554
80
421
507
601
120
458
534
658
160
512
598
725
200
523
668
754
the considered methods such as DGCM, ECRDP, and proposed SOMAC is 523 Kbps, 668 Kbps, and 754 Kbps, respectively, and it proves that the proposed SOMAC approach produced 120 Kbps–200 Kbps higher TP once compared with the previous works.
33.5 Conclusion This research proposed a Stability-Oriented Multi-agent Clustering (SOMAC) approach to reduce the energy consumption and delay in the VANETs. In the SOMAC approach, stable CH is chosen using multi-agent model. Multiple agents are considered for the weight factor of CH calculation. They are distance, speed, connectivity, average acceleration and velocity, and residual energy. Effective CH is selected using this method which greatly reduced the energy consumption of the network during vehicle communication. The simulation is performed through NS2 and SUMO mobility generators. The parameters which are considered for the performance analysis are PDR, E2E, EE, and TP. The previous researches which are used for comparative analysis are DGCM and ECRDP. The results prove that the proposed SOMAC approach produces an 11% high PDR, 40 ms lower E2E delay, 24% higher EE, and 200Kbps high TP. In the future direction furthermore, improving the density of the network optimization model could be considered.
33 Intelligent Data Transmission Through Stability-Oriented Multi-agent …
403
References 1. Khatri, S., Vachhani, H., et al.: Machine learning models and techniques for VANET based traffic management: implementation issues and challenges. Peer Peer Netw. Appl. 14(3), 1778– 1805 (2020) 2. Abbas, A.H., Mansour, H.S., Al-Fatlawi, A.H.: Self-adaptive efficient dynamic multi-hop clustering (SA-EDMC) approach for improving VANET’s performance. Int. J. Interact. Mob. Technol. 17(14) (2022) 3. Abbas, A.H., Ahmed, A.J., Rashid, S.A.: A cross-layer approach MAC/NET with updated-GA (MNUG-CLA)-based routing protocol for VANET network. World Electr. Veh. J. 13(5), 87 (2022) 4. Habelalmateen, M.I., Abbas, A.H., Audah, L., Alduais, N.A.M.: Dynamic multiagent method to avoid duplicated information at intersections in VANETs. TELKOMNIKA Telecommun. Comput. Electr (2020) 5. Abbas, A.H., Audah, L., Alduais, N.A.M.: An efficient load balance algorithm for vehicular ad-hoc network. In: 2018 Electrical Power, Electronics, Communications, Controls and Informatics Seminar (EECCIS), pp. 207–212. IEEE.onics and Control, 18(2), 613–621 (2018, October) 6. Malik, R.Q., Ramli, K.N., Kareem, Z.H., Habelalmatee, M.I., Abbas, A.H., Alamoody, A.: An overview on V2P communication system: architecture and application. In: 2020 3rd International Conference on Engineering Technology and its Applications (IICETA), pp. 174–178. IEEE (2020, September) 7. Kandali, K., Bennis, L., Bennis, H.: A new hybrid routing protocol using a modi_ed K-means clustering algorithm and continuous Hop_eld network for VANET. IEEE Access 9, 47169– 47183 (2021) 8. Katiyar, A., Singh, D., Yadav, R.S.: State-of-the-art approach to clustering protocols in VANET: a survey. Wireless Netw. Netw. 26(7), 5307–5336 (2020) 9. Abbas, A.H., Habelalmateen, M.I., Audah, L., Alduais, N.A.M.: A novel intelligent clusterhead (ICH) to mitigate the handover problem of clustering in VANETs. Int. J. Adv. Comput. Sci. Appl. 10(6) (2019) 10. Mostafa, S.A., Mustapha, A., Ramli, A.A., Jubair, M.A., Hassan, M.H., Abbas, A.H. (2020, July). Comparative analysis to the performance of three mobile ad-hoc network routing protocols in time-critical events of search and rescue missions. In: International Conference on Applied Human Factors and Ergonomics, pp. 117–123. Springer, Cham 11. Abbas, A.H., Habelalmateen, M.I., Jurdi, S., Audah, L., Alduais, N.A.M.: GPS based location monitoring system with geo-fencing capabilities. In: AIP Conference Proceedings, vol. 2173, no. 1, p. 020014. AIP Publishing LLC (2019, November) 12. Husnain, G., Anwar, S.: An intelligent cluster optimization algorithm based on whale optimization algorithm for VANETs (WOACNET). PLoS One 16(4), (2021). article id: e0250271, https://doi.org/10.1371/journal.pone.0250271 13. Khan, Z., Koubaa, A., Fang, S., Lee, M.Y., Muhammad, K.: A connectivity-based clustering scheme for intelligent vehicles. Appl. Sci. 11(2413), 1–15 (2021). https://doi.org/10.3390/app 11052413 14. Elira, B., Keerthana, K.P., Balaji, K.: Clustering scheme and destination aware context based routing protocol for VANET. Int. J. Intell. Netw. 2, 148–155 (2021). https://doi.org/10.1016/j. ijin.2021.09.006 15. Gillani, M., Niaz, H.A., Ullah, A., Farooq, M.U., Rehman, S.: Traffic aware data gathering protocol for VANETs. IEEE Access 10, 23438–23449 (2022). https://doi.org/10.1109/ACC ESS.2022.3154780 16. Abbasi, H.I., Voicu, R.C., Copeland, J.A., Chang, Y.: Towards fast and reliable multihop routing in VANETs. In: IEEE Transactions on Mobile Computing, vol. 19, no. 10, pp. 2461–2474 (2020). https://doi.org/10.1109/TMC.2019.2923230
404
A. Alsalamy et al.
17. Jubair, M.A., Hassan, M.H., Mostafa, S.A., Mahdin, H., Mustapha, A., Audah, L.H., Abbas, A.H.: Competitive analysis of single and multi-path routing protocols in mobile Ad-Hoc network. Indonesian J. Electr. Eng. Comput. Sci. 14(2) (2019) 18. Hassan, M.H., Jubair, M.A., Mostafa, S.A., Kamaludin, H., Mustapha, A., Fudzee, M.F.M., Mahdin, H.: A general framework of genetic multi-agent routing protocol for improving the performance of MANET environment. IAES Int. J. Artif. Intell. 9(2), 310 (2020) 19. Shah, Y.A., et al.: An evolutionary algorithm-based vehicular clustering technique for VANETs. IEEE Access 10, 14368–14385 (2022). https://doi.org/10.1109/ACCESS.2022.3145905 20. Ahmed, G.A., Sheltami, T.R., Mahmoud, A.S., Imran, M., Shoaib, M.: A novel collaborative IoD-assisted VANET approach for coverage area maximization. IEEE Access 9, 61211–61223 (2021). https://doi.org/10.1109/ACCESS.2021.3072431 21. Singh, G.D., Prateek, M., Kumar, S., Verma, M., Singh, D., Lee, H.-N.: Hybrid genetic firefly algorithm-based routing protocol for VANETs. IEEE Access 10, 9142–9151 (2022). https:// doi.org/10.1109/ACCESS.2022.3142811 22. Thakur, P., Ganpati, A.: MhCA: a novel multi-hop clustering algorithm for VANET. Int. J. Next-Gener. Comput. 12(4) (2021). https://doi.org/10.47164/ijngc.v12i4.309 23. Temurnikar, A., Verma, P., Dhiman, G.: A PSO enable multi-hop clustering algorithm for VANET. Int. J. Swarm Intell. Res. 13(2), 1–14 (2022). https://doi.org/10.4018/IJSIR.202204 01.oa7 24. Jabbar, M.K., Trabelsi, H.: A novelty of hypergraph clustering model (HGCM) for urban scenario in VANET. IEEE Access, vol. 10, pp. 66672–66693 (2022). https://doi.org/10.1109/ ACCESS.2022.3185075 25. Kandali, K., Bennis, L., et al.: An intelligent machine learning based routing scheme for VANET. IEEE Access 10, 74318–74333 (2022). https://doi.org/10.1109/ACCESS.2022.319 0964 26. Ali, R.R., Mostafa, S.A., Mahdin, H., Mustapha, A., Gunasekaran, S.S.: Incorporating the Markov Chain model in WBSN for improving patients’ remote monitoring systems. In: International Conference on Soft Computing and Data Mining, pp. 35–46. Springer, Cham (2020, January)
Chapter 34
Improved Chicken Swarm Optimization with Zone-Based Epidemic Routing for Vehicular Networks Nejood Faisal Abdulsattar, Ahmed H. Alkhayyat, Fatima Hashim Abbas, Ali S. Abosinnee, Raed Khalid Ibrahim, and Rabei Raad Ali
Abstract Vehicular Ad hoc Network (VANET) has recently grown in popularity as a key component of intelligent transportation systems (ITS). To optimize the data transfer, an effective routing protocol must be developed. Currently, in VANETs, traffic and congestion are increasing, which has a direct impact on network efficiency. In VANETs, various reactive and proactive routing protocols are used to transport data from source to destination. To improve network efficiency, a hybrid routing protocol based on the Zone-based Epidemic Routing Protocol (ZER) is employed in this study. In VANETs, the Improved Chicken Swarm Optimization (ICSO) algorithm is utilized to find the best optimal solution. For performance evaluation, energy efficiency, packet delivery ratio, routing overhead, and end-to-end delay are considered. When compared to previous works, the proposed ICSO-ZER protocol produces (15%) higher efficiency, (14% higher packet delivery ratio), (1500 packets) lower routing N. F. Abdulsattar (B) Department of Computer Technical Engineering, College of Information Technology, Imam Ja‘afar Al-Sadiq University, Al-Muthanna, 66002 Samawah, Iraq e-mail: [email protected] A. H. Alkhayyat College of Technical Engineering, The Islamic University, Najaf, Iraq e-mail: [email protected] F. H. Abbas Medical Laboratories Techniques Department, Al-Mustaqbal University College, 51001 Hillah, Babil, Iraq e-mail: [email protected] A. S. Abosinnee Altoosi University College, Najaf, Iraq R. K. Ibrahim Department of Medical Instruments Engineering Techniques, Al-Farahidi University, Baghdad, Iraq e-mail: [email protected] R. R. Ali National University of Science and Technology, DhiQar, Nasiriyah, Iraq e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_34
405
406
N. F. Abdulsattar et al.
overhead, and (200 ms) shorter end-to-end delay. The proposed ICSO-ZER protocol is also compared to the ACO-AODV and EEMP-ZRP protocols.
34.1 Introduction Vehicular Ad hoc Network (VANET) is a trending technology that is used for several applications in intelligent transmission systems (ITS) which employ moving vehicles to construct the wider network. To transmit information from one place to another, those vehicles are utilized in VANETs. Because of the huge dynamic mobility of VANETs, connection loss between the vehicles occurs with frequently changing topology [1–4]. Standard VANET network contains two kinds of communication systems (i) Vehicle-to-Vehicle (V2V) Communication as well as (ii) Vehicle-toInfrastructure Communication, and these are well-known infrastructure-less models. Due to the inbuilt characteristic, communication becomes complicated in VANETs. The additional dispute in VANETs is the environmental condition of vehicles, data dissemination, and defects due to radio communication and interference issues [5, 6]. As so to overcome such challenges of VANETs, it is essential to create an efficient routing protocol [7–9]. In general, routing protocols are used to obtain the optimal path to transfer the data from the source to the destination nodes. It tries to transfer the packets using minimum hop count as well as track the node inside the coverage area. Zone routing protocol (ZRP) is one of the hybrid routing protocols which unites the advantages of both the proactive and reactive routing protocols. In ZRP, routing performance is based on the node coverage zone [10]. It helps to transfer the data with less delay and packet loss during communication with minimum bandwidth utility. Currently, VANETs are used for high-speed technologies, so it becomes vital to improve the path selection in the routing protocol to achieve better performance. Thus, meta-heuristic optimization algorithms are introduced to improve the performance of routing in VANETs. CSO algorithm is one of the bio-inspired optimization techniques which utilizes the principle of the chicken swarm and divides it into several divisions to discover the optimal one. It helps to enhance the performance of the routing protocol to achieve better performance [11]. The main objectives of the study are: (i) discuss the stable and optimal path routing protocol to handle the uncertain VANET conditions; (ii) zone routing protocol is a hybrid routing protocol that is recommended to find the optimal path for vehicle communication; (iii) to find the best optimal solution, the zone routing protocol is combined with improved CSO (ICSO) algorithm; (iv) to analysis the performance of the network, considered parameters are energy efficiency, packet delivery ratio, routing overhead, and end-to-end delay. The paper consists of four sections. Section 34.2 discusses the earlier research on the zone routing protocol (ZRP) and CSO. Section 34.3 presents the ZRP routing protocol. Section 34.4 shows the simulation analysis and the results. Section 34.5 includes the conclusion and the future direction of the research.
34 Improved Chicken Swarm Optimization with Zone-Based Epidemic …
407
34.2 Related Works [12] presented Secure Tilted-Rectangular-Shaped Request Zone Location-Aided Routing Protocol (STRS-RZLAR) for VANETs which helps to protect the network from the man-in-the-middle attack (MITMA). This method achieves a high packet delivery ratio but fails to achieve high throughput and lower overhead. [13] proposed a method to improve the maintaining routes between nodes called Zone-based Routing with Parallel Collision-Guided Broadcasting Protocol (ZCG). Through this method, high reliability and lower consumption of energy are achieved, but this method creates move routing overhead and delay. [14] presented an extended research called the zone-based energy-efficient routing protocol. This method provides effective clustering with a genetic algorithm to find optimal path-distributed networks. Through this method, the lifetime is increased and packet drop is reduced, but it is not suitable for the network with high throughput. [15] introduced a novel approach called Fuzzy Bacterial Foraging Optimization ZoneBased Routing (FBFOZBR) protocol. Through this method, the network stability is improved, but it fails to improve the packet delivery ratio and throughput of the network. [16] proposed the CSO-based Clustering with a Genetic Algorithm (CSOC-GA) to improve the lifetime and reduce the consumption of energy. When this technique is applied to VANETs, it creates more routing overhead. [17] introduced a novel method called CSO and Adaptive Neuro-Fuzzy System. This method achieves high scalability but fails to achieve high throughput and packet delivery ratio. [18] proposed a CDO algorithm to improve the traffic efficiency of the urban expressway. This method creates more overhead and delay. [19] introduced an extended CSO algorithm to rectify the multi-objective optimization issue. It requires some betterment to achieve faster convergence. [20] presented a novel method with the combination of Ant Colony Optimization (ACO) technique and Ad hoc demand routing protocol (AODV) to find the shortest path to transfer the data in VANETs. This method achieves high throughput, low loss, and delay but failed to achieve high efficiency and packet delivery ratio during communication. [21–23] presented a zone-based energy efficiency multi-path routing protocol to reduce routing overhead and energy consumption during the process of data transmission, but it fails to achieve a higher packet delivery ratio. By analysis of the earlier works, it is understood that the development of an effective routing protocol for data transmission in VANETs is still in an open research area. In this research, we developed a Zone-based Epidemic Routing Protocol (ZER) with the combination of ICSO to improve the performance of VANETs.
408
N. F. Abdulsattar et al.
34.3 ICSO-ZER Routing Protocol The proposed ICSO-ZER protocol is produced by combining the ZRP protocol ICSO algorithm to find the optimal solution for data transmission in the VANETs’ environment. The process of the proposed ICSO-ZER routing protocol is shown in Fig. 34.1.
34.3.1 Zone-Based Epidemic Routing Protocol Zone-based Epidemic Routing Protocol is a hybrid topology-based routing protocol where each node in the network uses various routing protocols from reactive and proactive types depending on the requirement. In the maximum of cases, proactive routing protocols are used during the process of communication within the zones (intra-zone routing) as well as reactive routing protocols are used for communication between zones (inter-zone routing). ZER protocol employs a zonal model where the nodes find their neighbors in a certain radius. The nodes which are present inside the zone are categorized into two sections, they are peripheral and interior nodes. Peripherals are referred to as that the nodes maintain a similar radius to the source node and the other nodes are declared as interiors.
34.3.1.1
Zone Formation
At the initial condition, the region is distributed into non-overlapped zones where the zones are dynamic in nature. Zone maintains the standard information of the nodes such as mobility pattern, energy level, and transmission path toward destination. Hence, the network is dynamic, and the zones always try to join new nodes. The zones are created based on the geography of the region where each node inside the zone maintains its ID and routing information in the routing table. Once after entering into the zones, the nodes transmit the HELLO packets to nodes that are in the coverage area to understand the position of their neighbors and then store the details in the routing table. For better communication, zone provides two types of data transmission which are node-based (intra-zone) communication and zonebased (inter-zone) communication. Link state packets (LSPs) are transmitted after Fig. 34.1 Process of the proposed ICSO-ZER routing protocol
34 Improved Chicken Swarm Optimization with Zone-Based Epidemic …
409
the transmission of the HELLO packets which corresponds to the types of data transmission. To control each zone, a Zone-Leader (ZL node) is elected randomly [24] which monitors the functionalities of the zones systematically. In case any failures in ZL occur, new ZL is selected instantly. The only characteristic of the selection of ZL is centralization as it can able to cover the maximum number of nodes inside the zone. ZL controls the entire transmission among the zone such as data forwarding and routing table. The ZL selection in the zone and ZL leaving the zone all these details are recorded in the routing table for observation.
34.3.1.2
Communication Method
According to the coverage area of the ZL, zones are created. The nodes in the coverage area of the ZL enter into the respective zones and transmit the HELLO packets to understand the neighbors. The HELLO message maintains the two-hop neighbor information which helps for node-based communication inside the zone. The simulator code for ZL election and path selection is given below. Section 1: Zone-Leader Selection ZLList::AddZL(Nsaddr_taddr, Head_ Addr, Ctable_ent IntcoveredFlag) { ZL *NewZL = rand(Void). Section 2: ZL Routing Table Creation Zone Table::AddZoneTable(Head_ Addr, Ctable_ent) Section 3: Path Selection IERPAGENT::PathSelection (Head_addr, Ctable_ent)
34.3.2 Background of CSO Algorithm The CSO optimization is performed based on the movement and the performance of the chicken swarm. CSO is the combination of several groups and each group maintains its leader rooster, hens, and chicks as well as this category is based on its fitness values. The chicken which contains the maximum fitness value is elected as a leader rooster. The remaining chickens are hens and the chicks, and it maintains the mother–child relationship. The systematic process of CSO optimization is shown in Fig. 34.2.
34.3.3 ICSO Algorithm To achieve effective node-based and zone-based communication, ICSO algorithm is used. In node-based communication, data transmission is performed between the
410
N. F. Abdulsattar et al.
Fig. 34.2 Flowchart of CSO
nodes and the ZL. In zone-based communication, data transmission is performed between the ZLs. It is essential to find the optimal path for both these kinds of transmissions. CSO algorithm is not effective for dynamically varying networks; hence, it is improved when combined with Ant Colony Optimization (ACO) algorithm. The combination of CSO and ACO creates the ICSO which makes it suitable for the network with huge randomly varying mobility. The systematic procedure for the ICSO optimization algorithm is elaborated. Each group in the algorithm consists of a leader (L), mother (M), and chicks (C) as well as all the chickens maintain a separate fitness value. The fitness value is measured according to the initial energy (IE), remaining energy (RE), average distance (AD), and connection strength (CS) [25]. The mathematical equation to evaluate the connection strength is given below: CS(x, y) =
CAx CAx × . AD(x, y) IE(x, y)
(34.1)
In Eq. (34.1), the CAx denotes the coverage area of the source, AD(x, y) denotes the distance between the source and the destination, and finally, IE(x, y) is denoted the initial energy of the source and the destination. Then, the fitness value is expressed as below:
34 Improved Chicken Swarm Optimization with Zone-Based Epidemic …
F_value = (RE(x, y) + CS(x, y))/(AD(x, y)).
411
(34.2)
According to Eq. (34.2), the chicken with the best F_value is elected as a leader. The leader controls the other chicken during the process of good search. The leaders from all the groups initiate the food search as follows: L_((x, y))(t + 1) = L_((x, y))(t) × F L × (L_((z, y))(t) − L_((x, y))(t)). (34.3) In Eq. (34.3), L_((x, y)) (t) is the leader position, z is the hen position, and FL is the probability of the chicken that follows the leader. Through Eq. (34.3), the optimal path is updated. Then to find the best possible solution from Eq. (34.3), Ant Colony Optimization (ACO) is performed. To determine the worth of this obtained path, an operation is evaluated. The leaders are already formed and the evaluation is done between the fitness values. The optimal best solution is obtained from the L (x, y) (t + 1) value using ant optimization OBS = W1 ∗ L (x,y) (t + 1) + W2 ∗ L (x,y) (t + 1).
(34.4)
In Eq. (34.4), the weight factor (W 1 and W 2 ) lies between 0 and 1 randomly according to the fitness values of the leader. The OBS is found according to the updates of pheromone. OBSph (t + 1) = (1 − ρ) ∗ L (x,y) (t + 1) +
1 − OBSph (t) . 1 − OBSph (t)
(34.5)
In Eq. (34.5), the term ρ denotes the evaporate rate and 〖OBS〗_ph (t + 1) is denoted the optimal best solution obtained from the ICSO algorithm.
34.4 Performance Analysis To analyze the performance of the proposal, the NS2.35 software is used, which is a combination of two input languages, Tool Command Language (TCL) as a front-end programming language and C++ as a back-end programming language. The trace file for measuring the parameter values is created. As a result, Network Animator (NAM) is used to view the structure and operation of the network. 1500 m * 1500 m is the simulation coverage area. And the total number of vehicles in the network is 200. Table 34.1 shows the input parameters for the simulation method that were explored. The parameters used to evaluate the performance of the proposed method are energy efficiency, packet delivery ratio, routing overhead, and end-to-end latency, as
412 Table 34.1 Simulation parameters
N. F. Abdulsattar et al.
Parameters
Values
NS version
NS-2.35
Running time
100 ms
Coverage region
1500 m * 1500 m
No. of vehicles
200 vehicles
Antenna type
Omni-directional antenna
Propagation model
Two-ray ground model
Transmission range
5 km
Queue type
DropTail
Speed
100 km/h
Agent trace
ON
Movement trace
ON
Transmission power
0.500 J
Receiving power
0.050 J
well as comparisons to previous studies such as ACO-AODV [13] and EEMP-ZRP [14].
34.4.1 Energy Efficiency Calculation It is referred to as the consumed energy reduction from the initial residual energy of the network. Figure 34.3 shows the graphical representation of the proposed ICSOZEP protocol compared with the ACO-AODV and EEMP-ZRP in terms of energy efficiency. Table 34.2 shows the value analysis of the methods. The energy efficiency of the proposed ICSO-ZEP protocol is high when compared with the earlier works which is achieved with the help of improved CSO in zone routing. Through this algorithm, network stability is greatly improved. The efficiency of the protocols such as ACOAODV (82%), EEMP-ZRP (89%), and the proposed ICSO-ZEP (97%). From the value calculation, it is understood that the efficiency of the proposed ICSO-ZEP protocol is 8–15% better than others.
34.4.2 Packet Delivery Ratio Calculation It is referred to as the ratio of the number of packets received to the number of packets transmitted in the network. Figure 34.4 shows the graphical representation of the proposed ICSO-ZEP protocol compared with the ACO-AODV and EEMP-ZRP in terms of packet delivery ratio.
34 Improved Chicken Swarm Optimization with Zone-Based Epidemic …
413
Fig. 34.3 Energy efficiency calculation Table 34.2 Energy efficiency value table
No. of vehicles
EEMP-ZRP
ICSO-ZRP
25
8.15
13.25
49.76
50
15.31
25.31
55.23
75
21.72
29.71
61.71
100
35.48
43.46
69.17
125
49.77
55.11
75.42
150
62.76
69.33
78.41
175
75.46
81.74
89.13
200
82.16
89.46
97.46
Fig. 34.4 Packet delivery ratio calculation
ACO-AODV
414 Table 34.3 Packet delivery ratio calculation
N. F. Abdulsattar et al.
No. of vehicles
EEMP-ZRP
ICSO-ZRP
25
ACO-AODV 6.28
25.13
55.34
50
12.31
29.74
61.37
75
22.76
45.29
65.47
100
31.24
51.79
73.96
125
48.46
65.47
79.46
150
64.28
71.69
85.16
175
73.47
76.49
92.75
200
85.13
88.17
99.23
Table 34.3 shows the value analysis of the methods. The packet delivery ratio achieved by the proposed ICSO-ZEP is higher than the earlier methods for the reason that it works with the improved CSO algorithm. To a great extent, this algorithm reduced the routing overhead which reflects the improvement in packet delivery ratio. The packet delivery ratio of the protocols such as ACO-AODV (85%), EEMPZRP (88%), and the proposed ICSO-ZEP (99%). From the value calculation, it is proven that the packet delivery ratio of the proposed ICSO-ZEP is 11%–14% better than earlier research works.
34.4.3 Routing Overhead It is referred to as the forwarded control packets during the process of communication. The increase in forwarded control packets affects the efficiency of the network. Figure 34.5 shows the routing overhead measures for the methods such as ACOAODV, EEMP-ZRP, and the proposed ICSO-ZEP protocol. Table 34.4 shows the value analysis of the methods. Improved CSO in zone-based routing reduces the routing overhead during transmission, so the results are better than the earlier works. The overhead of the protocols such as ACO-AODV (2458 packets), EEMP-ZRP (1867 packets), and the proposed ICSO-ZEP (928 packets). From the analysis, it is shown that the overhead of the proposed ICSO-ZEP is 900–1500 packets lower than earlier research works.
34.4.4 End-to-End Delay It is referred to as the calculation of the delay that occurred during the process of communication in the network. Figure 34.6 shows the end-to-end delay measures for the methods such as ACO-AODV, EEMP-ZRP, and the proposed ICSO-ZEP protocol.
34 Improved Chicken Swarm Optimization with Zone-Based Epidemic …
415
Fig. 34.5 Routing overhead calculation Table 34.4 Routing overhead value table
No. of vehicles
ACO-AODV
EEMP-ZRP
ICSO-ZRP
25
247
196
96
50
476
347
168
75
625
528
256
100
964
798
396
125
1279
1138
528
150
1564
1276
684
175
1867
1534
865
200
2458
1867
928
Fig. 34.6 Routing overhead calculation
416 Table 34.5 End-to-end delay value table
N. F. Abdulsattar et al.
No. of vehicles
ACO-AODV
EEMP-ZRP
ICSO-ZRP
25
52.49
26.17
15.96
50
82.47
76.27
25.11
75
124.13
96.47
39.74
100
157.28
124.82
51.49
125
186.47
153.13
62.17
150
224.38
186.17
72.48
175
265.47
215.17
86.28
200
305.45
253.76
102.47
Table 34.5 shows the value analysis of the methods. Through Improved–CSO stability of the vehicles is increased that reducing the delay during communication. The end-to-end delay of the protocols such as ACO-AODV (305 ms), EEMP-ZRP (253 ms), and the proposed ICSO-ZEP (102 ms). From the value calculation, it is proven that the end-to-end delay of the proposed ICSO-ZEP is 150–200 ms lower than earlier research works.
34.5 Conclusion This research proposed a hybrid routing protocol for uncertain VANET communication called zone routing protocol. To provide the optimal solution during the process of communication in VANETs, ICSO is performed with zone routing protocol. Results of the proposed ICSO-ZEP protocol improve the network’s reliability in terms of energy efficiency, packet delivery ratio, routing overhead, and end-to-end delay. Likewise, the occurred results are compared with the earlier research work such as ACO-AODV and EEMP-ZRP. Compared with this earlier work, the proposed ICSOZEP protocol achieves (8–15%) higher efficiency, (8–14%) higher packet delivery ratio, (900 packets–1500 packets) lower routing overhead, and (150–200 ms) lower end-to-end delay when compared to with the earlier works. The future direction can increase vehicle strength to deal with dense traffic areas and high-speed networks.
References 1. Setiabudi, A., Pratiwi, A.A. et al.: Performance comparison of GPSR and ZRP routing protocols in VANET environment. IEEE Region 10 Symp. (TENSYMP) (2016) 2. Abbas, A.H., Audah, L., Alduais, N.A.M.: An efficient load balance algorithm for vehicular ad-hoc network. In: 2018 Electrical Power, Electronics, Communications, Controls and Informatics Seminar (EECCIS), pp. 207–212. IEEE (2018, October)
34 Improved Chicken Swarm Optimization with Zone-Based Epidemic …
417
3. Habelalmateen, M.I., Abbas, A.H., Audah, L., Alduais, N.A.M.: Dynamic multiagent method to avoid duplicated information at intersections in VANETs. TELKOMNIKA (Telecommun. Comput. Electr. Control) 18(2), 613–621 (2020) 4. Malik, R.Q., Ramli, K.N., Kareem, Z.H., Habelalmatee, M.I., Abbas, A.H., Alamoody, A.: An overview on V2P communication system: architecture and application. In: 2020 3rd International Conference on Engineering Technology and its Applications (IICETA), pp. 174–178. IEEE (2020, September) 5. Maranur, J.R., Mathpati, B.: VANET: vehicle to vehicle communication using moving zone based routing protocol. In: International Conference on Electrical, Electronics, Communication, Computer, and Optimization Techniques (ICEECCOT) (2018) 6. Abbas, A.H., Habelalmateen, M.I., Audah, L., Alduais, N.A.M.: A novel intelligent clusterhead (ICH) to mitigate the handover problem of clustering in VANETs. Int. J. Adv. Comput. Sci. Appl. 10(6) (2019) 7. Jubair, M.A., Hassan, M.H., Mostafa, S.A., Mahdin, H., Mustapha, A., Audah, L.H., Abbas, A.H.: Competitive analysis of single and multi-path routing protocols in mobile Ad-Hoc network. Indonesian J. Electr. Eng. Comput. Sci. 14(2) (2019) 8. Mansour, H.S., Mutar, M.H., Aziz, I.A., Mostafa, S.A., Mahdin, H., Abbas, A.H., Jubair, M.A.: Cross-layer and energy-aware AODV routing protocol for flying Ad-Hoc networks. Sustainability 14(15), 8980 (2022) 9. Abbas, A.H., Ahmed, A.J., Rashid, S.A.: A cross-layer approach MAC/NET with updated-GA (MNUG-CLA)-based routing protocol for VANET network. World Electr. Veh. J. 13(5), 87 (2022) 10. Nurwarsito, H., Umam, M.Y.: Performance analysis of temporally ordered routing algorithm protocol and zone routing protocol on vehicular Ad-Hoc network in urban environment. Int. Sem. Res. Inf. Technol. Intell. Syst. (ISRITI) (2020) 11. Deb, S., Gao, X.-Z., et al.: Recent studies on chicken swarm optimization algorithm: a review (2014–2018). Artif. Intell. Rev. 53(3), 1737–1765 (2020) 12. Saleh, M.: Secure tilted-rectangular-shaped request zone location-aided routing protocol (STRS-RZLAR) for vehicular ad hoc networks. Proc. Comput. Sci. 160, 248–253 (2019) 13. Basurra, S.S., Vos, MarinaDe, et al.: Energy efficient zone based routing protocol for MANETs. Ad Hoc Netw. 25, 16–37 (2015) 14. Srivastava, J.R., Sudarshan, T.S.B.: A genetic fuzzy system based optimized zone based energy efficient routing protocol for mobile sensor networks (OZEEP). Appl. Soft Comput. 37, 863– 886 (2015) 15. Mehta, K., Bajaj, P.R., Malik, L.G.: Fuzzy bacterial foraging optimization zone based routing (FBFOZBR) protocol for VANET. In: International Conference on ICT in Business Industry & Government (ICTBIG) (2017) 16. Osamy, W., El-Sawy, A.A., et al.: CSOCA: chicken swarm optimization based clustering algorithm for wireless sensor networks. IEEE Access 8, 60676–60688 (2020) 17. Tamtalini, M.A., El Alaoui, A.E.B. et al.: ESLC-WSN: a novel energy efficient security aware localization and clustering in wireless sensor networks. In: International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET) (2020) 18. Ci, Y., Wu, H., et al.: A prediction model with wavelet neural network optimized by the chicken swarm optimization for on-ramps metering of the urban expressway. J. Intell. Trans. Syst. (2022) 19. Zouache, D., Arby, YahyaOuld, et al.: Multi-objective chicken swarm optimization: a novel algorithm for solving multi-objective optimization problems. Comput. Ind. Eng. 129, 377–391 (2019) 20. Sindhwani, M., Singh, R., et al.: Improvisation of optimization technique and AODV routing protocol in VANET. Mater. Today Proc. 49(8), 3457–3461 (2022) 21. Sahu, R., Rizvi, M.A., et al.: Routing overhead performance study and evaluation of zone based energy efficient multi-path routing protocols in MANETs by using NS2.35. In: International Conference on Advances in Technology, Management and Education (ICATME) (2021)
418
N. F. Abdulsattar et al.
22. Abbas, A.H., Mansour, H.S., Al-Fatlawi, A.H.: Self-adaptive efficient dynamic multi-hop clustering (SA-EDMC) approach for improving VANET’s performance. Int. J. Inter. Mob. Technol. 17(14) 23. Mostafa, S.A., Mustapha, A., Ramli, A.A., Jubair, M.A., Hassan, M.H., Abbas, A.H. (2020, July). Comparative analysis to the performance of three Mobile ad-hoc network routing protocols in time-critical events of search and rescue missions. In: International Conference on Applied Human Factors and Ergonomics, pp. 117–123. Springer, Cham 24. Mehta, D., Kashyap, I., Zafar, S.: Random cluster head selection based routing approach for energy enrichment in MANET. In: International Conference on Recent Innovations in Signal processing and Embedded Systems (RISE) (2017) 25. Ali, R.R., Mostafa, S.A., Mahdin, H., Mustapha, A., Gunasekaran, S.S.: Incorporating the Markov Chain model in WBSN for improving patients’ remote monitoring systems. In: International Conference on Soft Computing and Data Mining, pp. 35–46. Springer, Cham
Chapter 35
Trust Management Scheme-Based Intelligent Communication for UAV-Assisted VANETs Sameer Alani, Aryan Abdlwhab Qader, Mustafa Al-Tahai, Hassnen Shakir Mansour, Mazin R. AL-Hameed, and Sarmad Nozad Mahmood
Abstract Vehicular Ad hoc Network (VANET) is serving for various applications of Intelligent Transmission System (ITS) where vehicle can easily able to communicate to other vehicles and to the infrastructure in an independent manner. Several problems are identified in VANETs due to highly dynamic mobility patterns. They are attackers which can easily able to capture in network, and huge obstacles are present in the ground level, connectivity loss, and so on. In order to save the vehicles from the ground-level obstacle, Unmanned Aerial Vehicles (UAVs) are introduced in vehicular communication that collects the data from the vehicle in air medium so that it can get escape from the obstacles in the ground level. In this paper, to secure the network from attackers, Trust Management Scheme (TMS) is proposed in UAV-Assisted S. Alani (B) University of Mashreq, Research Center, Baghdad, Iraq e-mail: [email protected] A. A. Qader Department of Computer Technical Engineering, Bilad Alrafidain University College, 32001, Diyala Baqubah, Iraq e-mail: [email protected] M. Al-Tahai Department of Medical Instruments Engineering Techniques, Al-Farahidi University, Baghdad, Iraq e-mail: [email protected] H. S. Mansour Department of Computer Technical Engineering, College of Information Technology, Imam Ja‘afar Al-Sadiq University, Al-Muthanna, 66002 Samawah, Iraq M. R. AL-Hameed College of Engineering Technology, Department of Medical Device Industry Engineering, Dhi Qar, National University of Science and Technology, Nasiriya, Iraq e-mail: [email protected] S. N. Mahmood Computer Technology Engineering, College of Engineering Technology, Al-Kitab University, Kirkuk, Iraq © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_35
419
420
S. Alani et al.
VANETs (UAVs). Through this method, trust evaluation is performed in both the communication modules of VANETs such as vehicle-to-vehicle data transmission and vehicle-to-UAVs’ data transmission which greatly helps to secure the network from the external threads. The proposed TMS-UAVs’ protocol has been validated using NS2 and SUMO. The simulation is performed and the results are compared with the earlier works such as ES-UAVs and PRO-UAVs. The simulation results of the proposed TMS-UAVs’ protocol confirm encouraging performance in terms of higher packet delivery ratio, lower end-to-end delay, lower average number of hops, and lower overhead.
35.1 Introduction VANET technology is an emerging field where maximum of the researches are performed day by day. Various applications are incorporated with VANETs to perform intelligent communication [1–3]. VANETs consist of certain special characteristics, and it has highly volatile topology, unpredictable mobility pattern, and high speed and mobility, which makes the system high complex and provides a way to the occurrence of link failure on the recognized routes [4]. In general, there are two types of communication present in VANETs; they are vehicle-to-vehicle (V–V) communication and vehicle-to-infrastructure (RSUs—Roadside Units) (V–I) communication. Several VANETs’ protocols are present to improve network performance, but it is applicable only for certain extent and still it is suffering because of high mobility. As the result developing a new model to provide an effective communication in VANETs are still an open research area where maximum of the researchers are learning, and developing a novel idea to overcome the issue [5–7]. The major issues which are highly needed to get addressed are due to large number of obstacles present in the network; vehicle-to-vehicle communication performance becomes ineffective. As so to overcome these drawbacks, UAV-Assisted VANETs (UAVs) are practiced nowadays, where vehicles can able to transfer the data packets to the UAVs in an air medium where there is no obstacle link in ground level. UAVs play like an alternative for the RSUs, and these UAVs are otherwise called as roadabove-unit (RAU). The special characteristics of UAVs are it is high flexible and automatic. Furthermore, the effectiveness produced between vehicles to UAVs is much better than the effectiveness of the communication between vehicles to RSUs [8, 9]. Even though UAV-Assisted VANETs provides promising performance there are other certain challenges present in VANETs is that due to lack of security packet loss and delay occurs during the process of communication. Due to high mobility nature of VANETs, it is very easy for the attackers to capture the network. So it is very essential to develop a trust-based approach to overcome this drawback. In this paper, intelligent communication is proposed in UAV-Assisted VANETs (UAVs) networks. The main contribution of the research is given. The contribution of the research is: (i) in order to protect the ground-level vehicle-to-vehicle communication from the road obstacles, UAV-Assisted VANETs (UAVs) are introduced. Through this system,
35 Trust Management Scheme-Based Intelligent Communication …
421
vehicles can transmit the data in the air medium as that no packet loss and delay happen due to the ground-level obstacles in the VANETs. (ii) To protect the vehicles and UAVs from the external threads/attackers, Trust Management Scheme (TMS) is introduced in UAV-Assisted VANETs (UAVs). (iii) Through the Trust Management Scheme (TMS), trust estimation is performed in vehicle and UAVs separately so that it can be protected from the external threads/attackers in an effective manner. The rest of the paper is organized as follows. In Sect. 35.2, the earlier trust modelbased researches in VANETs are analyzed. In Sect. 35.3, the network model is elaborated. In Sect. 35.4, the proposed Trust Management Scheme (TMS) in UAV-Assisted VANETs (UAVs) is presented. In Sect. 35.5, performance analysis is performed and the results are shown. In Sect. 35.6, the conclusion and the future direction are discussed.
35.2 Related Works In this section, various trust models in VANETs are analyzed. [10] proposed a trust-based algorithm to improve the effectiveness of VANETs in terms of network overhead, but other primary core parameters are not taken into consideration. [11, 12] proposed trust model to protect the network from Man-in-the-middle attack. This method produces moderate results which is not appropriate for the highspeed VANETs. [13] developed a trust-based framework to improve the security level of VANETs. This method produces low packet delivery ratio and throughput. [14–16] proposed a dynamic entity-centric trust method to improve the traffic quality of VANETs. Through this method, fast trust evaluation is performed. The overall performance of this network is moderate, and it produces high overhead during communication. [17] proposed a two-level detection approach to improve the trust scores of VANETs. This method created more packet loss during communication between vehicles due to the obstacles in the ground level. [18] reviewed the trust and cryptography-based approaches in VANETs and the results found that in maximum of the works, there is lack of security as well as due to obstacles’ packet loss and delay occurred. [19, 20] presented a trust-based adaptive privacy preserving authentication scheme to improve VANETs’ security and vehicle are allocated with various trust level that leads to the increase of overhead during communication. [21] developed a hybrid optimization-based Deep Maxout Network (DMN) with trust-based clustering approach to improve the energy and trust level of VANETs, but this method fails to achieve lower overhead. [22] proposed Optimized Link State Routing (OLSR) in VANETs blockchain which is used to improve the network security and reduce the overhead. But this method is applicable for the network with lower density of vehicle. After analyzing the earlier study, it is concluded that the major drawbacks in VANETs are due to obstacle packet loss which is occurred between ground-level vehicles as well as due to lack of security the effectiveness of the network is getting reduced. In order to overcome this issue in this paper, UAV-Assisted VANETs are developed. As so to improve the security of the VANETs, Trust Management Scheme (TMS)
422
S. Alani et al.
is introduced in VANETs. The process of this trust evaluation is elaborated in the upcoming sections.
35.3 Network Model The data transmission in UAV-Assisted VANETs includes two modules. They are (i) vehicle-to-vehicle data transmission (V2V) and vehicle-to-UAV data transmission (V2U/U2V). The working principles of each module are described below. The network architecture of the proposed work is shown in Fig. 35.1.
35.3.1 Vehicle-to-Vehicle Data Transmission (V2V) Each vehicle in the network is equipped certain coverage area. Any vehicle can communicate with other inside that particular coverage area. Due to the presence of obstacles, vehicle-to-vehicle data transmission is not highly effective; it may create loss of packets in vehicle communication in various streets. In case is the vehicle requires to transfer the data to the vehicle from other street, it can take the help of the UAVs to proceed the transmission in obstacles free manner.At this stage, vehicle-tovehicle transmission can happen in a direct manner or through UAVs, depending on the current situation and communication infrastructure. Attackers which are present in that particular area can able to transfer the fake packets in flooding manner which
Fig. 35.1 UAV-Assisted VANETs
35 Trust Management Scheme-Based Intelligent Communication …
423
affects the formal communication in the network. So, it is essential to protect the communication between vehicle-to-vehicle as well as vehicle-to-UAVs at this stage [23].
35.3.2 Vehicle-to-UAV Data Transmission (V2U/U2V) Each and every vehicle in the network is equipped with the access to exchange information with UAVs. UAVs are the air vehicles; it can able to provide obstacles free data transmission from ground to air-level communication. In general, UAVs consists of high range of coverage space so that it can able to cover maximum number of vehicle in its line of sight to process the communication. Attackers always try to affect the data transmission between the vehicles to UAVs where they can misuse huge data. So, it becomes indispensable to protect the network from such malfunction during the process of data transmission in the network.
35.4 Trust Management Scheme (TMS) for UAVs The routing models which are present in vehicle-to-vehicle data transmission are different from the vehicle to UAVs’ data transmission. So, it is essential to provide separate trust management process for both these communication models. Consequently, the trust management is proceeded in two levels, which are (i) trust management during communication between vehicles as well as (ii) trust management during communication between vehicles to UAVs, and these two methods are detailed below.
35.4.1 Trust Management During Communication Between Vehicles During this trust evaluation process, vehicle-to-vehicle direct communication is encouraged without any intervention of the intermediate vehicle to collect the data to transfer that to the destination. For this reason, vehicle-to-vehicle data transmission is equipped with direct trust calculation process. The initial stage trust score of each vehicles which get varies at each instant of time according to its current situation is represented as (0 ≤ ITscore ≤ 100). The vehicles with the ITscore below 30 is considered as untruth vehicle as well as the vehicles which maintains ITscore above 80 are considered as high prioritized vehicle for data transmission. The mathematical expression for the calculation of direct trust score of the vehicle V are defined as
424
S. Alani et al.
∑k DTscore (V ) =
i=1
ITscore (V , i) ∗ S(V , i ) ∗ AW(V , i ) ∗ PI(v.i ) . TI(V , i )
(35.1)
From Eq. (35.1), the terms S(V, i) denotes the satisfaction of the vehicle with respect to time, AW(V,i) denotes the average weight of the vehicle with respect to time and it varies from (0 to 1), and PI(v, i) denotes the position information of the vehicle to with respect to time. The predefined value of PI is 1 (highly positive) at the initial stage and it varies according to time. TI(V, i) denotes the total information of the vehicle with respect to time. By using Eq. (35.1), the direct trust score of the vehicle is calculated, and according to this, the vehicle-to-vehicle communication is performed.
35.4.2 Trust Management During Communication Between Vehicles to UAVs At the time of communication between the vehicles to UAVs, it is essential to verify the trust score of the UAVs in an effective manner. So, the trust score calculation of the UAVs is performed by considering the following parameters such as coverage area of UAV (CAUAV ), fitness score of UAVs (Fscore ), satisfaction factor of UAVs (Sfactor ), primary trust score of UAVs (PTscore ), and total history of data transmission of the UAVs (DThis ) [24]. By considering the above parameters, the final trust score of the UAVs is calculated and the mathematical expression for the calculation of the final trust score FTscore (UAV) is defined as: ∑k FTscore (UAV) =
i=1
CAUAV ∗ Fscore (UAV, i ) ∗ Sfactor (UAV, i) ∗ PTscore (UAV, i ) . DThis (35.2)
The math expression for the calculation of coverage area of UAV (CAUAV ) is expressed as / CAUAV =
(a2 − a1 )2 + (b2 − b1 )2 ,
(35.3)
where a2 , b2 the axes are coordinates of destination and a1 , b1 are the axis coordinates of source. The math expression for the calculation of fitness score of UAVs (Fscore ) is expressed as Fscore (U AV , i ) = α ∗ (Pr × Ps ) − β ∗ (Pr + Ps ),
(35.4)
where Pr , Ps are the average of transmitted and received packets of the UAVs and the terms (α, β) are the experimental constants which satisfy the condition (α + β = 1).
35 Trust Management Scheme-Based Intelligent Communication …
425
Primary trust score of UAVs (PTscore ) lies on (0–100) in random manner according to the current situation. By using Eq. (35.2), the final trust score of the UAVs is measured that helps to increase the effective of the data transmission. The impact of this Trust Management Scheme (TMS) is that it can able to detect and neglect the legitimate message which is collected during communication. The trust ability of the UAVs is updated at each instant of time.
35.5 Performance Analyses To evaluate the performance of the proposed TMS-UAVs, NS2.35 in Ubuntu 12.04 OS is used. NS2 is a discrete event simulator which is the combination of two input languages such as Object-oriented Tool Command Language (frond end) used for assembling and configuration process and C++ (back end) which is used to construct the internal mechanism. In order to generate the vehicle mobility, SUMO-1.1.0 is used, and to find the UAVs mobility, VanetMobiSim mobility generator is used. The communication range of the vehicles and UAVs is 500 m and 1000 m, respectively. The criteria’s which are utilized to process the evaluation are packet delivery ratio, end-to-end delay of the network, average number of hops, and routing overhead. The efficiency and performance of the proposed TMS-UAVs protocol with other earlier researches such as ES-UAVs and PRO-UAVs are compared [21, 22, 25–27]. The primary parameters which are used for simulation analysis are shown in Table 35.1.
35.5.1 Packet Delivery Ratio It is defined as that the ratio between successful packets reaches the destination to the total number of transmitted packets. The results of the packet delivery ratio for the proposed TMS-UAVs’ protocol and other two methods in terms of density of vehicles are presented in Fig. 35.2. The results prove that the increase of density of vehicles increases the packet delivery ratio of the network in maximum of the places. The outcome highlights that the proposed TMS-UAVs’ protocol achieves best performance when compared with the others works such as ES-UAVs and PROUAVs. This performance is achieved for the reason that TMS is performed separately for common vehicles and UAVs. Hence, the vehicles’ communication is performed in obstacles free manner as well as gets protected from attacks. The packet delivery ratio value calculation of the proposed TMS-UAVs protocol with other two methods is shown in Table 35.2. The packet delivery ratio achieved by the simulated methods such as ES-UAVs, PRO-UAVs, and proposed TMS-UAVs protocol is 91%, 93%, and 96%, respectively. The proposed TMS-UAVs’ protocol produced 3–5% better packet delivery ratio compared with the earlier works.
426 Table 35.1 Simulation parameters
S. Alani et al.
Input parameters
Values
Operating system
Ubuntu 16.04
Software
NS-2.35, SUMO-1.1.0
Running time
200 ms
Dimension
4000 m*4000 m
No. of vehicles
3000 vehicles
No. of UAVs
30 UAVs
Antenna type
Omni-directional antenna
Propagation model
Two-ray ground model
Queue type
DropTail
Link bandwidth
1Kbps
Topology
Urban
Speed of vehicle
50 km/h
Speed of UAVs
75 km/h
Transmission power
0.500 J
Receiving power
0.050 J
Connection
Multiple
Packet size
510 bytes
Fig. 35.2 Packet delivery ratio versus density of vehicles
35 Trust Management Scheme-Based Intelligent Communication … Table 35.2 Values for packet delivery ratio versus density of vehicles
427
Density of vehicles
ES-UAVs
PRO-UAVs
TMS-UAVs
300
91.24
93.45
97.23
600
90.84
93.89
98.03
900
91.45
92.12
97.81
1200
91.89
92.86
98.39
1500
92.12
93.11
98.56
1800
91.32
93.56
97.82
2100
90.64
93.86
97.12
2400
91.22
94.01
96.79
2700
91.49
93.51
96.43
3000
91.82
93.12
96.22
35.5.2 End-to-End Delay It is defined as the time taken to transfer the data from the source to the destination in end-to-end manner. Results of the end-to-end delay for the proposed TMS-UAVs’ protocol and other two methods in terms of density of vehicles are presented in Fig. 35.3. The results prove that the increase of density of vehicles increases the end-toend delay during the process of data transmission. Due to the presence of UAVs in VANETs, the communication is handled through the obstacles free manner which leads to reduce the end-to-end delay in the network.
Fig. 35.3 End-to-end delay versus density of vehicles
428 Table 35.3 Values for end-to-end delay versus density of vehicles
S. Alani et al.
Density of vehicles
ES-UAVs
300
102.45
PRO-UAVs 86.14
TMS-UAVs 56.13
600
153.47
124.53
78.12
900
186.47
154.29
102.47
1200
243.41
201.74
159.23
1500
286.94
249.76
196.52
1800
302.47
281.92
241.39
2100
369.12
331.46
289.64
2400
391.46
359.74
326.11
2700
423.16
391.22
339.88
3000
452.17
421.23
356.23
The end-to-end delay value calculation of the proposed TMS-UAVs protocol with other two methods is shown in Table 35.3. The end-to-end delay achieved with 3000 as density of the vehicles by the simulated methods are such as ES-UAVs, PRO-UAVs and proposed TMS-UAVs protocol are 452 ms, 421 ms and 366 ms, respectively. The proposed TMS-UAVs protocol produced 70–100 ms lower end-to-end delay compared with the earlier works.
35.5.3 Average Number of Hops It is calculated by the measured ratio between the total amounts of packets transmitted by the total hop nodes which are present in the transmission path between the source and the destination. Results of the average number of hops for the proposed TMSUAVs protocol and other two methods in terms of density of vehicles are presented in Fig. 35.4. The results prove that the increase of density of vehicles increases the average number of hops during data transmission. UAVs are combined with vehicular communication to provide effective communication as well as high efficiency is achieved by reducing the hop count during communication. The average number of hops’ value calculation of the proposed TMS-UAVs protocol with other two methods is shown in Table 35.4. The average number of hops achieved in simulations with a vehicle density of 3000 varies for different methods. Specifically, for ES-UAVs, PRO-UAVs, and the proposed TMS-UAVs protocol, the average number of hops is 189, 165, and 65 respectively. The proposed TMS-UAVs protocol produced 100–124 lower hop counts when compared with the earlier works.
35 Trust Management Scheme-Based Intelligent Communication …
429
Fig. 35.4 Average number of hops versus density of vehicles
Table 35.4 Values for average number of hops versus density of vehicles
Density of vehicles
ES-UAVs
PRO-UAVs
TMS-UAVs
300
55
49
25
600
72
63
29
900
86
78
31
1200
102
89
35
1500
121
109
42
1800
139
121
46
2100
143
129
52
2400
159
138
58
2700
171
152
62
3000
189
165
65
35.5.4 Overhead It is defined as that the data will travel in certain path to transfer it from the source to the destination. At the time if link bandwidth is crossed in a particular path, overhead will occur. The network with lower overhead increases its effectiveness during communication. Results of the overhead for the proposed TMS-UAVs’ protocol and other two methods in terms of density of vehicles are presented in Fig. 35.5. The results provide evidence that the increase of density of vehicles gradually increases the overhead at the time of transmission. When compared with the earlier
430
S. Alani et al.
Fig. 35.5 Routing overhead versus density of vehicles
works, the proposed TMS-UAVs protocol lowers overhead which is achieved with the help of the TMS-based UAVs. The overhead value calculation of the proposed TMS-UAVs protocol with other two methods is shown in Table 35.5. The overhead achieved with 3000 as density of the vehicles by the simulated methods is such as ES-UAVs, PRO-UAVs and proposed TMS-UAVs protocol are 121packets, 105 packets and 73 packets, respectively. The proposed TMS-UAVs protocol produced 30–50 packets lower overhead when compared with the earlier works. Table 35.5 Values for routing overhead versus density of vehicles
Density of vehicles
ES-UAVs
PRO-UAVs
TMS-UAVs
300
25
19
9
600
42
35
18
900
59
48
25
1200
67
54
32
1500
82
69
41
1800
91
78
52
2100
99
86
63
2400
105
92
69
2700
113
98
72
3000
121
105
73
35 Trust Management Scheme-Based Intelligent Communication …
431
35.6 Conclusion This research study presented a novel approach to improve the effectiveness of the communication in VANETs called Traffic Management-based UAV-Assisted VANETs (TMS-UAVs). This process is created mainly to protect the vehicle-tovehicle communication from ground-level obstacles as well as to protect both the vehicle-to-vehicle communication and vehicle-to-UAVs’ communication from external environment or attackers. To achieve that TMS performs trust evaluations separately for vehicles and UAVs that greatly helps to achieve effective communication during data transmission. Moreover, the proposed TMS-UAVs are evaluated using SUMO and NS2 simulator. The parameters which are considered for the process the evaluation are packet delivery ratio, end-to-end delay, average number of hops, and routing overhead. The performance of the proposed TMS-UAVs protocol with other earlier researches such as ES-UAVs, and PRO-UAVs is compared. Finally, the proposed TMS-UAVs’ protocol achieves 5% higher packet delivery ratio, 100 ms lower end-to-end delay, 124 lower hop count, and 50 packets lower overhead compared with the earlier works. In future direction further improves the speed and density of the vehicle the proposed protocol needs to get enhanced.
References 1. Mansour, H.S., Mutar, M.H., Aziz, I.A., Mostafa, S.A., Mahdin, H., Abbas, A.H., Jubair, M.A.: Cross-layer and energy-aware AODV routing protocol for flying Ad-Hoc networks. Sustainability 14(15), 8980 (2022) 2. Jubair, M.A., Hassan, M.H., Mostafa, S.A., Mahdin, H., Mustapha, A., Audah, L.H., Abbas, A.H.: Competitive analysis of single and multi-path routing protocols in mobile Ad-Hoc network. Indonesian J. Electr. Eng. Comput. Sci. 14(2) (2019) 3. Mostafa, S.A., Mustapha, A., Ramli, A.A., Jubair, M.A., Hassan, M.H., Abbas, A.H.: Comparative analysis to the performance of three Mobile ad-hoc network routing protocols in timecritical events of search and rescue missions. In: International Conference on Applied Human Factors and Ergonomics, pp. 117–123. Springer, Cham (2020, July) 4. Sharef, B., Alsaqour, R. et al.: Robust and trust dynamic mobile gateway selection in heterogeneous VANET-UMTS network. Vehic. Commun. 12, 75–87 (2018) 5. Oubbati, O.S., Lakas, A. et al.: UVAR: an intersection UAV-assisted VANET routing protocol. In: IEEE Wireless Communications and Networking Conference (2016) 6. Manzoor, A., Dang, T.N. et al.: UAV trajectory design for UAV-2-GV communication in VANETs. In: International Conference on Information Networking (ICOIN) (2021) 7. Abdulsattar, N.F., Hassan, M.H., Mostafa, S.A., Mansour, H.S., Alduais, N., Mustapha, A., Jubair, M.A.: Evaluating MANET technology in optimizing IoT-based multiple WBSN model in soccer players health study. In: International Conference on Applied Human Factors and Ergonomics. Springer, Cham (2022, January) 8. Ghazzai, H., Khattab, A., Massoud, Y.: Mobility and energy aware data routing for UAVAssisted VANETs. In: IEEE International Conference on Vehicular Electronics and Safety (ICVES) (2019) 9. Tripathi, K.N., Sharma, S.C., Yadav, A.M.: Analysis of various trust based security algoithm for the vehicular AD-HOC network. In: International Conference on Recent Innovations in Electrical, Electronics & Communication Engineering (ICRIEECE) (2018)
432
S. Alani et al.
10. Ahmad, F., Kurugollu, F., et al.: MARINE: man-in-the-middle attack resistant trust model in connected vehicles. IEEE Internet Things J. 7(4), 3310–3322 (2020) 11. Poongodi, M., Hamdi, M. et al.: DDoS detection mechanism using trust-based evaluation system in VANET. IEEE Access 7, 183532–183544 (2019) 12. Abbas, A.H., Mansour, H.S., Al-Fatlawi, A.H.: Self-adaptive efficient dynamic multi-hop clustering (SA-EDMC) approach for improving VANET’s performance. Int. J. Interac. Mob. Technol. 17(14) (2022) 13. Yao, X., Zhang, X. et al.: Using trust model to ensure reliable data acquisition in VANETs. Ad Hoc Netw. 55, 107–118 (2017) 14. Kudva, S., Badsha, S. et al.: A scalable blockchain based trust management in VANET routing protocol. J. Parallel Distrib. Comput. 152, 144–156 (2021) 15. Abbas, A.H., Habelalmateen, M.I., Audah, L., Alduais, N.A.M.: A novel intelligent clusterhead (ICH) to mitigate the handover problem of clustering in VANETs. Int. J. Adv. Comput. Sci. Appl. 10(6) (2019) 16. Habelalmateen, M.I., Abbas, A.H., Audah, L., Alduais, N.A.M.: Dynamic multiagent method to avoid duplicated information at intersections in VANETs. TELKOMNIKA (Telecommun. Comput. Electr. Control) 18(2), 613–621 (2020) 17. Muhammad Haleem Junejo: Ab Al-HadiAb Rahman, “lightweight trust model with machine learning scheme for secure privacy in VANET.” Proc. Comput. Sci. 194, 45–59 (2021) 18. Zhang, S., Liu, Y.: A trust based adaptive privacy preserving authentication scheme for VANETs. Vehic. Commun. 37 (2022) 19. Kaur, G., Kakkar, D.: Hybrid optimization enabled trust-based secure routing with deep learning-based attack detection in VANET. Ad Hoc Netw. (2022) 20. Abbas, A.H., Ahmed, A.J., Rashid, S.A.: A cross-layer approach MAC/NET with updated-GA (MNUG-CLA)-based routing protocol for VANET network. World Electr. Vehic. J. 13(5), 87 (2022) 21. Inedjaren, Y., Maachaoui, M. et al.: Blockchain-based distributed management system for trust in VANET. Vehic. Commun. 30 (2021) 22. Fatemidokht, H., Rafsanjani, M.K. et al.: Efficient and secure routing protocol based on artificial intelligence algorithms with UAV-assisted for vehicular Ad Hoc networks in intelligent transportation systems. IEEE Trans. Intell. Transp. Syst. 22(7), 4757–4769 (2021) 23. Saeed, M.M., Hasan, M.K., Obaid, A.J., Saeed, R.A., Mokhtar, R.A., Ali, E.S., Akhtaruzzaman, M., Amanluo, S., Hossain, A.K.M.Z.: A comprehensive review on the users’ identity privacy for 5G networks. IET Commun. 00, 1–16 (2022). https://doi.org/10.1049/cmu2.12327 24. Shahab, S., Agarwal, P., Mufti, T., Obaid, A.J.: SIoT (Social Internet of Things): a review. In: Fong, S., Dey, N., Joshi, A. (eds.) ICT Analysis and Applications. Lecture Notes in Networks and Systems, vol. 314. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-56552_28. 25. Mostafa, S.A., Ramli, A.A., Jubair, M.A., Gunasekaran, S.S., Mustapha, A., Hassan, M.H.: Integrating human survival factor in optimizing the routing of flying ad-hoc networks in search and rescue tasks. In: International Conference on Applied Human Factors and Ergonomics. Springer, Cham (2022, January) 26. Malik, R.Q., Ramli, K.N., Kareem, Z.H., Habelalmatee, M.I., Abbas, A.H., Alamoody, A.: An overview on V2P communication system: Architecture and application. In: 2020 3rd International Conference on Engineering Technology and its Applications (IICETA), pp. 174–178. IEEE (2020, September) 27. Ali, R.R., Mostafa, S.A., Mahdin, H., Mustapha, A., Gunasekaran, S.S.: Incorporating the Markov Chain model in WBSN for improving patients’ remote monitoring systems. In: International Conference on Soft Computing and Data Mining, pp. 35–46. Springer, Cham (2020, January)
Chapter 36
Wideband and High Gain Antenna of Flowers Patches for Wireless Application Qahtan Mutar Gatea, Mohammed Ayad Alkhafaji, Muhammet Tahir Guneser, and Ahmed J. Obaid
Abstract Because of its modest properties, the slot antenna has become an unsuitable contender in many applications. To achieve high gain, small size, and a wide frequency range in the proposed antenna, two layers of flower-shaped cells are positioned at various heights according to Al-Fabry–Perot theory. The improvements in antenna performance extended to the feed line and the slot drilled by the ground plate, allowing the amount of current flowing to be controlled. The number of cells in each of the metasurface layers above the typical antenna is 4 × 4. The antenna has a peak gain of 9.8 dB and a bandwidth of 2.6 GHz between 4.33 and 6.93 GHz. The CST software was used to carry out the simulation procedure. The antenna’s characteristics have made it an excellent contender for wireless applications.
Q. M. Gatea Department of Communications Techniques Engineering, Engineering Technical College, Al-Furat Al-Awsat Technical University, Najaf, Iraq e-mail: [email protected] M. A. Alkhafaji (B) Department of Medical Device Industry Engineering, College of Engineering Technology, National University of Science and Technology, Dhi Qar, Iraq e-mail: [email protected] M. T. Guneser Department of Electrical Electronic Engineering, College of Engineering, Karabuk University, Karabuk, Turkey e-mail: [email protected] A. J. Obaid Faculty of Computer Science and Mathematics, University of Kufa, Kufa, Iraq e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_36
433
434
Q. M. Gatea et al.
36.1 Introduction In the realm of antenna design, several key concepts and technologies play a pivotal role in shaping the performance and capabilities of these vital communication components. Among these are metamaterials, metasurfaces, traditional antennas, the Fabry–Perot concept, boundary conditions, linear and circular polarization, programmable metasurfaces, and active and passive elements [1–5]. Understanding these fundamental elements is crucial for engineers and researchers striving to innovate and optimize antenna designs for various applications. Metamaterials are artificially engineered materials with unique electromagnetic properties not typically found in natural substances [6]. These materials are designed at the subwavelength scale to exhibit extraordinary electromagnetic behavior, such as negative refractive index, which allows for unprecedented control over the propagation of electromagnetic waves. By incorporating metamaterial elements into antenna structures, engineers can achieve enhanced performance characteristics, including increased gain, improved directivity, and wider bandwidth [7]. Metasurfaces, closely related to metamaterials, are two-dimensional structures comprised of subwavelength elements arranged in a periodic or quasi-periodic pattern. They manipulate the wavefront of incident electromagnetic waves with extreme precision, enabling advanced control over the direction, phase, and polarization of the radiation. Metasurfaces have gained significant attention in antenna design due to their ability to shape and steer the radiation pattern, leading to improved beamforming and beam scanning capabilities [5–10]. Traditional antennas, on the other hand, have been the backbone of wireless communication systems for decades. These antennas employ well-established principles and geometries to radiate and receive electromagnetic waves efficiently [7]. They often operate based on fundamental concepts such as dipole antennas, loop antennas, patch antennas, or parabolic reflectors, each tailored to specific frequency ranges and applications [9]. Traditional antennas provide a baseline for performance and comparison against emerging technologies [8]. The Fabry–Perot concept is a fundamental principle in electromagnetic wave propagation and resonance [9]. Derived from optics, the Fabry–Perot interferometer describes the interaction of waves between two reflecting surfaces separated by a specific distance. In antenna design, this concept is utilized to achieve resonance and selectivity by adjusting the dimensions and spacing of conducting elements, such as patches or dipoles [10–33]. By tuning the resonance, antennas can operate efficiently at specific frequencies. Boundary conditions play a crucial role in antenna design, as they define the interaction between electromagnetic waves and the antenna structure [10–14]. These conditions dictate the reflection, transmission, and absorption of waves at the interface of different materials [13]. Properly understanding and implementing appropriate boundary conditions are essential for optimizing antenna performance and reducing undesirable effects such as signal loss or interference [13]. Polarization refers to the orientation of the electric field vector of an electromagnetic wave [14]. Linear polarization occurs when the electric field oscillates in a single plane, while circular polarization describes the rotation of the electric
36 Wideband and High Gain Antenna of Flowers Patches for Wireless …
435
field vector [14, 15]. Polarization is a vital aspect of antenna design, as it influences signal transmission, reception, and interference mitigation. Engineers often tailor antennas to support specific polarization requirements based on the application’s needs [16]. The emergence of programmable metasurfaces has revolutionized antenna design by providing dynamic control over electromagnetic wave properties. Programmable metasurfaces utilize tunable elements, such as varactors or Microelectromechanical Systems (MEMS), to actively adjust the metasurface’s response, including phase, amplitude, and polarization. This adaptability enables real-time reconfiguration of antenna characteristics, allowing for beam steering, beamforming, and adaptive signal processing. Within the realm of metamaterials and metasurfaces, elements can be categorized as either active or passive [15–17]. Active elements incorporate external power sources to actively manipulate electromagnetic waves, such as using electronic components or reconfigurable devices. Passive elements, on the other hand, do not require external power and rely on their inherent material properties for wave manipulation. Both active and passive elements play crucial roles in achieving desired antenna performance, with active elements enabling dynamic control and passive elements providing inherent characteristics and stability [18]. Understanding these key concepts and technologies in antenna design is crucial for unlocking new frontiers in wireless communication systems. By harnessing the unique properties of metamaterials, metasurfaces, and the principles of traditional antennas, engineers and researchers can develop innovative designs with enhanced performance, improved bandwidth, and adaptability [19]. The combination of these elements paves the way for advanced antenna systems capable of meeting the growing demands of modern wireless communications. In the current work, regular-shaped cells are miniaturized while maintaining a wide operating range using the technique turned impedance resonators in SAR applications [20, 21]. To get a small antenna size in this design, there are two metasurface layers with two distinct distributions has been used. In addition, artificial magnetic conductors (AMCs) are an example of an active component made of synthetic materials [22]. Reconfigurable antennas, which metasurface has also created, are composed of two parts: fixed or pre-designed elements for a certain application and programmable elements, which change the response’s form based on the simulation technique [15, 23, 24]. Because they enhance the same component of the antenna as the suggested antenna, gain and bandwidth antennas are important [24]. The suggested antenna employs hollow cells that are put into a metasurface layer to produce a square array of [4 × 4] cells that is homogeneous in size [26, 27]. To acquire and conjunction with a slot antenna design, the gain value about 9 dB and bandwidth values started from 2.14 to 2.7 GHz [28]. The patch’s shape was changed to increase gain and bandwidth, resulting shows the gain value is about 7.43 dB besides the bandwidth value is 2.2 GHz [29]. The second component calls for two layers of metasurface that have two separate air gap cavity sizes and a new rose-like shape. The antenna’s greatest gain with a 2.6 GHz bandwidth was 9.8 dB [29–36].
436
Q. M. Gatea et al.
(a)
(b)
Fig. 36.1 Design of slot antenna a basic shape and b updating shape
36.2 Antenna Design In order to reach the final design form, and using the data proposed in this study, the design stages will be divided into four stages, as shown below.
36.2.1 The Slot Antenna The first piece of the recommended antenna is the slot antenna that the TLF-35A substrate uses beneath the ground plate layer. A slit with dimensions of length C = 18 mm, width B = 1 mm, and 0.762 mm thickness was etched. The dielectric layer has a tangent loss of 0.0022, and the permittivity is 3.5. The antenna’s feeding line is a 40.5 mm linear microstrip transmission line located beneath the ground plate layer. Figure 36.1a shows the basic design. While Fig. 36.1b illustrates the design of the slot antenna which converts the ground plate’s slotted sloth’s ends into the form of butterfly embryos. The other change is to modify the feed line’s geometry from a straight line to a triangular vertex.
36.2.2 Extraction of Metasurface Parameters One of the most significant aspects that must be addressed is the simulation approach for just a single cell of the metasurface layer. This method’s main goal is to get the
36 Wideband and High Gain Antenna of Flowers Patches for Wireless …
437
metasurface layer’s characteristics, such as negative permeability, negative permittivity, and negative refractive index, as quickly as feasible and in the most uncomplicated way possible. A single cell of the suggested metasurface layer can serve this function by being used on a larger substrate. The following step is attaching this fundamental design to two waveguide ports, one at the top and the other at the bottom of the unit cell, in order to accurately excite the cell. As seen in Fig. 36.2, the boundary condition has a perfect magnetic field as well as a perfect electrical field. In that study, the S-parameter serves as the foundation for extracting the essential curves utilizing these equations [30–35]: /
2 (1 + S11)2 − S21 , 2 (1 − S11)2 − S21 /( ) = X ± j 1 − x2 ,
Z =±
(36.1)
ejnk0 d
(36.2)
) 2 2 + S21 S11 X= , 2S(1,1) (
(36.3)
where n represents the refractive index of the metasurface unit, the unit cell reflective index, while Z is the overall equivalent impedance, wavenumber by K 0 , and a single metasurface cell thickness is d symbol. The first and third equation equivalent impedance and the refractive index are accomplished in sophisticated forms. Besides, a real part of the equivalent impedance and the imaginary part of the refractive index should be zero or higher. In fact, this is an acceptable requirement for metasurface permeability and permittivity. The essential equation linked permeability and permittivity to equivalent impedance Z as shown in the equations below [30, 31]: μr = n Z ,
(36.4)
n . Z
(36.5)
εr =
36.2.3 Metasurface Layer with Uniform Unit Cell Distribution The metasurface surface is then going to be built in its usual shape, using an equal number of cells as in previous tests. The dielectric used is TLF-35A, which has the same properties as the slot antenna, and the structure of the array is simulated and design according of using [4 × 4] exactly the same sized cells. The air gab size with 4.113 mm exists from the incorporation of the primary surface layer and the aperture
438
Q. M. Gatea et al.
Fig. 36.2 Two waveguide ports placed by metasurface unit cell
antenna. A 0.75 mm gap is between the first and second layers. Figure 36.3 illustrates the deeper metasurface layer.
36.2.4 New Unit Cell-Shaped Metasurface Layer The fourth phase, which changed the design from a conventional form of unit cells to a novel form that considerably enhanced antenna performance, is one of the most crucial design processes. The metasurface layer’s current paths were changed when the patches were transformed into a flower-like design. These updates combine with the changes made to the slot antenna, which involved making the feed line’s head into a sizable triangle and the slit ends change into butterfly wings. Figure 36.4 lists the complete design specifications for each of a metasurface layers and cells. All antenna dimensions are listed in Table 36.1.
36.3 Results and Discussion In the following sections, we are going to go through the most important facts from each stage of a lengthy simulation process. From the beginning with the traditional slot’s simulation results, it exhibited unfavorable results in contrast with an extensive range of frequencies, spanning 4–7 GHz, yielding a gain of 4.34 dB, as shown in Fig. 36.5. Making modifications to the ground plate’s slit, which changed in a way with different current distributions inside this slot and has a greater antenna stimulation on both sides of the matching and bandwidth range, is part of the second step in building the final design. Another updated feature that helps increase antenna
36 Wideband and High Gain Antenna of Flowers Patches for Wireless …
439
Fig. 36.3 a Design of a homogeneous metasurface layer, b side section showing the designed antenna layers, and c the rear section of the antenna is in the form of a single cell
performance is the addition of a triangular head to the feed line’s head. The antenna achieved a 4.22 dB gain and 0.75 GHz bandwidth. According to Fig. 36.6, in the final two processes, a layer of metasurface with a standard form is applied, and then a layer with a different air cavity is applied. This layer serves as the cornerstone of the final shape and incorporates a new design for the patches, which is a floral shape. The suggested antenna was able to get satisfactory results from the stimulation in the metasurface layer by keeping the same number of cells in the two layers but with two different spacing. Figure 36.7 illustrates the bandwidth attained, which ranges from 4.33 to 6.93 GHz with a peak gain of 9.8 dB. Figure 36.8 combines the three curves for the various antenna phases to demonstrate the dramatic performance difference. Starting with the conventional antenna, a classical antenna with butterfly wings, and traditional antenna with a layer of new shape metasurface, Fig. 36.9 shows the greatest gain value for each antenna stage. 4.34 dB, 4.22 dB, and 9.8 dB are the respective values.
440
Q. M. Gatea et al.
(a)
(b)
Fig. 36.4 a Designing a novel single-unit cell form and b insertion of a new form into the metasurface layer
Table 36.1 Set of antenna parameters
Character of parameter
Dimensions of parameters (mm)
D
8
B
1
C
18
S
40.5
X
65
Y
65
KF
0.2
CD
0.8
M=N
7
C
0.5
Z
14
To calculate the air gap between the layer of the microstrip conventional and slot antennas, the Fabry–Pérot resonating has been utilized as showed in Eq. 36.6 below [33]. f =
c (ϕ + ϕH − 2N π ), N = 0, 1, 2 . . . n, 4π h air
(36.6)
where ϕ L represents the ground plane, while ϕ H is the reflection phases of the surface with partial reflection property, whereas hair represents the amount of space between the partial reflection surface and the ground plane. The primary resonance is often used to keep a low profile (N = 0) providing the ground plane with an entirely electric
36 Wideband and High Gain Antenna of Flowers Patches for Wireless … Fig. 36.5 S(1,1) of traditional antenna
Fig. 36.6 S(1,1) of antenna with metasurface layer
Fig. 36.7 S(1,1) of updating traditional antenna
441
442
Q. M. Gatea et al.
Fig. 36.8 Traditional and metasurface layer antenna results compared
(a)
(b)
(c)
Fig. 36.9 S(1,1),BW value for three antenna stages
36 Wideband and High Gain Antenna of Flowers Patches for Wireless …
443
Table 36.2 Traditional antenna’s characteristics either previous to and following the metasurface layer Name
Updating traditional antenna
MS layer + traditional antenna
Center frequency (GHz)
6.7
5.68
S(1,1),BW value (dB)
− 29
− 29
Range of bandwidth (GHz)
6.25–7
4.32–6.93
Bandwidth value
750 MHz
2.6 GHz
Impedance bandwidth
–
–
Higher gain value (dB)
4.22
9.8
Air cavity height
–
4.113 mm 0.75 mm
Table 36.3 Differences among the suggested antenna and various other references Name
The suggested antenna
Ref. [27]
Ref. [28]
Ref. [29]
Center frequency (GHz)
5.68
2.45
2.3
6.75
S(1,1),BW value (dB)
− 29
− 32
− 38
− 33.4
Range of bandwidth (GHz)
4.32–6.93
2.14–2.70
2–2.63
4.72–6.9
Bandwidth value
2.6 GHz
560 MHz
630 MHz
2.18 GHz
Higher gain value (dB)
9.8
9
1
7.43
conducting (ϕL = π ). The air cavity’s internal diameter is roughly half its operating frequency’s wavelength. (ϕH ≈ π ) when a phase of reflection from a surface with partial reflection satisfies the requirement. Table 36.2 gives the amounts of impedance bandwidth calculated using the equation below [33], as well as results from Figs. 36.8 and 36.9. S(1,1),BW
) ( 2 f p2 − f p1 ), = ( f p2 + f p1
(36.7)
where S(1,1) value, BW represents the impedance bandwidth. Besides, the f p2 and f p1 are represent the greater and smaller frequencies, respectively. Table 36.3 shows differences between the suggested antenna and various other references.
36.4 Conclusion The antenna underwent several stages of development, starting from the conventional slot antenna through its updated version, adding one layer of the metasurface, and concluding with two layers of the metasurface that are shaped like flowers. This resulted in an antenna with a high gain, a wide frequency range, and a small size.
444
Q. M. Gatea et al.
The dimensions of each layer’s cells are [4 × 4]. After a thorough simulation using the CST software, getting a higher gain value is 9.8 dB, and the peak of the bandwidth value is 2.6 GHz, transforming an antenna into a powerful filter for use in wireless applications. This is accomplished by varying the distances between the two layers.
References 1. Gatea, Q.M., Alyasiri, A.J., Ali, F.M., Al-Kafahji, N.: Design low profile and wideband antenna based on metasurface 2. Wang, Z., et al.: Metasurface on integrated photonic platform: from mode converters to machine learning. Nanophotonics (2022) 3. Gatea, Q.M., et al.: Gradient distribution of metasurface based antenna performance enhancement. AIP Conf. Proc. 2290(1) (2020). AIP Publishing LLC 4. Li, A., Singh, S., Sievenpiper, D.: Metasurfaces and their applications. Nanophotonics 7(6), 989–1011 (2018) 5. Park, I.: Application of metasurfaces in the design of performance-enhanced low-profile antennas. EPJ Appl. Metamater. 5, 11 (2018) 6. Wang, J., et al.: On the use of metasurface for vortex-induced vibration suppression or energy harvesting. Energy Convers. Manage. 235, 113991 (2021) 7. Fang, B., et al.: Broadband cross-circular polarization carpet cloaking based on a phase change material metasurface in the mid-infrared region. Front. Phys. 17(5), 1–9 (2022) 8. He, S., et al.: Recent advances in MEMS metasurfaces and their applications on tunable lens. Micromachines 10(8), 505 (2019) 9. Sheersha, J.A., Nasimuddin, N., Alphones, A.: A high gain wideband circularly polarized antenna with asymmetric metasurface. Int. J. RF Microw. Comput.-Aided Eng. 29(7), e21740 (2019) 10. Chen, D., et al.: Miniaturized wideband planar antenna using interembedded metasurface structure. IEEE Trans. Antennas Propag. 69(5), 3021–3026 (2020) 11. Wu, J., et al.: Liquid crystal programmable metasurface for terahertz beam steering. Appl. Phys. Lett. 116(13), 131104 (2020) 12. Hosseininejad, S.E., et al.: Digital metasurface based on graphene: an application to beam steering in terahertz plasmonic antennas. IEEE Trans. Nanotechnol. 18, 734–746 (2019) 13. He, Q., Sun, S., Zhou, L.: Tunable/reconfigurable metasurfaces: physics and applications. Research 2019 (2019) 14. Liu, G.Y., et al.: Frequency-domain and spatial-domain reconfigurable metasurface. ACS Appl. Mater. Interfaces 12(20), 23554–23564 (2020) 15. Tang, W., et al.: Wireless communications with programmable metasurface: new paradigms, opportunities, and challenges on transceiver design. IEEE Wireless Commun. 27(2), 180–187 (2020) 16. Ma, Q., et al.: Information metasurfaces and intelligent metasurfaces. Photonics Insights 1(1), R01 (2022) 17. Nie, N.-S., et al.: A low-profile wideband hybrid metasurface antenna array for 5G and WiFi systems. IEEE Trans. Antennas Propag. 68(2), 665–671 (2019) 18. Paiva, J.L.D.S., et al.: Using metasurface structures as signal polarisers in microstrip antennas. IET Microw. Antennas Propag. 13(1), 23–27 (2019) 19. Li, J., et al.: High-efficiency terahertz full-space metasurface for the transmission linear and reflection circular polarization wavefront manipulation. Phys. Lett. A 428, 127932 (2022) 20. Ta, S.X., et al.: Single-feed, compact, GPS patch antenna using metasurface. In: 2017 International Conference on Advanced Technologies for Communications (ATC), pp. 60–63. IEEE (2017)
36 Wideband and High Gain Antenna of Flowers Patches for Wireless …
445
21. Liu, W., Chen, Z.N., Qing, X.: Miniaturized broadband metasurface antenna using stepped impedance resonators. In: 2016 IEEE 5th Asia-Pacific Conference on Antennas and Propagation (APCAP), pp. 365–366. IEEE (2016) 22. Liu, W.E.I., et al.: Miniaturized wideband metasurface antennas. IEEE Trans. Antennas Propag. 65(12), 7345–7349 (2017) 23. Martinez, I., Werner, D.H.: Reconfigurable beam steering metasurface absorbers. In: 2014 IEEE Antennas and Propagation Society International Symposium (APSURSI), pp. 1674– 1675. IEEE (2014) 24. Xie, P., et al.: Novel Fabry–Pérot cavity antenna with enhanced beam steering property using reconfigurable meta-surface. Appl. Phys. A 123(7), 1–6 (2017) 25. Kumar, P.P., et al.: Metasurface based low profile reconfigurable antenna. In: 2017 International Conference on Communication and Signal Processing (ICCSP), pp. 2081–2085. IEEE (2017) 26. Cui, T.J., et al.: Information metamaterial systems. iScience 23(8), 101403 (2020) 27. Chaimool, S., Rakluea, C., Akkaraekthalin, P.: Low-profile unidirectional microstrip-fed slot antenna using metasurface. In: 2011 International Symposium on Intelligent Signal Processing and Communications Systems (ISPACS), pp. 1–5. IEEE (2011) 28. Munir, A., et al.: Metasurface-backed monopole printed antenna with enhanced bandwidth. In: 2017 International Symposium on Antennas and Propagation (ISAP), pp. 1–2. IEEE (2017) 29. Gatea, Q.M., et al.: Hash unit cell shape used to enhancement gain and bandwidth of metasurface antenna. J. Phys. Conf. Ser. 012022 (2020). IOP Publishing 30. Rajak, N., Chattoraj, N.: A bandwidth enhanced metasurface antenna for wireless applications. Microw. Opt. Technol. Lett. 59(10), 2575–2580 (2017) 31. Numan, A.B., Sharawi, M.S.: Extraction of material parameters for metamaterials using a full-wave simulator education column. IEEE Antennas Propag. Mag. 55(5), 202–211 (2013) 32. Tamim, A.M., et al.: Split ring resonator loaded horizontally inverse double L-shaped metamaterial for C-, X- and Ku-band microwave applications. Results Phys. 12, 2112–2122 (2019) 33. Wang, N., et al.: Wideband Fabry–Perot resonator antenna with two complementary FSS layers. IEEE Trans. Antennas Propag. 62(5), 2463–2471 (2014) 34. Mansour, R.F., Alsuhibany, S.A., Abdel-Khalek, S., Alharbi, R., Vaiyapuri, T., Obaid, A.J., Gupta, D.: Energy aware fault tolerant clustering with routing protocol for improved survivability in wireless sensor networks. Comput. Netw. 109049 (2022). ISSN 1389-1286. https:// doi.org/10.1016/j.comnet.2022.109049 35. Obaid, A.J.: Wireless sensor network (WSN) routing optimization via the implementation of fuzzy ant colony (FACO) algorithm: towards enhanced energy conservation. In: Kumar, R., Mishra, B.K., Pattnaik, P.K. (eds.) Next Generation of Internet of Things. Lecture Notes in Networks and Systems, vol. 201. Springer, Singapore (2021). https://doi.org/10.1007/978981-16-0666-3_33 36. Alkhafaji, M.A., Uzun, Y.: Modeling, analysis and simulation of a high-efficiency battery control system. CMES-Comput. Model. Eng. Sci. 136(1), 709–732 (2023). https://doi.org/10. 32604/cmes.2023.024236
Chapter 37
Certain Investigations on Solar Energy Conversion System in Smart Grid C. Ebbie Selva Kumar, R. Brindha, and R. Eveline Pregitha
Abstract Today the world facing the huge gap between generation and demand of electrical energy, after the huge oil crises the Renewable Energy power generation helps to fulfil the energy gap. There are different Renewable Energy resources are available in nature, form that solar energy source plays a vital role in electrical power generation. In India huge MW of solar parks has been installed with better efficiency but those plants are facing some power quality issues when connected to the load due to some technical reasons like equipment failure, overheating of electrical distribution system, software corruption, circuit board failure, fluctuation on voltage and frequency, etc. The power quality is important characteristics of Renewable Energy system because now a day the connected loads are nonlinear like oscillation in frequency, voltage, and harmonics due to the met and also more sensitive. The smart grid connection also have same problem due to various source connected to the grid such as Solar, Wind, Thermal, Nuclear. This article provide solution to overcome this problem by introducing soft computing-based MPPT control system, filter control system and also for three phase system. Soft computing system will provide quick and cost-effective solution for various complex problems.
37.1 Introduction Nowadays the role of Renewable Energy is increase in the generation of electrical energy to fulfil the breach among electrical energy supply and demand. It will be the alternative resource for the fossil fuel and also called as non-conventional energy sources. The solar energy is naturally available and inexhaustible resource which is green energy because it will not emit greenhouse gases [1]. The solar photovoltaic C. Ebbie Selva Kumar (B) · R. Brindha Department of Electrical and Electronics Engineering, SRM Institute of Science and Technology, SRM Nagar, Kattankulathur, Tamil Nadu, India e-mail: [email protected] R. Eveline Pregitha Department of Electronics and Communication Engineering, Noorul Islam Centre for Higher Education, Kumarakoil, Tamil Nadu 629180, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_37
447
448
C. Ebbie Selva Kumar et al.
contains various fabrication methods which converts sunlight to electrical energy is said to be as photovoltaic effect. It will generate DC electrical power so the huge MW solar power plants are connected to the smart grid for the inverting operation [2]. The grid-connected solar PV is converts DC from PV module to AC and distribute to the consumer. Once the PV system is connected to the grid after satisfying the consumers demand it can send the extra electrical energy to the grid. In the smart grid connection, there will be fluctuation in voltage, frequency, and harmonics due to the constantly changes in solar irradiation and also other sources [3]. The gridconnected PV contains solar modules, MPPT with converter and inverter, filter and distribution line [4]. In this system power quality is important phenomenon to show the efficiency and operating condition of grid-connected PV system. The power quality refer that the electrical system to be work in effective and efficient manner. The important key point of the power quality is frequency, voltage, voltage flicker, and current harmonics. This power quality will be affected by various sources connected in the grid like solar, wind, hydro, thermal, nuclear, etc. Mostly the power quality is concern in fluctuation in frequency and voltage which happens due to inconsistent behaviour of Renewable Energy due to the habitually varying weather characteristics [5]. The Maximum Power Point Tracker (MPPT) and grid synchronization contains same controlling algorithm. In this article soft computing technology is used to control the MPPT and filter [6]. There are many problems while integrating solar energy conversion system to the distribution network which disturb the operation of solar PV system. At the point of interconnection connected nonlinear load and switching of power converter will inject current harmonics to the system [7]. The distribution losses can be reduced by eliminating harmonics, grid current balancing, and unity power factor operation. The harmonics can be eliminated by using filter controlling technique. The both MPPT and filter controlling are performed by using soft computing methodology. Soft computing is the process of group computing technology based on the human decision and natural selection which create quick and cost-effective solution to various complex problems. The MPPT is to make sure that the PV panel operating at maximum power at all circumstances. Due to the constantly meteorological change the solar photovoltaic output gets vary and it will not produce maximum power at all the time, so the MPPT will buck and boost the power and generate constant maximum power at all circumstance [8]. A group of PV panel integrated to the grid. There are various MPPT algorithms are available for power control strategy, among that based on P&O algorithm method introduced in this article, it provides best performance between accuracy and complexity [3].
37 Certain Investigations on Solar Energy Conversion System in Smart Grid
449
37.2 Literature Review The work under the title of Multi-Objective Solar Power Conversion System with MGI Control for Grid Integration at Adverse Operating Conditions presented by Mukundan had used Incremental conductance (IC) method for MPPT operation and for power quality improvement multiple generalized integral (MGI) was implemented. This method is implemented in laboratory protocol with nonlinear and unbalanced load connection and different test had been don under various irradiation and load distribution and grid operation. Under the mode–mode transitions analysis the dynamic performance had been tested. It gave better power quality and maintained for each and every stage of transitions [6]. The paper under the title of Unbiased Circular Leakage Centred Adaptive Filtering Control for Power Quality Improvement of Wind–Solar PV Energy Conversion System presented by Chishti had done the woke on Unbiased Circular Leakage Centred (UCLC) adaptive filter controlling technology, this was implemented for wind speed condition and varying solar irradiation level. It diminishes the grid current harmonic level and improve the power quality at the AC input main and also perform filtering on both dynamic and steady state condition. The IEEE-519 standard had been maintained successfully under grid current [9]. The research under the title of residual incremental conductance-based nonparametric MPPT control for solar photovoltaic energy conversion system carried by Alsumiri followed improved incremental conductance algorithm on the bases of mathematical residual theorem. The important technique of this method is to ensure the MPPT achievement by using residual values of IC. It will improve the operation of MPPT and reduce the fluctuation in the system. In this work the classical incremental conductance has to be zero at the instantaneous condition which is the reference voltage for controller. By maintaining the residue value at zero, it forced the system to attain zero and eliminate the error of classical condition. They had compared the both classical and residual performance of the MPPT and concluded that the residual method performs gave better improvement in energy conversion [10]. The article under the title of A New Multilevel Inverter Topology with Reduced Power Components for Domestic Solar PV Applications presented by Ponnusamy and the team worked on Dual Source Multilevel Inverter (DSMLI) by few powers switching aimed at solar power conversion system. This method for roof top solar power conversion and which works under nine levels and thirteen levels inverter topology in both asymmetric and symmetric operation, respectively. On the bases of DSMLI grid-connected PV system provide 92% efficiency for 1 kW, when compared to other scheme it gave 3% higher efficiency and also it reduces the cost from $170 to $125 by the utilization of minimum number of components in proposed system [11]. The work under the title of Robust EnKF with Improved RCGA-based Control for Solar Energy Conversion Systems had been done by Shah and Singh had enhanced real coded genetic algorithm (RCGA) bases on metaheuristic method which is
450
C. Ebbie Selva Kumar et al.
preferred to adjust Proportional Integral (PI) control of DC link voltage. The robust ensemble Kalman filter (EnKF) control is suggested for multi-functional 3Φ gridconnected solar photovoltaic energy conversion method. This system is to improving the power quality of grid-connected distribution system. The suggested algorithm was compared with various conventional methods like LMF and FZA-NLMF and the testing result under different weak grid conditions such as unbalancing grid voltage, over voltage, and under voltage. The obtaining results satisfy the IEEE-519, 1159, and 1564 standards [12]. The article presented by Kumar under the title of implementation of multilayer fifth-order generalized integrator-based adaptive control for grid-tied solar PV energy conversion system had implemented the Global maximum power point (GMPP) tracking system by using Human Psychology Optimization (HPO) process. By using the MATLAB modelling and simulation various parameters had been measured such as under voltage, over voltage, temperature variation, and various harmonic distortion are considered. This method is proposed for partially shaded solar PV module which is integrated to single phase grid. By the use of the single input tuned fuzzy PI controlled makes flexible error reduction process that leads to fast settling of actual signal to the reference signal [13].
37.3 Proposed Method 37.3.1 Working of MPPT The MPPT with the charge controller has be proposed to achieve maximum power from solar PV panel during all circumstance. There are two different category of MPPT techniques are available which is direct and indirect technique. The indirect technique contains open circuit voltage, short circuit current, and fixed voltage method. This type of tracking easily measure the periodic estimation of MPPT like the operating voltage is adjusted by using only fixed voltage method during different metrological changes to get high MPP voltage in winter and low MPP voltage in summer. Due to the seasonal changes like irradiation and temperature, it will not provide accurate performance at all the time. The open circuit voltage method is most commonly used indirect MPPT system. In this system, where k refers constant and the crystal silicon value is varied from normally 0.7 to 0.8, this method is very easy and simplest process to implement while comparing to another method [14]. In this method every time system need to predict new open circuit voltage V out during the changes on radiation. For this process the load must be disconnected from the PV module each time which leads to power losses in the system so the direct MPPT technique is preferred. The direct technique works as faster than the indirect method and it measure current voltage or power with more accurate value. There are various direct MPPT methods are available,
37 Certain Investigations on Solar Energy Conversion System in Smart Grid
451
Fig. 37.1 Grid-connected solar PV
from that perturb and observe (P&O) process is carried to make some modification in this editorial. Grid connected solar PV is shown in Fig. 37.1.
37.3.2 P&O Algorithm The P&O algorithm is used in MPP tracking system with small perturbation is presented to make power deviation in the solar PV module. In this method, it measure output power of PV system periodically and compare with previous output power. In case, output power is more than previous power the same process is followed otherwise the perturbation will retreated. This perturbation algorithm provided for module voltage, it will check the power increased or decreased with respect to increases or decreases of module voltage. When the voltage increase, makes the power increases which implies the operating point of solar module system on left of MPP. So the perturbation is needed to change the direction of right to reach the MPP [15]. Contrariwise arises in voltage make to drop the power which means operating point of the solar module system on the right of MPP therefore perturbation is necessary to make left to reach MPP. The flow chart of the P&O algorithm is shown in Fig. 37.2. Normally the MPPT is coupled between the battery and the PV module and measure individual voltage level. It governs the battery stays fully charged or not, where it is fully charged then it will stop the charging to avoid the battery from over charging and damage. If it is not fully charged, it activate DC/DC converter to start charging the battery. From the measurement of output current and voltage, microcontroller calculates the present power Pnew and compares it with preceding measured power Pold . If the output power Pnew > Pold , the duty cycle of PWM improve to remove extreme power from the PV
452
C. Ebbie Selva Kumar et al.
Fig. 37.2 Flow chart of P&O algorithm
system. Then the output power Pnew < Pold , the PWM duty cycle is reduced to make sure the system to go back to the earlier maximum power. The proposed method of MPPT is simple, low cost, and easier to implement through better accuracy [15–17].
37.3.3 VSB Control The above-mentioned P&O algorithm, force solar PV module to work under maximum power point in all circumstances. The next stage is DC power is converted into AC power with the help of the Voltage Source Converter (VSC). This VSC method is operated by using Anova Kernel Kalman Filter (AKKF) which makes harmonic reduction during the combination of solar PV with grid connection. Main focuses of this technique is power quality enhancement, power management DC to AC power conversion and maintain synchronization to grid. The AKKF is based on affine projection family which is quickly recognized the function of essential from the input signal. It will reduce the logic complexity, algorithm delay, and computational burden by the hybridization of Kalman filter and kernel trick. This controlling approaches confirmed on two stage 1Φ grid-tied PV system when load and batteries are connected in Point of Common Coupling (PCC) and DC link,
37 Certain Investigations on Solar Energy Conversion System in Smart Grid
453
respectively [18]. The performance of transient condition is improved by connecting the battery directly to the DC link and also it will fix voltage level in DC link. This DC link will control the battery charging current and voltage regulation during the fluctuation of voltage in the system so there is no need of extra sensor for battery charge controller. The objective of this process is Fundamental Component (FC) drawing out from grid voltage as well as load current. In this method the AKKF will filter out the harmonics components present in the system and DC offset. It will perfectly attenuate the higher and lower order harmonics present in the signal and maintain 50 Hz frequency range. In this method the amplitude and the unit vector are calculated and the AKKF is aimed to filtering the voltage in grid connection which results are quadrature and in phase component of FC. To estimating Fundamental Component of load current, which may be linear or nonlinear, in nonlinear condition there will be a huge harmonic component present in the system. Hence the AKKF extract active component of the load power and also fundamental frequency. To estimate active load power, moving average filter method is used which shown in Fig. 37.3c. In this E f is use to amplify Ai and amplified signal is delivered through integrator and also determine the value of certain duration. Here the N s is delay duration and f s is sampling frequency then N s = f s /E f . While improving transient performance of controlling technique dynamic reflection parameter of PV power is used, that will immediately reflect on changes in solar PV power and the grid current. The VSC switching pulse is generated by using hysteresis controller (S 1 , S 2 , S 3 , and S 4 ) while the inputs are ig and igref that is shown in Fig. 37.3d.
37.4 Performance of Power Quality Under Different Conditions Under nonlinear load the PV power is greater than the load power so after satisfying the demand of the load, the remaining power is transferred to grid and as per the IEEE-519 standard, the THD of current must be lesser than 5%. By using this filtering method the current and voltage harmonics has been reduced and also it obtain voltage THD as 1.3% and current THD as 1.5% under normal grid voltage state shown in Fig. 37.4. In over voltage condition Fig. 37.6 shows the voltage THD is 2.2% and current THD is 2.6%. In under voltage Fig. 37.7 shows the THD current are 2.3% and the voltage THD is 2.0% was obtained. Waveform of over voltage condition and under voltage condition is shown in Fig. 37.5. Solar irradiation experiment, there are two different situations taken under the consideration of sudden changes in irradiation. If the insolation suddenly varies from 700 to 1000 W/m2 and another one is 1000–700 W/m2 , under both the situation the MPPT and AKKF controller are performing with tracking efficiency of nearly 100%. In this MPPT algorithm it takes 0.8 s to tracking the new maximum power
454
C. Ebbie Selva Kumar et al.
Fig. 37.3 Grid-connected PV system control, a DC-DC converter, b VSC control, c MAF, d VSC switching pulses generation
point (MPP) in the rising of solar insolation condition and it takes 0.75 s to track new MPP under falling of solar insolation.
37 Certain Investigations on Solar Energy Conversion System in Smart Grid
455
Fig. 37.4 Waveform of normal condition a voltage and current waveform, b power provide to grid, c voltage harmonics, d current harmonics
456
C. Ebbie Selva Kumar et al.
Fig. 37.5 Waveform of a over voltage condition and b under voltage condition
37 Certain Investigations on Solar Energy Conversion System in Smart Grid
457
Fig. 37.6 Waveform in over voltage a voltage and current waveform, b current harmonic, c voltage harmonics
Fig. 37.7 Waveform in under voltage a voltage and current waveform, b current harmonic, c voltage harmonics
37.5 Conclusion The investigation on power quality in solar energy conversion system has been carried with different controlling topologies for MPPT control and filter control to improving the power quality of entire system under all circumstance. The P&O controlling algorithm is used in MPPT for better tracking and to obtain maximum power under various meteorological changes. While comparing to other algorithm P&O algorithm is easier and simple to implementation with low cost. The MPPT tracker takes 0.8 s under rising solar irradiation and 0.72 s under falling solar irradiation. The total harmonic distortion is the important phenomenon in solar PV integrated to grid connection, in this AKKF filtering technique the voltage and current THD is maintained under 5% as per IEEE-519 norms and also the tracking efficiency of the system is maintained nearly 100%. This type of technique provides fast and swinging free performance at all circumstance which will improve the efficiency of system.
458
C. Ebbie Selva Kumar et al.
Acknowledgements The author with a deep sense of gratitude would thank the supervisor for his guidance and constant support rendered during this research. Research Funding No Financial support. Competing Interests This paper has no conflict of interest for publishing.
References 1. Kumar, N., Singh, B., Panigrahi, B.K.: ANOVA kernel Kalman filter for multi-objective grid integrated solar photovoltaic-distribution static compensator. IEEE Trans. Circuits Syst. I Regul. Pap. 66(11), 4256–4264 (2019) 2. Myneni, H., Ganjikunta, S.K.: Energy management and control of single-stage grid-connected solar PV and BES system. IEEE Trans. Sustain. Energy 11(3), 1739–1749 (2019) 3. Hu, J., Joebges, P., Pasupuleti, G.C., Averous, N.R., De Doncker, R.W.: A maximumoutput-power-point-tracking-controlled dual-active bridge converter for photovoltaic energy integration into MVDC grids. IEEE Trans. Energy Convers. 34(1), 170–180 (2018) 4. Biswas, S., Huang, L., Vaidya, V., Ravichandran, K., Mohan, N., Dhople, S.V.: Universal current-mode control schemes to charge Li-ion batteries under DC/PV source. IEEE Trans. Circuits Syst. I Regul. Pap. 63(9), 1531–1542 (2016) 5. Bajaj, M., Singh, A.K.: Grid integrated renewable DG systems: a review of power quality challenges and state-of-the-art mitigation techniques. Int. J. Energy Res. 44(1), 26–69 (2020) 6. Mukundan, N., Singh, Y., Naqvi, S.B.Q., Singh, B., Pychadathil, J.: Multi-objective solar power conversion system with MGI control for grid integration at adverse operating conditions. IEEE Trans. Sustain. Energy 11(4), 2901–2910 (2020) 7. Patowary, M., Panda, G., Naidu, B.R., Deka, B.C.: ANN-based adaptive current controller for on-grid DG system to meet frequency deviation and transient load challenges with hardware implementation. IET Renew. Power Gener. 12(1), 61–71 (2018) 8. Yap, K.Y., Sarimuthu, C.R., Lim, J.M.Y.: Artificial intelligence based MPPT techniques for solar power system: a review. J. Mod. Power Syst. Clean Energy 8(6), 1043–1059 (2020) 9. Chishti, F., Murshid, S., Singh, B.: Unbiased circular leakage centered adaptive filtering control for power quality improvement of wind-solar PV energy conversion system. IEEE Trans. Sustain. Energy 11(3), 1347–1357 (2019) 10. Alsumiri, M.: Residual incremental conductance based nonparametric MPPT control for solar photovoltaic energy conversion system. IEEE Access 7, 87901–87906 (2019) 11. Ponnusamy, P., Sivaraman, P., Almakhles, D.J., Padmanaban, S., Leonowicz, Z., Alagu, M., Ali, J.S.M.: A new multilevel inverter topology with reduced power components for domestic solar PV applications. IEEE Access 8, 187483–187497 (2020) 12. Shah, P., Singh, B.: Robust EnKF with improved RCGA-based control for solar energy conversion systems. IEEE Trans. Ind. Electron. 66(10), 7728–7740 (2018) 13. Kumar, N., Hussain, I., Singh, B., Panigrahi, B.K.: Implementation of multilayer fifth-order generalized integrator-based adaptive control for grid-tied solar PV energy conversion system. IEEE Trans. Ind. Inf. 14(7), 2857–2868 (2017) 14. Salman, S., Xin, A.I., Zhouyang, W.U.: Design of a P-&-O algorithm based MPPT charge controller for a stand-alone 200 W PV system. Prot. Control Mod. Power Syst. 3(1), 1–8 (2018) 15. Dabra, V., Paliwal, K.K., Sharma, P., Kumar, N.: Optimization of photovoltaic power system: a comparative study. Prot. Control Mod. Power Syst. 2(1), 1–11 (2017) 16. Kumari, J.S., Babu, D.C.S., Babu, A.K.: Design and analysis of P&O and IP&O MPPT techniques for photovoltaic system. Int. J. Mod. Eng. Res. 2(4), 2174–2180 (2012)
37 Certain Investigations on Solar Energy Conversion System in Smart Grid
459
17. Sera, D., Teodorescu, R., Hantschel, J., Knoll, M.: Optimized maximum power point tracker for fast changing environmental conditions. In: 2008 IEEE International Symposium on Industrial Electronics, June 2008, pp. 2401–2407. IEEE (2008) 18. Shukl, P., Singh, B.: Delta-bar-delta neural-network-based control approach for power quality improvement of solar-PV-interfaced distribution system. IEEE Trans. Ind. Inf. 16(2), 790–801 (2019)
Chapter 38
Stationary Wavelet-Oriented Luminance Enhancement Approach for Brain Tumor Detection with Multi-modality Images A. Ahilan, M. Anlin Sahaya Tinu, A. Jasmine Gnana Malar, and B. Muthu Kumar
Abstract A brain tumor is an abnormal growth of cells in the brain. However, manually detecting brain tumors is hard because it is hard to find erratically shaped tumors with only one modality and time-consuming. In this work, a novel stationary wavelet-oriented luminance enhancement (SOLE) approach to denoise the multimodal images. Initially, medical images like MRI, CT, and PET are gathered from publicly available datasets. These multi-modality images are divided into low and high-frequency sub-images using stationary wavelet transform (SWT), which has the advantage of preserving temporal features so that information loss can be stopped. Then the low-frequency and high-frequency images are processed with distribution and denoising modules to remove the noise, respectively. The approximation coefficient is pre-processed using multi-scale retinex with gamma correction for efficaciously retrieving the noise-free image. Consequently, the remaining coefficients are pre-processed using a multi-scale Gaussian bilateral filter, and tracking wavelet denoising (TWD) algorithm in the denoising module dynamically enhanced the color detail information without human intervention so that observed image contrast and visibility are well preserved. Lastly, noise-free image is reconstructed from sub-enhanced images using Inverse-SWT to detect brain tumors. Experimental A. Ahilan (B) Department of Electronics and Communication Engineering, PSN College of Engineering and Technology, Tirunelveli, Tamil Nadu 627152, India e-mail: [email protected] A. Jasmine Gnana Malar Department of Electrical and Electronics Engineering, PSN College of Engineering and Technology, Tirunelveli, Tamil Nadu 627152, India e-mail: [email protected] M. Anlin Sahaya Tinu Department of Electronics and Communication Engineering, Anna University, Chennai, Tamil Nadu 600025, India B. Muthu Kumar Department of Computer Science and Engineering, School of Computing and Information Technology, REVA University, Kattigenahalli, Karnataka 560064, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_38
461
462
A. Ahilan et al.
results show that the proposed algorithm has a mean error rate of 0.03 compared to the other filters. The proposed SOLE technique achieves a less running time of 0.97 s, whereas other existing techniques such as K-SVD, DRAN, 2-stage CNN, and AMF-AWF achieve the running time of 6.9, 1.63, 1.52, and 8.75 s.
38.1 Introduction Brain cancer is an anomalous development of cells in the brain. Gliomas are the most normal type of brain cancer. There are two distinct forms of gliomas: lowgrade gliomas (LGG) and high-grade gliomas (HGG) [1]. Whereas LGG tumors are less invasive, HGG tumors are aggressive, growing, and fast encroaching on nearby tissues. For both men and women, this cancer is the tenth most common cause of death [2]. According to current figures, primary malignant brain tumors are predicted to cause the deaths of about 18,600 persons. Only significant X-ray radiation exposure can induce brain cancers, and some brain tumors are inherited [3]. A CT, ultrasound, or X-ray cannot reveal information that the MRI can. The test identifies tumors, assaults, and epilepsy in addition to tissue infections or disorders like inflammation or infection [4]. Brain tumor symptoms can include headache, nausea and vomiting, confusion, and peculiar tiredness as the tumor grows and suppresses nearby nerves or blood vessels. If a brain tumor is detected and treated at an early stage, its complications can be kept to a minimum [5]. Brain tumors must be accurately segregated in order to organize treatment and involve medical specialists. For an experienced professional, manually segmenting tumors will take a lot of time. For the diagnosis of cancer and the planning of a treatment strategy, brain tumor segmentation is essential. Automatic tumor segmentation is therefore required. The amount, structure, and location of the tumors vary widely, which makes segmentation a more difficult problem [6, 7]. When evaluating gliomas because they can provide extra information, MRI, PET, and CT pictures provide level images with varied intensities [8]. The use of machine learning algorithms allowed for the early diagnosis of brain cancers. The primary advantage of deep learning is its ability to properly identify the most prognostic features from the raw data from the provided set of labeled photos [9]. That is a significant benefit of deep learning compared to other shallow learning techniques. It takes a lot of time and effort to identify, separate, and categorize malignant regions in pictures of brain tumors [10, 11]. Principles of image processing aid in the visualization of the numerous human anatomical structures. It is quite difficult to gain insight into the aberrant structures of the human brain by utilizing simple imaging techniques [12]. The multi-modal imaging approach differentiates and clarifies the brain’s neuronal architecture. The key contribution of the proposed algorithm is summarized as follows: • The objective of this research is to present a novel SOLE pre-processing algorithm to denoise the medical images for brain tumor detection.
38 Stationary Wavelet-Oriented Luminance Enhancement Approach …
463
• Initially, the gathered medical images (CT, MRI, and PET) are divided into low and high-frequency sub-images using SWT. • In the denoising module, split images are pre-processed using different algorithms to dynamically enhanced the color detail information without human intervention so that observed image contrast and visibility are well preserved. • Consequently, noise-free image is reconstructed from sub-enhanced images using Inverse-SWT. • The efficacy of the proposed SOLE algorithm was assessed using specific metrics such as peak-to-noise ratio (PSNR) and mean square error (MSE). The remainder of this study was organized into five sections as follows: Section 38.2 outlines the related works, Sect. 38.3 includes the proposed SOLE pre-processing algorithm, Sect. 38.4 includes results and discussion, and finally, Sect. 38.5 enfolds with the conclusion and future work.
38.2 Literature Survey In recent days, several tools and techniques were introduced by researchers mainly to detect the brain tumor efficiently. Some of those pre-processing techniques are studied briefly in this section. In 2021, Rai et al. [13] introduced an unsupervised method for denoising medical images that builds denoised images by inferring noise characteristics from the given images. There are two data processing modules in this system: residual learning (RL), which learns the noise directly, and patch-based dictionaries, which learn noise indirectly. The sparse representation K-singular value decomposition (K-SVD) technique was used to train patch-based dictionaries. In 2020, Sharif et al. [14] designed a deep dynamic residual attention network (DRAN) to lessen the noise in the medical images. The deep network used in the suggested method leverages feature correlation known as the attention mechanism to combine spatially refined residual features with the attention mechanism. In 2020, Sang et al. [15] developed a multi-scale pooling DCNN for precise image recognition. The propagation of the layer in the DCNN allows for improved image efficiency. The use of block information has resulted in a better picture recognition result. The primary application of this technology has been in medical image processing. In 2019, Debnath and Talukdar [16] designed a method that combines memorybased learning of a specific database combined with twofold h histogram stretching is applied for faster and more accurate identification of tumor components in a 2D sliced image. By obviating the need for the traditional iterative computing technique, the proposed method significantly decreases computational time. In 2019, Chang et al. [17] devised a two-stage CNN model: the first stage was to improve image separation, and alternatively, the noise subnetwork acts as a noise estimator, providing the image subnetwork with adequate knowledge about noise and allowing us to deal with various noise levels and distributions. The network was also designed to have both short-term and long-term links to effectively propagate information between layers.
464
A. Ahilan et al.
In 2019, Dey et al. [18] proposed a two-stage image valuation tool to assess brain MR images obtained with the Flair and DW modalities. A multi-threshold approach backed by Shannon Entropy (SE) and Social-Group Optimization (SGO) was applied to pre-process the input photos. The image pre-processing uses a variety of techniques to extract the tumor portion, including active contour (AC), watershed, and region-growing segmentation [19]. ANFIS was then used to classify benign and malignant tumors. In 2018, Ali put forward an approach with fundamental filters to eliminate the additive noises present in the MRI. In addition to adding salt and pepper noise and Gaussian noise to the MRI image, the median filter algorithm was changed. Implementation of the median filter (MF), adaptive Wiener filter (AWF), and adaptive median filter (AMF) has been implemented for noise reduction. The additive noises found in the MRI images were eliminated using these filters. From this literature review, various pre-processing approaches were utilized that focused on denoising the brain multi-modality images. The existing approaches frequently failed to achieve high quality due to suboptimal learning techniques or complex geometric frameworks, and they may be ineffective. To address these limitations, the SOLE approach has been proposed to denoise the medical images for the early detection of brain tumors.
38.3 Proposed Method In this section, a novel SOLE pre-processing approach has been proposed to denoise the medical images for detecting brain tumors. The gathered medical images are divided into low and high-frequency sub-images using SWT and pre-processed using different algorithms to dynamically enhanced the detailed information. These noisefree images are reconstructed from sub-enhanced images using Inverse-SWT. The overall workflow = w of the proposed model is represented in Fig. 38.1.
38.3.1 2D-Stationary Wavelet Transform The SWT addresses the lack of translation invariance in the discrete wavelet transform (DWT). At each level, the high-pass and low-pass filters produce sequences that are equal in length. Due to its time-invariant properties, SWT preserves the exact temporal features at each level of decomposition as the original signal. In SWT, zeros are inserted between filter taps rather than decimal points to eliminate repetitions and increase robustness. The input medical image (I) index set is considered a 2D[x, y], I[x, y] which depicts the xth column and yth row pixel. Figure 38.2 shows the representation of the first-order 2DSWT decomposed using Haar partitioning.
38 Stationary Wavelet-Oriented Luminance Enhancement Approach …
465
Fig. 38.1 Overall workflow of the proposed SOLE approach
Fig. 38.2 Portrayal of first-order 2DSWT decomposition of sample MRI image
SWT performs first-level 2D-SWT on the image to obtain the approximation coefficient (LL), horizontal coefficient (HL), vertical coefficient (LH), and diagonal coefficient (HH), respectively. Two wavelet sub-bands were extracted from the medical image using 2DSWT corresponding to each sub-band coefficient of the wavelet transforms. The detailed and approximation coefficients of the 2DSWT are represented as:
466
A. Ahilan et al.
C˜ i+1, j,n =
∞ ∑
h(u)h(u)C˜ i, j+2i ,n+2i V ,
(38.1)
h(u)h(v)d˜1,i, j+2i ,n+2i V ,
(38.2)
h(u)h(v)d˜2,i, j+2i ,n+2i V ,
(38.3)
h(u)h(v)d˜3,i, j+2i ,n+2i V ,
(38.4)
U =−∞
d˜1,i+1, j,n =
∞ ∑ U =−∞
d˜2,i+1, j,n =
∞ ∑ U =−∞
d˜3,i+1, j,n =
∞ ∑ U =−∞
where Ci, j and di, j represent approximation and detailed coefficients, respectively. The concatenation of the four sub-bands after 2DSWT decomposition is always the exact size as the source input image. By reversing the procedures, 2D Inverse-SWT (2DISWT) can be traced back to 2DSWT.
38.3.2 Multi-scale Retinex The multi-scale retinex technique is a form of image luminance adjustment that takes into account both image dynamic area reduction and image color stability. This process was carried out for colored images, and it was derived as: Rmsr, j (x, y) =
n ∑
wi Ri, j (x, y),
(38.5)
i−1
where Rmsr,j is the input image of the ith channel; n is the number of scales; j denotes the jth channel; i represents the ith scale; Ri,j (x, y) is the image luminance compensation fallout of the jth channel in the ith scale, and wi is the image luminance compensation weight in the ith scale, that regularly engaged as wi = 1/n. Retinex with color reconstruction increases the luminance obtained by multi-scale retinex. During this image enhancement process, the edges of the images were made clearer.
38 Stationary Wavelet-Oriented Luminance Enhancement Approach …
467
38.3.3 Gamma Correction Algorithm 1: Pseudocode of the proposed SOLE algorithm Input: Raw multi-modality medical images Imri [x, y], Ipet [x, y] and Ict [x, y] Output: Pre-processed medical images SWT is used to decompose the Imri [x, y], Ipet [x, y] and Ict [x, y] into approximate (Ci, j ) and detailed (di, j ) components Compute the brightness B(u, v) of Imri [u, v], Ipet [u, v] and Ict [u, v] Compute the reflection R(u, v) of Imri [u, v], Ipet [u, v] and Ict [u, v] if Gamma correction is satisfied then ( )γ Il' = Im IIm update R(u, v) Using Il' (u, v) and Ir' (u, v) with fusion to get noise free approximated image end if Using TWD algorithm update the di, j components Using ISWT to concatenate decomposed components to get pre-processed images return pre-processed multi-modality images
The gamma correction is applied after the weighted distribution is mapped. The normalized cumulative density function (cdf) is used to apply gamma correction in this approach, and it is carried out as follows: T (l) = lmax (l/lmax )γ = lmax (l/lmax )l−cdf(l) ,
(38.6)
where the gamma equation is calculated as: γ = l − cdfw (l),
(38.7)
where the distribution function is represented as cdfw , the max signifies the maximizing operator, and l is the pixel intensity. The limited distribution function lowers the image contrast of brightness pixels while increasing the contrast of low-contrast pixels.
38.3.4 Multi-scale Gaussian Bilateral Filter The high-frequency characteristics, such as boundary features and noises, are contained in the reflection component. Therefore, multi-scale Gaussian bilateral filter is used to enrich the features and denoise the medical images to obtain the corrected reflection image (Ir' ). A1 = G 1 × P A2 = G 2 × P A3 = G 3 × P
(38.8)
468
A. Ahilan et al.
Penhanced = h 1 (P − G 1 ) + h 2 ( p − G 2 ) + h 3 ( p − G 3 ) + P.
(38.9)
The P stands for the recovered dehazed image, while the Penhanced stands for the upgraded image. h 1 , h 2 , and h 3 represent the weight parameters and the Gaussian kernels are G 1 , G 2 , and G 3 . Bilateral filter is illustrated as: BF[I ]U =
( ) 1 ∑ G σ ||U − V ||G σs (||IU − I V ||)I V , NU U ∈Ω t
(38.10)
where U represent the center pixel, v is the neighborhood pixel of the u, the normalization factor is NU , G σt is depicts the Gaussian function which relevant to the pixels spatial distance, and G σs represents the pixel values difference. Finally, the improved luminance component is merged with the feature-rich reflection component to produce a potential haze-free approximation image.
38.3.5 Tracking Algorithm The high-frequency input images undergo a series of pre-processing procedures to turn them into a format appropriate for further analysis. The high-frequency images, which include a lot of undesired data (noise), as well as the genuine noise in MRI, CT, and PET, are removed sequentially in this phase. The artifacts such as date, name, age, gender, and other textual data are eliminated using a tracking algorithm. After the image is void of all textual content, imaging techniques are used to remove noise. The wavelet denoising algorithm is used to eliminate noise from the SWT detailed coefficients, after which the texture and edge features are boosted using the TWD algorithm. The universal threshold is employed in wavelet denoising, which preserves the coefficients below the threshold and eliminates or makes zero the coefficients above it. √ TWDv = σ 2 log N ,
(38.11)
where σ represent the noise variance and N is the test image pixel count. Finally, the decomposed coefficients are integrated with ISWT and formed the denoised multi-modality images.
38.4 Results and Discussion The experimental setup of this study was employed by using MATLAB 2019b, a deep learning toolbox. In this result analysis, the multi-modality such as MRI, CT, and PET images from the publicly available Kaggle dataset was used for pre-processing
38 Stationary Wavelet-Oriented Luminance Enhancement Approach …
469
Fig. 38.3 Results of SOLE pre-processing algorithm
the medical images. The competence of the proposed SOLE algorithm is assessed by the specific parameters to detect the brain tumor. Furthermore, the evaluation of the proposed SOLE compared with classic pre-processing models is also provided. Figure 38.3 depicts the results of the proposed SOLE with the sample of three sets after pre-processing the multi-modality medical images. The medical images from the SWT are further processed using different techniques by eliminating the unwanted distortions in the images. The process was carried out in several steps by separating the images into low and high-luminance images. The output images of the SOLE algorithm show the visible reduction of unwanted noise distortions in the images from the original database. The pre-processed signal produced has a clear view of filtering the multi-modality images.
38.4.1 Performance Analysis Papers—The performance of proposed denoising techniques is evaluated through standard processing metrics like mean square error (MSE) and peak signal–noise ratio (PSNR). The statistical evaluation of the parameters is given below:
470
A. Ahilan et al.
Table 38.1 Performance evaluation of proposed approach Noise ratio
Parameters
Proposed
%1
PSNR
68.04
MSE %3
PSNR MSE
N −1 )2 1 ∑( p(s) − p(s) ˆ N s=o [ ] max2 psnr = 10 log10 , mse
mse =
0.052 69.35 0.058
(38.12)
(38.13)
where N is the total number of images, p(s) denotes the acquired value, and p(s) ˆ represents the predicted value. The high PSNR value is the indication of denoising and the low MSE value is the indication protection edge information of the multi-modality images. Table 38.1 displays the performance study of proposed algorithm, in which this algorithm has the MSE value of 0.052 for 1% noise ratio and 0.058 for 3% noise ratio. The error rate of the proposed model was low as portrayed with the SWT. From the examination, we accomplish that the proposed SOLE approach gives the low error rate. The experimental fallouts of Table 38.1 symbolize that the proposed SOLE algorithm has the least MSE value of 0.05, which intended the proposed model attains low error rate.
38.4.2 Comparative Analysis The comparative analysis section analyzes the performance of existing and proposed SOLE algorithm. This section evaluates the performance of the various denoising filters. Table 38.2 shows the performance of existing denoising algorithms and the proposed SOLE algorithm for multi-modality images. Table 38.2 shows the performance analysis of different image denoising filters such as median filter, notch filter, mean filter, and multi-scale Gaussian bilateral filter. The MSGB filter attains 0.052, 0.058, and 0.043 error rates for the noise ratio of %1, %3, and %5, respectively. On the other hand, the MSGB filter obtains the PSNR values of 68.04, 69.35, and 68.18 for the noise ratio of %1, %3, and %5 respectively. From the analysis, we enclosed that the MSGB filter gives the minimum MSE compared to other denoising techniques. Figure 38.4 depicts the comparison between the traditional filters and MSGB filters based on the PSNR and MSE. This result portrays that the MSGB filter has
38 Stationary Wavelet-Oriented Luminance Enhancement Approach …
471
Table 38.2 Performance comparison analysis of traditional denoising filters Noise ratio %1 %3 %5
Performance metrics
Denoising techniques Median filter
Notch filter
Mean filter
MSGB filter
PSNR
41.25
56.13
62.17
68.04
MSE
0.89
0.84
0.15
PSNR
53.22
58.25
64.33
MSE
0.53
0.28
0.08
PSNR
55.24
67.43
67.49
MSE
0.49
0.17
0.08
0.052 69.35 0.058 68.18 0.043
the least MSE value, in which the proposed SOLE approach yields a low error rate for different noise ratios when compared to the other denoising filters. The experimental duration of sample images in the testing phase was counted to determine the average process time of different techniques, as given in Table 38.3. As can be shown in Table 38.3, our technique is clearly superior to other methods. Techniques like DRAN [14] and 2-stage CNN [17] are close to the proposed model due to their enhanced model. On the other hand, the running time of AMF with AWF [19] is very high compared to the approaches such as K-SVD [13], DRAN [14], and 2-stage CNN [17]. As a result, this strategy outperforms others in terms of overall efficiency,
(a)
(b)
Fig. 38.4 Performance analysis of traditional denoising filters based on a PSNR and b MSE
Table 38.3 Comparison of different methods time computation
Authors
Methods
Running time (s)
Rai et al. [13]
K-SVD
6.94
Sharif et al. [14]
DRAN
1.63
Chang et al. [17]
2 stage CNN
1.52
Ali [19]
AMF-AWF
8.75
Proposed
SOLE
0.97
472
A. Ahilan et al.
performance, and computing complexity. The proposed SOLE model provides better pre-processing results at a low error rate and running time without compromising on accuracy.
38.5 Conclusion In this work, a novel stationary wavelet-oriented luminance enhancement (SOLE) approach to denoise the multi-modal images. The multi-modality images are divided into low and high-frequency sub-images using SWT which has the advantage of preserving temporal features so that information loss can be stopped. Then the low-frequency and high-frequency images are processed with distribution and denoising modules to remove the noise, respectively. The approximation coefficient is pre-processed using multi-scale retinex with gamma correction for efficaciously retrieving the noise-free image. Consequently, the remaining coefficients are preprocessed using a multi-scale Gaussian bilateral filter and the TWD algorithm in the denoising module dynamically enhanced the color detail information without human intervention so that observed image contrast and visibility are well preserved. Lastly, noise-free image is reconstructed from sub-enhanced images using Inverse-SWT to detect brain tumors. The experimental fallouts show that the mean error rate of the proposed algorithm is 0.03 which is significantly low compared to other filters. The proposed SOLE approach attains better pre-processing results at a low error rate and running time with satisfactory accuracy. In the future, the proposed approach can be upgraded to reconstruct noise-free images on run time from different medical domains for early diagnosis of brain diseases.
References 1. Amin, J., Sharif, M., Haldorai, A., Yasmin, M., Nayak, R.S.: Brain tumor detection and classification using machine learning: a comprehensive survey. Complex Intell. Syst. 1–23 (2021) 2. Nazir, M., Shakil, S., Khurshid, K.: Role of deep learning in brain tumor detection and classification (2015 to 2020): a review. Comput. Med. Imaging Graph. 91, 101940 (2021) 3. Hu, A., Razmjooy, N.: Brain tumor diagnosis based on metaheuristics and deep learning. Int. J. Imaging Syst. Technol. 31, 657–669 (2021) 4. Irmak, E.: Multi-classification of brain tumor MRI images using deep convolutional neural network with fully optimized framework. Iran. J. Sci. Technol. Trans. Electr. Eng. 45, 1015– 1036 (2021) 5. Khan, A.R., Khan, S., Harouni, M., Abbasi, R., Iqbal, S., Mehmood, Z.: Brain tumor segmentation using K-means clustering and deep learning with synthetic data augmentation for classification. Microsc. Res. Techn. 84, 1389–1399 (2021) 6. Ismael, S.A.A., Mohammed, A., Hefny, H.: An enhanced deep learning approach for brain cancer MRI images classification using residual networks. Artif. Intell. Med. 102, 101779 (2020)
38 Stationary Wavelet-Oriented Luminance Enhancement Approach …
473
7. Narmatha, C., Eljack, S.M., Tuka, A.A.R.M., Manimurugan, S., Mustafa, M.: A hybrid fuzzy brain-storm optimization algorithm for the classification of brain tumor MRI images. J. Ambient Intell. Hum. Comput. 1–9 (2020) ˇ Classification of brain tumors from MRI images using a 8. Badža, M.M., Barjaktarovi´c, M.C: convolutional neural network. Appl. Sci. 10, 1999 (2020) 9. Ghassemi, N., Shoeibi, A., Rouhani, M.: Deep neural network with generative adversarial networks pre-training for brain tumor classification based on MR images. Biomed. Signal Process. Control 57, 101678 (2020) 10. Wadhwa, A., Bhardwaj, A., Verma, V.S.: A review on brain tumor segmentation of MRI images. Magn. Reson. Imaging 61, 247–259 (2019) 11. Acharya, U.R., Fernandes, S.L., WeiKoh, J.E., Ciaccio, E.J., Fabell, M.K.M., Tanik, U.J., Rajinikanth, V., Yeong, C.H.: Automated detection of Alzheimer’s disease using brain MRI images—a study with various feature extraction techniques. J. Med. Syst. 43(9), 1–14 (2019) 12. Saba, T., Mohamed, A.S., El-Affendi, M., Amin, J., Sharif, M.: Brain tumor detection using fusion of hand crafted and deep learning features. Cognit. Syst. Res. 59, 221–230 (2020) 13. Rai, S., Bhatt, J.S., Patra, S.K.: An unsupervised deep learning framework for medical image denoising. arXiv preprint arXiv:2103.06575 (2021) 14. Sharif, S.M.A., Naqvi, R.A., Biswas, M.: Learning medical image denoising with deep dynamic residual attention network. Mathematics 8, 2192 (2020) 15. Sang, H., Xiang, L., Chen, S., Chen, B., Yan, L.: Image recognition based on multiscale pooling deep convolution neural networks. Complexity 2020 (2020) 16. Debnath, S., Talukdar, F.A.: Brain tumour segmentation using memory-based learning method. Multimed. Tools Appl. 78, 23689–23706 (2019) 17. Chang, Y., Yan, L., Chen, M., Fang, H., Zhong, S.: Two-stage convolutional neural network for medical noise removal via image decomposition. IEEE Trans. Instrum. Meas. 69, 2707–2721 (2019) 18. Dey, N., Rajinikanth, V., Shi, F., Tavares, J.M.R., Moraru, L., Karthik, K.A., Lin, H., Kamalanand, K., Emmanuel, C.: Social-group-optimization based tumor evaluation tool for clinical brain MRI of Flair/diffusion-weighted modality. Biocybern. Biomed. Eng. 39, 843–856 (2019) 19. Ali, H.M.: MRI medical image denoising by fundamental filters. In: High-Resolution Neuroimaging—Basic Physical Principles and Clinical Applications, vol. 14, pp. 111–124 (2018)
Chapter 39
Multi Parameter Machine Learning-Based Maternal Healthiness Classification System Rajkumar Ettiyan and V. Geetha
Abstract During pregnancy is needed to identify any problems early and ensure maternal and foetal health and well-being. But most people are lived in rural areas so, lack of regular checkups at the starting time of pregnancy time. So, number of unexpected deaths count has increased during delivery. In this paper, a novel Maternal Monitoring System (MMS) has been proposed for monitoring pregnant women at home, that helps in monitoring the maternal and foetal health such as body temperature, foetal heart rate, blood pressure, respiration rate, foetal movement, PPG, and abdominal ECG. The input data from the MIT-BIH arrhythmia dataset are preprocessed by using Stationary Wavelet Transform (SWT) it remove unwanted noise in the signal. Denoised signals are decomposition using Variational Mode Decomposition (VMD) which is identify AECG signal into its constituent maternal and foetal components. After identify, the both signals are classified by using Support Vector Machine (SVM) it produces the best classification results, regardless of whether the maternal or foetal are healthy or unhealthy. According to the experimental findings, the proposed MMS model’s overall accuracy was 98.9%. The proposed MMS model was compared with other traditional models like KNN, MLE, and Navy Bayes obtains less accuracy compared to SVM. SVM maintains the high accuracy of 99%. The accuracy rate obtained by the SVM is more efficient than the existing model, respectively.
R. Ettiyan (B) Department of Computer Science and Engineering, Puducherry Technological University, Puducherry, India e-mail: [email protected] V. Geetha Department of Information Technology, Puducherry Technological University, Puducherry, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_39
475
476
R. Ettiyan and V. Geetha
39.1 Introduction Most of the pregnancy women become aware of their lifestyle and how it affects the foetus during pregnancy, which is a rare occurrence [1]. Pregnant women cannot receive affordable health monitoring services in typical situations. Recently, remote health monitoring technologies have been shown to enhance pregnancy-related health outcomes for both maternal and foetal [2]. An important aspect of women’s health examinations is the measurement of baby kick counts during pregnancy. Pregnant women are monitored at a number of medical facilities throughout this time [3]. The person must frequently measure certain medical parameters, including blood pressure, weight, foetal movement, pulse, and other standards. Maternal deaths among pregnant women have increased in recent years. Hospitals, expertise, and awareness are lacking in many rural areas [4]. They did not perform their routine checkup because there were no hospitals around. Additionally, avoiding routine checkups at this stage of pregnancy is hampered by time and distance. The rate of abnormal children being born and the rate of foetal mortality can both be decreased with routine checkups [5]. During this time, it is possible for a variety of issues, including weight gain, gestational diabetes, bleeding, and variations in blood caused by the mother’s movements [6]. At least twice throughout the process, ultrasound scans should be performed on women during pregnancy. They are frequently unable to receive immediate care [7]. Therefore, giving pregnant women prompt, effective medical care will increase the likelihood that a healthy baby will be born. We can identify some issues with the ultrasound scan approach since it exposes the developing child to heat energy, which can result in birth defects [8]. To overcome these above problems, a novel MMS innovative pregnancy monitoring in the healthcare systems has been proposed. The major contributions are: • MMS helps monitor the maternal and foetal health indicators such as body temperature, foetal heart rate, blood pressure, respiration rate, foetal movement, PPG, and abdominal ECG signals. • VMD is identify AECG signal into its constituent maternal and foetal components. After identify, the both signals are classified by using SVM it produces the best classification results, regardless of whether the maternal or foetal are healthy or unhealthy. • MMS helps to find any problem of maternal and foetal at beginning stage. And, the number of unexpected deaths count will be decreased during delivery. The remaining portions of the analysis are structured as follows: Sect. 39.2 describe the literature survey in detail. Section 39.3 describes MMS. The result is given in Sect. 39.4 and finally, Sect. 39.5 describe the conclusion. We would like to draw your attention to the fact that it is not possible to modify a paper in any way, once it has been published. This applies to both the printed book and the online version of the publication. Every detail, including the order of the names of the authors, should be checked before the paper is sent to the Volume Editors.
39 Multi Parameter Machine Learning-Based Maternal Healthiness …
477
39.2 Literature Survey Pregnant women cannot perform their routine checkups at the beginning of pregnancy, which results in a greater death rate for both parents and new-born babies in both urban and rural areas. The women are dealing with serious medical problems as a result of this situation. An overview of a few recent advanced techniques for regular pregnant monitoring system within this part. In 2019, Yuan et al. [9] suggested an Android-based system for monitoring foetal ECGs. It creates a portable, low-power foetal ECG collector that continuously records signals from the mother’s abdomen ECG. The research has shown that the foetal ECG can be clearly extracted by the FastICA algorithm, and that the sample entropy can precisely determine the channel into which the foetal ECG is inserted. The suggested foetal ECG monitoring system may be helpful for non-invasive, real-time monitoring of foetal ECGs. In 2020, Hasan and Ismaeel [10] suggested ECG monitoring system made up of an Arduino Uno, an ESP8266 Wi-Fi module, and an IoT Blynk application make up the. The doctor can monitor the patient remotely by adopting the suggested ECG healthcare system and utilising the IoT Blynk application, which is downloaded to his smartphone and analyses and visualises the patient’s ECG signal. The monitoring method can be completed anywhere, at any time, without visiting a hospital. In 2020, Hema and Anil [11] suggested a tool for monitoring the foetal heart rate during pregnancy. The main tool for this detection is a foetal digital stethoscope sensor, which should be applied to the pregnant woman’s abdomen. The accurate foetal heart rate is detected and sent as a text message to the appropriate mobile phone via a GSM module, and the simulation of uterine contractions can also be displayed on a desktop via an EMG sensor. The patient can monitor the foetal heart rate at home with the help of this adaptable and affordable equipment. In 2021, Li et al. [12] proposed an SMHS with essential technologies, including wearables and cloud computing. It looks into its uses as well as monitoring and management techniques in hospitals’ home obstetrics units. The smart maternal platform, which focuses on pregnant women, boosts productivity, makes it easier for pregnant women to visit doctors, and raises the standard of obstetrical care. In 2021, Bjelica et al. [13] proposed an IT ecosystem for prenatal care based on the fusion of different services in an e-health ecosystem that is semantically enhanced. It was used to gather information from participants and gauge how well the new system was received. Results indicate that the system is of high quality, usable, and useful, and that both expectant mothers and doctors are prepared for more extensive use of the system. In 2022, Raza et al. [14] proposed a DT-BiLTCN to create a system based on an artificial neural network to forecast hazards to maternal health using medical records. Utilising an risk monitoring system, 1218 samples total from maternal health care, hospitals, and community clinics were collected for the study. The SVM in this instance delivered high accuracy results with 98% accuracy thanks to the feature set that DT-BiLTCN offers.
478
R. Ettiyan and V. Geetha
In 2022, Preethi and Bhagyaveni [15] proposed an HPMD that consists of a smartphone application for pregnancy health monitoring and a wearable abdomen patch that connects to an analytical H-IoT platform. The non-invasive assessment of the foetal heart rate (FHR), foetal presentation, and uterine contractions to detect genuine labour and ensure the foetus’s safety. The parameters are shown in the PHM mobile app, which also has live visualisation plots of the EMG and FECG signals for tracking the health of the mother and foetus in settings where they are free to move around. These methods outperform those previously developed but they have some drawbacks such as a premature birth rate, increasing surgery rate and death rate. To overcome the above drawbacks, a novel MMS has been suggested in this paper.
39.3 Proposed Method In this section, a novel MMS is design to monitoring the pregnant women. Figure 39.1 illustrate the overview of MMS for high-risk pregnant women at home. This device can monitor maternal and foetal health parameters such as body temperature, foetal heart rate, blood pressure, respiration rate, foetal movement, PPG, and abdominal ECG. The input data from the dataset are pre-processed by using SWT it removes unwanted noise in the signal. Denoised signals are decomposition using VMD which is identify AECG signal into its constituent maternal and foetal components. After identify, the both signals are classified by using SVM it produces the best classification results, regardless of whether the maternal or foetal are healthy or unhealthy. It consists of some major steps such as pre-processing, decomposition techniques, and classification is described below.
39.3.1 MIT-BIH Arrhythmia Dataset Kindly the most widely used dataset for arrhythmias is the MIT-BIH dataset, which is also used for arrhythmia detection. It has 48 records, each of which is a 30-min recording of heartbeat signals with a sampling frequency of 360 Hz. The individuals in the records range in age and gender. Experts have labelled the heartbeats and R-peak locations and linked them to the dataset; the training and evaluation phases have used these annotations and locations as the ground truth. The databases have an annotation file that lists each record’s heartbeat type and position in the “QRS” complex. These heartbeat class annotations were used as reference annotations for the proposed model’s evaluation.
39 Multi Parameter Machine Learning-Based Maternal Healthiness …
479
Fig. 39.1 Overview of Maternal Monitoring System (MMS)
39.3.1.1
Data Acquisition
In this stage, all the input data taken from MIT-BIH arrhythmia such as body temperature, respiration rate, foetal movement, and abdominal ECG. FECG was extracted from the composite AECG signal acquired from mother’s abdomen. It is used to find the acquisition of various maternal and foetal physiological.
39.3.2 Pre-processing 39.3.2.1
Denoising
It is the process of removal of unwanted high-frequency signals. Power Line Interference (PLI) is typical high-frequency noise found in most of the physiological parameters. Since noises are challenging to interpret with the human eye, the suggested MMS uses a method called SWT. The process used to denoise a physiological signal is called SWT. The wavelets are capable of revealing details including signal trends, discontinuities, and signal noise. A modified variant of the wavelet transform is the SWT. It can decompose time-invariant signals without sacrificing any of their information. It divides an y-valued signal into approximation and detailed coefficient sets. Equations 39.1 and 39.2 can be used to calculate the approximation and detailed coefficients, where i and j are the number of levels of decomposition and position, respectively, and cAPi, j and cDCi, j respectively, stand for the approximation and detail coefficients.
480
R. Ettiyan and V. Geetha
cAPi, j =
n ∑
cAPi−1, j+2i (n) h(n)
(39.1)
cDCi−1, j+2i (n) f (n)
(39.2)
k=1
cDCi, j =
n ∑ k=1
The signal was subsequently divided into various frequency resolutions and subjected to soft thresholding. To separate noise from the signal, thresholding is used. The inverse Stationary Wavelet Transform was used to rebuild the decomposed signal after that ISWT. The denoised ECG signal is generated via ISWT.
39.3.3 Decomposition Techniques 39.3.3.1
VMD
VMD is a non-recursive signal processing technique used to split up a multicomponent signal into a finite number of band-restricted sub-signals, also referred to as “modes”. The purpose of the VMD algorithm is to discretize a real-valued input signal f (t) into K discrete modes u k (t) where each mode must be as compact as possible around a centre frequency of k that is chosen throughout the decomposition process. Figure 39.2 shows VMD based decomposition of an AECG signal into its constituent maternal and foetal components.
39.3.4 Classification The standard deviations of constituent maternal and foetal components are input as features to a Support Vector Machine (SVM). SVM uses the structural risk minimisation method as its foundation to categorise maternal—foetal health conditions. The SVM classifier was used once more to predict the likelihood of new-borns developing metabolic acidosis, identify maternal and foetal distress, and differentiate between healthy and hypoxic foetuses. To categorise the data instances in the testing set that only include features, SVM should build a model. The categorisation is determined by the inner-product kernel, which generates various learning machines and hence various decision limits. Equation 39.3 was employed to generate credible findings using the radial basis function (RBF) kernel based on statistical features collected from FHR signals. C and γ are user-specified parameters. |)2 ) (| ( S y j , yk = γ exp | y j − yk |
(39.3)
39 Multi Parameter Machine Learning-Based Maternal Healthiness …
481
Fig. 39.2 VMD based decomposition of an AECG signal into its constituent maternal and foetal components
39.4 Result and Discussion In this section, the proposed MMS is used to monitor pregnant women. It helps monitor the maternal and foetal health indicators such as body temperature, foetal heart rate, blood pressure, respiration rate, foetal movement, PPG, and abdominal ECG. The proposed MMS will test with sample of different data collected from pregnancy women as shown in Fig. 39.3. Figure 39.3 illustrate a sample of data collected from pregnancy women. It features the ability to record a PPG signal for heart rate monitoring. It includes an inertial measuring unit (IMU) to monitor hand motion, heart rate, physical activity, and health decisions. A new data record was available every 15 min since the data collection rate was set to 1 sample every 15 min. The pregnant mothers were requested to provide the data to the doctor on a regular basis via a smartphone or a computer. The majority of the data processing was done on cloud servers, combining data to derive new knowledge like health status and physical activity.
482
R. Ettiyan and V. Geetha
Fig. 39.3 A sample of data collected from pregnancy women
39.4.1 Performance Analysis In this section, the performance analysis was calculated based on, accuracy, recall, F1-score, specificity, and precision. accuracy =
TP + TN TP + TN + FP + FN
Specificity =
TN TN + FP
(39.4) (39.5)
39 Multi Parameter Machine Learning-Based Maternal Healthiness …
483
Table 39.1 Comparison of healthy and unhealthy in proposed model Accuracy
Specificity
Precision
Recall
F1-score
Unhealthy
0.987
0.965
0.931
0.965
0.934
Healthy
0.991
0.977
0.954
0.987
0.969
Fig. 39.4 Performance of proposed MMS
Precision = recall =
TP TP + FP
TP TP + FN
(39.6) (39.7)
where false-positives, false-negatives, true-positives, and true-negatives, are denoted by the letters FP, FN, TP, and TF (Table 39.1). Figure 39.4 illustrate the performance of suggested MMS for two classes, it includes healthy, and unhealthy. The suggested MMS achieved higher accuracy of 0.991, and 0.987 for healthy, and unhealthy. The suggested approach achieved higher specificity of 0.977, and 0.965 for healthy, and unhealthy and precision is 0.954, and 0.931 for healthy, and unhealthy. The proposed approach achieved higher recall of 0.987, and 0.965 and F1 score is 0.969, and 0.934 for healthy, and unhealthy.
39.4.2 Comparative Analysis Figure 39.5 illustrate RMSE values for health score predictions using different techniques, where the x axis ranges from 15 min to 6 h. The RMSE values of the suggested SVM approach are the lowest, whereas those of the KNN and MLE methods are dramatically higher.
484
R. Ettiyan and V. Geetha
Fig. 39.5 RMSE values for health scores predictions using various techniques
Fig. 39.6 Comparison between traditional machine learning techniques
Figure 39.6 illustrate traditional networks like KNN, MLE, and Navy Bayes obtains less accuracy compared to SVM. SVM maintains the high accuracy of 99%. The accuracy rate is obtained by the SVM is more efficient than the existing models. Thus, it is clearly seen that MMS is identify maternal or foetal are healthy or unhealthy.
39.5 Conclusion In this paper, a novel Maternal Monitoring System (MMS) has been proposed for monitoring pregnant women at home, that helps in monitoring the maternal and foetal health indicators such as body temperature, foetal heart rate, blood pressure, respiration rate, foetal movement, PPG, and abdominal ECG. The input data from the MITBIH arrhythmia dataset are pre-processed by using Stationary Wavelet Transform (SWT) it remove unwanted noise in the signal. Denoised signals are decomposition using Variational Mode Decomposition (VMD) which is identify AECG signal into its constituent maternal and foetal components. After identify, the both signals are
39 Multi Parameter Machine Learning-Based Maternal Healthiness …
485
classified by using Support Vector Machine (SVM) it produces the best classification results, regardless of whether the maternal or foetal are healthy or unhealthy. The experimental result shows, the proposed MMS model achieved overall accuracy is 98.9%. The proposed MMS model was compared with other traditional models like KNN, MLE, and Navy Bayes obtains less accuracy compared to SVM. SVM preserves the high accuracy ranges of 99%. The accuracy rate is obtained by the SVM is more efficient than the existing model, respectively. The Future work of the project is very essential in order to make the design system more advanced. In the intended system the enrichment would be involving more sensors to the Internet that measure a variety of other health parameters of pregnant women and would be advantageous for pregnant women monitoring.
References 1. Morris, T., Strömmer, S., Vogel, C., Harvey, N.C., Cooper, C., Inskip, H., Woods-Townsend, K., Baird, J., Barker, M., Lawrence, W.: Improving pregnant women’s diet and physical activity behaviours: the emergent role of health identity. BMC Pregnancy Childbirth 20, 1–12 (2020) 2. Lori, J.R., Perosky, J., Munro-Kramer, M.L., Veliz, P., Musonda, G., Kaunda, J., Boyd, C.J., Bonawitz, R., Biemba, G., Ngoma, T., Scott, N.: Maternity waiting homes as part of a comprehensive approach to maternal and newborn care: a cross-sectional survey. BMC Pregnancy Childbirth 19(1), 1–10 (2019) 3. Priyanka, B., Kalaivanan, V.M., Pavish, R.A., Kanageshwaran, M.: IOT based pregnancy women health monitoring system for prenatal care. In: 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS), Mar 2021, vol. 1, pp. 1264–1269. IEEE (2021) 4. Sk, M.I.K., Paswan, B., Anand, A., Mondal, N.A.: Praying until death: revisiting three delays model to contextualize the socio-cultural factors associated with maternal deaths in a region with high prevalence of eclampsia in India. BMC Pregnancy Childbirth 19, 1–11 (2019) 5. King, C., Mancao, H.J.: Special supplemental nutrition programme for women, infants and children participation and unmet health care needs among young children. Child Care Health Dev. 48(4), 552–557 (2022) 6. Main, E.K., Chang, S.C., Dhurjati, R., Cape, V., Profit, J., Gould, J.B.: Reduction in racial disparities in severe maternal morbidity from hemorrhage in a large-scale quality improvement collaborative. Am. J. Obstet. Gynecol. 223(1), 123-e1 (2020) 7. Bidmead, E., Lie, M., Marshall, A., Robson, S., Smith, V.J.: Service user and staff acceptance of fetal ultrasound telemedicine. Digit. Health 6, 2055207620925929 (2020) 8. Qiu, Q., Huang, Y., Zhang, B., Huang, D., Chen, X., Fan, Z., Lin, J., Yang, W., Wang, K., Qu, N., Li, J.: Noninvasive dual-modality photoacoustic-ultrasonic imaging to detect mammalian embryo abnormalities after prenatal exposure to methylmercury chloride (MMC): a mouse study. Environ. Health Perspect. 130(2), 027002 (2022) 9. Yuan, L., Yuan, Y., Zhou, Z., Bai, Y., Wu, S.: A fetal ECG monitoring system based on the android smartphone. Sensors 19(3), 446 (2019) 10. Hasan, D., Ismaeel, A.: Designing ECG monitoring healthcare system based on internet of things Blynk application. J. Appl. Sci. Technol. Trends 1(3), 106–111 (2020) 11. Hema, L.K., Anil, A.: Pregnant women health monitoring system using embedded system. IOP Conf. Ser. Mater. Sci. Eng. 993(1), 012078. IOP Publishing (2020) 12. Li, X., Lu, Y., Fu, X., Qi, Y.: Building the Internet of Things platform for smart maternal healthcare services with wearable devices and cloud computing. Future Gener. Comput. Syst. 118, 282–296 (2021)
486
R. Ettiyan and V. Geetha
13. Bjelica, D., Bjelica, A., Despotovi´c-Zraki´c, M., Radenkovi´c, B., Bara´c, D., Ðogatovi´c, M.: Designing an IT ecosystem for pregnancy care management based on pervasive technologies. Healthcare 9(1), 12. Multidisciplinary Digital Publishing Institute (2021) 14. Raza, A., Siddiqui, H.U.R., Munir, K., Almutairi, M., Rustam, F., Ashraf, I.: Ensemble learning-based feature engineering to analyze maternal health during pregnancy and health risk prediction. PLoS ONE 17(11), e0276525 (2022) 15. Preethi, K., Bhagyaveni, M.A.: Design of H-IoT based pregnancy monitoring device in freeliving environment. In: Distributed Systems Integration. Technical report, Global Grid Forum (2022)
Chapter 40
Machine Learning-Based Brain Disease Classification Using EEG and MEG Signals A. Ahilan, J. Angel Sajani, A. Jasmine Gnana Malar, and B. Muthu Kumar
Abstract Electroencephalography (EEG) and Magneto-encephalography (MEG) are important tools for assessing brain activity that are being developed scientifically. EEG enables better clinical and healthcare services to meet the rising need for the early diagnosis of brain disease at cheap prices. In this paper, EEG and MEG signals are used as an input to detect brain cancer and strokes in its early stages. A singletrial channel data (STD), averaged channel data (ACT), and time–frequency data organisation of the channel data is required for the EEG/MEG signal projection from the channel to the source space (TFD). The signals are pre-processed using discrete wavelet transform to adaptive time–frequency resolution of analysis on nonstationary signals. Then, the signals are given as an input to the Fast Fourier Transform to get subsets of typical in-class “invariant” coefficients from wavelet coefficients (time– frequency information). Finally, the multi-class SVM is employed for classifying normal, cancer, and stroke cases using EEG and MEG signals. A quantitative analysis of the proposed method is conducted using parameters like accuracy, specificity, and precision. The proposed Dual signal classification model achieved higher accuracy is 92.59%. According to the proposed Dual signal classification model, Bootstrap models, SVM, XGBoost, and Fast Fourier Transform improve the overall accuracy by 4.61%, 0.16%, 15.77%, and 9.63%, respectively.
A. Ahilan (B) Department of Electronics and Communication Engineering, PSN College of Engineering and Technology, Tirunelveli, Tamil Nadu 627152, India e-mail: [email protected] J. Angel Sajani Department of Electronics and Communication Engineering, Anna University, Chennai, Tamil Nadu 600025, India A. Jasmine Gnana Malar Department of Electrical and Electronics Engineering, PSN College of Engineering and Technology, Tirunelveli, Tamil Nadu 627152, India B. Muthu Kumar Department of Computer Science and Engineering, School of Computing and Information Technology, REVA University, Kattigenahalli, Karnataka 560064, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_40
487
488
A. Ahilan et al.
40.1 Introduction One of the most crucial organs in the body, the brain independently and comprehensively regulates and keeps track of metabolic processes. Abnormalities in the brain, such as, ischaemic strokes, brain tumours, and epilepsy may affect normal biological processes. The doctor will suggest a suitable signal or imaging modality if a brain anomaly is found during the screening phase. Brain illnesses are often diagnosed using diagnostic techniques like Electroencephalography (EEG), Magnetoencephalography (MEG), Magnetic Resonance Imaging (MRI), Positron Emission Tomography (PET), and Computed Tomography (CT) [1]. Brain tumours frequently cause seizures, migraines, motor weakness, nausea, and abnormalities in vision. The most dreadful form of brain cancer is brain metastasis, in which the tumour keeps growing. Over 130 different types of brain tumours have been classified by the World Health Organisation, with glioblastoma, meningioma, CNS lymphoma, nonspecific glioma, and many more tumours being among the most prevalent ones [2]. In EEG investigations, the concept of “brain connection,” which describes a number of interrelated aspects of brain organisation, is of tremendous interest. Typically, it is separated into three groups: functioning, efficient, and anatomically or structurally connected brains [3]. The human brain is the most important part of the central nervous system. There are billions of cells making up this body, the most of which are neurons. Each neuron is made up of cellular bodies, axons, and dendrites. In response to the stimuli, they activate and transmit information by numerous channels to other neurons and organs, including muscles and gland cells [4]. Radiotherapy, surgery, and chemotherapy are a few of the treatment options accessible because each form of tumour can receive a unique course of treatment. But it’s important to consider the type of brain tumour, its size, rate of growth, and all of the previously mentioned contributing variables before beginning treatment. In order to comprehend survival, therapy, and prognosis in the present day, tumour histology and molecular diagnostics are unquestionably the most crucial elements to take into account [5]. Diseases such as, Alzheimer’s disease, dementia, human sleeping stages, driver’s fatigue stages, strokes, brain tumours, and seizures are among the conditions that can be detected by EEG signal classification. The following constitutes the study’s primary section: • In this research, EEG and MEG signals are used as an input to detect brain cancer and strokes in its early stages. • A single-trial channel data (STD), averaged channel data (ACT), and time– frequency data organisation of the channel data is required for the EEG/MEG signal projection from the channel to the source space (TFD). • The signals are pre-processed using discrete wavelet transform to adaptive time– frequency resolution of analysis on nonstationary signals. • Then, the signals are given as an input to the Fast Fourier Transform to get subsets of typical in-class “invariant” coefficients from wavelet coefficients (time–frequency information).
40 Machine Learning-Based Brain Disease Classification Using EEG …
489
• Finally, the multi-class SVM is employed for classifying normal, cancer, and stroke cases using EEG and MEG signals. • Accuracy, specificity, and precision are used as criteria in the quantitative study of the proposed approach. The remainder of this work was separated into the five sections listed below. The literature review is shown in Sect. 40.2, the proposed method is explained in Sect. 40.3, results and discussion are defined in Sect. 40.4, and the conclusion and references for future work are presented in Sect. 40.5.
40.2 Literature Survey In 2018, Qureshi et al. [6] has suggested an approach for detecting ischaemic strokes using machine learning and EEG signal analysis across multiple domains from wearable EEG devices. Multi-Layer Perceptron and Bootstrap models (Extra-Tree and Decision-Tree) can achieve test accuracy of 95% with an area under the ROC curve of 0.85 using data from 40 sick and 40 healthy individuals. In 2021, Bera [7] had suggested a number of brain ailments and diseases, including epilepsy, Alzheimer’s disease, Parkinson’s disease, stroke, brain cancer, and brain tumours, could be identified using EEG signals and the SVM technique, which demonstrated improved accuracy. With a classification accuracy of 99.44%, the suggested approach of hybrid characteristics distinguished between the innocent people and guilty. In 2022, Sawan et al. [8] had suggested that the wearable gadget “MUSE 2’s” EEG outputs be used to examine the usage of ML algorithms for the stroke diagnostic in medicine. With an accuracy rating of 83.89%, the XGBoost classifiers outscored other classifiers in the analysis of eight ML approaches. In 2020, McDermott et al. [9] presented a Dual-Frequency Symmetry Difference when used with symmetric sceneries, Electrical Impedance Tomography (MFSDEIT) may efficiently locate and identify unilateral perturbations. A study is being done to see if the approach can be used to successfully identify the aetiology of stroke using machine learning. With an average accuracy of 85%, it is able to recognise and differentiate between bleed and clot in human data using MFSD-EIT with Support Vector Machines classification. In 2021, Choi et al. [10] has created a health monitoring system that can identify senior people who are walking regularly and anticipate stroke disorders in real time. The six channels of raw EEG data from FFT were pre-processed using Fast Fourier Transform. According to the tests presented in this work, it is possible to identify stroke pioneers and incidence in senior people with greater than 90% accuracy using only key characteristics of EEG biometric data obtained while walking. In 2017, Adhi et al. [11] had presented the outcomes of automated identification of ischaemic stroke and healthy individuals using scaling exponent EEG and detrended fluctuation analysis using an extreme learning machine. In the 0–30 Hz band, 18 EEG
490
A. Ahilan et al.
channels were used for the signal processing. Using 120 hidden neurons and sine as the ELM’s activation function, the findings showed that the suggested technique worked with an accuracy of 84% when recognising ischaemic strokes. In 2020, Hassan et al. [12] had suggested developing a system that uses cuttingedge machine learning algorithms to recognise the amount of human attention. Using a BITalino EEG sensor board, 30 human volunteers paying different levels of attention had their EEG signals recorded. The accuracy of this suggested method for assessing a subject’s degree of attentiveness was almost 89%. In existing method, an EEG signal is used to detect the brain disease. However, it does not predict the disease in real time. To overcome the above issue we use EEG and MEG signals to find the disease immediately.
40.3 Proposed Method In this research, brain cancer and stroke diseases can be quickly found using EEG and MEG signals. An electroencephalogram (EEG) can monitor brain waves, which are representations of cerebral activity. EEG signals help to understanding the physiological and functional characteristics and brain activity. The bioelectrical activity of the brain is extensively documented in EEG and MEG (Fig. 40.1). The dataset used for this project are shared by Healthcare Innovation in Neuro Technology (HiNT). HiNT creates a wearable point-of-care monitoring system that can identify diseases in high-risk individuals. The dataset comprises of 40 patients with a history of ischaemic stroke and 40 healthy individuals. The mean age of patients is 72 with a 13.6 standard deviation, whereas the mean age of healthy people is 73 with a 7.1 standard deviation. Each subject’s EEG and MEG data were captured for between 15 and 4 h while being sampled at 256 Hz.
Fig. 40.1 Proposed machine learning model for brain disease detection
40 Machine Learning-Based Brain Disease Classification Using EEG …
491
40.3.1 CTC Conversion A single-trial channel data (STD), averaged channel data (ACT), and time–frequency data organisation of the channel data is required for the EEG/MEG signal projection from the channel to the source space (TFD). Therefore, single trials must be retrieved from the continuous channel data (CTC), whether or not final averaging or a time– frequency transformation is used. The EMEG suite’s second tab permits channellevel data preparation and pre-processing, which entails generating the ACT, STD, and TFD files from the CTC and PRT files. For epoch, averaging and time–frequency transformations, continuous EEG/MEG channel time-course (CTC) data and trigger data saved in protocol files (PRT) are integrated for this purpose.
40.3.2 Signal Pre-processing The EEG and MEG signals are pre-processed using discrete wavelet transform to adaptive time–frequency resolution of analysis on nonstationary signals. Different types of noise in the EEG signal degrade its quality and prevent it from being processed further. Therefore, to increase overall accuracy, it is required to remove these undesired signals. Wavelet transforms are swiftly computed using the discrete wavelet transform, which employs sub-band coding. It is easy to implement and requires less computation time and resources. In CW, the signal is examined using a collection of basic functions that are connected by straightforward scaling and translation parameters. A time scale representation of the digital signal is created by DWT using a digital filtering approach. The signal to be analysed is applied to filters with varying cutoff frequencies and scalars. The following is a generalised equation for a DWT signal: Z [c, d] =
α ∑
z(m)∅c,d (m)
(40.1)
m=−α
where Z(m) is the input signal to be analysed. ) ( m−d 1 ∅c,d (m) = √ ∅ α α
(40.2)
The discrete wavelet transform function performs single levels to reconstruct the data from the supplied coefficients.
492
A. Ahilan et al.
40.3.3 Fast Fourier Transform The EEG and MEG signals are given as an input to the Fast Fourier Transform to get subsets of typical in-class “invariant” coefficients from wavelet coefficients (time– frequency information). In this study, the Fast Fourier Transform was utilised to extract characteristics from the full wave’s alpha, beta theta and delta components. Gamma waves are disregarded since they are primarily noise. An input matrix of 14 × 40,500 was used in our research for a single subject expression. The input matrix consisted of 57 trials for each of the 19 subjects for a specific expression, such as a smile. This resulted in a size of 798 × 40,500 for the input matrix for a single statement. Considering all five expressions, the total input matrix was 3990 × 40,500. FFT was used to transform the input matrix into the output matrix. The elements of the resulting matrix are the frequency components of the discretised input signals used in computations. This matrix has statistical features applied column-bycolumn, meaning that each column represented a piece of data or variable to which the use of statistical features. In this research combined the features of mean, standard deviation, and entropy horizontally. Formulae (40.3) through (40.5), respectively, show the equations for these properties. The classification task was then given the final matrix as an input. Mean, ∑M j=1
z=
zj
(40.3)
M
Standard Deviation, /
∑M (
h=
j=1
Zj − z
)2
M −1
(40.4)
Entropy, e=−
∑
qz log qz
(40.5)
z
Here, the values of the data are z 'j h, the total number of values is M and the probability of the distribution is represented by qz in the entropy computation for each z.
40.3.4 Multi-class SVM The multi-class SVM is employed for classified as normal, cancer, and stroke using EEG and MEG signal. The characteristics of statistical learning theory and structural
40 Machine Learning-Based Brain Disease Classification Using EEG …
493
risk minimisation led to the one-versus-one implementation of multi-class SVM. The multi-class SVM typically consists of a number of binary SVM. Another alternative to creating multiple binary classifiers is to separate all classes in a single optimisation process and construct a decision function that includes M samples, generally with noise: ∑∑ 1 ∑( T ) vn vn + C ξ nj 2 n=1 j=1 n/= y k
Ψ(w, ξ ) =
M
(40.6)
j
where v represents the normal vector to the hyperplane of the SVM being considered ( ) ) ( v y Tj h j + d y j ≥ vnT h j + dn + 2 − ψ nj
(40.7)
ψ nj ≥ 0, for j = 1, . . . , M : n ∈ 1, . . . , k\y j The decision function is given by f (h). [( ] ) f (h) = arg max vnT h + dn , n = 1, . . . k n
(40.8)
The saddle point of the Lagrangian and the equation for the multi-class SVM can be used to determine the solution to this optimisation issue in dual variables: M ∑
( ) α j y j k h Tj h + dn ,
(40.9)
) ( ) ( k h Tj h = exp − γ ||h − y||2
(40.10)
f (h) =
j=1
The kernel function is shown here as k(…,) and M coefficients without regard to the number of classes (k). A weight constant obtained from the SVM procedure is α j and support vectors are h j . Furthermore, the regularisation immediately lowers the number of non-zero coefficients α.
40.4 Result and Discussion In this result analysis, the EEG and MEG signal from the selected datasets are classified as normal, brain cancer, and strokes.
494
A. Ahilan et al.
40.4.1 Performance Analysis The accuracy, specificity, and precision criteria were used to construct the performance analysis in this work. accuracy =
TP + TN TP + TN + FP + FN
(40.11)
Specificity =
TN TN + FP
(40.12)
Precision =
TP TP + FP
(40.13)
recall = ( f1 = 2
TP TP + FN
precision ∗ recall precision + recall
(40.14) ) (40.15)
False-positives, false-negatives, true-positives, and true-negatives are, respectively, denoted by the letters FP, FN, TP, and TF, respectively (Fig. 40.2). The proposed Dual signal classification method for three classes that is normal, brain cancer, and stroke. The proposed Dual signal classification method produced higher accuracy results for normal, brain cancer, and stroke of 0.925, 0.947, and 0.974.
Fig. 40.2 Performance metrics for three classes
40 Machine Learning-Based Brain Disease Classification Using EEG …
495
Fig. 40.3 Training and testing accuracy of proposed method
According to Figs. 40.3 and 40.4, the suggested model has achieved excellent accuracy in both training and testing, respectively. Accuracy, specificity, and precision are the three factors that determine performance, and the proposed model’s accuracy is 92.59%.
40.4.2 Comparative Analysis In this segment, a comparison is also done between the proposed model and the conventional machine learning networks. On the basis of a performance comparison with existing approaches, our strategy is more productive than those ways. Performance is assessed using accuracy, precision, and specificity. In this comparative analysis, the proposed model is compared to three existing machine learning methods (Fig. 40.5). Table 40.2 shows the outcomes in terms of the total accuracy rate. Table 40.1 show that the multi-class SVM outperforms standard networks like Random Forest, Decision tree, Navy bias, and KNN in terms of accuracy. The high accuracy ranges of 97.91% are preserved by multi-class SVM. The multi-class SVM’s accuracy rate is higher than that of the current models. Therefore, it is evident that the Dual signal classification model outperforms other methods. Table 40.3 shows that compared to Bootstrap models, SVM, XGBoost, and Fast Fourier Transform, the suggested Dual signal classification model improves overall accuracy by 4.61%, 0.16%, 15.77%, and 9.63%, respectively. According on the
496
A. Ahilan et al.
Fig. 40.4 Training and testing loss of proposed method
Fig. 40.5 Comparison of traditional machine learning models Table 40.1 Performance analysis of proposed method
Classes
Accuracy
Specificity
Precision
Normal
0.925
0.956
0.903
Brain cancer
0.947
0.893
0.913
Brain stroke
0.974
0.921
0.968
40 Machine Learning-Based Brain Disease Classification Using EEG … Table 40.2 Comparison between traditional machine learning networks
Table 40.3 Comparison between proposed and the existing models
497
Network
Accuracy
Specificity
Precision
Random forest
91.08
89.91
90.36
Decision tree
93.67
90.89
93.51
Navy bias
90.79
93.78
89.59
KNN
94.19
92.19
90.93
Multi-class SVM
97.91
95.16
97.26
Author
Methods
Accuracy (%)
Qureshi et al. [6]
Bootstrap models
95
Bera [7]
SVM
99.44
Sawan et al. [8]
XGBoost
83.89
Choi et al. [10]
Fast Fourier transform
90
Proposed
Dual signal classification
99.6
comparison above, the suggested Dual signal classification model is more accurate than the current models.
40.5 Conclusion In this paper, Dual signal classification model has been proposed to detect brain cancer and strokes in its early stages. A single-trial channel data (STD), averaged channel data (ACT), and time–frequency data organisation of the channel data is required for the EEG/MEG signal projection from the channel to the source space (TFD). The signals are pre-processed using discrete wavelet transform to adaptive time–frequency resolution of analysis on nonstationary signals. Then, the signals are given as an input to the Fast Fourier Transform to get subsets of typical inclass “invariant” coefficients from wavelet coefficients (time–frequency information). Finally, the multi-class SVM is employed for classifying normal, cancer, and stroke cases using EEG and MEG signals. In comparison to Bootstrap models, SVM, XGBoost, and Fast Fourier Transform, the proposed Dual signal classification model improves overall accuracy by 4.61%, 0.16%, 15.77%, and 9.63%, respectively. In the future, we think that our data collection should be expanded to include more data because it contained average numbers of data. By increasing the number of features in our model, we hope to achieve greater accuracy.
498
A. Ahilan et al.
References 1. Thanaraj, K.P., Parvathavarthini, B., Tanik, U.J., Rajinikanth, V., Kadry, S., Kamalanand, K.: Implementation of deep neural networks to classify EEG signals using gramian angular summation field for epilepsy diagnosis. arXiv preprint arXiv:2003.04534 (2020) 2. Hazra, D., Byun, Y.: Brain tumor detection using skull stripping and U-Net architecture. Int. J. Mach. Learn. Comput. 10(2), 400–405 (2020) 3. Wang, F., Tian, Y.C., Zhang, X., Hu, F.: Detecting disorders of consciousness in brain injuries from EEG connectivity through machine learning. IEEE Trans. Emerg. Top. Comput. Intell. (2020) 4. Savadkoohi, M., Oladunni, T., Thompson, L.: A machine learning approach to epileptic seizure prediction using electroencephalogram (EEG) signal. Biocybern. Biomed. Eng. 40(3), 1328– 1341 (2020) 5. Shaari, H., Kevri´c, J., Juki´c, S., Beši´c, L., Joki´c, D., Ahmed, N., Rajs, V.: Deep learning-based studies on pediatric brain tumors imaging: narrative review of techniques and challenges. Brain Sci. 11(6), 716 (2021) 6. Qureshi, A.A., Zhang, C., Zheng, R., Elmeligi, A.: Ischemic stroke detection using EEG signals. In: CASCON, pp. 301–308 (2018) 7. Bera, T.K.: A review on the medical applications of electroencephalography (EEG). In: 2021 Seventh International Conference on Bio Signals, Images, and Instrumentation (ICBSII), pp. 1– 6. IEEE (2021) 8. Sawan, A., Awad, M., Qasrawi, R.: Machine learning-based approach for stroke classification using electroencephalogram (EEG) signals (2022) 9. McDermott, B., Elahi, A., Santorelli, A., O’Halloran, M., Avery, J., Porter, E.: Multi-frequency symmetry difference electrical impedance tomography with machine learning for human stroke diagnosis. Physiol. Meas. 41(7), 075010 (2020) 10. Choi, Y.A., Park, S., Jun, J.A., Ho, C.M.B., Pyo, C.S., Lee, H., Yu, J.: Machine-learning-based elderly stroke monitoring system using electroencephalography vital signals. Appl. Sci. 11(4), 1761 (2021) 11. Adhi, H.A., Wijaya, S.K., Badri, C., Rezal, M.: Automatic detection of ischemic stroke based on scaling exponent electroencephalogram using extreme learning machine. J. Phys. Conf. Ser. 820(1), 012005. IOP Publishing (2017) 12. Hassan, R., Hasan, S., Hasan, M.J., Jamader, M.R., Eisenberg, D., Pias, T.: Human attention recognition with machine learning from brain-EEG signals. In: 2020 IEEE 2nd Eurasia Conference on Biomedical Engineering, Healthcare and Sustainability (ECBIOS), pp. 16–19. IEEE (2020)
Chapter 41
Performance Comparison of On-Chain and Off-Chain Data Storage Model Using Blockchain Technology E. Sweetline Priya
and R. Priya
Abstract In recent years, the blockchain technology has attracted a lot of attention. Blockchains’ rapid ascent to prominence is largely due to the openness and efficiency they provide. Businesses have begun blockchain technology. As the need for blockchain is growing day by day, it is necessary to evaluate the storage capacity of blockchains. There are two solutions to store data in native to blockchain namely onchain or off-chain storage. The choice of the storage is based on the scenario or application. Hence in this paper, the data storage in on-chain model and off-chain model is implemented for a real-time scenario “blood donation and transfusion system” using Hyperledger Fabric tool and the performance of the system for both the model is been analyzed. From the results it is found that, off-chain data storage model is faster (5.39% faster) to store data than the on-chain data storage model. Also, the offchain data storage model is slower (90.425% slower) than on-chain model to retrieve data from blockchain. By interpreting the results it is concluded that on-chain data storage is better than off-chain data storage model for the scenario “blood donation and transfusion system” that we have taken for comparison. On-chain data storage model is needed for applications with high trust on data as it is stored in blockchain. Off-chain data storage is not needed for text-based data.
41.1 Introduction With the advent of Bitcoin in 2008, the world was introduced to blockchain technology (BCT), which is currently anticipated to revolutionize society as a whole. Blockchain is a distributed, unchangeable ledger that makes it simpler to manage assets and record transactions in a corporate network. A blockchain network may E. S. Priya (B) · R. Priya VELS Institute of Science, Technology & Advanced Studies (VISTAS), Chennai, Tamilnadu, India e-mail: [email protected] R. Priya e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_41
499
500
E. S. Priya and R. Priya
be used to monitor orders, payments, accounts, production, and much more in a business. BCT is applicable to a wide range of industries besides business, including supply chain management, banking, government, retail, and law. Blockchain has been praised as a technology that will change the game and has more immediate applications. However, there are a number of issues with utilizing a blockchain to store and preserve business data. First, the volume of data that may be kept in a single block of a blockchain system may be constrained. Second, the cost of adding a transaction to a block can be too high. Third, while sharing data for calculation or verification may be required, the data itself may be private or should not be accessible to all parties. Therefore, based on the application and its usage, the data can be stored onchain or off-chain in blockchain technology. Hence in this paper, the following are performed such as • Storage of data in on-chain, i.e., storage on ledger of blockchain. • Storage of data in off-chain, i.e., storage on interplanetary file system (IPFS). • Performance comparison of on-chain and off-chain data storage size and retrieval time. The rest of the paper is organized as follows: • Section 41.2 explains the survey of literature such as blockchain technology in detail, the architecture of on-chain storage of data in blockchain and the architecture of off-chain storage in IPFS. • Section 41.3 explains the implementation and Comparison of On-Chain and OffChain Data Storage Model. • Section 41.4 explains the results and analysis. • Section 41.5 concludes the research work. • Section 41.6 highlights the future work.
41.2 Review of Literature In [1], IPFS-based storage model for blockchain is proposed and implemented. It is concluded in that paper, IPFS-based storage has better storage space and security. In [2], on-chain and off-chain storage for supply chain has be analyzed and it is conclude that on-chain storage is not applicable for supply chain for the scenario that they have taken. In [3], a novel scheme is proposed with some improvements on top of IPFS architecture. It is optimized for the large data storage scenario through the IPFS model. In [4], a distributed storage system IPFS is used to increase throughput and also to bypass the storing liabilities.
41 Performance Comparison of On-Chain and Off-Chain Data Storage …
501
In [5], introduction to blockchain technology, the storage needs of blockchain, on-chain and off-chain storage requirements, TPS calculation, performance of offchain storage with three examples, on-chain and off-chain storage management are briefly explained. In [6], a IPFS-based health care system is designed for data security and management. Also the processing time for various transactions is calculated and is tested with different devices in a blockchain network. IPFS storage is adopted for decentralized storage. Also the comparison of proposed system with already existing system is provided in detail. In the following section, the detailed explanation on blockchain technology and in-depth understanding of on-chain and off-chain data storage model of blockchain is provided.
41.2.1 Blockchain Technology By eliminating the need for a third party to validate transactions, distributed ledger technology like blockchain gives transactions more security. Distributed networks, encryption, and other technologies are all combined in BCT. Blockchain keeps track of all transactions or other digital occurrences in chronologically ordered blocks. The hash function is used to securely link blocks together in chain format hence the name “blockchain”. Blockchain uses a peer-to-peer network to store data and transactions in the ledger in a decentralized manner. Based on the users/participants of the blockchain network, and how they access the data on blockchain, it is categorized as follows: • Permissioned Blockchain: Private blockchain, also known as permissioned blockchain [7], is a system where private ledgers are deployed by many organizations under the control of a single entity. • Permissionless Blockchain [8]: Anyone may participate in the verification process on a public or permissioned blockchain without restriction. • Consortium Blockchain [9]: Consortium blockchain is a combination of permissioned and permissionless ledger. For instance, highly trustworthy information is maintained in a public ledger while sensitive information is retained in a private ledger. Any stakeholder may use the two types of ledgers to control access to information without depending on centralized governance.
41.2.2 On-Chain Storage of Data On-chain storage refers to the capacity of the blockchain itself to hold various data in the ledger [2]. This is crucial to take into account since all the participants of the system will need to have access to the stored data. In on-chain storage, the entire transaction is completed on the blockchain network. The transaction is then
502
E. S. Priya and R. Priya
Fig. 41.1 On-chain data storage architecture
confirmed and added to the public ledger of a blockchain network. Chain code or smart contracts are used for verification of values that is added to blockchain. Here is how it works: When a data is to be added to blockchain, the information such as data, timestamp of transaction, nonce, block number, hash of the data, hash of previous data are packaged and formed a block. This block is then transmitted to a connected blockchain network, where it waits to be verified by network nodes by means of smart contract and then added to the blockchain. Because transaction data is made available to the public and is continually updated and evaluated by the network of validators, on-chain data transactions offer a high level of security and transparency. However, due to the complexity of the procedure, processing each transaction and adding it to the blockchain takes some time. Below Fig. 41.1 explains how the data is stored in on-chain ledger. When the participants want to submit a data or file through a transaction, first they have to clear the smart contract which will validate the data. Only when the transaction is validated, the data is stored as blocks in on-chain ledger. If the data fails in validation, then data will not be stored in blockchain [10]. Hence in on-chain model, only the validated data can enter into blockchain ledger which in turn achieves data integrity. After the data storage, if the user wants to retrieve the data, the key is queried to blockchain ledger directly and the data is retrieved from on-chain. In on-chain, the data is stored either in CouchDB or LevelDB [11]. The most used state database for real-time applications is CouchDB.
41.2.3 Off-Chain Storage of Data Off-chain storage also known as interplanetary file transfer network (IPFS)-based data storage model of blockchain is a distributed data storage model [1]. The offblockchain storing of data can be in various formats. This is necessary when a party
41 Performance Comparison of On-Chain and Off-Chain Data Storage …
503
Fig. 41.2 Off-chain data storage architecture
wants to use the blockchain to verify information but does not necessarily want to make it publicly available. Additionally, when the amount of data recorded can exceed what the blockchain can handle. In off-chain storage, the data is directly stored in IPFS file system hence the data cannot be validated through smart contract. After the data is stored in IPFS, it in tern returns a hash value which will be stored in blockchain. Hence with off-chain data storage model, the data can be made secured not transparent to the participants. Figure 41.2 explain the off-chain data storage model. As in Fig. 41.2, the data or file to be stored is sent to IPFS directly by the user/ participant. IPFS storage system stores the data and generates a hash for the same. The hash value is sent to blockchain (on-chain storage) and is stored in the same. Similarly if the data to be retrieved, the hash value is first retrieved from blockchain ledger, then it is queried to IPFS system, then IPFS will return the data in desired format (such as text/json/image/pdf file) to the end user. Numbers in the figure denotes the sequential order of the flow of the transactions that happen.
41.3 Implementation and Comparison of On-Chain and Off-Chain Data Storage Model The implementation for on-chain and off-chain data storage is constructed utilizing the Hyperledger Fabric tool on a private blockchain network for the “Blood Donation and Transfusion System” use case. The system’s participants include patients, blood banks, hospitals, and donors. First blood donor’s details are stored, and then blood
504
E. S. Priya and R. Priya
Fig. 41.3 Sample data taken for on-chain and off-chain storage model
donation details are stored, and then blood testing details and finally blood transfusion details. Different transactions are triggered for getting each blood detail. Refer Fig. 41.3 for sample data for the above specified details. Dataset taken for implementation is a real-time data from a NGO blood bank from Virudhunagar District, Tamilnadu. Hence the sensitive data are hidden with read color. Also refer Fig. 41.4 for the algorithms implemented for storage and retrieval of blood details in on-chain and off-chain model.
41.4 Results and Analysis To store and retrieval of data in application, the communication is made through a software known as “Advanced Rest API”. Figure 41.5 shows the sample on-chain data storage where entire blood details are stored in json format in blockchain ledger.
41 Performance Comparison of On-Chain and Off-Chain Data Storage …
505
Algorithm 1 Storing blood details in on-chain ledger 1. Input: blood details. 2. Output: storage of blood details in on-chain ledger 3. for blood details do 4. validate blood details through smart contract 5. if validation is true then 6. store blood details in on-chain ledger 7. else 8. reject the transaction 9. end if 10. end for Algorithm 2 Retrieving blood details from on-chain 11. Input: blood data key. 12. Output: blood data json values from blockchain 13. for blood data key do 14. get blood data json value from blockchain 15. send to user 16. end for Algorithm 3 Storing blood details in off-chain 17. Input: blood details. 18. Output: storage of blood in IPFS and hash value in blockchain 19. for blood details do 20. send blood details to IPFS system & get hash data 21. send hash data to blockchain 22. end for Algorithm 4 Retrieving blood details from off-chain 23. Input: blood data key. 24. Output: blood data json values from IPFS 25. for blood data key do 26. get hash value from blockchain 27. get blood data json value from IPFS 28. send to user 29. end for
Fig. 41.4 Algorithms for on-chain and off-chain data storage model
Figure 41.6 displays the sample off-chain data storage where the hash value alone is stored in blockchain. Figure 41.7 shows a sample blood details which is stored in IPFS storage. An online cloud service called “Pinata Cloud” is used for IPFS storage in our implementation. The blood data is stored in IPFS storage and its hash value is stored in
506
E. S. Priya and R. Priya
Fig. 41.5 On-chain data storage sample screenshot
blockchain, i.e., in blockchain the blood id and hash value of the data alone is stored. Hence first the blood id is queried in blockchain ledger to get its corresponding hash value. Then again one more query is made with hash value to Pinata Cloud (IPFS storage) to get corresponding blood details. Figure 41.8 is the comparison of the data storage time, i.e., time taken for data storage into on-chain (blockchain ledger) and off-chain (IPFS storage) model. From Fig. 41.8, it is clear that, the time taken for storing data into on-chain model takes more time comparing to time taken for storing data into IPFS and hash into blockchain. This is because for on-chain storage, the data is first validated with smart contract and then gets stored where as in off-chain there is not data validation happens. Figure 41.9 is the comparison of time taken for data retrieval for on-chain and off-chain data storage model. From Fig. 41.9, it is clear that, the time taken for query the on-chain model takes less time comparing to time taken for retrieving hash from blockchain and the data from IPFS storage together. As we have used a public cloud (Pinata cloud), there is time delay in data retrieval. Hence, with the result it is understood that the cloud service speed plays a vital role in speed of the data retrieval. Tables 41.1 and 41.2 displays the time taken for the data storage and data retrieval methods in on-chain and off-chain data storage model.
41 Performance Comparison of On-Chain and Off-Chain Data Storage …
Fig. 41.6 Off-chain data storage sample screenshot
Fig. 41.7 IPFS data storage sample screenshot
507
508
E. S. Priya and R. Priya
Fig. 41.8 Time taken for data storage in on-chain and off-chain model
Fig. 41.9 Time taken for data retrieval in on-chain and off-chain model Table 41.1 Time taken for data storage methods in on-chain and off-chain model
Data storage method saveDonorRegistrationDetails
Time taken (milliseconds) On-chain
Off-chain
198.521
186.705
saveBloodDonationDetails
2.768
5.446
saveBloodTestingDetails
4.386
2.187
saveBloodTransfusionDetails Average time taken
2.472
2.587
52.03675
49.23125
41 Performance Comparison of On-Chain and Off-Chain Data Storage … Table 41.2 Time taken for data retrieval methods in on-chain and off-chain model
Data retrieval method getDonorDetails getdDonationDetails getTestingDetails getTransfusionDetails Average time taken
509
Time taken (milliseconds) On-chain
Off-chain
97.881
1096.151
93.656
1316.992
133.909
1009.396
97.571
995.524
105.75425
1104.51575
The average time taken for different data storage methods (saveDonorRegistrationDetails, saveBloodDonationDetails, saveBloodTestingDetails, saveBloodTransfusionDetails) in on-chain data storage model is 52.04 ms and off-chain data storage model is 49.23 ms. By applying the below formula (formula 41.1) on the results it is found that, off-chain data storage model is faster (5.39% faster) to store data than the on-chain data storage model. From results, we can conclude that there is no much difference in time for both on-chain and off-chain storage model in storing data to blockchain. % increase/decrease = (t1 − t2) ∗ 100 / t1
(41.1)
The average time taken for different data retrieval methods (getDonorDetails, getBloodDonationDetails, getBloodTestingDetails, getBloodTransfusionDetails) in on-chain data storage model is 105.75 ms and off-chain data storage model is 1104.52 ms. By applying formula 41.1 on results it is found that, the off-chain data storage model is slower (90.425% slower) than on-chain model to retrieve data from blockchain. From the results, it is very clear that, on-chain model performs better than off-chain model, as off-chain model is dependent on cloud storage speed to retrieve the data from it. From Fig. 41.10, it is very clear that, for both on-chain and off-chain storage of time data/json data, there is no difference in the size of the block. Both have same size, i.e., only 6 KB of data is occupied in blockchain. Hence no matter whether the storage model is on-chain or off-chain, the data storage size in blockchain is not changed for text data. From the implementation it is also observed that, in on-chain storage as smart contracts are implemented, the data that comes into the system will be verified and only valid data can enter into on-chain. This achieves the data integrity and the trust on the data. Also as smart contract verifies the data, it takes slightly more time for saving the data into on-chain comparing to off-chain storage. In addition, for the data retrieval from off-chain storage, the time taken is more, as it the data is retrieved from public cloud.
510
E. S. Priya and R. Priya
Fig. 41.10 Size of data in on-chain and off-chain model
41.5 Conclusion This paper compares the performance of on-chain and off-chain data storage model, practically implemented with a real-time use case “blood donation and transfusion system” along with real-time dataset. Also “Hyperledger Fabric” is the tool used for the implementation and a public cloud service named “Pinata Cloud” is used for IPFS storage. From the implementation it is concluded that, on-chain storage is much preferable for text-based storage. Also for maintaining data integrity, the transactions should go through smart contracts hence on-chain data storage model can be preferred to achieve data integrity. Also it is observed that there is no significant change in the size of the data block that is occupied by the data itself in on-chain model and the hash value of the data in off-chain model. It is also observed that, the on-chain data storage takes slightly longer time to store data comparing to off-chain data storage model as it goes through the smart contracts. But the data retrieval time for on-chain model is less than off-chain model.
41.6 Future Work In this work, the performance comparison of on-chain data storage model and offchain data storage model is been done using real-time dataset of blood bank system. However, the comparison is done with text data. In the next work, we plan to compare the same with image data and check the performance in both models.
41 Performance Comparison of On-Chain and Off-Chain Data Storage …
511
References 1. Zheng, Q., Li, Y., Chen, P., Dong, X.: An innovative IPFS-based storage model for blockchain. In: Proceedings—2018 IEEE/WIC/ACM International Conference Web Intelligence WI 2018, pp. 704–708 (2019). https://doi.org/10.1109/WI.2018.000-8 2. Hepp, T., Sharinghousen, M., Ehret, P., Schoenhals, A., Gipp, B.: On-chain vs. off-chain storage for supply-and blockchain integration. IT Inf. Technol. 60(5), 283–291 (2021). https://doi.org/ 10.1515/itit-2018-0019 3. Chen, Y., Li, H., Li, K., Zhang, J.: An improved P2P file system scheme based on IPFS and blockchain. In: Proceedings 2017 IEEE International Conference Big Data (Big Data), vol. 2018-Janua, pp. 2652–2657 (2017). https://doi.org/10.1109/BigData.2017.8258226 4. Sohan, S.H., Mahmud, M., Sikder, M.A.B., Hossain, F.S., Hasan, R.: Increasing throughput and reducing storage bloating problem using IPFS and dual-blockchain method. In: International Conference on Robotics, Electrical and Signal Processing Techniques, pp. 732–736 (2021). https://doi.org/10.1109/ICREST51555.2021.9331254 5. IBM.: Storage needs for blockchain technology—point of view, p. 22 (2018). [Online]. Available: https://www.ibm.com/downloads/cas/LA8XBQGR 6. Azbeg, K., Ouchetto, O., Jai Andaloussi, S.: BlockMedCare: a healthcare system based on IoT, Blockchain and IPFS for data management security. Egypt. Inf. J. 23(2), 329–343 (2022). https://doi.org/10.1016/j.eij.2022.02.004 7. Priya, E.S., Priya, R., Surendiran, R.: Implementation of trust-based blood donation and transfusion system using blockchain technology. Int. J. Eng. Trends Technol. 70(8), 104–117 (2022). https://doi.org/10.14445/22315381/IJETT-V70I8P210 8. Mu, Y., Rezaeibagha, F., Huang, K.: Policy-driven blockchain and its applications for transport systems. IEEE Trans. Serv. Comput. 13(2), 230–240 (2020). https://doi.org/10.1109/TSC.2019. 2947892 9. Zahed Benisi, N., Aminian, M., Javadi, B.: Blockchain-based decentralized storage networks: a survey. J. Netw. Comput. Appl. 162, 102656 (2020). https://doi.org/10.1016/j.jnca.2020. 102656 10. Zebpay.: On-chain vs off-chain: is one better than the other?” https://zebpay.com/in/blog/onchain-vs-off-chain 11. Abbas, K., Afaq, M., Khan, T.A., Song, W.: A blockchain and machine learning-based drug supply chain management and recommendation system for smart pharmaceutical industry, pp. 1–31 (2020). https://doi.org/10.3390/electronics9050852
Chapter 42
Performance Analysis of Skin Cancer Diagnosis Model Using Deep Learning Algorithm with and Without Segmentation Techniques A. Bindhu and K. K. Thanammal
Abstract Skin cancer is currently an increasing disease all over the world, and it is responsible for a number of deaths. The skin disease is caused by the stretched exposure to harmful energies from the sun. Skin cancer disease widely split into two different categories are melanocytic and non-melanocytic. Skin cancer begins in one organ and gradually spreads to other regions of the body, eventually killing the person. A visual inquiry performed by a specialist dermatologist using a set of specific clinical equipment is the most basic technique to detect the threat of skin. Early-stage detection is important for controlling the spread of a tumor throughout the body. However, existing algorithms of Skin cancer severity conditions still have some drawbacks such as analysis of skin lesions is not insignificant, slightly worse than that of dermatologists, costly and time-consuming. Various machine learning algorithms has been used to detect many diseases employing but the condition of the disease diagnosis is more complex while detect the disease. To impact the performance level of more advanced approaches in thermoscopic images of skin cancer. Without segmentation process the time consuming is reduced and give better performance. Both segmentation and without segmentation techniques are analyzed and the performance metrics are observed.
42.1 Introduction Skin cancer is one of the foremost diseases spreading in all over the world, frequency are risen dramatically in previous period, to the point that it has reached epidemic proportions [1]. Skin cancer are widely differentiated in two kinds Melanoma and A. Bindhu (B) · K. K. Thanammal Research Scholar, Department of Computer Science & Research, S.T. Hindu College, Nagercoil, Tamilnadu, India e-mail: [email protected] S.T. Hindu College, Affiliated to Manonmanium Sundaranar University, Nagercoil, Tamilnadu, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_42
513
514
A. Bindhu and K. K. Thanammal
non-melanoma. Melanoma is a disease it termed as Malignant Melanoma. In nineteenth century, melanoma is very dangerous disease it is commonly spread in both men and women. Skin cancer is one of the type of lethal [2]. It is very critical to correctly detect various types of skin cancers because this will lead to the best treatment options [3]. Chemical carcinogens, Ultraviolet light, and Genetic predisposition, including such as mineral oils, tar, and chemical fertilizers, and lead; chronic inflammation, gender, age, immunosuppression, and cigarettes are all reasons of skin cancer [4]. One of the major disadvantages in skin cancer disease is continuously caused by sunlight, which accounts for 90% of all occurrences. The most significant risk factor is exposure to sunlight. Mostly avoided in connection with sunlight, it can reduce the exposure to sunlight in dangerous disease of skin cancer, which is a most suitable factor in sunlight, and by preventing the dangerous growing effects of sunlight. As a result, promoting protective behaviors in the prevention of skin cancer is critical [5]. Multiple non-invasive measures have been developed, explore skin cancer symptoms as well as severity of skin cancer like melanoma or non-melanoma [6]. Because of their generality, deep learning architectures may be used to solve a wide range of classification problems. They’re becoming more popular as a strategy for multilabel classification, especially in the medical field [7]. The algorithm may effectively identify cancer, recommend primary treatment possibilities, provide multi-class classification for 134 disorders, and help doctors to perform better [8]. The large amount and variety of data are mainly associated with recent advances in deep learning models. Large amounts of data are essential to improve the performance of machine learning models [9]. The first step in image analysis is image segmentation, it divides an image into sub-regions that isn’t overlapping and are associated [2]. The application of various regions of computer vision explores the outcomes, like as content-based retrieval; object recognition; and object detection, are increased reliant on the efficacy of image segmentation [10]. But, segment the object using some methods from a difficult coarse image is still complex. In few years, widespread segmentation approaches including such as markov random field-based, thresholding-based, histogram-based, contour detection-based, graph-based, and fuzzy rule-based, pixel clustering-based, texture-based, and principal component analysis-based [11]. These approaches, Image thresholding algorithms based on histograms are simple to use and have been successfully used in a variety of applications.
42.2 Literature Review Recently, Recently, many different methods have been used to diagnose skin cancer. In this article, some of the existing skin cancer prediction methods used in deep learning techniques are mentioned below.
42 Performance Analysis of Skin Cancer Diagnosis Model Using Deep …
515
Mittal and Saraswat [12] had developed to improve the productivity of multilevel image threshold segmentation, a two-dimensional (2D) histogram-based segmentation algorithm was developed. The image is separated into several non-overlapping regions using multilevel image threshold segmentation. Kadampur and Al Riyaee [9] had proposed a models for skin cancer detection has determined the cloud in the diagram for classifying the images of dermal cell. The model support to predict the skin cancer with high accuracy are constructed using the architecture in cloud that controls the technique of deep learning in its important implementations. Yet, the solution of non-parametric it does not rely on the constraint data in common distribution method. Huang et al. [13] had suggested one of the most dangerous type of skin cancer is melanoma and increase on the frequency. To improve the patient survival rate, it is critical to receive a diagnosis as quickly as possible. In medical image analysis, skin lesion segmentation is a complex problem. The method of object scale-oriented fully convolutional networks (OSO–FCNs) for skin cancer segmentation this technique was developed to overcome the problem. In training dataset, it was learned from the scale of the lesions, had an important impact on the segmentation outcomes of the lesions in the testing phase, leading to the development of an object scale-oriented (OSO) training method. Ashraf et al. [14] had developed an efficient technique for skin cancer classification. In human body skin’s, the skin cancer can be caused by the growth and spread of abnormal cells. Our technique’s categorization outcomes are compared to those of state-of-the-art methods, despite the fact that used a smaller number of factors/ feature vectors. In this result section, our suggested method has a highest accuracy rate of 93.29%. Maron et al. [15] has suggested an experiment on the strength of neural networks in skin cancer classification. The study’s purpose was introducing a dermoscopic skin cancer standard that could be used to measure classifier resilience in the face of OOD data. The performance of our classifier validates the flaws of the CNN and serves as a benchmark. Generally, this benchmark will aid in the creation of more effective skin cancer classifications by facilitating a more comprehensive screening process. Based on the paper revealed above, some of the key challenges of skin cancer severity prediction models are that non-parametric solutions are not limited by the assumption that the information follows the analysis. normal distribution [12], slightly worse than the dermatologist distribution [9], correct analysis of skin lesions is not negligible [14], and the test suite is demanding and still not available in dermatology [15].
516
A. Bindhu and K. K. Thanammal
42.3 Various Methods for Skin Cancer Prediction Skin cancer is major difficult disease in most of the countries, and early detection has been found to significantly minimize the chances of death in the enormous majority of cases. Dermatologist-level accuracy has already been obtained using deep learning algorithms for skin cancer diagnosis. However, there appears to be a gap between the performance of these models and their use in real-world clinical situations, due to difficulties in interpreting the decisions made by deep learning algorithms. Therefore, the interpretation model’s internal mechanisms for the final diagnosis should be one of the next important directions in this field of research. Because human health is a concern in medical applications, there is a particular demand for models that can demonstrate their mechanism in a way that physicians can comprehend. It’s no surprise that this specific branch of research has recently gained traction under the designation of explainable AI (XAI). The hidden activations of neural networks can be “looked into” to explain the prediction mechanism. This is comparable to neuroscience, in which researchers study the activity of neurons in the brain to better understand the system: It’s a more difficult task since, given our brain’s utmost complexity, it’s similar to a deep learning model. In the skin cancer prediction widely split into two categories with segmentation and without segmentation are briefly explained in the below section.
42.3.1 Skin Cancer Prediction with Segmentation Image segmentation is a technique for breaking a digital image into many subgroups. It is defined as image segments that assist in reducing the complexity of an image in order to make image processing and analysis easier. In easy words, segmentation is defining the labels to pixels. Local Binary Fitting Active Contour (LBF), Thresholding and clustering method used for segmented process. In with segmentation process three types of techniques and it related diagram is briefly determined in the upcoming section. Local Binary Fitting Active Contour (LBF). Active contour is an image segmentation approach that uses energy forces and restrictions to separate the pixels of interest from the rest of the image so that it may be processed and analyzed further. For the segmentation process, an active contour is referred to as an active model. Contours are patterns that define the area of interest in an image. A contour is a grouping of points that has been approximated. Polynomial interpolation can be linear or splines, and it describes the curve in the image. Various active contour models are employed in the segmentation method in image processing. The segment images are correct with intensity inhomogeneity, a modified version defined as local binary fitting (LBF) model was utilized. The model’s contribution was that it represented local intensity information using two spatially changing fitting
42 Performance Analysis of Skin Cancer Diagnosis Model Using Deep …
517
functions. When the segmenting images through intensity inhomogeneity, the variable fitting function is more capable of representing the local information quality, which is important. The following is the definition of the energy function: Local binary fitting active contour performance is calculated using Eq. (42.1), [16] εxLBF (C, f 1 (x), f 2 (x)) 2 = λ1 K (x − y) I (y) − f 1 (x)| dy + λ2 K (x − y) I (y) − f 2 (x)|2 dy, in(c) out(c) (42.1) where, x ∈ is considered as pixel in the image domain ; λ1 and λ2 are determined as positive constants; K (x − y) Localization property of a Gaussian Kernel function, if the value of K |x − y| increase, the value of K (x − y) approaches zero; f 1 (x) and f 2 (x), correspondingly, are two separate fitting functions that approximate intensity levels within, beyond the point x region. The diagram of local binary fitting is shown as Fig. 42.1. Thresholding Thresholding is one of the most fundamental picture segmentation techniques. It’s quick, simple to use, and understand. It works by converting a scalar image into a binary image by calculating a threshold value based on the intensity values of the image. When comparing pixel intensity levels, a threshold value is used. A value of 1 is assigned to pixels with an intensity equal to or greater than the threshold value,
Input Image
Final Contour
Smoothing using anisotropic diffusion
Active Contour Method
Smoothed Image
Fig. 42.1 Local binary fitting
Region growing method (quadtree)
Initial Contour
518
A. Bindhu and K. K. Thanammal
while a value of 0 is assigned to pixels of lower intensity, distinguishing between foreground (white pixels) and background (black pixels). Determine the scalar image that is most creative, f (i, j ). A beginning stage depending on the intensity levels, T is chosen. Split the image into two sets H1 and H2 where H1 and H2 consist of a group of pixels that are lighter or darker than the threshold value. In addition, the mean intensities h 1 and h 2 of H1 and H2 respectively. The new threshold value calculated based on the evaluation result is 2 T1 = h 1 +h . If T − T1 ≥ T , (a predefined parameter), then the above technique 2 is repeated otherwise the binary image h(i, j ) obtained is, Threshold technique is calculated using using Eq. (42.2), [17]. h(i, j ) = {1i f h(i, j) ≥ TF 0 if h(i, j) < TF
(42.2)
where TF is the final threshold value. This method of determining a threshold is iterative (Fig. 42.2).
Convert Gridded RGB to Gray Scale Image
Apply Logarithmic Transformation
Apply Global Thresholding
Get the Local Thresholding for each Compartment
Combining the Two Thresholded Images
Fill Pinholes to Solidity Spots
end
Fig. 42.2 Flow chart diagram of threshold
42 Performance Analysis of Skin Cancer Diagnosis Model Using Deep …
519
Select next seed pixcel
Neighbors are 8 neighbors of seed pixel Satisfy pixels are assigned to the region and are the new neighbors
Compare neighbors to the seed pixel with homogeneity criterion
If any neighbor pixels compared satisfy the homogeneity condition
TRUE
FALSE
Fig. 42.3 Flow chart diagram of region growing method
Region Growing Method. The homogeneity idea, which refers to pixels with similar attributes clustering together to produce a homogeneous region, is supported by the region-based technique. The RGM technique is categorized into three types based on the principle of region growing is Split and merge, Region merging and Region splitting. The region-growing technique algorithm has three components: seed point selection, growth principle, and termination circumstances (Fig. 42.3). The selection of seed points is an important phase in the area-growing process, and it necessitates because it may be selected by the user depending on a variety of parameters, it is a human–computer interaction approach that settles on overall segmentation using the region growing method. Although, the growth principle indicates that the pixel value of pixels in close proximity is smaller than the threshold. Finally, the termination requirements will be prolonged until no pixels are capable to satisfy the principle of growth step’s requirement. Thus, The Region growing methodology begins with seeds and grows with surrounding homogeneous materials. The fundamental purpose of region growth is to map individual pixels (seeds) in an input image to a region database of images [18]. The diagrammatic representation of the flow chart in region growing method is given as follows:
42.3.2 Skin Cancer Prediction Without Segmentation Skin cancers account for one-third of all cancers diagnosed worldwide. Over the last few decades, the incidence of skin cancer has increased. Dermoscopy has improved
520
A. Bindhu and K. K. Thanammal
the capacity to diagnose skin cancer in recent years. Skin cancer prediction contain three major steps such as pre-processing, segmentation, and classification. When the image is noisy or has intensity changes, segmentation is unable to differentiate the shading of real images, and it is both power and time consuming. So, the segmentation process was avoided due to this issue. The processed image directly given as input of the classifiers. This work contain three various classifier are used for comparative analysis. Mobile Net The depth wise separable convolution design supports the MobileNet network architecture. It is the most extreme example of the inception module, in which each channel receives its own spatial convolution (depth wise convolutions). Then pointwise convolutions are utilized, which are 1 × 1 convolutions. Separating depth wise and pointwise convolution enhances computing efficiency on the one hand. After, the accuracy improves when the cross channel and spatial correlations mapping are learned separately. The schematic diagram of the MobileNet is shown in Fig. 42.4. ResNeXt101
Fig. 42.4 Architecture of mobile net classifier
Dense Units =16 Act.=relu
Dense Units =3
Softmax
MobileNet (Conv Layers) α =0.25
Global average pooling
Copy
64 MPF (copied to 3 input channels)
64 MPF Frames
The concept of residual connections is presented as a solution to the saturation and precision degradation problems with which ResNet has improved the depth of the network. ResNet is divided into several different forms including ResNet-101, ResNet-50, and ResNet-152. ResNeXt101 architecture’s residual learning framework facilitates the formation of deeper networks and reformatting layers to learn residual functions from class input. Because of the increased depth, the ResNeXt101 model conforms more easily and provides for improved accuracy. In ResNeXt101, a change to improve the performance includes a dropout, Softmax layers with seven outputs and a dense layer with’relu’ activation. On 8912 images, the enhanced ResNeXt101 is fine-tuned with a learning rate of 0.0001 and an SGD optimizer with a momentum of 0.9 (For
Block 3
Convolution [1×1, 256, 3×3, 256 3×3, 512] G=32 ×4
Convolution [1×1, 512, 3×3, 512 3×3, 1024] G=32 ×23
v1,v2,….v64
Flatten
dE(u,v)= ∑i=o n (ui-vi)2 Block 4 Convolution [1×1, 1024, 3×3, 1024 3×3, 1024] G=2048 ×3
Block 2
Convolution [1×1, 128, 3×3, 128 3×3, 256] G=32 ×3
Max pooling (3×3), stride 2
Convolution 7×7,64,stride 2
Block 1
Clobal Average Pooling (64)
Block 4
Siamese Network (Same model and weights) ResNeXt101
521
u1,u2,….u64
Block 3
Convolution [1×1, 1024, 3×3, 1024 3×3, 1024] G=2048 ×3
Convolution [1×1, 512, 3×3, 512 3×3, 1024] G=32 ×23
Convolution [1×1, 256, 3×3, 256 3×3, 512] G=32 ×4
Block 2
Flatten
Block 1
Clobal Average Pooling (64)
ResNeXt101
Convolution [1×1, 128, 3×3, 128 3×3, 256] G=32 ×3
Max pooling (3×3), stride 2
Convolution 7×7,64,stride 2
42 Performance Analysis of Skin Cancer Diagnosis Model Using Deep …
Fig. 42.5 Architecture diagram of ResNeXt101
30 epochs). Figure 42.5 shows the architecture diagram of ResNeXt101. In this architecture diagram briefly explains about the ResNeXt101 it shows in Fig. 42.5. Multi-Model Ensemble Based on Deep Learning In, cancer prediction, there are various categorization models available, but none of them is completely reliable, and every method may making wrongs in another areas. Multiple classification approaches placed together may increase performance over individual models. Multi-model ensemble is an approach that combines the predictions of multiple models into a second-stage learning model. To provide a final set of predictions, the second-stage model is trained to optimally combine forecasts from the first-stage models. To stack the many classifiers, use deep learning as the ensemble model. Neural networks are utilized in a range of applications and are based on the brain works. By combining the input variables, to produce an output, a neural network is trained. It can be learn, consider a nonlinear function approximate given a set of characteristics and an objective, with one or more nonlinear layers (called hidden layers) between the input and output layers. Deep learning has a type machine learning that uses deep neural networks with more hierarchical hidden layers of nonlinear information processing, to identify complex patterns from undirected high-dimensional raw data. In a deep neural network, the no. of layers is denoted as n l and layer l as L l , so layer L 1 is the input layer and layer L nl is the output layer. Also, let sl represent number 1 the 2 w = w , w , . . . wnl of neurons in layer l. The neural network has the parameters 1 2 l nl and b = b , b , . . . b , where wi j , j = 1,2,…, sl−1 , i = 1,2,…, sl , l = 2, 3, . . . n l , signifies the weight of the connection between the two units j in layer l − 1 and
522
A. Bindhu and K. K. Thanammal
unit i in layer l, and bil , i = 1,2,…sl , l = 2,3, . . . nl , representing the bias of unit i in layer l. Suppose that have a training set x 1 , y 1 , x 2 , y 2 , . . . , (x m , y m ) of m samples, using the SGD to train the neural network. Define the cost function (the above-mentioned objective function) as follows, Multi model ensemble technique is calculated using Eqs. (42.3 and 42.4), (Fig. 42.6). sl−1 sl nl m 2 λ 1 Wl J W, b; x i , y i + m i=1 2 l=2 j=1 i=1 i j
sl−1 sl nl m 2 λ 1 1 ||h w,b x i − y i ||2 + = Wl m i=1 2 2 l=2 j=1 i=1 i j
J (w, b) =
(42.3)
The first term is the original mean squared error term, while the second term is the prescriptive term that prevents overfitting by limiting the weighting magnitude, and λ is the parameter controlling the weight loss. number of the relative importance of the two components. The nonlinear hypothesis h w,b (x) of a well-defined neural network is, h W,b (x) = f W T x + b
(42.4)
The activation function is represented by f : R → R.
D
Run 1
Test outcome 1
Run 2
Test outcome 2 Test outcome
Run 3
Test outcome 3
Run 4
Test outcome 4
Fig. 42.6 Multimodal based on deep learning
42 Performance Analysis of Skin Cancer Diagnosis Model Using Deep …
523
42.4 Result and Discussion Ensure the image segmentation profile offers the correct response for evaluating the lesion’s shape and fidelity. Skin cancer image segmentation is commonly utilized for clinical analysis in radiation. Any excessive containment can expose a healthy substrate to radiation, while any unqualified detection will leave the melanoma tumour area untreated. When a significant local deviation occurs during the image segmentation process, it does not always occupy a large volume, but it does result in larger structure differences. The results of the experiments in this part show the feasibility and effectiveness of the various strategies. Comparative performance analysis of various methods using segmentation techniques and without segmentation conditions done using MATLAB are briefly explained in the below section. With segmentation. In this part, three types of techniques are used for comparison study are Local Binary Fitting Active Contour (LBF), Thresholding, and Region Growing Method (RGM). In comparison to currently-used techniques, these three methods are more advanced image segmentation algorithms, with the automatic segmentation method has the potential to replace manual segmentation paradigms that are subjective, time-consuming, and expensive. The comparative analysis between the techniques states that the threshold is the best classification technique compared to other. Figure 42.7 shows the sample segmented images using three techniques Local Binary Fitting Active Contour (LBF) segmentation technique the segmentation results of 3 representative test images are segmented by the equation. Threshold Segmentation techniques and Region Growing Method techniques. These techniques are used to segment the required section part in the input skin image. Figure 42.8a represents the accuracy with segmentation. The graph is plotted in different techniques on X-axis and obtained accuracy value on Y-axis respectively. The comparison of the accuracy value for LBF, Threshold and RGM techniques are 94, 95.8, and 95.1 respectively. This Comparison shows that the threshold technique give better performance compared to other segmentation techniques. Figure 42.8b represents the specificity with segmentation. The graph is plotted in different techniques on X-axis and obtained accuracy value on Y-axis respectively. The comparison of the accuracy value for LBF, Threshold and RGM techniques are 96.7, 96.4, and 95.3 respectively. This Comparison shows that the LBF technique give better performance compared to other segmentation techniques. Figure 42.9 shows the sensitivity with segmentation. The graph is plotted in different techniques on X-axis and obtained accuracy value on Y-axis respectively. The comparison of the accuracy value for LBF, Thresholding and RGM techniques are 85.2, 92.3, and 91.3 respectively. This Comparison shows that the Threshold technique give better performance compared to other segmentation techniques.
524
Fig. 42.7 Segmented images
Fig. 42.8 a Specificity and b accuracy of with segmentation
A. Bindhu and K. K. Thanammal
42 Performance Analysis of Skin Cancer Diagnosis Model Using Deep …
525
Fig. 42.9 Sensitivity of with segmentation
Without Segmentation In this section, the segmentation involves three types of techniques are used for comparison study are MobileNet, ResNeXt101, and Multi-Model Ensemble. The performance metrics used for the comparison are accuracy, precision, and recall. Figure 42.10a represents the accuracy in without segmentation. The graph is plotted in different techniques on X-axis and obtained accuracy value on Y-axis respectively. The comparison of the accuracy value for MobileNet, ResNeXt101 and Multi-model ensemble techniques are 95.9, 83.5, and 91.5 respectively. This Comparison shows that the MobileNet technique give better performance compared to other segmentation techniques. Figure 42.10b represents the precision in without segmentation. The graph is plotted in different techniques on X-axis and obtained accuracy value on Y-axis respectively. The comparison of the accuracy value for MobileNet, ResNeXt101 and Multi-model ensemble techniques are 97.2, 84.2 and 86 respectively. This Comparison shows that the MobileNet technique give better performance compared to other segmentation techniques. Figure 42.11 shows the recall in without segmentation. The graph is plotted in different techniques on X-axis and obtained accuracy value on Y-axis respectively.
Fig. 42.10 a Accuracy and b precision of without segmentation
526
A. Bindhu and K. K. Thanammal
Fig. 42.11 Recall of without segmentation
The comparison of the accuracy value for MobileNet, ResNeXt101 and Multimodel ensemble techniques are 97.2, 84.5, and 86.3 respectively. This Comparison shows that the MobileNet technique give better performance compared to other segmentation techniques.
42.5 Conclusion This manuscript presents a computer-aided approach for the detection of skin disease. In comparative analysis, Local Binary Fitting Active Contour (LBF), Thresholding, and Region Growing Method (RGM) techniques are used to segment medical images in the detection of melanoma skin cancer diseases by processing skin lesion images. For, without segmentation MobileNet, ResNeXt101, and Multi-Model Ensemble these three techniques are used to skin cancer analysis. Further, the performance of skin cancer prediction method with and without segmentation validated and analyzed. Comparative analysis, for segment the accuracy value of LBF, Thresholding and RGM value is 95.8, 96.7 and 92.3% among these methods the threshold value provides better performance. Next, without segmentation the accuracy value of MobileNet, ResNeXt101 and Multi-Model Ensemble the value is 95.9, 97.2, and 86.3% among these methods the MobileNet give better outcome. Therefore, the comparative analysis can be a good alternative for improving the existing skin cancer prediction methods. In future work, these techniques will further be investigated for other image processing applications. Acknowledgements The authors would like to thank the reviewers for all of their careful, constructive and insightful comments in relation to this work.
42 Performance Analysis of Skin Cancer Diagnosis Model Using Deep …
527
References 1. Nahar, V.K., Ford, M.A., Hallam, J.S., Bass, M.A., Hutcheson, A., Vice, M.A.: Skin cancer knowledge, beliefs, self-efficacy and preventative behaviors among North Mississippi Landscapers. Dermatol Res. Pract. 2013, 1–7 (2013) 2. Nahata, H., Singh, S.P.: Deep learning solutions for skin cancer detection and diagnosis. In: Machine Learning with Health Care Perspective, pp. 159–182. Springer, Cham (2020) 3. Mijwil, M.M.: Skin cancer disease images classification using deep learning solutions. Multimedia Tools Appl. 1–17 (2021) 4. Jeihooni, A.K., Moradi, M.: The effect of educational intervention based on PRECEDE model on promoting skin cancer preventive behaviors in high school students. J. Cancer Educ. 34(4), 796–802 (2019) 5. Jeihooni, A.K., Rakhshani, T.: the effect of educational intervention based on health belief model and social support on promoting skin cancer preventive behaviors in a sample of Iranian farmers of Cancer Education 34(2), 392–401 (2019) 6. Mohapatra, S., Abhishek, N.V.S., Bardhan, D, Ghosh, A.A., Mohanty, S.: Skin cancer classification using convolution neural networks. In: Advances in Distributed Computing and Machine Learning, pp. 433–442. Springer, Singapore (2021) 7. Maxwell, A., Li, R., Yang, B., Weng, H., Aihua, O., Hong, H., Zhou, Z., Gong, P., Zhang, C.: Deep learning architectures for multi-label classification of intelligent health risk prediction. BMC Bioinf. 18(14), 121–131 (2017) 8. Han, S.S., Park, I., Chang, S.E., Lim, W., Kim, M.S., Park, G.H., Chae, J.B., Huh, C.H., Na, J.I.: Augmented intelligence dermatology: deep neural networks empower medical professionals in diagnosing skin cancer and predicting treatment options for 134 skin disorders. J. Investig. Dermatol. 140(9), 1753–1761 (2020) 9. Kadampur, M.A., Al Riyaee, S.: Skin cancer detection: applying a deep learning-based model driven architecture in the cloud for classifying dermal cell images. Inf. Med. Unlocked 18, 100282 (2020) 10. Pacheco, A.G.C., Krohling, R.A.: The impact of patient clinical information on automated skin cancer detection. Comput. Biol. Med. 116, 103545 (2020) 11. Höhn, J., Hekler, A., Krieghoff-Henning, E., Kather, J.N., Utikal, J.S., Meier, F., Gellrich, F.F. et al.: Integrating patient data into skin cancer classification using convolutional neural networks: systematic review 23(7), e20708 (2021) 12. Mittal, H., Saraswat, M.: An optimum multi-level image thresholding segmentation using nonlocal means 2D histogram and exponential Kbest gravitational search algorithm. Eng. Appl. Artif. Intell. 71, 226–235 (2018) 13. Huang, L., Zhao, Y.G., Yang, T.J.: Skin lesion segmentation using object scale-oriented fully convolutional neural networks. SIViP 13(3), 431–438 (2019) 14. Ashraf, R., Kiran, I., Mahmood, T., Butt, A.U.R., Razzaq, N., Farooq, Z.: An efficient technique for skin cancer classification using deep learning. In: 2020 IEEE 23rd International Multitopic Conference (INMIC), pp. 1–5. IEEE (2020) 15. Maron, R.C., Schlager, J.G., Haggenmüller, S., von Kalle, C., Utikal, J.S., Meier, F., Gellrich, F.F., Hobelsberger, S., Hauschild, A., French, L., Heinzerling, L.: A benchmark for neural network robustness in skin cancer classification. Eur. J. Cancer 155, 191–199 (2021) 16. Yu, H., He, F., Pan, Y.: A novel region-based active contour model via local patch similarity measure for image segmentation. Multimedia Tools Appl. 77(18), 24097–24119 (2018) 17. Wadhwa, A., Bhardwaj, A., Verma, V.S.: A review on brain tumor segmentation of MRI images. Magn. Reson. Imaging 61, 247–259 (2019) 18. Punitha, S., Amuthan, A., Joseph, K.S.: Benign and malignant breast cancer segmentation using optimized region growing technique. Future Comput. Inf. J. 3(2), 348–358 (2018)
Chapter 43
Security for Software Defined Vehicular Networks P. Golda Jeyasheeli, J. Deepika, and R. R. Sathya
Abstract Vehicular Ad-hoc Network (VANET) comprises of moving and stationary vehicles connected by a wireless network. VANET plays an important role in safe driving, emergency and navigation. It is a part of the Intelligent Transport System. There are two main communications components in VANET which are Vehicle-toVehicle communication (V2V) and Vehicle-to-Infrastructure (V2I) communication. Vehicle-to-Vehicle communication (V2V) is communication between the vehicles in the network and Vehicle-to-Infrastructure communication (V2I) is communication between vehicles and the roadside framework. The main components of the VANET include Roadside Unit (RSU), On-Board Unit (OBU) and a Trusted Authority (TA). The RSU is the fixed unit that sends and receives information from the Trusted Authority (TA) and On-Board Unit (OBU). Therefore, it acts as the communication interface between the vehicles and the Trusted Authority. VANET include nodes that are highly mobile and they have dynamic topology. The devices share sensitive information between them. VANETs are vulnerable to different types of security attacks. Hence secure communication must be enabled. The security is provided by a message encryption mechanism and an Intrusion Detection System (IDS). Messages are encrypted and decrypted using ECC. An Intrusion Detection System is proposed in this work that uses Support Vector Machine (SVM). The classifier is trained using the NSL-KDD dataset. This IDS can be placed in the controller connected with the RSUs, where necessary actions can be taken based if attack a are detected. The performance of the classifier is evaluated for different splits of the dataset and a comparative analysis is drawn from all the splits.
P. G. Jeyasheeli (B) · J. Deepika · R. R. Sathya Department of Computer Science and Engineering, Mepco Schlenk Engineering College, Sivakasi, Tamilnadu 626005, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_43
529
530
P. G. Jeyasheeli et al.
43.1 Introduction Today’s world is ruled by the Internet of Things (IoT) and everything revolves around it now-a-days. The use of IoT has become more and more prominent in various fields. The enhanced version of IoT improves performance for the transportation and machinery fields which led to the development of Industrial Internet of Things (IIoT). One of the encouraging areas of the Internet of Things is VANET. VANET is used in Intelligent Transport System which aids in resolving various traffic issues [11] which includes traffic accidents and congestion. Intelligent Transport System (ITS) requires efficient security features [4] which are also dynamic in nature [1]. To make the VANET more efficient and dynamic, SDN is used. Software defined network aims to make a network more dynamic and programmable. The key concepts involved are automation and flexibility. SDN allows separation of data and control planes. The use of SDN in the VANET provides the following benefits. (1) provides flexibility and reduces the burden of the network (2) helps in avoiding beacon messages (3) helps in better routing. The VANET consists of Roadside Units (RSU) and vehicles. The RSUs are linked to the SDN controller, which facilitates the sharing of global network data. Apart from making the network a flexible one, it is very important to secure the network because this concerns the risk of human life [2]. It is also critical to ensure the integrity of the messages sent to Roadside Units. Attackers can tamper the messages that are being sent. To ensure integrity, the message is encrypted and decrypted using ECC cryptography. The Roadside Units (RSU) are capable of authenticating the message transferred. In spite of whatever authentication method is used, some attackers always find a way to collapse the network. Some attackers disguise themselves as authenticated users and enter into the VANET and disrupt the network. To prevent this type of attacks in the network, reactive mechanisms such as Intrusion Detection System is used. There have been many works dedicated to show the importance of security in VANET. VANET itself provides enhancement in safety while driving. Being a wireless network, it is prone to many security issues.
43.2 Related Works Any unauthorized activity can lead to compromise in the integrity and confidentiality of the information shared within the network. So, Dua et al. [5] worked on secure message communication protocol among vehicles [10] in smart city, using Elliptic Curve Cryptography (ECC) technique. They also introduced the concept of using cluster heads for communication. Because of the changing network topology, cluster formation is regarded as a difficult task in VANET. Here, the vehicles which are of the same relative speed and direction are made cluster members. The cluster head is chosen based on the average speed’s stability measure.
43 Security for Software Defined Vehicular Networks
531
Shao et al. [12] introduced a new authentication protocol for VANETs in a decentralized group model by making use of a new group signature scheme, which led to threshold authentication. Since the messages received are from anonymous vehicles, it makes it difficult for the vehicles to trust the messages. So, this scheme allows the receivers to accept the messages that are confirmed by a number of vehicles. In a decentralized model, the entire VANET network is separated into several separate groups and each group is under a particular RSU as their head instead of an authority [6]. But these two different methods cannot be integrated together because of traceability and distinguishability of the message. Therefore, a new authentication protocol was introduced that would integrate decentralized model and threshold authentication by using a new group signature scheme. Zhong et al. [14] proposed a conditional privacy-preserving and authentication scheme for secure service provision in VANETs which uses pseudonym-based signatures. VANET’s support multicast communication to support VANET security. But they possess some challenges. The system consists of three physical components, TA, RSU and vehicles. The trusted authority is in charge of the entire VANET system. Here the vehicles are divided into different geographical areas each with a TA [9]. The RSUs are placed along the roadside and serve as a connecting link between the RSU and the vehicles. Wireless communication via V2V and V2R is supported by the RSU. In order to provide authenticity of the message bilinear pairing technique was used. They provided a comparative analysis of the existing system and the proposed system. Arul et al. [3] introduced a security protocol called quantum key GRID for authentication and key agreement scheme, which can be used in 5G scenario for IoT devices. This was introduced since encrypting algorithms like RSA and ECC are vulnerable in quantum computers. To ensure a secure quantum key hierarchy, a simple key generation and key management scheme was proposed. This is introduced because crypto suits such as RSA are vulnerable to quantum computer attacks. Zhong et al. proposed a conditional privacy-preserving and authentication scheme for secure service provision in VANETs which uses pseudonym-based signatures. They provided a comparative analysis of the existing system and the proposed system. Li et al. [7] explored authentication concerns with privacy preservation and nonrepudiation in VANETs and suggested an ACPN framework based on public key cryptography. Maglaras [8] developed a distributed Intrusion Detection System for Vehicular Ad-hoc Networks that can be installed on Roadside Units (RSUs). The KOCSVM module serves as the foundation for the system. It is the primary detection module in intrusion detection, combining an OCSVM classifier with an RBF kernel and a recursive k-means clustering module. The IDS’s primary function is to detect anomalies in an efficient manner. The Intrusion Detection System (IDS) examines network traffic and reports on suspicious activity.
532
P. G. Jeyasheeli et al.
43.3 Methodology SDN-based VANET is an Intelligent Transport System (ITS) which consists of a group of moving vehicles or stationary vehicles connected together in a wireless network. VANET was developed for improving safety like collision avoidance, etc. Here, the vehicles are able to communicate with the other vehicles and Roadside Units (RSU) in the network. The vehicles are considered as nodes in VANET architecture. These nodes communicate with each other through mobility nodes. VANETs have frequently changing topology because of the high node mobility. VANETs are prone to many attacks and Denial of Service being the prominent one. These attacks can result in risking the lives of people. Therefore, there is a need for security in VANET (Fig. 43.1). The aim of the work is to provide message integrity and detect the attacks that are done by the intruders. The proposed work consists of 2 modules: message encryption and Intrusion Detection System. The attack detection is done in two different ways. 1. IDS is implemented using machine learning with an SVM classifier [13]. 2. Packets are analysed and attacks are detected. The system diagram for the work is as follows (Fig. 43. 2).
Fig. 43.1 VANET architecture
43 Security for Software Defined Vehicular Networks
533
Fig. 43.2 System diagram
43.3.1 Incorporating SDN in VANET Software defined vehicular network decouples the data and the control plane. An entity known as controller helps on monitoring and networking devices. It resides in the control plane. The controller helps in controlling the overall performance. The data plane has forwarding devices. The data plane is responsible for data forwarding and is equipped with forwarding devices as well as wired and wireless communication lines. A standardized interface, such as OpenFlow, is used to communicate across these two planes. Vehicular Ad-hoc Network is a subclass of MANET. VANET are deployed in roads where vehicles act as mobile nodes. These vehicles are capable of communicating with Roadside Units and also other vehicles. Roadside Units are fixed physical entities that are fixed on the roadside. To improve the overall efficiency of the network, SDN is applied in VANET. So, in SDVN, the vehicles are the data plane entities. The local RSUs and global SDN controllers are control plane entities. The controller performs routing, information gathering and providing services to end users. The controller is in charge of routing, gathering information, and providing services to end users. The controller gives the
534
P. G. Jeyasheeli et al.
application plane an up-to-date picture, which aids in the management of services like security and access control.
43.3.2 Message Encryption Vehicle and RSU communicate with each other. This communication is initiated when the vehicle enters into the RSU range. The vehicle shares information regarding traffic jams, accidents, collision warning, etc. There is a chance that attackers may intrude inside the network and can fabricate the message that the vehicle sends. This can lead to many accidents risking the lives of people. Every message sent to the RSU must be delivered without any modifications and the message’s integrity should be ensured. In order to overcome this problem, ECC encryption can be used to ensure integrity. Why ECC over other encryption? 1. ECC is one of the most powerful cryptographics and it is very hard for the attacker to crack as the key created by ECC is more difficult mathematically. 2. ECC key size is small. 3. Key generation is faster. 4. Compared to other famous cryptography like RSA, ECC provides optimal security with shorter key length, requiring lesser load for network and computing power. 5. The ECC key size of 256 bits is similar to a 3072-bit RSA key and is 10,000 times more secure 2048-bit RSA key. The message is encrypted using ECC encryption in the vehicle and later decrypted at the RSU. Because ECC cannot provide encryption directly, a hybrid encryption approach is used to produce a shared secret key for symmetric data encryption and decryption using the Elliptic Curve Diffie–Hellman (ECDH) key exchange scheme.
43 Security for Software Defined Vehicular Networks
535
For this algorithm, initially assume that the private and public keys are generated randomly using the curve equation. In our network, the certificate authority generates the public and the private key and distributes them to the vehicles and RSU, respectively. With the help of these keys, message encryption and decryption are performed. By employing encryption, message integrity is ensured.
43.3.3 Detecting Attacks Sometimes it is possible some may fake as authentic users and perform attacks after joining the network. To detect such attacks Intrusion Detection System can be used.
43.3.3.1
IDS Using SVM
To detect attackers Intrusion Detection System is used. In our module, IDS is implemented using the Support Vector Machine (SVM) classifier. SVM is a type of supervised machine learning algorithm that can be used to solve classification or regression problems. Why is SVM preferred? • There are many other classification algorithms like random forest, KNN. But SVM is better than other classifiers because of the better accuracy results. They also use less memory compared to other classifiers. • The generalization error is much less compared to other classifiers. SVM has a faster prediction rate too. • SVM naturally avoids overfitting problems. • SVM also has good generalization and can perform non-linear classification. • SVM is fast processing and has high detection performance. Steps to Build a Support Vector Machine (SVM) Classification Model 1.
Data Pre-processing
Pre-processing is the process of cleaning raw data and formatting it so that it can be used by machine learning models. Pre-processing improves the model’s accuracy as well. Following the collection of data, the following procedures are carried out. Finding Missing Values After loading the dataset, the next step is to find the missing values present in the dataset. Missing values are nothing but the values or data that are not present for some variables. In our dataset there is no missing value present. If there are missing values, either remove the corresponding record or the missing value is replaced with
536
P. G. Jeyasheeli et al.
the mean, median or most frequent value of the particular feature depending upon the nature of the feature. Encoding Categorical Values Some data present in the dataset are non-numerical. This data needs to be converted into numerical values before they are used. NSL-KDD has three features that are non-numerical—protocol_type, service, flag. Each feature has a list of categories and they are numbered starting from 0. For example, protocol_type has 3 categories namely tcp, udp, icmp and they are numbered as 0, 1, 2, respectively. Feature Scaling The values in the dataset are not in the uniform range. Feature scaling standardizes the independent features. One such method is Min–Max normalization. This technique scales the values between the range 0–1. The formula used is, Z new = Zi − min(Z )/ max(Z )− min(Z ) 2. Dataset Splitting After pre-processing, cross-validation is done. Cross-validation is the process in which the dataset is divided into train and test sets. Splitting the dataset is one of the important processes in any modelling. The model is trained on the training set and then evaluated using the test set. A train-test splitting helps to avoid overfitting or underfitting. 3. Training The next step is to train the model with our train dataset. Model training is the process which helps in identifying and learning good values for all attributes present. In supervised learning, the machine learning algorithm builds a model which learns from the labelled training set. SVM Support Vector Machine (SVM) is a classification-based supervised learning algorithm. A simple linear SVM classifier works by drawing a line (decision boundary) between two classes. It means one side of the line represents one category and the other side belongs to a separate category. The hyperplane is the decision boundary which separates the two classes. The equation of the hyperplane is given by, w.x + b = 0. The distance between the vector (w) and origin is ‘s’. The SVM works by taking the dot product between data points as a vector (x) and a vector (w). The vector are classified based on these following rules, X.w = s (situated on the boundary).
43 Security for Software Defined Vehicular Networks
537
X.w > s (situated on the right side of the plane). X.w < s (situated on the left side of the plane). 4. Testing The final step in the process is to test our model. Testing is the process where the performance of a trained model is evaluated on a separate testing dataset. Hold-out method and k-fold cross-validation method are used in the module. • Hold-out method In this method, the entire dataset is divided into train and test sets by different percentages, like 90–10, 80–20 and 70–30. The training set proportion must always be greater than the test set. The model is evaluated for all the three types of splits. • K-fold cross-validation The dataset is divided into k-folds of nearly equal size in this technique. The value of k denotes the number of groups into which a given dataset must be divided. The value of k is assumed to be 10. Build the model on k−1 folds in the dataset for each fold in the dataset. The machine learning model is then tested on the kth fold. This procedure must be repeated until each fold has served as a test set. The evaluation is carried out by taking the average of all the fold accuracies. This is referred to as cross-validation accuracy, and it serves as a performance metric.
43.3.3.2
IDS by Analysing Packets
Intrusion Detection System works by analysing the incoming packets. The network topology consisting of 4 hosts is simulated and the packets transmission is captured. Attacks are intentionally performed from one host and these attack packets are captured along with normal packets. Attacks performed UDP Flood A sort of Dos attack is a UDP flood. The attacker bombards the targeted server with a huge volume of UDP packets. The aim of the attacker is to overwhelm the target’s ability to respond and process and eventually the server becomes unreachable by other clients. Smurf Attack Smurf attack is a form of Distributed Denial of Service (DDoS). A hacker overloads the targeted server with Internet Control Message Protocol (ICMP) packets with a spoofed IP address.
538
P. G. Jeyasheeli et al.
SYN Flood A SYN flood is a form of DoS attack, in which an attacker continuously initiates a connection to the server without finalizing it. It means, the attacker doesn’t send any acknowledgement to the server for finalizing the connection and the server waits too longs and becomes unavailable to other users. These attacks are performed and packets are captured. These packets serve as dataset. Then, packets are classified as attacks or not using SVM method again.
43.4 Results and Discussion 43.4.1 Experimental Setup Message encryption is done in Mininet-WiFi. The Intrusion Detection System using SVM, the dataset used is NSL-KDD. This dataset contains the record of the network traffic. Each record in dataset has 41 attributes. There are 194,340 records in total. Last column of the record is a label which tells us that the record is either an attack or not. The 41 features are as described in Table 43.1. A network is simulated and packets are captured. This information is saved, they are exported in the form of csv files. These act as a second dataset. This dataset is given as input for the SVM classifier and they are classified as attack packets or not. The features obtained from the captured packets are described in Table 43.2. For Intrusion Detection System by analysing the network is simulated using Mininet-WiFi. It contains 4 hosts. Attacks are simulated from one host. Attacks packets are generated using hping3. Hping3 is a packet generator tool for sending out customized packets. The packets are captured using Wireshark. Wireshark is a packet capturing tool and the packet information can be saved. These packets are analysed using filters.
43.4.2 Results A network of 4 stations is created and the communication between two hosts is assumed. The message is sent to the server by the client. It gets encrypted and sent. Table 43.3 describes the encrypted message. The server on receiving the encrypted message decrypts the message using the respective key it has. Table 43.4 describes the decrypted message, private and author tag received.
43 Security for Software Defined Vehicular Networks
539
Table 43.1 NSL-KDD dataset features No
Feature
No
Feature
1
duration
22
Count
2
protocol_type
23
srv_count
3
service
24
serror_rate
4
flag
25
srv_serror_rate
5
src_bytes
26
rerror_rate
6
dst_byted
27
srv_serror_rate
7
land
28
same_srv_rate
8
wrong_fragment
29
diff_srv_rate
9
urgent
30
srv_diff_host_rate
10
hot
31
dst_host_count
11
num_failed_logins
32
dst_host_srv_count
12
logged_in
33
dst_host_same_srv_rate
13
num_compromised
34
dst_host_diff_srv_rate
14
root_shell
35
dst_host_same_src_port_rate
15
su_attempted
36
dst_host_srv_diff_host_rate
16
num_root
37
dst_host_serror_rate
17
num_file_creations
38
dst_host_srv_serror_rate
18
num_shells
39
dst_host_rerror_rate
19
num_access_files
40
dst_host_srv_rerror_rate
20
num_outbound_cmds
41
is_host_login
21
is_guest_login
Table 43.2 Captured dataset features
No
Features
1
Time
2
Source
3
Destination
4
Protocol
5
Length
6
Info
Table 43.3 Message encryption Original message
A traffic has occurred
Encrypted message
\xc7\ × 96\ × 95\xd1\x96E\xc7\xf0\xd4\x1f|\x85g\xbd\xe4\ × 85}|\xc0\x8b\ × 00
\xc7\ × 96\ × 95\xd1\x96E\xc7\xf0\xd4\x1f|\x85g\xbd\xe4\ × 85}|\xc0\x8b\ × 00
z\xeb; < \x0f)\ × 07!\x1c\xf9\ × 17\xde9\ × 80\xa3\x9b
\xe0\xff6\x1f\xe8\x8b\xa2\xea%lM\ × 93\nQ\ × 08\xf3
10,444,868,193,361,392,142,672,108,172,911,160,578,830,299,522,654,680,798,224,102,242,411,015,852,616
2,417,555,726,063,437,496,723,113,225,742,559,703,433,170,870,212,075,195,441,440,366,864,182,550,563
A traffic has occurred
Encrypted cypher text received
Nonce key received
Author tag received
Private key received
Cypher text private key received
Decrypted msg
Table 43.4 Message decryption
540 P. G. Jeyasheeli et al.
43 Security for Software Defined Vehicular Networks
43.4.2.1
541
Intrusion Detection System
Using SVM Classifier The performance evaluation is done by the calculation of the evaluation metrics of the machine learning model. The metrics include accuracy, recall, precision, confusion matrix and f 1-score. Accuracy The sum of true positives and true negatives divided by the total number of samples is accuracy. Accuracy = (tp + tn)/(tp + tn + fp + fn) Recall Recall is defined as the ratio of the correct predictions to the total correct samples in the dataset. Recall = (tp)/(tp + fn) F1-Score The F1-score is the sum of recall and precision. As a result, this score considers both false positives and false negatives. Precision The number of true positives divided by the total number of positive predictions, i.e. the number of true positives plus the number of false positives, is the definition of precision. Precision = tp/(tp + fp) F1-Score The F1-score is the sum of recall and precision. As a result, this score considers both false positives and false negatives. F1-Score = tn/(tn + fp) NSL-KDD Dataset The dataset is divided into training and testing sets as 70–30%. The model’s evaluation metrics is given in Table 43.5.
542
P. G. Jeyasheeli et al.
Table 43.5 Evaluation metrics for 70–30 split
Precision
Recall
f 1-score
Normal
0.91
0.94
0.93
Anomaly
0.94
0.91
0.92
Precision
Recall
f 1-score
Normal
0.90
0.94
0.92
Anomaly
0.94
0.90
0.92
Precision
Recall
f 1-score
Normal
0.92
0.96
0.94
Anomaly
0.95
0.91
0.93
Table 43.6 Evaluation metrics for 80–20 split
Table 43.7 Evaluation metrics for 90–10 split
Table 43.8 Evaluation metrics for k-fold (k = 10)
Precision
Recall
f 1-score
Normal
0.95
0.97
0.96
Anomaly
0.97
0.94
0.95
The dataset is divided into training and testing set in 80–20%. The model’s evaluation metrices is given in Table 43.6. The dataset is divided into training and testing sets in the 90–10%. The model’s evaluation metrics is given in Table 43.7. The dataset is also tested using k-fold cross-validation technique with 10 as k value. The obtained results are shown in Table 43.8. The following is a comparison of the training and testing accuracy obtained from various ratios of dataset splits (Table 43.9).
Captured Packet Dataset The packets for the attacks simulated are captured using Wireshark. They are analysed and the result in shown in Table 43.10. The training and testing accuracy for the captured dataset is shown Table 43.11. Table 43.9 Comparative analysis of training and testing accuracies 70–30 split
80–20 split
90–10 split
K-fold (K = 10)
Training accuracy
0.9343
0.934
0.935
0.95
Testing accuracy
0.9345
0.9352
0.9373
0.96
43 Security for Software Defined Vehicular Networks Table 43.10 Evaluation metrics for captured packet dataset
Table 43.11 Training and testing accuracy
543
Precision
Recall
f 1-score
Normal
1.0
0.95
0.98
Anomaly
0.97
1.0
0.99
Training accuracy
1.0
Testing accuracy
0.929
Table 43.12 Normal and attack packets No Time
Source
Destination Protocol Length Info
1.
0
10.0.0.9
10.0.0.2
UDP
242
55457 > 0 Len=200
2.
3.85E–05
10.0.0.9 10.0.0.2
UDP
242
55,458 > 0 Len = 200
3.
988.5332
10.0.0.1 10.0.0.2
ICMP
98
Echo (ping) request id = 0 × 06e8, seq = 1/256, ttl = 64 (reply in 26)
4.
988.5343
10.0.0.2 10.0.0.1
ICMP
98
Echo (ping) reply id = 0 × 06e8, seq = 1/256,ttl = 64 (request in 25)
5.
989.5334
10.0.0.1 10.0.0.2
ICMP
98
Echo (ping) request id = 0 × 06e8, seq = 2/512, ttl = 64 (reply in 28)
6.
0
10.0.0.1 10.0.0.2
TCP
54
48,167 > 0 [ ] Seq = 1 Win = 512 Len = 0
7.
4.92E−05 10.0.0.1 10.0.0.2
TCP
54
48,168 > 0 [ ] Seq = 1 Win = 512 Len = 0
Analysis of Captured Dataset Both normal and attack packets are captured using Wireshark and the analysis is done using filters. These filters separate the attack packets from the normal packets. Table 43.12 describes the normal and the attack packets that are captured using Wireshark (Table 43.13). Table 43.14 shows the captured packets for SMURF attack. It can be seen that the protocol used is ICMP and large ping requests are sent and the server is not able to respond. And hence ‘no response found’ is shown. We can also see that the sequence number is not in an orderly fashion. Filter used: icmp and data.len > 48. Table 43.15 shows the captured packets for SYN flood. The attack packets use TCP protocol and the destination is the same for all packet transmission. The flag is mentioned as SYN and the length of the packet is larger than the usual. Filter used: tcp.flags.syn = = 1 and tcp.flags.ack = = 1Table.
544
P. G. Jeyasheeli et al.
Table 43.13 UDP flood No
Time
Source
Destination
Protocol
Length
Info
1
0
10.0.0.9
10.0.0.2
UDP
242
55,457 > 0 Len = 200
2
3.85E–05
10.0.0.9
10.0.0.2
UDP
242
55,458 > 0 Len = 200
3
5.06E–05
10.0.0.9
10.0.0.2
UDP
242
55,459 > 0 Len = 200
4
6.12E–05
10.0.0.9
10.0.0.2
UDP
242
55,460 > 0 Len = 200
5
7.44E–05
10.0.0.9
10.0.0.2
UDP
242
55,461 > 0 Len = 200
6
8.65E–05
10.0.0.9
10.0.0.2
UDP
242
55,462 > 0 Len = 200
7
9.63E–05
10.0.0.9
10.0.0.2
UDP
242
55,463 > 0 Len = 200
8
0.000107
10.0.0.9
10.0.0.2
UDP
242
55,464 > 0 Len = 200
Table 43.14 SMURF attack No
Time
Source
Destination
Protocol
Length
Info
1
0
10.0.0.1
10.0.0.2
TCP
54
48,167 > 0 [ ] Seq = 1 Win = 512 Len = 0
2
4.92E–05
10.0.0.1
10.0.0.2
TCP
54
48,168 > 0 [ ] Seq = 1 Win = 512 Len = 0
3
7.64E–05
10.0.0.1
10.0.0.2
TCP
54
48,169 > 0 [ ] Seq = 1 Win = 512 Len = 0
4
0.000103
10.0.0.1
10.0.0.2
TCP
54
48,170 > 0 [ ] Seq = 1 Win = 512 Len = 0
5
0.000129
10.0.0.1
10.0.0.2
TCP
54
48,171 > 0 [ ] Seq = 1 Win = 512 Len = 0
Table 43.15 SYN flood No
Time
Source
Destination
Protocol
Length
Info
1
0
10.0.0.2
10.0.0.1
TCP
54
0 > 5514 [RST, ACK] Seq = 1 Ack = 1 Win = 0 Len = 0
2
4.41E–05
10.0.0.2
10.0.0.1
TCP
54
0 > 5515 [RST, ACK] Seq = 1 Ack = 1 Win = 0 Len = 0
3
0.000565
10.0.0.2
10.0.0.1
TCP
54
0 > 5516 [RST, ACK] Seq = 1 Ack = 1 Win = 0 Len = 0
4
0.00061
10.0.0.2
10.0.0.1
TCP
54
0 > 5517 [RST, ACK] Seq = 1 Ack = 1 Win = 0 Len = 0
5
0.000647
10.0.0.2
10.0.0.1
TCP
54
0 > 5518 [RST, ACK] Seq = 1 Ack = 1 Win = 0 Len = 0
6
0.000682
10.0.0.2
10.0.0.1
TCP
54
0 > 5519 [RST, ACK] Seq = 1 Ack = 1 Win = 0 Len = 0
7
0.000717
10.0.0.2
10.0.0.1
TCP
54
0 > 5520 [RST, ACK] Seq = 1 Ack = 1 Win = 0 Len = 0
8
0.000753
10.0.0.2
10.0.0.1
TCP
54
0 > 5521 [RST, ACK] Seq = 1 Ack = 1 Win = 0 Len = 0
43 Security for Software Defined Vehicular Networks
545
43.5 Conclusion We proposed ways of improving and making the software defined vehicular network a more secure one. Lack of security can affect the lives of people. By implementing message encryption and Intrusion Detection System, attackers find it difficult to enter the network. Message encryption using ECC is one of the powerful methods and this is applied on the communicating nodes to ensure integrity. An Intrusion Detection System is implemented using SVM classifier to detect the attacks and the accuracy of the results are discussed. The Denial of Service attacks are simulated in a network and the packets are captured and analysed using Wireshark.
References 1. Archanaa, G.J., Venittaraj R.: Reliable message delivery using digital signature in VANETs. Int. J. Eng. Sci. Comput. (2013) 2. Archanaa, G.J., Venittaraj, R.: A secure communication for clustered RSU in VANETs. Int. J. Innov. Res. Sci. Eng. Technol. 3 (2014) 3. Arul, R., et al.: A quantum-safe key hierarchy and dynamic security association for LTE/SAE in 5G scenario. IEEE Trans. Indust. Inf. 16, 681–690 (2020) 4. Catak, F.O.: Two-layer malicious network flow detection system with sparse linear model-based feature selection. J. Natl. Sci. Found. Sri Lanka 46, 601–612 (2018) 5. Dua, A., et al.: Secure message communication protocol among vehicles in smart city. IEEE Trans. Veh. Technol. 67, 4359–4373 (2018) 6. Lai, C., et al.: Secure group communications in vehicular networks: a software-defined networkenabled architecture and solution. IEEE Veh. Technol. Mag. 12, 4–49 (2017) 7. Li, J., Lu, H., Guizani, M.: ACPN: A novel authentication framework with conditional privacypreservation and non-repudiation for VANETs. IEEE Trans. Parallel Distrib. Syst. 26, 938–948 (2015) 8. Maglaras, L.A.: A novel distributed intrusion detection system for vehicular ad hoc networks. Int. J. Adv. Comput. Sci. Appl. 6, 101–106 (2015) 9. Lakshmanan, M., Natarajan S.K.: Security enhancement in In-vehicle controller area networks by electronic control unit authentication. Romanian J. Inf. Sci. Technol. 22, 228–243 (2019) 10. Raja Priya, V.K., Priyadharsini, S.: Smart vehicles for urban sensing based on content-centric approach. Int. J. Ad Hoc Ubiquit. Comput. 29 (2018) 11. Mishra, R., et al.: Vanet: security, issues and challenges. IEE Trans. Electr. Electron. (2016) 12. Shao, J., et al.: A threshold anonymous authentication protocol for VANETs. IEEE Trans. Vehic. Technol. 65, 1711–1720 (2016) 13. Siddiquee, M.S.A., Udagepola, K.P.: Use of artificial neural networks and support vector machines to predict lacking traffic survey data. J. Natl. Sci. Found. Sri Lanka 45, 239–246 (2017) 14. Zhong, H., et al.: Efficient conditional privacy preserving and authentication scheme for secure service provision in VANET. Tsinghua Sci. Technol. 21, 620–629 (2016)
Chapter 44
Design of Multiband Antenna with Staircase Effect in Ground for Multiband Application Sonali Kumari, Y. K. Awasthi, and Dipali Bansal
Abstract This paper presents a simple annular ring patch with staircase like ground plane for multiband antenna. The antenna is suitable for wireless application and C-band applications. Overall size of antenna is (36 × 39 × 1.6) mm3 . Three bands have been achieved (2.33–3.03) GHz, (3.39–4.23) GHz, (5.74–9) GHz with resonant frequencies 2.68, 3.40, and 6.63 GHz. Peak gain of 5.51 dB has been achieved at 2.68 GHz. High Frequency Structure Simulator software (HFSS) has been used for antenna designing. FR4 substrate has been used for designing the antenna, which makes antenna cost effective. Microstrip feedline has been given at the middle of patch. Change in the size of ground plane has been also shown in the picture below. Which makes antenna efficient and its effects has been shown in the results. By simply cutting at edge corner of ground plane antenna converts from dual band to multiband.
44.1 Introduction In communication systems, multiband devices (like dual frequency band, Tri frequency band, Quad frequency band, and Penta frequency band devices) are considered, which support multiple radio-frequency bands [1, 2]. In the last few years, rapid development has happened in the field of wireless communication. Various design parameters are considered while designing a multiband antenna, like Bandwidth enhancement, Selection of feeding mechanism, Proper selection of substrate, Overall Gain improvement, Size reduction (so that it is suitable for wireless and portable applications), to make it cost effective. To achieve these criteria, many researchers S. Kumari (B) · Y. K. Awasthi Manav Rachna International Institute of Research and Studies, Faridabad, India e-mail: [email protected] Y. K. Awasthi e-mail: [email protected] D. Bansal DSEU Okhla-II Campus, New Delhi, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_44
547
548
S. Kumari et al.
have worked on multiband antennas using different technologies like Reconfigurable antenna [3–7], Defected ground structure [8–10], Slot cutting [11–13], Fractal technology [14–19]. Still there is a need of antenna which will be considered for maximum parameters. In this paper hybrid technology of slot and DGS have been used which is not only cost effective but also good in gain and bandwidth. First, change in the size of ground plane has been proposed and then, change in the shape of ground plane (staircase like) has been developed. High Frequency Structure Simulator (HFSS) software has been used for designing the antenna. Section 44.2 describes the development of antenna design. Section 44.3 discusses results and discussion. Section 44.4 describes the conclusion and future scope.
44.2 Proposed Antenna Design 44.2.1 Development of Antenna Design Geometry Basic desigining of antenna development can be easily understood by flow diagram of antenna designing, given in Fig. 44.1. Figure 44.2 shows the proposed antenna design. In this proposed design, first a substrate of (36 × 39 × 1.6) mm3 has been chosen. Initially, a ground plane of 9.3 mm has been taken. Annular ring patch of 2 mm has been taken. In between ring, there is a small rectangle patch connected through feed line. Microstrip feedline has been used for the excitation. After simulation it is observed that it provides maximum return loss at 2.88 GHz. One band of range 2.78–2.95 GHz has been achieved. Then the size of ground plane has increased from 9.3 to 12.3 mm that increases the radiating band. As a result the two bands of 2.75–2.91 GHz and 7.46–8.03 GHz have been achieved. Then in the final design two small rectangular slots (2 × 5) mm2 from the ground have been removed. Now the ground plane looks like staircase that changes the effect of resistance and capacitance. Antenna results have been considerably improved in terms of gain and bandwidth, which can be clearer through S11 results.
44.3 Results and Discussion Figure 44.3 shows simulated results of Antenna 1, Antenna 2, and Proposed Design. From the Table 44.1, through this better results can be visualized. Simulated results clearly indicate the effect of slot cutting in the ground plane. The proposed antenna is found to be best in terms of number of bands, bandwidth, maximum return loss, and appreciable value of peak gain. Further, the proposed antenna shows good and stable radiation pattern of E-plane and H-plane at 2.68, 3.4, and 6.63 GHz except slight distortion at 6.63 GHz (as shown in Fig. 44.4). Figure 44.5 shows the graph of
44 Design of Multiband Antenna with Staircase Effect in Ground …
549
Fig. 44.1 Basic design steps of antenna designing
Fig. 44.2 Evolution of proposed antenna design a Antenna 1, b Antenna 2, c final proposed design
550
S. Kumari et al.
Fig. 44.3 Simulated results (return loss) of Antenna 1, Antenna 2, and proposed design
Table 44.1 Comparative results of Antenna 1, Antenna 2, and proposed design Design
Frequency band (GHz) Resonant frequency Return loss (dB) Gain (dB) (GHz)
Antenna 1
2.78–2.95
2.88
− 13.62
–
Antenna 2
2.75–2.91
2.86
− 14.22
–
7.46–8.03
7.53
− 11.67
Proposed design 2.33–3.03
2.68
− 40.55
5.51
3.39–4.23
3.4
− 28
3.24
5.74–9.00
6.63
− 12.87
4.9
antenna gain for the proposed design. Maximum gain of 5.51 dB has been achieved (which is quite considerable) at 2.68 GHz.
44.4 Conclusion and Future Scope The proposed design is well-suited for wireless band (2.33–3.04 GHz and 3.39– 4.23 GHz) and C-band (5.74–9.00 GHz) applications. Staircase like ground plane has been proposed by simple cutting edge at the bottom of antenna. The proposed antenna shows an appreciable value of gain as 5.51 dB at resonance of 2.68 GHz. Further, the propose design shows a stable radiation pattern to be used for the linearly polarized applications. In the future, the proposed design can be extended for eight-port or multiport multiple input and multiple output (MIMO) antenna.
44 Design of Multiband Antenna with Staircase Effect in Ground …
551
Fig. 44.4 Radiation pattern of a E-plane at 2.68 GHz b H-plane at 2.68 GHz c E-plane at 3.4 GHz d H-plane at 3.4 GHz e E-plane at 6.63 GHz f H-plane at 6.63 GHz
552
S. Kumari et al.
Fig. 44.5 Simulated gain graph of proposed design
References 1. Liao, S.Y.: Microwave Devices and Circuits. Pearson Education India (1990) 2. Balanis, C.A.: Antenna Theory: Analysis and Design, 2nd edn. Wiley (2007) 3. Chawla, P., Anand, R.: Micro-switch design and its optimization using pattern search algorithm for application in reconfigurable antenna. Mod. Antenna Syst. (2017). https://doi.org/10.5772/ 66127 4. Tripathi, S., Pathak, N.P., Parida, M.: A compact reconfigurable aperture coupled fed antenna for intelligent transportation system application. Int. J. RF Microw. Comput. Aided Eng. 30(7) (2020). https://doi.org/10.1002/mmce.22210 5. JenathSathikbasha, M., Nagarajan, V.: Design of multiband frequency reconfigurable antenna with a defected ground structure for wireless applications. Wirel. Pers. Commun. Commun. 113(2), 867–892 (2020). https://doi.org/10.1007/s11277-020-07256-8 6. Aboufoul, T., Alomainy, A., Parini, C.: Reconfiguring UWB monopole antenna for cognitive radio applications using GaAs FET switches. IEEE Antennas Wirel. Propag. Lett. Wirel. Propag. Lett. 11, 392–394 (2012). https://doi.org/10.1109/lawp.2012.2193551 7. Abutarboush, H.F., Nilavalan, R., Cheung, S.W., Nasr, K.M., Peter, T., Budimir, D., AlRaweshidy, H.: A reconfigurable wideband and multiband antenna using dual-patch elements for compact wireless devices. IEEE Trans. Antennas Propag. Propag. 60(1), 36–43 (2012). https://doi.org/10.1109/tap.2011.2167925 8. Usha Devi, Y., Boddapati, M.T.P., Kumar, T.A., Sri Kavya, C.K., Pardhasaradhi, P.: Conformal printed MIMO antenna with DGS for millimetre wave communication applications. Int. J. Electron. Lett. 8(3), 329–343 (2019). https://doi.org/10.1080/21681724.2019.1600731 9. Jilani, S.F., Alomainy, A.: Millimetre-wave T-shaped MIMO antenna with defected ground structures for 5G cellular networks. IET Microw. Antennas Propag. Propag. 12(5), 672–677 (2018). https://doi.org/10.1049/iet-map.2017.0467 10. Jilani, S.F., Alomainy, A.: Millimetre-wave T-shaped antenna with defected ground structures for 5G wireless networks. In: 2016 Loughborough Antennas & Propagation Conference (LAPC) (2016). https://doi.org/10.1109/lapc.2016.7807477 11. Dahiya, A., Anand, R., Sindhwani, N., Kumar, D.: A novel multi-band high-gain slotted fractal antenna using various substrates for X-band and ku-band applications. Mapan 37(1), 175–183 (2021). https://doi.org/10.1007/s12647-021-00508-3 12. Sharawi, M.S., Ikram, M., Shamim, A.: A two concentric slot loop based connected array MIMO antenna system for 4G/5G terminals. IEEE Trans. Antennas Propag. Propag. 65(12), 6679–6686 (2017). https://doi.org/10.1109/tap.2017.2671028 13. Liu, W.X., Yin, Y.Z., Xu, W.L., Zuo, S.L.: Compact open-slot antenna with bandwidth enhancement. IEEE Antennas Wirel. Propag. Lett. 10, 850–853 (2011). https://doi.org/10.1109/lawp. 2011.2165197
44 Design of Multiband Antenna with Staircase Effect in Ground …
553
14. Anand, R., Chawla, P.: Bandwidth optimization of a novel slotted fractal antenna using modified lightning attachment procedure optimization. Smart Antennas 379–392 (2022). https://doi.org/ 10.1007/978-3-030-76636-8_28 15. Anand, R., Chawla, P.: Optimization of inscribed hexagonal fractal slotted microstrip antenna using modified lightning attachment procedure optimization. Int. J. Microw. Wirel. Technol. 12(6), 519–530 (2020). https://doi.org/10.1017/s1759078720000148 16. Anand, R., Chawla, P.: A novel dual-wideband inscribed hexagonal fractal slotted microstrip antenna for c- and X-band applications. Int. J. RF Microw. Comput. Aid. Eng. 30(9) (2020). https://doi.org/10.1002/mmce.22277 17. Kumari, S., Srivastava, S., Lai, R.K.: Design of monopole fractal antenna using annular ring for RFID applications. In: 2015 International Conference on Soft Computing Techniques and Implementations (ICSCTI) (2015). https://doi.org/10.1109/icscti.2015.7489579 18. Kumari, S., Srivastava, S., Mittal, A., Lal, R.K.: Designing of a fractal annular array antenna with use of power divider. Indian J. Sci. Technol. 10(30), 1–5 (2017). https://doi.org/10.17485/ ijst/2017/v10i30/115492 19. Kumari, S., Awasthi, Y.K., Bansal, D.: A miniaturized circularly polarized multiband antenna for Wi-Max, C-band & X-band applications. Progr. Electromagn. Res. C 125, 117–131 (2022)
Chapter 45
Affine Non-local Means Image Denoising Rohit Anand, Valli Madhavi Koti, Mamta Sharma, Supriya Sanjay Ajagekar, Dharmesh Dhabliya, and Ankur Gupta
Abstract In subsequent activities like image identification and medical diagnosis, digital pictures are frequently damaged by noise during capture and transmission. When damaged photos must be utilized, several denoising techniques have been suggested to increase the accuracy of these jobs. However, the majority of these techniques either call for making assumptions on the statistical characteristics of the corrupting noise or are specifically created solely for a certain kind of noise. The performance of traditional image denoising algorithms that use lone noisy images and general image databases will soon be reached. In this article, we suggest denoising images with specific external image databases. Denoising is formulated as an optimum filter design issue, and we use the focused databases to (1) find the fundamental properties of the ideal filter using group sparsity, and (2) find spectral coefficients of the optimal filter using localized priors. Research exhibits better denoising R. Anand (B) Department of ECE, G. B. Pant DSEU Okhla-1 Campus (Formerly G. B. Pant Engineering College), New Delhi 110020, India e-mail: [email protected] V. M. Koti Department Computer Science, GIET Degree College, NH-16, Chaitanya Knowledge City, Rajahmundry, Andhra Pradesh 533294, India e-mail: [email protected] M. Sharma Department of CSE and CSA, Arni University, Kangra, Himachal Pradesh, India S. S. Ajagekar Department of Engineering and Technology, Bharati Vidyapeeth Deemed University, Navi Mumbai, India e-mail: [email protected] D. Dhabliya Department of Information Technology, Vishwakarma Institute of Information Technology, Pune, Maharashtra, India e-mail: [email protected] A. Gupta Department of Computer Science and Engineering, Vaish College of Engineering, Rohtak, Haryana, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_45
555
556
R. Anand et al.
outcomes than current methods for a range of circumstances, adding photos and text, multitier pictures, as well as face images. In contrast to employing a general database, we present an adaptive picture denoising approach in this study. In this context, a focused database is one that only contains photos pertinent to the noisy image. Targeted external databases might be acquired in several real-world settings, including text pictures, human faces, and photos taken by multi-view camera systems, as will be demonstrated in subsequent sections of this work.
45.1 Introduction An image is a column- and row-organized array, or matrix, of square pixels (picture components). The most popular and practical method of sharing or distributing information is through images [1–4]. Images clearly communicate information about the locations, dimensions, and connections between items. They depict geographical data that humans may identify as objects. Because of our natural visual and cerebral capacities, humans are adept at extracting information from such pictures [5, 6]. Humans get around 75% of their information in visual form. Phase coding and local binary patterns are used to create multi-resolution histogram descriptors that are invariant to different kinds of picture deterioration. The accuracy of such a portrayal has been proposed for face recognition, and it is superior to a single-scale pattern technique. Histogram reduces the impact of noise-induced pattern responses that are unstable [7]. The issue with picture denoising is significant because of the clear applications it has. It offers a suitable framework for evaluating image processing concepts and methods since it is the most straightforward inverse issue that can be formulated [7, 8]. In the last ten years or more, research has focused a lot on the use of redundant representations and sparsity. In this study, we take into account two types of training [9]: Either utilizing patches from the picture itself to train the vocabulary or using a corpus of high-quality images. Conventional work also proposed an SFDL method for picture set-based face identification, whereby each training and testing example is a collection of photographs of faces collected in different poses, lighting situations, expressions, and resolutions [9]. Image compression and image denoising reduce performance and compression by quantizing small-scale coefficients to zero [10, 11]. By pruning the whole wavelet packet decomposition of the image in terms of a preset cost function, one may get the optimal wavelet packet basis for image compression [10–12]. The spatial division criterion and the procedure for histogram estimation were the major topics of conventional study. The biggest issue with this strategy is that it leaves the original uniform LBP’s monotonic invariance attribute unchanged [12]. Many alternative cost functions, including coding-based cost functions, have also been developed. Denoising renders the original picture unviewable, however, the MAD estimator can determine the noise level in an image with noise [13]. To reduce
45 Affine Non-local Means Image Denoising
557
noise in a noisy image, the traditional wavelet transform is utilized, as it is in the EWF and the doubly local Wiener filtering. The resulting denoised image is then used as a pilot image to choose the most suitable wavelet packet basis. The matching of features across various pictures, or thresholding, is a typical difficulty in denoising [14]. To reliably match various viewpoints of an item or scene, Scale Invariant Feature Transform (SIFT) collects recognizable invariant characteristics from a picture. To execute binary thresholding, adaptive thresholding examines each pixel about its immediate surroundings [15–17]. For its many practical uses, picture denoising has been a hot study area, surprisingly; the assessment of the caliber of denoised photographs has received little interest in recent times [18, 19]. Our findings indicate that there is only a weak correlation between contemporary models of objective picture quality and subjective judgments. Future development of sophisticated objective models should include more research on structural faithfulness and naturalness [20, 21]. For its many practical uses, picture denoising has been a hot study area, surprisingly, the assessment of the caliber of denoised photographs has received little interest in recent times. In this study, we first construct a database of denoised photos produced by both traditional and cutting-edge denoising algorithms, as well as noisy images with varying noise levels [22]. The effectiveness of the denoised photos is then assessed and compared using a multi-stimulus ranking technique in a subjective experiment. Data analysis reveals that there is great agreement and significant disagreement among human respondents about their perceptions of denoised photos [23, 24].
45.2 Proposed System Targeted image denoising (TID), the proposed technique, uses a localized prior and group sparsity minimization to determine the best denoising filter. Based on this, two broad categories can be made for despeckling techniques: the homomorphic approach and the non-homomorphic approach [25]. Both decimated and undecimated wavelet domains are supported by these methods. Undecimated wavelets have a benefit that is comparable to a targeted image denoising’s benefit for signalindependent noise reduction. To create the TID, circularly shifted versions of the noisy image are subjected to wavelet denoising. The results are then averaged before being circularly shifted once more. It produces better results than traditional hard and soft thresholding in terms of noise reduction, sharpness preservation, and a lack of ringing impairments. The suggested technique requires fewer targeted photos in the database than the current approaches. Additionally, the suggested approach provides two fresh perspectives on the denoising issue. First, we demonstrate how the classical eigendecomposition may be used to solve a convex optimization involving group sparsity to discover the basis matrix while developing a linear denoising filter. This explains why many popular denoising algorithms that employ PCA as a learning step do so. Then, we show that the spectral components of the denoising filter can be estimated
558
R. Anand et al.
with the help of a localized prior and that the resultant Bayesian mean squared error [26] may be reduced to improve the denoising quality. The proposed solution requires fewer specific images in the database than the standard approach. In addition to a novel take on the denoising issue, the proposed approach provides two others. We first demonstrate how the standard eigendecomposition may be used to solve a convex optimization [27–29] involving group sparsity, which is necessary for determining the basis matrix and developing a linear denoising filter. Consequently, principal component analysis (PCA) is a common training step in several widely used denoising algorithms. Second, we demonstrate that a localized prior may be used to estimate the spectral components of the denoising filter and that doing so improves denoising quality by reducing Bayesian mean squared error [26]. Unlike the Gore face dataset, in which the subject remains the same, the FEI dataset has the facial features of several people. The (2-D) SA-DCT and the 1-D orthonormal transform have a decomposable composition on 3-D groups that are generalized cylinders with adaptive-shaped cross sections. The figure clearly demonstrates that the proposed strategy yields the highest PSNR values across the board, with a tiny but statistically significant advantage over other approaches when compared to the Gore dataset. All of the aforementioned non-local approaches, and almost all patchbased methods, estimate the whole image by averaging its constituent clean patches [30, 31]. Our first experiment looks at denoising a text picture using text that is close but not identical. The objective quality of denoised pictures is assessed using two quality metrics: PSNR and SSIM. TID produces the greatest SSIM and PSNR values of all of the available techniques. The PSNR outperforms the benchmark BM3D denoising technique by 5 dB [32, 33]. The benefits of the proposed system are to assess the denoised pictures’ objective quality and on each patch, denoising is done individually [34]; the calculation can be parallelized using GPU. TID exhibits a superior usage of the targeted database when contrasted with other external denoising techniques. The greatest PSNR and SSIM values are produced by TID [35, 36].
45.3 System Architecture The suggested technique, known as targeted image denoising (TID), uses a localized prior and group sparsity reduction to choose the best denoising filter. The TID entails applying wavelet denoising to copies of the noisy picture that have been circularly moved, then averaging the outcomes before shifting the image back. In terms of noise suppression, sharpness preservation, and the lack of ringing defects, it performs better than traditional hard and soft thresholding. As a result, the undecimated wavelet domain is where transform domain despeckling is generally done. The following are several phases (as shown in Fig. 45.1) in the creation of the suggested system:
45 Affine Non-local Means Image Denoising
Preprocessing
559
Adding Noise
Noise Removed by TID
Performance Evaluation
Estimation
Fig. 45.1 System architecture
Pre-processing: At the beginning of the method, a basis matrix U is used. To create a filtered image, the projected coefficients are first subjected to hard thresholding, p. In the next stage, the filtered image p is used as a rough guide to the necessary spectral component [34, 37]. Adding Noise: Denoising a text picture by experimentation with other related but different texts. PSNR and SSIM, two quality measures, are utilized to assess denoised pictures’ actual quality [35]. Noise Removed by TID: TID produces the greatest PSNR and SSIM values of all the techniques. The benchmark BM3D (internal) denoising technique is 5 dB worse than the PSNR. TID exhibits a superior usage of comparison of the selected database to other exterior denoising techniques. Estimation: Training a statistical before the intended database is an alternate answer to question (Q). As a result of the denoising problem’s ability to be reformulated as MAP estimation, this technique offers the advantage that its performance frequently has theoretical assurance [36]. Performance Evaluation: We’re currently using MATLAB for our implementation. With a targeted database of 9 photos with comparable sizes, denoising an image (301,218) takes roughly 144 s. An Intel Core i7-3770 CPU is used to execute the code. We present a runtime comparison of our system with alternative approaches.
45.4 Results Instead of a general database, we suggest employing a tailored external database with an adaptive picture denoising technique. A focused database in this context is one that only includes photos pertinent to the noisy image. The setup for issues like handwriting, bar codes, and license plates is simplified in this way. To set up the experiment, we randomly select one document and add noise to it. Nine clean papers
560
R. Anand et al.
PSNR
37.5 37 36.5
PSNR
36 31x31
29x29
27x27
25x25
23x23
21x21
19x19
17x17
15x15
13x13
9x9
11x11
7X7
5X5
3X3
35.5
Fig. 45.2. PSNR outcomes
MSE
54 53 52
MSE
51 31x31
29x29
27x27
25x25
23x23
21x21
19x19
17x17
13x13
15x15
11x11
9x9
7X7
5X5
3X3
50
Fig. 45.3 Mean MSE output
with the same font size are then used for denoising. We tack on zero-mean Gaussian noise to the test pictures, with standard deviations ranging from 20 to 80. It is decided that the patch size will be 8 8 (d = 64). The denoised picture unbiased quality is assessed using three quality metrics: PSNR, MSE, and SSIM (as shown in Figs. 45.2, 45.3, and 45.4). Since the default search window size for BM3D is just 39 by 39, we conduct an analysis to evaluate how other search window sizes impact performance. SSIM 0.97 0.965 0.96 SSIM
0.955
Fig. 45.4 Mean SSIM after denoising
31x31
27x27
29x29
25x25
23x23
21x21
19x19
17x17
15x15
11x11
13x13
9x9
7X7
5X5
3X3
0.95
45 Affine Non-local Means Image Denoising
561
45.5 Conclusion The performance of traditional picture denoising techniques depending upon a single noisy input and generic databases is beginning to plateau. We believe that picture denoising in the future should be target-oriented, meaning that only photos that include the same objects should be utilized for training [38, 39]. We offer techniques and accompanying simulations of leveraging targeted databases for the most effective linear denoising filter design to handle this new paradigm shift in image denoising. In comparison to a wide range of current algorithms, our suggested solution, which is based on group sparsity and localized priors, demonstrated greater performance and resilience. A thorough sensitivity analysis of the method will be done in subsequent work.
45.6 Future Scope The suggested technique, known as targeted image denoising (TID), uses a localized prior and group sparsity reduction to choose the best denoising filter. This research has provided a straightforward way for picture denoising that achieves cutting-edge performance that is comparable to and occasionally even outperforms previously released top alternatives. The suggested approach relies on local operations and entails a straightforward average computation as well as sparse decompositions of each picture block under a single fixed. The dictionary’s content is crucial to the denoising process, and our research has demonstrated that both an adaptive vocabulary taught on patches of the noisy picture itself and a dictionary developed for realistic real-world images function well. In the future, further research could be made considering advanced noise removal mechanisms to get better PSNR.
References 1. Sindhwani, N., Anand, R., Meivel, S., Shukla, R., Yadav, M.P., Yadav, V.: Performance analysis of deep neural networks using computer vision. EAI Endorsed Trans. Indu. Netw. Intell. Syst. 8(29), e3–e3 (2021) 2. Choudhary, P., Anand, M.R.: Determination of the rate of degradation of iron plates due to rust using image processing. Int. J. Eng. Res. 4(2), 76–84 (2015) 3. Juneja, S., Anand, R.: Contrast enhancement of an image by DWT-SVD and DCT-SVD. In: Data Engineering and Intelligent Computing: Proceedings of IC3T 2016, pp. 595–603. Springer Singapore (2018) 4. Buades, A., Coll, B., Morel, J.-M.: A review of image denoising algorithms, with a new one. SIAM Multiscale Model. Simul. 4(2), 490–530 (2005) 5. Kervrann, C., Boulanger, J.: Lo caladaptivity to variable smoothness for exemplar-based image regularization and representation. Int. J. Comput. Vis. Comput. Vis. 79(1), 45–69 (2008)
562
R. Anand et al.
6. Singh, H., Ramya, D., Saravanakumar, R., Sateesh, N., Anand, R., Singh, S., Neelakandan, S.: Artificial intelligence-based quality of transmission predictive model for cognitive optical networks. Optik 257, 168789 (2022) 7. Dabov, K., Foi, A., Katkovnik, V., Egiazarian, K.: Image denoising by sparse 3D transformdomain collaborative filtering. IEEE Trans. Image Process. 16(8), 2080–2095 (2007) 8. Saini, P., Anand, M.R.: Identification of defects in plastic gears using image processing and computer vision: a review. Int. J. Eng. Res. 3(2), 94–99 (2014) 9. Dabov, K., Foi, A., Katkovnik, V., Egiazarian, K.: A nonlocal and shape-adaptive transformdomain collaborative filtering. In: Proceedings of the International Workshop Local Non-local Approximation in Image Processing (LNLA), Aug 2008, pp. 1–8 10. Vyas, G., Anand, R., Hole ◠ , K.E.: Implementation of advanced image compression using wavelet transform and SPHIT algorithm. Int. J. Electron. Electr. Eng. 4(3), 249–254 (2011) 11. Kumar, R., Anand, R., Kaushik, G.: Image compression using wavelet method & SPIHT algorithm. Digit. Image Process. 3(2), 75–79 (2011) 12. Zhang, L., Dong, W., Zhang, D., Shi, G.: Two-stage image denoising by principal component analysis with local pixel grouping. Pattern Recognit. 43, 1531–1549 (2010) 13. Dong, W., Zhang, L., Shi, G., Li, X.: Nonlocally centralized sparse representation for image restoration. IEEE Trans. Image Process. 22(4), 1620–1630 (2013) 14. Rajwade, A., Rangarajan, A., Banerjee, A.: Image denoising using the higher order singular value decomposition. IEEE Trans. Pattern Anal. Mach. Intell. Intell. 35(4), 849–862 (2013) 15. Shao, L., Yan, R., Li, X., Liu, Y.: From heuristic optimization to dictionary learning: a review and comprehensive comparison of image denoising algorithms. IEEE Trans. Cybern. 44(7), 1001–1013 (2014) 16. Zontak, M., Irani, M.: Internal statistics of a single natural image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2011, pp. 977–984 17. Gupta, M., Anand, R.: Color image compression using a set of selected bit planes. IJECT 2(3), 243–248 (2011) 18. Mosseri, I., Zontak, M., Irani, M.: Combining the power of internal and external denoising. In: Proceedings of the IEEE International Conference on Computational Photography (ICCP). Apr 2013, pp. 1–9 19. Burger, H.C., Schuler, C., Harmeling, S.: Learning how to combine internal and external denoising methods. In: Recognition, P. (ed.) Berlin, pp. 121–130. Springer, Germany (2013) 20. Glasner, D., Bagon, S., Irani, M.: Super-resolution from a single image. Proc. Int. Conf. Comput. Vis. (ICCV), 349–356 (2009) 21. Chatterjee, P., Milanfar, P.: Is denoising dead? IEEE Trans. Image Process. 19(4), 895–911 (2010) 22. Dushyant, K., Muskan, G., Gupta, A., Pramanik, S.: Utilizing machine learning and deep learning in cyber security: an innovative approach. In: Ghonge, M.M., Pramanik, S., Mangrulkar, R., Le, D.N. (eds.) Cyber Security and Digital Forensics. Wiley (2022) https:// doi.org/10.1002/9781119795667.ch12 23. Bansal, R., Obaid, A.J., Gupta, A., Singh, R., Pramanik, S.: Impact of big data on digital transformation in 5G era. In: 2nd International Conference on Physics and Applied Sciences (ICPAS 2021) (2021). https://doi.org/10.1088/1742-6596/1963/1/012170 24. Pramanik, S., Bandyopadhyay, S.K., Ghosh, R.: Signature image hiding in color image using steganography and cryptography based on digital signature concepts. In: IEEE 2nd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), Bangalore, India, pp. 665–669 (2020). https://doi.org/10.1109/ICIMIA48430.2020.9074957 25. Malik, S., Saroha, R., Anand, R.: A simple algorithm for the reduction of blocking artifacts using the SAWS technique based on fuzzy logic. Int. J. Comput. Eng. Res. 2(4), 1097–1101 (2012) 26. Gupta, A., Anand, R., Pandey, D., Sindhwani, N., Wairya, S., Pandey, B.K., Sharma, M.: Prediction of breast cancer using extremely randomized clustering forests (ERCF) technique: prediction of breast cancer. Int. J. Distrib. Syst. Technol. (IJDST) 12(4), 1–15 (2021)
45 Affine Non-local Means Image Denoising
563
27. Anand, R., Chawla, P.: A novel dual-wideband inscribed hexagonal fractal slotted microstrip antenna for C-and X-band applications. Int. J. RF Microw. Comput. Aided Eng. Comput. Aided Eng. 30(9), e22277 (2020) 28. Dahiya, A., Anand, R., Sindhwani, N., Kumar, D.: A novel multi-band high-gain slotted fractal antenna using various substrates for X-band and Ku-band applications. Mapan 37(1), 175–183 (2022) 29. Anand, R., Chawla, P.: Bandwidth optimization of a novel slotted fractal antenna using modified lightning attachment procedure optimization. In: Smart Antennas: Latest Trends in Design and Application, pp. 379–392. Springer International Publishing, Cham (2022) 30. Babu, S.Z.D., et al.: Analysation of big data in smart healthcare. In: Gupta, M., Ghatak, S., Gupta, A., Mukherjee, A.L. (eds.) Artificial Intelligence on Medical Data. Lecture Notes in Computational Vision and Biomechanics, vol 37. Springer, Singapore (2023). https://doi.org/ 10.1007/978-981-19-0151-5_21 31. Gupta, A., Singh, R., Nassa, V.K., Bansal, R., Sharma, P., Koti, K.: Investigating application and challenges of big data analytics with clustering. In: 2021 International Conference on Advancements in Electrical, Electronics, Communication, Computing, and Automation (ICAECA), pp. 1–6 (2021). https://doi.org/10.1109/ICAECA52838.2021.9675483 32. Pandey, B.K., et al.: Effective and secure transmission of health information using advanced morphological component analysis and image hiding. In: Gupta, M., Ghatak, S., Gupta, A., Mukherjee, A.L. (eds.) Artificial Intelligence on Medical Data. Lecture Notes in Computational Vision and Biomechanics, vol. 37. Springer, Singapore (2023). https://doi.org/10.1007/978981-19-0151-5_19 33. Pathania, V., et al.: A database application for monitoring COVID-19 in India. In: Gupta, M., Ghatak, S., Gupta, A., Mukherjee, A.L. (eds.) Artificial Intelligence on Medical Data. Lecture Notes in Computational Vision and Biomechanics, vol. 37. Springer, Singapore (2023). https:// doi.org/10.1007/978-981-19-0151-5_23 34. Sharma, S., Rattan, R., Goyal, B., Dogra, A., Anand, R.: Microscopic and ultrasonic superresolution for accurate diagnosis and treatment planning. In: Communication, Software, and Networks: Proceedings of INDIA 2022, pp. 601–611. Springer Nature Singapore, Singapore (2022) 35. Kaur, J., Sabharwal, S., Dogra, A., Goyal, B., Anand, R.: Single image dehazing with dark channel prior. In: 2021 9th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), pp. 1–5. IEEE (2021) 36. Veeraiah, V., Khan, H., Kumar, A., Ahamad, S., Mahajan, A., Gupta A.: Integration of PSO and deep learning for trend analysis of meta-verse. In: 2022 2nd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), pp. 713–718 (2022). https://doi.org/10.1109/ICACITE53722.2022.9823883 37. Raghavan, R., Verma, D.C., Pandey, D., Anand, R., Pandey, B.K., Singh, H.: Optimized building extraction from high-resolution satellite imagery using deep learning. Multimed. Tools Appl. 81(29), 42309–42323 (2022) 38. Anand, R., Sindhwani, N., Dahiya, A.: Design of a high directivity slotted fractal antenna for Cband, X-band and Ku-band applications. In: 2022 9th International Conference on Computing for Sustainable Global Development (INDIACom), pp. 727–730. IEEE (2022) 39. Anand, R., Arora, S., Sindhwani, N.: A miniaturized UWB antenna for high speed applications. In: 2022 International Conference on Computing, Communication and Power Technology (IC3P), pp. 264–267. IEEE (2022)
Chapter 46
A Novel LRKS-WSQoS Model for Web Service Quality Estimation Using Machine Learning-Based Linear Regression and Kappa Methods K. Prakash and Kalaiarasan
Abstract Web services are used as the building blocks for IT applications. There are several competing elements to consider when choosing a reliable Internet company. The goal is to divide potential services into several groups based on end users’ preferences and taking into account each service’s distinctive qualities. Our method will look at service characters in terms of highest quality using the kappa statistics value. The kappa statistic methodology is the most effective machine learning technique for evaluating service quality while taking into account a variety of quality attribute values, commonly known as QoS characteristics. For each and every service, our method calculates the classification accuracy score. The kappa static value (KV), a parametric value obtained using a non-leaner model, is used to evaluate a service’s performance using a logistic regression-based accuracy model. The accuracy of each service’s balance is then assessed.
46.1 Introduction The recommendation is helpful in addressing this issue. However, service archives cannot validate the validity of the quality assurance claims made by online web services. As the software development life cycle process progresses from the requirements gathering phase to the deployment phase, the risk factor rises. So, it is now much more crucial to estimate web service QoS values in order to choose the best services. Software developers and services utilize extensible markup language (XML) to create public client interfaces that can only be recognized by Universal
K. Prakash (B) · Kalaiarasan School of Computer Science and Engineering, Presidency University, Bengaluru, India e-mail: [email protected] Kalaiarasan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_46
565
566
K. Prakash and Kalaiarasan
Resource Identifiers (URIs). other client programmers’ use of XML-based communication methods, including The coupling of several services using HTTP and SOA is made possible by the usage of recognized protocols, such as SOAP and WSDL, for instance. The services that can be used as dynamic, reconfigurable web services have a number of restrictions. These limitations are based on the availability, responsiveness, support for dynamic configuration, and failure management requirements. We present a method for determining the service talents that are truly required to deliver a dynamic online service in our study. The types of services that can be employed dynamically depend on the fact that fewer clients actually use the services that the service providers give. The Dynamic Web Service Selection Framework’s recommendation algorithm suggests the best service to meet the user’s demands. Users are encouraged to rate and evaluate web services as they use them in order to assist one another in finding a better online service. This is crucial when various Internet service providers offer the same functionality with varying degrees of service quality. We give the user a statistic to aid in rating an online service. Along with the runtime behaviour of a web service, the predicted QoS metrics, such as the maximum execution time, average execution time, maximum response time, average response time, and so forth, will be compared in a comparison matrix. Web services with higher service quality will receive a higher grade than those with poorer service quality and the same functionality. This approach examines the group of online services that the target user has evaluated to ascertain how comparable a group of online services is to a web service for which a user rating is to be projected. By averaging the target user’s ratings of each of the identified comparable online services, the projection is produced. The service prediction generation and service similarity computation phases make up the item-based collaborative filtering approach. Service recommendation technology is crucial for the personalization of intelligent services. The services offered must satisfy all needs, both functional and nonfunctional. QoS-based service models were created as a result. Customers must undoubtedly be connected with useful services that are based on QoS and offer sage service recommendations. The vast majority of service recommendation systems ignore the service-user correlation and variable QoS levels in favour of anticipating and offering recommendations based on data from user interactions. In this paper, we present a novel service recommendation technique. Acts of service are a part of many aspects of our life. Consumers are increasingly likely to make purchases online, such as reservations for hotels and movie tickets. Web services are consequently becoming much more widespread. A web service is a self-describing, self-adapting, modular, and highly interoperable application module [1]. More corporate software and apps have lately been created to enable web services due to the aforementioned benefits. As we have learned more and more about the effects of various manufacturing processes, services have developed into a crucial trend for industrial success. Businesses utilize a range of technology and load-testing approaches (IT). Jmeter and Tsung have been mentioned as load-testing tools [2]. The second Apache product is likewise available through the GitHub repository, despite the first being opensource [3]. Due to the availability of both paid and free solutions that make running
46 A Novel LRKS-WSQoS Model for Web Service Quality Estimation …
567
web service test scripts easier, performance testing has grown in popularity and become a crucial part of the SDLC. The fact that the suggested alternatives and the existing benchmark tools only provide information on output measures is problematic. It is now impractical to evaluate the efficacy of open-source performance testing tools and existing load-testing techniques using a comprehensive statistic. In the current study, we developed a load-testing technique and an APM measure to identify performance-related issues as a result of the increasing workload of web service users. The first outcome is a recommendation for a web service performance testing plan. A suggestion for a measurement tool for web service performance testing methods is the second outcome. Due to the constraints of the underlying message and transport protocols, web services may encounter performance bottlenecks. Nonetheless, because they rely on widely used protocols like HTTP and SOAP, they continuously put the user under strain. As a result, it is essential to comprehend how these restrictions function. HTTP is a best-effort delivery service. The two serious problems described below are typically caused by this stateless data-forwarding mechanism: No assurance can be given that packages will arrive at their intended location. It is not guaranteed that the packages will arrive in a specific order. Lack of bandwidth simply resulted in the packets being discarded. There is little doubt that the network’s bandwidth is a concern as more people and data use it. Several programmes operate under the assumption that there would be virtually no latency and limitless bandwidth. The use of synchronized messaging is common in apps. Synchronized messaging functions well when executing an application on your own PCs because components link with latencies of just a few microseconds. On the other hand, latencies for web services are measured in the tens, hundreds, or even thousands of milliseconds because they interact over the Internet.
46.2 Literature Review Lack of bandwidth simply resulted in the packets being discarded. There is little doubt that the network’s bandwidth is a concern as more people and data use it. Several programmes operate under the assumption that there would be virtually no latency and limitless bandwidth. Synchronous messaging is widely used while running programmes on personal computers because it enables components to communicate with latencies as low as a few microseconds. On the other hand, latencies for web services are measured in the tens, hundreds, or even thousands of milliseconds because they interact over the Internet. In [2, 4], the Q-learning method is used to tackle the services composition problem in a dynamic environment. These approaches need that services be started in order to gather the QoS characteristic values and that they be updated to take environmental changes into account in order to implement the appropriate workflow for carrying out
568
K. Prakash and Kalaiarasan
the required service invocation actions. According to Wang et al. research, [4] maximizing anticipated cumulative rewards effectively satisfies the user’s QoS needs. As an alternative, the Wang et al. methods in [2] used a normalized reward function that took into account the users’ known preferences for the features and QoS qualities. Two multi-objective MDP techniques are advised under [5] for managing the composition of services under ambiguous and unanticipated conditions. Unlike the second strategy, which works well when there are several policies, the first approach deals with situations where there is only one policy and multiple objectives. In the first technique, the user’s selections are predetermined, but in the second, the set of all appropriate service compositions is represented by a convex hull of extreme services. In the first technique, the user’s selections are predetermined, but in the second, the set of all appropriate service compositions is represented by a convex hull of extreme services. A crucial strategy for creating dependable systems is software fault tolerance [3, 5]. Software fault tolerance, commonly referred to as design diversity, can be accomplished by using independently generated, functionally identical components. Due to the high cost of producing redundant software components, software fault tolerance is often only used for critical systems. A number of web services that are all functionally equivalent have already been developed and made available online by numerous service computing companies. To create service-oriented systems that are fault-tolerant, these web services could be packaged as replacement components. The QoS of the alternative service candidates must be considered in the best fault tolerance solutions for service-oriented systems in order to enhance system performance. In a recently published article, Rhmann et al. [6] claimed that the J48 classifier outperformed the other classifier in fault prediction. Recent research by Hasnain et al. [7] highlighted the significance of numerous quality parameters. They identified the critical parameters that had the biggest impact on the choice of web services datasets. The most important quality indicators that had an impact were throughput and reaction time, as well as the discussion of the preliminary categorization research in relation to the final confusion matrix. In the literature, some important QoS prediction techniques have also been proposed. The four parameters of the confusion matrix were used by Polat et al.
46.3 Implementation and Methodology Figure 46.1 shows how the architecture design enables the building of a more realistic service by offering a more exact mechanism for identifying a web service based on a kappa statistic value. The suggested design will assist in choosing the appropriate online service for a particular targeted user metric. The dataset of various online services is the focus of architectural techniques and procedures, where all attribute values are preprocessed, and various quality values of each service are segregated. A number of elements are indicated in the architecture diagram that determine
46 A Novel LRKS-WSQoS Model for Web Service Quality Estimation …
569
Fig. 46.1 Proposed accuracy model
whether a service is appropriate, but the most dependable factor for determining the characteristics and quality of a web service is its tangibility. The web service data collection includes a number of service quality metrics specific to a particular online service. These statistics were gathered from several sources. Polynomial logistic regression is a technique that can be used to expand the traditional use of logistic regression to the construction of nonlinear decision boundaries. We’ll demonstrate in the model that the solution still presents a nonlinear optimization difficulty for parameter estimates even after linear constraints are put in place for these issues. Finding the class from which the new data point would have come is the method’s main goal in light of the new data point. This system of classification is applied in a number of situations and is efficient for a wide range of issues. The well-known regularization method can be used to get rid of overfittings that are occasionally caused by unconditional constraints in datasets. Use a variety of filters to reduce the amount of inaccurate, partial, and null data before verifying the accuracy of the data. The accuracy metric has been used by the majority of studies [8, 9] to evaluate the classification accuracy of classifiers. Equations (46.1) and (46.2) show how the accuracy metric is computed using measurements from the confusion matrix (46.2). Total number of events that were anticipated correctly. Studies [8, 9] have largely employed the accuracy metric to determine the classification accuracy of classifiers. According to the following Eqs. (46.1) and (46.2), the confusion matrix measurements are used in the accuracy metric (46.2). Accuracy =
Total no. of corrected predictions Total no. of predictions
(46.1)
570
K. Prakash and Kalaiarasan
OR Accuracy =
T.N + T.P . T.P + T.N + F.P + F.N
(46.2)
We have suggested that the performance of service (PS) approach employs measurements of the confusion matrix in a manner similar to Eq. (46.2). We recommended using the confusion matrix to determine the performance of the service (PS), which is displayed in equation as a percentage score (46.3). Based on the classification results, we were able to estimate the rank for each individual online service. The PS prediction demonstrated how performance cases caused by web service requests may be correctly categorized. PS% =
T.P ∗ 100. T.P + F.P + T.N + F.N
(46.3)
How to calculate the PS per cent service instance is shown in Eq. (46.3). Similar to Silva-Palacios’ work, we were able to connect the classes from the confusion matrix. A confusion matrix is used to show how challenging it is for a classifier to distinguish between various classes. Instead of using the four confusion matrix measurements directly, we accurately projected service instances of them in order to learn as much as possible about the instances of web service performances. According to equation, the PS% was comparable to a precise estimation of performance occurrences (46.3). To improve the accuracy of binary classifiers on the numerical dataset, we preprocessed the data from a few carefully chosen websites. The first step of the suggested technique was to preprocess the data that was obtained from the web services dataset. We used the min maximization normalization approach to normalize the data, as shown in the following equation. A publicly available QoS dataset was used to assess the proposed method. The results showed that the suggested model maintained a higher prediction accuracy than the models already in use. Further research is required to improve the performance accuracy of the selected models. Pi =
Q i − min(q) . max(q) − min(q)
(46.4)
Here, Pi stands for the value of a particular quality attribute, while max(q) and min(q) stand for the highest and lowest values for each of the quality characteristics mentioned. Excel files containing normalized data were saved as .csv and afterwards utilized to binary classify web service instances. In the literature, a number of normalizing techniques have been suggested. As was said in the first approach, the two most widely used techniques are min–max and x-score normalization. Min–max is used to normalize features with values between 0 and 1, as in Eq. (46.4). The min–max normalization approach aids in maintaining the relationship between the ordinal input data normalization methods based on mean and standard deviations of the data do not exhibit consistent performance since values of these measures
46 A Novel LRKS-WSQoS Model for Web Service Quality Estimation …
571
fluctuate over time. It is more acceptable to apply min–max normalization since the values of both characteristics (throughput and reaction time) are based on historical data and do not vary over time. We tested the inter-rater reliability or agreement between the expected and actual occurrences of web services using kappa statistics after accurately classifying the data. Making the kappa static value calculation statistic for comparing accuracy between observed accuracy and predicted accuracy is the kappa static value. Kappa static value formula in the process of calculating the Kappa Static Value =
(observed service accuracy − expected service accuracy) . (1 − expected service accuracy)
Here observed service accuracy (OSA) is measured in terms of OSA =
( p + s) . N
Similarly expected service accuracy (ESA) ESA =
( p + r )( p + q) + (q + s)(r + s) . N
The kappa static value (KV) is obtained by considering the factors like OSA, ESA KV =
( p+s) N
1−
(( p+r ) ( p+q)+(q+s) (r +s)) N (( p+r ) ( p+q)+(q+s) (r +s)) N
−
.
Here p = TP: (Proper identification of Positive values) q = FP: (Improper identification of Positive values) r = FN: (Improper identification of Negative values) s = TN: (Proper identification of Negative values) N—Number of predictions or Observations. We can calculate the balance accuracy since balanced accuracy is a parameter that measures how well a classification model performs. This statistic is especially helpful when there is an imbalance between the two classes, meaning one class appears substantially more frequently than the other. The logistic regression model performs a fairly excellent job of forecasting, as indicated by the balanced accuracy, which is relatively high. The closer the balanced accuracy is to 1, the better the model is able to accurately categorize observations. Balanced accuracy of a service = ((Sensitivity of Service + Specificity of Service)/2),
572
K. Prakash and Kalaiarasan
where Sensitivity: The proportion of positive cases the model is able to identify, or the “real positive rate”. Specificity: The proportion of negative cases that the model is able to identify, or the “real negative rate”.
46.3.1 Sensitivity and Specificity Calculation The sensitivity and specificity analysis is performed to determine the factors that have a greater impact on availability. Thus in order to maximize availability, it is necessary to identify the components that are of the most concern [10]. The suggested WMR method’s ranking of quality measures was subjected to a sensitivity analysis. We establish the WMR score and then prioritize quality measures based on that value. The WMR score’s value will rise in proportion to the quality metrics in datasets. Prior to our research, Ibrahim [11] and Li et al. [12] conducted a sensitivity study of QoS measures to examine their influence on the decision findings. However, for sensitivity analysis, we adhere to the procedures used in [13]. Measuring sensitivity and specificity can be done in the following strategy: T.P T.P + FN T.N . Specificity of Service = F.P + T.N
Sensitivity of Service =
We use R, the greatest statically based programming language, to calculate these metrics since it makes the process of developing a logistic model simple and straightforward. The real-world dataset values were used to create a glm model, which was then utilized as input for confusion matrix, one of the most recognized machine learning classifiers. Figure 46.2 illustrates how confusion matrix creates the most realistic metrics for various outcome values. Kappa statistic values ranged from 0 to 1. A kappa statistics value of 0.4 suggested a very low similarity, a value of 0.55–0.70 indicated a reasonable similarity, a value of 0.70–0.85 an exceptionally high similarity, and a value of > 0.86 indicated a perfect match between predicted and real web service instances. displays the dataset’s average kappa statistics for the binary categorization of the users’ invoked occurrences. After ranking the online services, we must assess the suggested strategy. So, in order to verify the accuracy of the kappa coefficient, we used the data from the online services. From each classifier, the kappa coefficient was calculated. Another quality indicator used to assess the accuracy of online services is balanced accuracy, which is calculated based on the value factor like total of sensitivity of service with other similar factors obtained as specificity.
46 A Novel LRKS-WSQoS Model for Web Service Quality Estimation …
573
Fig. 46.2 Process of kappa static values using confusion matrix in R
According to Ben-David and Frank, kappa statistics show the classifier’s good prediction performance in the binary classification test; thus, we utilized those statistics to test our suggested technique. The categorization that happens by pure chance is not disregarded by kappa statistics. Web service instances are categorized, and a high kappa statistic value shows that the grouping of examples is not random. As a result, kappa statistics demonstrate a classifier’s highest capacity to classify. In relation to the web service datasets, we calculated the average kappa value for each classifier. Kappa statistics were used to investigate the inter-rater reliability or agreement between the predicted and actual occurrences of online services. For the WS3–WS5 datasets, neither classifier significantly differed in accuracy performance. Classifiers performed better than predicted due to their ability to capture the categorization of web services instances in each web service dataset. The chosen web service dataset’s average kappa statistics for the binary categorization of the users’ invoked instances are shown in Fig. 46.3. After ranking the online services, we must assess the suggested strategy. So, in order to verify the accuracy of the kappa coefficient, we used the data from the online services. From each classifier, the kappa coefficient was calculated.
574
K. Prakash and Kalaiarasan
Accuracy Curve
Fig. 46.3 Kappa static value representation for different services
46.4 Conclusion It could be challenging to ensure client satisfaction if a service’s accuracy evaluation is based on performance factors. In the service classification scenario, logistic regression will be utilized to propose evaluating the service’s results in terms of linear decision boundaries. This is the best classification technique since it can construct linear boundaries with the best classification model while taking into consideration a few probability interpretations and the fact that each end user has a different assessment of how happy they are, which is also a challenging metric. The projected linear boundaries are taken into account in the kappa statistics approach to produce the kappa value. The best classifier for categorizing data is this one. The kappa value for the service’s expected and actual accuracy values is used to analyse the intravariation among performance components. According to the static kappa value, the accuracy was between 0 and 1. Last but not least, the kappa value exhibits very good accuracy distribution over the spectrum of services.
References 1. Apte, V., Viswanath, T., Gawali, D., Kommireddy, A., Gupta, A.: AutoPerf: automated load testing and resource usage profiling of multitier internet applications. In: Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering, pp. 115–126. ACM (2017) 2. Wang, H., Zhou, X., Zhou, X., Liu, W., Li, W., Bouguettaya, A.: Adaptive Service Composition Based on Reinforcement Learning, pp. 92–107. Springer, Berlin, Germany (2010)
46 A Novel LRKS-WSQoS Model for Web Service Quality Estimation …
575
3. Rhmann, W., Pandey, B., Ansari, G., Pandey, D.K.: Software fault prediction based on change metrics using hybrid algorithms: an empirical study. J. King Saud Univ. Comput. Inf. Sci. 32(4), 419–424 (2020) 4. Ren, L., Wang, W., Xu, H.: A reinforcement learning method for constraint-satisfied services composition. IEEE Transactions on Services Computing, vol. 1. IEEE Computer Society, Los Alamitos, CA, USA (2017) 5. Mostafa, A., Zhang, M.: Multi-objective service composition in uncertain environments. IEEE Trans. Serv. Comput. (2015) 6. Hasnain, M., Pasha, M.F., Ghani, I., Mehboob, B., Imran, M., Ali, A.: Benchmark dataset selection of web services technologies: a factor analysis. IEEE Access 8, 53649–53665 (2020) 7. Lopes, F., Agnelo, J., Teixeira, C.A., Laranjeiro, N., Bernardino, J.: Automating orthogonal defect classication using machine learning algorithms. Future Gener. Comput. Syst. 102, 932947 (2020) 8. Singh, D., Singh, B.: Investigating the impact of data normalization on classification performance. Appl. Soft Comput. 105524 (2019) 9. Ben-David, A., Frank, E.: Accuracy of machine learning models versus ‘hand crafted’ expert systems—a credit scoring case study. Expert Syst. Appl. 36(3), 5264–5271 (2009) 10. Dantas, J., Matos, R., Araujo, J., Oliveira, D., Oliveira, A., Maciel, P.: Hierarchical model and sensitivity analysis for a cloud-based VoD streaming service. In: Proceedings of the 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W), Jun 2016, pp. 10–16 11. Ibrahim, A.A.Z.A.: PRESENCE: a framework for monitoring, modelling and evaluating the performance of cloud SaaS web services. In: Proceedings of the 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W), Jun 2018, pp. 83–86 12. Li, L., Liu, M., Shen, W., Cheng, G.: Recommending mobile services with trustworthy QoS and dynamic user preferences via FAHP and ordinal utility function. IEEE Trans. Mob. Comput. 19(2), 419–431 (2020) 13. Ouadah, A., Hadjali, A., Nader, F., Benouaret, K.: SEFAP: an efficient approach for ranking skyline web services. J. Ambient Intell. Human. Comput. 10(2), 709–725 (2019)
Chapter 47
Deep Learning-Based Optimised CNN Model for Early Detection and Classification of Potato Leaf Disease R. Chinnaiyan, Ganesh Prasad, G. Sabarmathi, Swarnamugi, S. Balachandar, and R. Divya
Abstract After rice and wheat, potatoes are the third-largest crop grown for human use worldwide. The different illnesses that can harm a potato plant and lower the quality and quantity of the yield cause potato growers to suffer significant financial losses every year. While determining the presence of illnesses in potato plants, consider the state of the leaves. Early blight and late blight are two prevalent illnesses. A certain fungus causes early blight, while a specific bacterium causes late blight. Farmers can avoid waste and financial loss if they can identify these diseases early and treat them successfully. Three different types of data were used in this study’s identification technique: healthy leaves, early blight, and late blight. In this study, I created a convolutional neural network (CNN) architecture-based system that employs deep learning to categorise the two illnesses in potato plants based on leaf conditions. The results of this experiment demonstrate that CNN outperforms every task currently being performed in the potato processing facility, which needed 32 batch sizes and 50 epochs to obtain an accuracy of about 98%.
R. Chinnaiyan (B) Department of CSE , Alliance College of Engineering and Design, Alliance University, Bengaluru, India e-mail: [email protected] G. Prasad School of CSE, Presidency University, Bengaluru, India G. Sabarmathi School of Business and Management, CHRIST UNIVERSITY (Deemed to be University), Bengaluru, India Swarnamugi Department of CS, Jyoti Nivas College, Bengaluru, India S. Balachandar · R. Divya CMR Institute of Technology, Bengaluru, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_47
577
578
R. Chinnaiyan et al.
47.1 Introduction In many civilizations, potatoes are a staple cuisine that is well known throughout the world. Despite the fact that there are numerous other professions, farming is by far the most common. As is well known, India has a mostly agrarian economy that produces a wide variety of goods, with potatoes playing a significant role [1]. India is the world’s second-largest producer of potatoes. In 2020, India produced 51 million tonnes or so of potatoes. According to the Agricultural and Processed Food Products Export Development Authority (APEDA), Uttar Pradesh is India’s top producer and exporter of processed food products. Top producer of potatoes [1], making up about 30.33% of overall production. Potassium, fibre, and vitamins are abundant in potatoes (especially C and B6). It assists in the treatment of conditions like cancer, heart disease, and high blood pressure by lowering the total cholesterol levels in the blood. Yet, the truth is that significant potato leaf diseases like early blight, late blight, and others have caused the export and production to decline during the past few years. Also, this has a bad effect on farmers. The leaves of a potato diseases affect both plants and agricultural soils [2]. These ailments are caused by microorganisms, genetic abnormalities, and infectious agents such as bacteria, fungus, and viruses. Fungi and bacteria are the principal causes of illnesses in potato leaves. Soft rot and common scab are bacterial illnesses, whereas late and early blight are caused by fungi. We are therefore motivated to develop an automated system that, by identifying and diagnosing these diseases on such important plants, may boost agricultural output, increase farmer profit, and significantly strengthen the nation’s economy. Farmers will benefit greatly from the ability to discern between these illnesses and potato leaves using the CNN algorithm [3, 4]. Three different sorts of processed photographs are available. There are three types: early, healthy, and late (Fig. 47.1). Two portions, one for testing and the other for training, make up the whole collection of images. The suggested technique would tell healthy potato leaves from sick potato leaves. We are employing a confusion matrix [5] to characterise the performance of a classification model [6] in order to assist farmers in easily identifying these diseases and using the appropriate fertilisers to remove these blights in order to easily enhance the development and production momentum of the plant.
47.2 Literature Survey The suggested methodology was used R-CNN [6], deep learning, in a work titled “Detecting the contaminated area coupled with disease using deep learning [7] in tomato plant leaves”. The advantage of this is that we can enhance disease detection in plant leaves by employing the R-CNN [8] mask. These are a few illustrations of early investigations on the detection of potato leaf disease. Its drawback is that it takes longer to process. “Potato Disease Detection Using Machine Learning [6]” is
47 Deep Learning-Based Optimised CNN Model for Early Detection …
579
Fig. 47.1 Potato leaves
the name of a scientific paper. In this instance, the technique is image processing. The use of the CNN [9] model, which had a 90% validation accuracy rate, was one of the project’s primary benefits. The need for a large training model is this strategy’s main disadvantage. As per the literature review , the authors presented [10–14] novel machine learning and deep learning models for prediction of diseases in plants and health care which provides better accuracy. Krishna Mitra’s proposed a novel model, “Using Machine Learning to Detect Diseases in Plants,” They employed CNN’s TensorFlow Framework in this instance. The benefit of this approach was that it could identify sugarcane problems brought on by fungi and that just leaf area was counted. It has the disadvantage of being challenging to implement because of its High Computational Complexity.
47.3 Dataset Collection The Plant Village Dataset [15] is accessible for research on Kaggle, an open-source repository. Pictures of potato leaves classified into three categories—healthy leaves, early blight, and late blight—make up the dataset used for this study. The “Plant Village Dataset” dataset was taken from the Kaggle website. Table 47.1 displays the amount of data used in each. In 2152 potato leaf images, we have used 1722 images to train our model and 430 images for validation. The training and validation images bar graph are shown in Fig. 47.2.
580 Table 47.1 Datasets
R. Chinnaiyan et al.
Samples
Numbers
Healthy leaf
152
Early blight
1000
Late blight
1000
Total
2152
Fig. 47.2 Bar graph of training and validation images
47.4 Data Preprocessing Currently, 2152 potato leaf photos are being used, and they have been sorted into three groups: those with healthy leaves, those with early blight, and those with late blight. Prior to just increasing the quantity of data, we first supplement the data. Each training sample is produced in multiple plausible forms by a procedure called data augmentation [18] in order to fictitiously increase the training dataset. Overfitting is reduced as a result. All of the generated photographs will be added to the training set once each image in the training set has been slightly shifted, rotated, and scaled by various percentages. Because of this, the model is better able to take into account changes in the object’s size, orientation, and location within the image. The images have contrast and lighting options that can be changed. The photos can be rotated both vertically and horizontally. By incorporating all the modifications, we can increase the size of our training set. Then, we make batches with a total of 50 epochs, 3 channels, and 32 images apiece. To determine which is favoured for splitting in the proportion of data, the results of each of these data sharing methods will be compared to the results of its accuracy.
47 Deep Learning-Based Optimised CNN Model for Early Detection …
581
47.5 Classification and Model Building Model and categorization construction using convolutional neural network (CNN) architecture, we must classify photos after the aforementioned processes. CNN is a supervised learning technique that recognises images by using a training dataset and a focus on the characteristics of the images. The convolutional layer of the CNN aids in the neural network’s recognition of potato leaves based on the characteristics of the leaves. The neural network recognises images of potato leaves using pixels from the image. Red, green, and blue channels, each with a resolution of 2562 × 256 pixels, will be used in this project (RGB). A filter will first be used to warp the leaf image. Then, Pooling will be used to reduce the image’s resolution while maintaining its quality. The generated image will be processed using MaxPooling. The following stage involves flattening these layers. The feature map created by pooling will then be converted into a vector format as a result. The output layer utilises the SoftMax activation function, whereas the hidden layers use the nonlinear activation function ReLu. Figure 47.3 depicts the suggested CNN architecture that was used in this experiment to detect illnesses in potato leaves. It consists of six convolutional layers and six MaxPooling layers. The core component of CNN is the convolution layer. It carries the majority of the computational load on the network. An activation map, a two-dimensional (2D) representation of the image, shows the response of the kernel at each spatial position of the image. The following formula can be used to determine the size of the output volume if we have an input of size W × W × D and D output number of kernels with a spatial dimension of F with stride S and amount of padding P. The representation between the input and the output is mapped with the aid of the totally linked layer. It was carried out using the following equation: Wout =
(W − F + 2P) . S+1
Fig. 47.3 Architecture of the proposed model
582
R. Chinnaiyan et al.
The fully connected layer helps to map the representation between the input and the output. It was performed by using the equation: Z 1 = W1 ∗ h l−1 . The sequential model is created using CNN. We have adjusted the linear units for this. Moreover, SoftMax is utilised as a forecasting activation based on maximum likelihood. Below is the equation for the SoftMax function: T
Softmax (Z ) =
eX . k e X T
Here, the SoftMax of z signifies X and T ’s internal product. The input width (nm ) and height (nh ) of the first convolutional layer are 256 and 256, respectively. Additionally, the f m , f h , and fc represent the width, height, and channels of the kernel filter of the convolutional layer. The dimension of the MaxPooling layer output was calculated using the equation: Dimension(Conv(n, k)) =
nh − fh nm − fm +1 , + 1 , fc . s s
In comparison with sigmoid and tanh, ReLu is more reliable and accelerates the convergence by six times. The ReLu activation function was performed by using the equation: ReLu(x) = Max(0, x). The summary of the total number of training parameters and non-trainable parameters is shown in Fig. 47.4. Using hyperparameter tweaking methods, the ideal hyperparameter values were found after building the 14-DCNN model. The 14-DCNN’s optimizer function, mini-batch size, and dropout probability were determined using the random search and coarse-to-fine approaches. Adaptive moment estimate is the most used optimizer for hyperparameter searching (Adam). The most typical hyperparameters and their values for the 14-DCNN model for detecting plant leaf diseases are listed in Table 47.2.
47.6 Evaluation and Results Deep learning algorithms reduce crop yield losses in the identification of plant leaf diseases, which dramatically increases agricultural output and quality. Our work proposes a rapid and simple multi-level deep learning model for categorising potato leaf diseases. In order to classify early blight and late blight potato infections from
47 Deep Learning-Based Optimised CNN Model for Early Detection …
583
Fig. 47.4 Parameters Table 47.2 Common hyperparameters
Hyperparameter
Value
Batch sizes
32
Dropout value
0.2
Loss
Categorical cross-entropy
Optimizer
Adam
Activation function for conv layer
ReLu
584
R. Chinnaiyan et al.
those photographs, it then segregated the batches of potato leaves from the potato leaf photos. Afterwards, a secondary level convolutional neural network was built to identify potato leaf disease. It created a layer for resizing and normalisation, with a train size of 80% and test size of 20%, and divided the 20% test size into 10 and 10% validation and testing sections. At the third stage, we used a data augmentation method to lessen data overfitting and raise the model’s projected accuracy. In the fourth phase, the model was built utilising the Adam optimizer, 50 training iterations, metrics, and Sparse categorical cross-entropy for accuracy and losses. The findings were as follows: In the last stage of the model, after 50 epochs, we got 0.99 training accuracy and 0.97 validation accuracy with good performance and no overfitting. Overall around 98% accuracy for our test dataset was considered as pretty good accuracy and then plotted accuracy and loss curves of the training and validation dataset as show in Fig. 47.5 (Fig. 47.6). The confusion matrix is a summary of predictions made by the classification techniques. The confusion matrix of the classification technique represents the true positive (TP), true negative (TN), false positive (FP), and false negative (FN) values of every single class. The area under the receiver operating characteristic (AUCROC) curve is one of the popular metrics that are used to evaluate the performance of learning algorithms. The TPR and FPR are calculated using the below equations: TP TP + FN FP . FPR = FP + TN
TPR =
Fig. 47.5 Graph for training and validation over accuracy and loss curves
47 Deep Learning-Based Optimised CNN Model for Early Detection …
585
Fig. 47.6 Implementation of success identification of plant leaves
A confusion matrix is a table that is used to define the performance of a classification model. It visualises and summarises the performance of a classification model. Here, we plot a graph to show the analysis between true and predicted values (Fig. 47.7). Recall is the most significant performance evaluation method. The recall is calculated by dividing the number of results that were successfully identified by the number of results that were wrongly rejected. The percentage of actual positives that were Fig. 47.7 Confusion matrix
586
R. Chinnaiyan et al.
Table 47.3 Evaluation metrics of proposed methodology Category name
Precision (%)
Recall (%)
F1-score
Supplement
Late blight
98
100
99
204
Early blight
98
97
98
197
Healthy
96
93
95
29
Macro avg.
98
97
97
430
Weighted avg.
98
98
98
430
Accuracy %
97.3
97
98
430
successfully detected is calculated using recalls. One of the metrics frequently used to assess how effectively machine learning algorithms perform is the F1 score. The harmonic mean of recall and precision is used to calculate the F1 score. The F1 score value illustrates the prediction advantage of the categorization techniques. The weighted average precision, recall, and accuracy of the classification algorithms were computed using the following formulas. Classification accuracy = Precision = Recall = F1 Score = 2 ∗
TP + FN , TP + TN + FP + FN TP , TP + FP
TP , TP + FN
(precision ∗ recall) . (precision + recall)
The result got for the above evaluation techniques is given in Table 47.3 (Figs. 47.8 and 47.9).
47.7 Comparison of Results See Table 47.4 and Fig. 47.10.
47 Deep Learning-Based Optimised CNN Model for Early Detection …
Fig. 47.8 Graph of evaluation metrics
Fig. 47.9 Graph of different classification techniques
587
588
R. Chinnaiyan et al.
Table 47.4 Authors results S. No.
Model and authors
Data Sets
Classifier
Accuracy (%)
1
Deep Kothari et.al
2
Anushka Bangal et al
2022
900
CNN
97
2022
1150
CNN
3
91.41
Divyansh Tiwari et al
2020
2152
CNN
97.8
4
Harisha J and Mrs. Renuka Malge
2022
2152
CNN
97
5
Mr. Girish Athanikar and Ms. Priti Badar
2016
24
BPNN
92
6
Aditi Singh and Harjeet Kaur
2021
300
K-means
95
7
Abdul Jalil Rozaqi et al
2021
450
CNN
95
8
Md. Ashiqur Rahaman Nishad et al
2022
2580
K-means
97
9
Dr. Tejashree T. Moharekar et al
2022
2152
CNN
94.6
10
Our approach
2022
2152
DL CNN
98
Fig. 47.10 Bar graph for comparison of different author results
47.8 Conclusion In this study, we constructed a model to categorise potato leaf states such as early blight, late blight, and healthy, with a classification accuracy of nearly 98%, using deep learning techniques and convolution neural networks. (CNN). The use of data augmentation strengthens the model’s credibility. Our method can assist farmers in
47 Deep Learning-Based Optimised CNN Model for Early Detection …
589
boosting agricultural production and early disease detection. This kind of undertaking, in our opinion, will be extremely important for the agriculture industry. Most farmers in India lack basic literacy skills and have just a vague awareness of the illness. We believe that this programme could benefit Indian potato farmers. The suggested method is proven to successfully differentiate between three different potato leaf diseases.
References 1. Khalid Rayhan Asif, Md., Asfaqur Rahman, Md., Hasna Hena, M.: CNN based disease detection approach on potato leaves. In: 2020 Proceedings of the Third International Conference on Intelligent Sustainable Systems [ICISS 2020], pp. 428–432. IEEE (2020) 2. Suttapakti, U., Bunpeng, A.: Potato leaf disease classification based on distinct color and texture feature extraction. 2019 19th International Symposium on Communications and Information Technologies (ISCIT), pp. 82–85. IEEE (2019) 3. Dickson, M.A., Bausch, W.C.: Plant recognition using a neural network classifier with size and shape descriptors. Trans. ASAE 1, 97–102 (1997) 4. Sabarmathi, G., Chinnaiyan, R.: Mining patient health care service opinions for hospital recommendations. Int. J. Eng. Trends Technol. 69(9), 161–167 (2021) 5. Chinnaiyan, R., Alex, S.: Machine learning approaches for early diagnosis and prediction of fetal abnormalities. In: 2021 International Conference on Computer Communication and Informatics (ICCCI), pp. 1–3 (2021). https://doi.org/10.1109/ICCCI50826.2021.9402317 6. Sharma, P., Berwal, Y.P.S., Ghai, W.: KrishiMitr (Farmer’s Friend): using machine learning to identify diseases in plants. In: 2018 IEEE International Conference on Internet of Things and Intelligence System (IoTaIS), pp 29–34. IEEE (2018) 7. Hari Pranav, A., Senthilmurugan, M., Pradyumna Rahul, K., Chinnaiyan, R.: IoT and machine learning based peer to peer platform for crop growth and disease monitoring system using blockchain. In: 2021 International Conference on Computer Communication and Informatics (ICCCI), pp. 1–5 (2021) 8. Preetika, B., Latha, M., Senthilmurugan, M., Chinnaiyan, R.: MRI image based brain tumour segmentation using machine learning classifiers. In: 2021 International Conference on Computer Communication and Informatics (ICCCI), pp. 1–9 (2021) 9. Merchant, M., Paradkar, V., Khanna, M., Gokhale, S.: Mango leaf deficiency detection using digital image processing and machine learning. In: 2018 3rd International Conference for Convergence in Technology (I2CT), pp 1–3. IEEE (2018) 10. Sardogan, M., Tuncer, A., Ozen, Y.: Plant leaf disease detection and classification based on CNN with LVQ algorithm. In: 2018 3rd International Conference in Computer Science and Technology, pp. 382–385. IEEE (2018) 11. Sabarmathi, G., Chinnaiyan, R.: Reliable feature selection model for evaluating patient home health care services opinion mining systems. In: 2021 International Conference on Advancements in Electrical, Electronics, Communication, Computing and Automation (ICAECA), pp. 1–4 (2021). https://doi.org/10.1109/ICAECA52838.2021.9675485 12. Latha, M., Senthilmurugan, M., Chinnaiyan, R.: Brain tumor detection and classification using convolution neural network models. In: 2021 International Conference on Advancements in Electrical, Electronics, Communication, Computing and Automation (ICAECA), pp. 1–5 (2021) 13. Senthilmurugan, M., Latha, M., Chinnaiyan, R.: Analysis and prediction of tuberculosis using machine learning classifiers. In: 2021 International Conference on Advancements in Electrical, Electronics, Communication, Computing and Automation (ICAECA), pp. 1–4 (2021)
590
R. Chinnaiyan et al.
14. Chinnaiyan, R., Alex, S.: Optimized machine learning classifier for early prediction of fetal abnormalities. Int. J. Comput. Intell. Control 13(2) (2021) 15. Chinnaiyan, R., Alex, S.: Early analysis and prediction of fetal abnormalities using machine learning classifiers. In: 2021 2nd International Conference on Smart Electronics and Communication (ICOSEC), pp. 1764–1767 (2021)
Chapter 48
Detection of Parkinson’s Disease in Brain MRI Images Using Deep Learning Algorithms N. S. Kalyan Chakravarthy, Ch. Hima Bindu, S. Jafar Ali Ibrahim, Sukhminder Kaur, S. Suresh Kumar, K. Venkata Ratna Prabha, P. Ramesh, A. Ravi Raja, Chandini Nekkantti, and Sai Sree Bhavana
Abstract A neurological illness called Parkinson’s disease usually affects elderly persons. Dopamine is produced by neurons in the Substantia Nigra area and is depleted in Parkinson’s disease. Movement is impacted by this illness. This being an emerging situation. Early diagnosis is necessary for diseases. The Parkinson’s Progression Markers Initiative (PPMI) database was used to obtain T1-weighted MRI scans for 181 people with Parkinson’s disease and 80 healthy individuals. The CNN architecture is made simpler by the use of image registration. To identify between people with Parkinson’s disease (PD) and healthy people, the sequential model and ResNet50 were developed. The accuracy of ResNet50 is 86%, while the accuracy of the sequential model is 96%. We contrast the two models’ metrics.
N. S. Kalyan Chakravarthy (B) · Ch. Hima Bindu · S. Suresh Kumar QIS College of Engineering and Technology, Ongole, Andhra Pradesh, India e-mail: [email protected] S. Jafar Ali Ibrahim Department of IoT, School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu 632014, India e-mail: [email protected] S. Kaur Taylors University Lake Side Campus, 47500 Subang Jaya, Selangor, Malaysia e-mail: [email protected] K. Venkata Ratna Prabha · A. Ravi Raja · C. Nekkantti · S. S. Bhavana Department of ECE, Velagapudi Ramakrishna Siddhartha Engineering College, Kanuru, Andhra Pradesh, India P. Ramesh Dr Sudha and Nageswara Rao Siddhartha Institute of Dental Sciences, Gannavaram, Andhra Pradesh, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_48
591
592
N. S. Kalyan Chakravarthy et al.
48.1 Introduction Parkinson’s disease worsens less frequently because dopamine-releasing brain nerve cells are damaged [1]. Dopamine is a neurotransmitter that regulates muscular action, thus when levels are low, symptoms including tremors in the hands, legs, and jaw, tight muscles, shakiness and loss of coordination, and insomnia happen. This sickness may affect up to 10 million individuals, according to study. Also, research has indicated that each year, nearly 60,000 individuals receive diagnoses. Early therapy is essential for Parkinson’s disease because it is a degenerative, incurable condition that can eventually result in dementia [2]. MRI, PET, and SRECT scans can be used to diagnose Parkinson’s disease. Nonetheless, due to how reasonably priced it is, MRI is frequently employed. Parkinson’s disease can be detected with MRI, or magnetic resonance imaging, scans that help visualise the structure of the brain. Although though a 3D image of the brain’s architecture is produced by an MRI scan, Parkinson’s disease (PD) is challenging to spot with the untrained eye. Due to technology advancements that have been shown to be beneficial, analysis and disease detection are now conducted using computer-based approaches [3]. Since they come from various sources, the MRI images utilised as inputs have diverse spatial layouts. In order to register two images, they must be physically aligned. It is now easy to compare images. The Support Vector Machine (SVM) classifier and the Bayesian network classifier are two examples of algorithms that are based on the manually created features that are the root of Parkinson’s disease. Contrarily, CNN doesn’t require hand-crafted features. CNN autonomously finds the pattern after understanding the crucial information from the image, greatly speeding up the disease diagnosis process. VGGNet, RESNet, AlexNet, and other CNN architectures are only a few examples of those that can be applied [4, 5].
48.2 Related Work In this research [6], R. Parkinson’s disease was identified by Prashanth and colleagues using non-motor natured markers. Ocular movement and olfactory loss are some of these characteristics. Dopaminergic imaging and Cerebrospinal Fluid (CSF) measurement—these characteristics are supplemented with the use of biomarkers to improve detection. To compare performance, they used a variety of classifiers, including Random Forests, Support Vector Machine (SVM), Naive Bayesian, and Boosted Trees. The accuracy, sensitivity, and specificity of the classification made by the Support Vector Machine classifier were 96.04%, 97.03%, and 95.01% respectively. The goal of Lopez et al.’s paper [7] is to pinpoint the isosurface that causes Parkinson’s disease. Implementing isosurface reduces the 3D CNN’s complexity. Images for DaTScan are provided by the PPMI database. The accuracy rises to
48 Detection of Parkinson’s Disease in Brain MRI Images Using Deep …
593
95.1% when the LeNet and AlexNet CNN architectures are used with the scans’ isosurfaces. After analysing the continuous speech signals of persons with Parkinson’s disease, Orozco-Arroyave et al. categorised the data using artificial segmentation in their work [8]. They employed a technique that automatically divided frames into voiced and unvoiced frames from vocal sounds. Depending on the language, the accuracy ranges from 83 to 98%, and for crosslanguage trials, it ranges from 60 to 99%. Shinde et al. [9] developed diagnostic biomarkers utilising neuromelanin-sensitive magnetic resonance imaging to determine the pattern changes related to PD (NMSMRI). A computer-based strategy is created using CNN to discover biomarkers. This method is 80% more accurate than other models at identifying poor areas in photos using Neuromelanin contrast. The work on fMRI and MRI scans used as inputs to the CNN model created for the identification of Alzheimer’s disease is described in Sarraf et al. study’s [10]. Due to the same picture intensities and brain patterns in people over 75, it can be challenging to distinguish between an AD-affected brain and a normal brain using image analysis, functional MRI data, and Alzheimer’s disease forecasting. The outcome of the categorization is improved by the usage of pipelines, which have accuracy rates of 99.9% for fMRI pipelines and 98.84% for MRI pipelines. In this work [11], Kollia et al. suggested a technique for diagnosing Parkinson’s illness using information from a DNN. The model, which combines k-Nearest Neighbour classification, k-means clustering, and transfer learning of deep neural networks, uses MRI and DaTscan data as its input. To aid the DNN in adjusting to multiple data representations from varied clinical circumstances, an unique loss function is also planned. The acquired information is assessed in a medical environment. In this research [12], Billones et al. investigated the distinction between Alzheimer’s disease and moderate cognitive impairment, which is frequently mistaken for AD in its early stages. To categorise them, a CNN architecture based on structural MRI images was developed. A VGG-16 model is created for the classification of scans for normal, Alzheimer’s, and cognitive impairment. Furthermore discovered is the accuracy of binary classification, which was attained at 98.25%.
48.3 Proposed Model 48.3.1 Data Collection The PPMI database is queried for the necessary data. The Parkinson’s Disease Marker Initiative (PPMI) is a multidisciplinary, collaborative effort that gathers clinical and imaging data in an effort to identify and examine a number of biomarkers that are
594 Table 48.1 Specifications of downloaded MRI scans
N. S. Kalyan Chakravarthy et al.
Criteria
Value
Modality
MRI
Weighing
T1
Acquisition type
3D
Group
PD and Control
No. of images downloaded
Total = 261
Field strength
PD = 181, Control = 80 3.0 T
responsible for the onset and progression of Parkinson’s disease. The MRI scans that were discovered meet the specifications listed in Table 48.1 (Fig. 48.1).
Fig. 48.1 Flowchart of the paper
48 Detection of Parkinson’s Disease in Brain MRI Images Using Deep …
595
Fig. 48.2 Image before registration
48.3.2 Data Preprocessing The MRI images that were obtained from the database were from diverse sources and had distinct spatial layouts despite the notion that PPMI is a collaborative and multicenter project. By putting all of the photos in the same spatial layout, the image registration procedure enhances CNN’s feature detection. Figures 48.2 and 48.3 demonstrate the application of the 3D slicer software’s affine registration image registration approach on downloaded MRI images. With the free application 3D slicer, biological imaging is explored. The 88th slice of the registered 3D images is taken into consideration, saved in PNG format, and used as the input of the convolutional neural network. The shot’s critical area is the PD-affected zone, as depicted in Fig. 48.4. The Contour Snake model, an image segmentation method, is used to find it. The energy or intensity of the image serves as the foundation for this idea. In the location of lowest internal energy, a shape is created by connecting the “contour” dots.
48.3.3 Convolutional Neural Network The images’ features that are invisible to the unaided eye are classified, predicted, and identified using CNN. The CNN architecture will be more beneficial for medical treatments than others. The sequential model and ResNet50 are two distinct CNN architectures that are taken into consideration in this research. CNN receives the
596
N. S. Kalyan Chakravarthy et al.
Fig. 48.3 Image after registration
Fig. 48.4 Region of interest
downloaded MRI pictures as inputs. The remaining 80% of the downloaded images are utilised by CNN for training while just 20% are used for testing. Throughout the course of 50 iterations, the model is trained and tested. The MRI picture is projected to be impacted or normal using the model prediction technique, as shown in Fig. 48.5.
48 Detection of Parkinson’s Disease in Brain MRI Images Using Deep …
597
Fig. 48.5 Image before registration
Nine layers make up the sequential model in Fig. 48.6. They consist of three convolution layers, two pooling layers, flattening, and two dense layers, each with a filter size of 32, 64, or 128 and a relu activation function. The input image is 64643 pixels in size. ResNet50 in Fig. 48.7 makes use of both the pretrained ResNet model and the ImageNet weights. This model’s input dimensions are 2,402,403. The abbreviation for residual network is ResNet50. One 77 average pooling layer, one 33 maximum pooling layer, fifteen 33 convolution layers, thirty-two 11 convolution layers, and one 77 average pooling layer make up the pretrained model. These layers are connected to a completely connected layer.
Fig. 48.6 Sequential model architecture
598
N. S. Kalyan Chakravarthy et al.
Fig. 48.7 ResNet50 architecture
48.4 Performance Metrics To evaluate the model metrics such as Specificity, Sensitivity, Accuracy, Precision, F1-score are used. These metrics can be calculated using the following formulae. Consider W = True Positive (TP) X = True Negative (TN) Y = False Positive (FP) Z = False Negative (FN) Accuracy (%) = (W + X ) × 100/(W + Y + Z + X ) Specificity = X/(X + Y ) Sensitivity or Recall = W/(W + Z ) Precision = W/(W + Y ) F1 Score = 2 × (Recall × Precision)/(Recall + Precision) In the above TP means the images that are PD classified as PD, TN means the images that are of control are classified as control, FP means the images that are of control are classified as PD, FN means the images that are of PD are classified as control.
48 Detection of Parkinson’s Disease in Brain MRI Images Using Deep …
599
48.5 Results Two different 2D CNN models, the sequential model and ResNet50, are used to identify MRI scans with Parkinson’s disease. Following the image registration process, the MRI images are split in half, 80:20, into training and testing data. With training and testing set to 50 epochs each, photos are used to train and test the CNN models. Each epoch’s accuracy and loss values are recorded. Figures 48.8a, c show the training and testing accuracy values for the sequential model and ResNet50, respectively. The graph shows a blue line for training accuracy and a red line for testing accuracy. Figures 48.8b, d show the training and testing loss values for the sequential model and ResNet50, respectively. The graph shows a red line for testing loss and a blue line for training loss. Several measures are generated, including accuracy, sensitivity, specificity, precision, and F1-score, to analyse the performance of both models. These parameters are available for observation in Table 48.2. The sequential model outperforms ResNet50, which has an accuracy of 86%, with a 96% accuracy. The classification performance of the ResNet5 sequential model and the ResNet50 sequential model are contrasted in Table 48.2.
48.6 Conclusion A sequential model and ResNet50 are recommended in this study to distinguish between MRI scans of Parkinson’s disease patients and those of healthy individuals. PD sufferers’ and healthy people’s MRI scans are used to train the models. To make
600
N. S. Kalyan Chakravarthy et al.
Fig. 48.8 a Training and validation accuracy for sequential model, b training and validation loss for sequential model, c training and validation accuracy for ResNet50, d training and validation loss for ResNet50
48 Detection of Parkinson’s Disease in Brain MRI Images Using Deep … Table 48.2 Classification performance comparison of sequential model and ResNet5
601
Sequential model
ResNet50
Accuracy (%)
96
86
Sensitivity
0.97
0.90
Specificity
0.94
0.81
Precision
0.87
0.66
F1-score
0.93
0.80
the model simpler, preprocessing and photo registration are utilised. Once the classification is complete, it is decided if the MRI is affected or not. The accuracy of ResNet50 is 86%, while the accuracy of the sequential model is 96%. The purpose of the sequential model is to produce more accurate results.
References 1. Davie, C.A.: A review of Parkinson’s disease (2008). https://doi.org/10.1093/bmb/ldn013 2. Aich, S., Joo, M., Kim, H.-C., et.al,: Improvisation of classification performance based on feature optimization for differentiation of Parkinson’s disease from other neurological diseases using gait characteristics. IJECE 9 (2019) 3. Acton, P.D., Newberg, A.: Artificial neural network classifier for the diagnosis of Parkinson’s disease using [99mTc]TRODAT-1 and SPECT. Phys. Med. Biol. (2006). https://doi.org/10. 1088/0031-9155/51/12/004 4. Haller, S., Badoud, S., Nguyen, D., Barnaure, I., Montandon, M.L., Lovblad, K.O., Burkhard, P.R.: Differentiation between Parkinson disease and other forms of Parkinsonism using support vector machine analysis of susceptibility-weighted imaging (SWI): initial results. EurRadiol (2013). https://doi.org/10.1007/s00330-012-2579-y 5. Morales, D.A., Vives-Gilabert, Y., Bielza, C., et.al.: Predicting dementia development in Parkinson’s disease using Bayesian network classifiers. Psychiatry Res. (2013). https://doi. org/10.1016/j.pscychresns.2012.06.001 6. Sahay, S., Prashanth, R., Roy, S.D., Mandal, P.K., Ghosh, S.: High-accuracy detection of early Parkinson’s disease through multimodal features and machine learning. Int. J. Med. Inform. (2016). https://doi.org/10.1016/j.ijmedinf.2016.03.001 7. Andrés, O., Jorge, M., Manuel, M.-I., Górriz Juan, M., et.al.: Parkinson’s disease detection using isosurfaces-based features and convolutional neural networks. Front. Neuroinform. 13 (2019) 8. Arroyave, J.R., Daqrouq, K., Rusz, J., Nöth, E., et al.: Automatic detection of Parkinson’s disease in running speech spoken in three different languages (2016). https://doi.org/10.1121/ 1.4939739 9. Shinde, S., Saboo, Y., Prasad, S., et.al.: Predictive markers for Parkinson’s disease using deep neural nets on neuromelanin sensitive MRI (2019). https://doi.org/10.1016/j.nicl.2019.101748 10. Sarraf, S., Tofighi, G., Deep, A.D.: Alzheimer’s disease classification via deep CNN using MRI and fMRI (2016). https://doi.org/10.1101/070441 11. Kollia, I., Stafylopatis, A.-G., Kollias, S.: Predicting Parkinson’s disease using latent information extracted from deep neural networks (2019). arXiv:1901.07822 12. Billones, C.D., Earl, D., et.al.: DemNet: a convolutional neural network for the detection of Alzheimer’s disease and mild cognitive impairment. In: IEEE-TENCON Conference (2016). https://doi.org/10.1109/TENCON.2016.7848755
602
N. S. Kalyan Chakravarthy et al.
13. Long, D., Xuan, M., Kong, D., et al.: Automatic classification of early Parkinson’s disease with multi-modal MR imaging (2019). https://doi.org/10.1371/journal.pone.0047714 14. Jeyaselvi, M., Jayakumar, C., Sathya, M., Jafar Ali Ibrahim, S., Kalyan Chakravarthy, N.S.: Cyber security-based multikey management system in cloud environment. In: 2022 International Conference on Engineering and Emerging Technologies (ICEET), Kuala Lumpur, Malaysia, pp. 1–6 (2022). https://doi.org/10.1109/ICEET56468.2022.100071044. https://iee explore.ieee.org/abstract/document/10007104 15. Jafar Ali Ibrahim, S., Rajasekar, S., Kalyan Chakravarthy, N.S. , Varsha, Singh, M.P., Kumar, V., Saruchi: Synthesis, characterization of Ag/Tio2 nanocomposite: its anticancer and antibacterial and activities. Global Nest 24(2), 262–266 (2022). https://doi.org/10.30955/gnj.004 2505 16. Shanmugam, S., Jafar Ali Ibrahim, S., Mariappan, S., Varsha, S., Kalyan Chakravarthy, N.S., Kumar, V., Saruchi: Recent advances in analysis and detection of tuberculosis system in chest X-ray using artificial intelligence (AI) techniques: a review. Curr. Mater. Sci. 16(1) (2023). https://doi.org/10.2174/2666145415666220816163634 17. Jafar Ali Ibrahim, S., et al.: Rough set based on least dissimilarity normalized index for handling uncertainty during E-learners learning pattern recognition. Int. J. Intell. Networks 3, 133–137 (2022). https://doi.org/10.1016/j.ijin.2022.09.001. https://www.sciencedirect.com/science/art icle/pii/S2666603022000148 18. Ramprasath, J., Krishnaraj, K., Seethalakshmi, V.: Mitigation services on SDN for distributed denial of service and denial of service attacks using machine learning techniques. IETE J. Res. 1–12 (2022). https://doi.org/10.1080/03772063.2022.2142163 19. Balasamy, K., Krishnaraj, N., Vijayalakshmi, K.: Improving the security of medical image through neuro-fuzzy based ROI selection for reliable transmission. J. Multimedia Tools Appl. 81, 14321–14337 (2022) 20. Krishnaraj, N., Vidhya, R., Shankar, R., Shruthi, N.: Comparative study on various low code business process management platforms. In: 2022 International Conference on Inventive Computation Technologies (ICICT), Nepal, pp. 591–596 (2022). https://doi.org/10.1109/ ICICT54344.2022.9850581 21. Ramprasath, J., Seethalakshmi, V.: Secure access of resources in software-defined networks using dynamic access control list. Int. J. Commun. Syst. 34(1), e4607 (2020) 22. Ramprasath, J., Ramakrishnan, S., Saravana Perumal, P., Sivaprakasam, M., Manokaran Vishnuraj, U.: Secure network implementation using VLAN and ACL. Int. J. Adv. Eng. Res. Sci. 3(1):2349–6495 (2016) 23. Yin, X., Vignesh, C.C., Vadivel, T.: Motion capture and evaluation system of football special teaching in colleges and universities based on deep learning. Int. J. Syst. Assur. Eng. Manag. 13, 3092–3107 (2022). https://doi.org/10.1007/s13198-021-01557-2 24. Zang, H., Chandru Vignesh, C., Alfred Daniel, J.: Influence of social and environmental responsibility in energy efficiency management for smart city. J. Interconnection Networks 22(Supp 01), 2141002 (2022). https://doi.org/10.1142/S0219265921410024 25. Wu, K., Li, C., Chandru Vignesh, C., Alfred Daniel, J.: Digital teaching in the context of Chinese universities and their impact on students for ubiquitous applications. Comput. Electr. Eng. 100, 107951 (2022). https://doi.org/10.1016/j.compeleceng.2022.107951 26. Sreethar, S., Nandhagopal, N., Anbu Karuppusamy, S., Dharmalingam, M.: SARC: search and rescue optimization-based coding scheme for channel fault tolerance in wireless networks. Wirel. Networks 27(6), 3915–3926 (2021). https://doi.org/10.1007/s11276-021-02702-2 27. Sreethar, S., Nandhagopal, N., Karuppusamy, S.A., et al.: A group teaching optimization algorithm for priority-based resource allocation in wireless networks. Wirel. Pers. Commun. 123, 2449–2472 (2022). https://doi.org/10.1007/s11277-021-09249-7 28. Sabarmathi, G., Chinnaiyan, R.: Big data analytics research opportunities and challenges—a review. Int. J. Adv. Res. Comput. Sci. Software Eng. (IJARCSSE) 6(10) (2016). ISSN: 2277 128X 29. Sabarmathi, G., Chinnaiyan, R.: Reliable data mining tasks and techniques for industrial applications. In: IAETSD J. Adv. Res. Appl. Sci. 4(7) (2017). ISSN: 2394-844
48 Detection of Parkinson’s Disease in Brain MRI Images Using Deep …
603
30. Sabarmathi, G., Chinnaiyan, R.: Investigations on big data features research challenges and applications. In: International Conference on ‘Intelligent Computing and Control Systems (ICICCS), pp.782–786. IEEE Xplore (2018). ISBN: 978-1-5386-2745-7 31. Sabarmathi, G., Chinnaiyan, R.: Envisagation and analysis of mosquito borne fevers: a health monitoring system by envisagative computing using big data analytics. In: International conference on Computer Networks, Big Data and IoT (ICCBI 2018). Lecture Notes on Data Engineering and Communications Technologies, vol. 31, pp.630–636. Springer, Cham (2019). ISBN: 978-3-030-24643-3 32. Sabarmathi, G., Chinnaiyan, R.: Reliable machine learning approach to predict patient satisfaction for optimal decision making and quality health care. In: 2019 International Conference on Communication and Electronics Systems (ICCES), Coimbatore, India, pp. 1489–1493 (2019). https://doi.org/10.1109/ICCES45898.2019.9002593 33. Sabarmathi, G., Chinnaiyan, R.: Big data analytics framework for opinion mining of patient health care experience. In: 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, pp. 352–357 (2020). https://doi.org/10.1109/ICC MC48092.2020.ICCMC0066 34. Sabarmathi, G., Chinnaiyan, R.: Mining patient health care service opinions for hospital recommendations. Int. J. Eng. Trends Technol. 69(9), 161–167 (2021) 35. Sabarmathi, G., Chinnaiyan, R.: Sentiment analysis for evaluating the patient medicine satisfaction. Int. J. Comput. Intell. Control 13(2), 113–118 (2021) 36. Chinnaiyan, R., Alex, S.: Early analysis and prediction of fetal abnormalities using machine learning classifiers. In: 2021 2nd International Conference on Smart Electronics and Communication (ICOSEC), pp. 1764–1767 (2021). https://doi.org/10.1109/ICOSEC51865.2021.959 1828. https://ieeexplore.ieee.org/abstract/document/9591828 37. Chinnaiyan, R., Alex, S.: Machine learning approaches for early diagnosis and prediction of fetal abnormalities. In: 2021 International Conference on Computer Communication and Informatics (ICCCI), pp. 1–3 (2021). https://doi.org/10.1109/ICCCI50826.2021.9402317. https:// ieeexplore.ieee.org/abstract/document/9402317 38. Chinnaiyan, R., Alex, S.: Optimized machine learning classifier for early prediction of fetal abnormalities. Int. J. Comput. Intell. Control 13(2) (2021). https://www.mukpublications.com/ ijcic-v13-2-2021.php 39. Hari Pranav, A., Senthilmurugan, M., Pradyumna Rahul, K., Chinnaiyan, R.: IoT and machine learning based peer to peer platform for crop growth and disease monitoring system using blockchain. In: 2021 International Conference on Computer Communication and Informatics (ICCCI), pp. 1–5 (2021) 40. Lavanya, L., Chandra, J.: Oral cancer analysis using machine learning techniques. Int. J. Eng. Res. Technol. 12(5) (2019). ISSN 0974-3154 41. Preetika, B., Latha, M., Senthilmurugan, M., Chinnaiyan, R.: MRI image based brain tumour segmentation using machine learning classifiers. In: 2021 International Conference on Computer Communication and Informatics (ICCCI), pp. 1–9 (2021). https://doi.org/10. 1109/ICCCI50826.2021.9402508
Chapter 49
Malicious Bot Detection in Large Scale IoT Network Using Unsupervised Machine Learning Technique S. Pravinth Raja, Shaleen Bhatnagar, Ruchi Vyas, Thomas M. Chen, and Mithileysh Sathiyanarayanan
Abstract The extensive research of Internet of Things (IoT) apps and connected digital gadgets has been heavily targeted by intruders launching spread attacks due to lossy wireless networks. Attackers employ botnets, which are attack vectors made up of captured bots created for a specific purpose, to gain control of systems and components by acting maliciously. In order to minimize those problems, a distributed machine learning model was employed to extract the proper feature and pick features that would protect the application or network from hostile attacker behavior. To develop an effective and efficient secure identification of IoT-based risks in the heterogeneous network, a well-structured model must be built for training and testing along some distribution of the dataset toward verifying the recommended system. In order to build the best botnet attack detection model based on the numerous attack characteristics of the botnet, attack component analysis has been proposed in the study to classify the best attack feature subsets on various attack features acquired from the benchmark dataset. Genetic algorithms have been used to extract discriminating characteristics from network log data in order to provide the best feature subsets.
S. Pravinth Raja (B) Department of CSE, Presidency University, Bengaluru, India e-mail: [email protected] City University of London, London, UK S. Bhatnagar Department of Computer Science and Engineering, Manipal Institute of Technology Bengaluru, Manipal Academy of Higher Education, Manipal, India R. Vyas Department of CSE, Geetanjali Institute of Technical Studies, Gadwa, Rajasthan, India T. M. Chen School of Science and Technology, City, University of London, London, UK M. Sathiyanarayanan MIT Square, London, UK © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_49
605
606
S. Pravinth Raja et al.
49.1 Introduction Due to evolution of the Internet of Things network usage, high demand of the cyber security against cyber-attacks through defense mechanism on various forms has been raised [1]. Cyber security has currently gained a lot of importance due to the exploration of the Internet of Things (IoT). Actually, it is necessary to design routing strategies of the IoT networks to cope with attack heterogeneity among bot network. Botnet attacks composing of bot are exponentially increasing. The large scale availability of malicious activities of bot will produce botnets which enables the attacker to exploit the network parameter such as energy and bandwidth to inject the disturbance of normal day-to-day activities [2]. Botnet is a network of composed of the malicious devices which activated and managed by the command-and-control centralized server. Botnets is composed of various category of attack which considered as distributed denial-of-service (DDoS) attacks, Worm hole attack and node spamming attack. Botnet attack detection is a complex in management as attack produces the diversity on structures and protocols to interrupt network. Although large no of existing solutions has been employed to handle botnet attacks using supervised and unsupervised learning [3]. The challenges identified for incorporating the malicious bot detections are • Large scale network • The data are unsupervised. Despite many advantageous of IDS, it still requires more improvement to gather the dynamic malicious structure of attacking characteristics of the attacker and evolution of the networks. In this paper, efficient machine learning using attack component analysis has been used to classify optimal feature which has been extracted using the genetic algorithm to detect and secure the network. The rest of the paper is sectioned as mentioned; Sect. 49.2 provides the literatures employed for botnet detection. Section 49.3 provides the current architecture to detect the botnet-based malicious attacks using machine learning architecture. Section 49.4 discusses the experimental outcomes on validation of the current model on various dataset. The motivation for the proposed, area of cybersecurity is always a challenge. As a result, cybercriminals are continuously looking for new ways to exploit vulnerabilities and use them for nefarious and unlawful purposes. Malware distribution techniques are evolving in new and creative ways. The software is then used to conduct out additional assaults such as data exfiltration and denial-of-service operations using or on compromised computers. Our primary addition, as described in this article, the suggestion of an intrusion detection solution based on deep autoencoders that can spot botnet-specific anomalies in IoT network traffic. Although the use of autoencoders for anomaly identification is an ongoing study subject, its implementation is still in its early phases. Unlike other methods [4], we chose to build our solution centralized rather than profiling each device using a deep autoencoder for a variety of reasons. Finally, the conclusion of the article has been mentioned in Sect. 49.5.
49 Malicious Bot Detection in Large Scale IoT Network Using …
607
49.2 Related Work In this section, significant literature employed for malicious attack detection using machine learning architecture on basis of rule and flow-based features of the network traffic has been summarized in detail.
49.2.1 Botnet Attack Detection Using Decision Tree In this architecture, traffic data has been used in categorizing normal behavior of the network from abnormal behavior of the network on its flow-based structures. Decision tree technique is used to represent the features of the traffic information in form of the tree structure [5]. Further it is to mitigate information overload and optimize the decision making of the system by extracting and customizing the attack information through a protocol of identifying patterns from the extensive data of the heterogeneous network. It is capable of detecting the mutate kind of anticipating models.
49.2.2 IoT Security Model-Based Fuzzy Theory In this method, fuzzy theory-based malicious attack handling model for IoT detect against the device behavior on analysis of the spatial and temporal features of the IoT services. A fuzzy theory has been enabled to malicious characteristics on basis of relationship mapping and forwarding behavior with neighbor bots [6]. Further architecture includes a attack features weighting factor to compute the structural significance of attack based on the parametric weight value along end to end packet forwarding ratio.
49.3 Proposed Model In this section, a unique botnet attack detection model based on optimization of the multiple layer perceptron with genetic algorithm has been designed for each IoT devices interconnected in the distributed network. It is adapted to detect the heterogeneous architecture of IoT botnet attack. Further architecture of the proposed model is as follows:
608
S. Pravinth Raja et al.
49.3.1 Attack Characteristics of the Botnet Attack propagating in the IoT devices has been classified on basis of the malicious characteristics of the bot employed for the specified activity. • DDOS attack DDoS attack is an important feature, which considered as bots magnitude. Specified DDoS attack is uniquely determined by a DDoS ID. Specified DDoS ID is mentioned with a timestamp, which constitute the start time of the DDoS attacks [7]. Every DDoS attacks based on the botnet families and it is considered as time serious data.
49.3.2 Feature Selection—Genetic Algorithm Feature selection on the dataset is carried out using genetic algorithm which would be propagated in a distributed manner. Each bot considered as chromosome and botnet as population [8], it is processed using the fitness function to select the optimal features of the dataset. A cross over function is implemented to guide the network on botnet movement degree of on attack feature on fitness scores. Notation of the bot properties has been tabulated in this work (Table 49.1). Assume if node i is in session of bot interactions with other bot j in distributed architecture, and it is present in a attack session with additional bot k, which has few network characteristics on j. Bot i may utilize the fitness computation of mutation process of genetic model to make an initial update of bot j’s fitness scores on its commanding. Architecture of the proposed approach is presented in Fig. 49.1. A fitness function [9] is therefore needed to compute the bot degree to any bot i on features of bot. The following assumption should be considered as • The importance of fitness function is usually to suggest node i with trust value as it aim to begin or organize relate with j. After interactions of node i with node j, k’s unintended attack features are effortlessly eliminated from node i, as i’s direct assessment of the bot on the basis of the fitness score. Table 49.1 Parameter description of the model notations Symbol
Description
F ij
Fitness value of the feature in the node j estimated by node i at present state of the network and condition
F
Feature parameter of node computed by node j at current context
β ij←k
Cross over selection on node j by node k to node I
wi (p)
Node parameter weight
V ij (p)
Fitness value of the node j on assessment of node i
S ij (p)
Node assessment value
49 Malicious Bot Detection in Large Scale IoT Network Using …
609
Fig. 49.1 Architecture diagram of the proposed architecture
• Fitness estimation should be employed to avoid DDOS and Sybil attacks. • Changes of the fitness score on past computation and current computation should be computed effectively. • Mutation will happen on large variation between communication as node operating with time. Covariance function will compute rate in which node i is evaluate the node with malicious characteristics of the node k to the j command. It is weight i assigned to the node as indirect assignment of the attack. The belief function of bot is given as Belief Function for Bot attack is βi j←k βi j←k = 1 −
(Vk j ( p) − Vi j ( p))
Thus, a malicious node represents small variation to acquire by leading to a poor service to i, and its efficiency to effect (p) is extremely restricted. Combined with the trust decay function, it provided an efficient protection against static and dynamic attacks [4]. Correlation Computation Using Aggregation Function A correlation function determines node trusts are accumulated to estimate an entire trust score for a node. The accumulation function selection is based on the correlation situation using a dynamic weighted sum approach [10]. Node secure modification (on partial trust scores) are event-based and happens whenever nodes communicate with each other. It follows the multi-criteria approach toward node malicious computation. C is set of correlation nodes on collaboration context. Trust over Collaborating Node is provided by T [C] = Ti j , P, W, S, F, L ∀i j ∈ C
610
S. Pravinth Raja et al.
where T ij represents trust score of node j estimated by node i. P represents set of parameters to compute the trust is {p1 , p2 , p3 … pn }. W termed as the weight on node j parameter j is {wi (p1 ), wi (p2 ), wi (p3 ), … wi (pn )}. S termed as the set of node j trust value on mentioned parameter is {sij (p1 ), sij (p2 ), sij (p3 ) … sij (pn )}. F represents fitness aggregation function f (W, S).
49.3.2.1
Attack Parameter
A decision system determines the prediction based on many factors of the attack of the node communicating for data transmission. Each factor is a attack parameter. Attack parameters can be represented as objective or subjective. Parameters are represented as objective if they are verifiably computed. Those properties incorporate data transaction speed, reliability, rate of work, proximity, service cost, stake in the attack collaboration [11]. In this fitness value of each bot produces the same results to other attacking bot. In this fitness value of each node does not produce the same results to attack, even if they all assessed j at the particular condition, t. It is given as follows. Condition for Objective Parameter Si j ( p1 ) = Si j ( p2 ) = Si j ( p1 ) Therefore, the trust method aids both QoS and malicious parameters implicitly due to larger robustness of applications. In this work, selection of each parameters to implement depends on the collaboration situation and should be selected when C is set up.
49.4 Experimental Results In section, the experimental outcomes of the machine learning-based botnet attack detection against the existing methods toward evolving attack categorization has been represented using k fold cross validation. In this evaluation, precision, recall, F1-score, and computation time has been analyzed.
49 Malicious Bot Detection in Large Scale IoT Network Using …
611
49.4.1 Dataset Description The extensive experiments toward evolving attack classification of the botnet attack like DDOS, phishing, and other malicious attack using optimization of convolution neural network architecture using on two familiar CTU-13 dataset and KDD dataset composed of attack with diverse data integration to determine the attack class is as follows.
49.4.1.1
CTU-13 Dataset
CTU-13 dataset composed of attack sequence of network traffic which was captured at CTU University and managed in Pcap files. The CTU-13 dataset is a labeled dataset which is composed of 13 attack scenarios labeled as normal, attack, or background. The 13 files includes various botnet types including centralized and decentralized structures [12]. It randomly produces 10 varied sequences of each dataset for attack class and report the average outcome on basis feature extraction methods. Table 49.2 represents the botnet attack type which is as represented above. The proposed machine learning framework for attack classification on various constraints and strategies using genetic algorithm-based optimization on attack component analysis is been evaluated against the following performance measures precision, recall, F measure, and computation time (Fig. 49.2). The recall measured is computed against various dataset is been represented in the Fig. 49.3 on attack classification. It is a computation of a classification accuracy of the attack detection is illustrated in Fig. 49.4 and its performance value represented in Table 49.2. Table 49.2 Structure of CTU-13 dataset
Sequence
Botnet name
Structure
1
DDOS
P2P
2
DDOS
P2P
3
Phishing
HTTP
4
Phishing
HTTP
5
Mirai
IRC
6
Mirai
IRC
7
Virut
P2P
8
Rbot
IRC
9
Rbot
IRC
10
Murlo
HTTP
11
Neris
IRC
12
Neris
HTTP
13
Phishing
P2P
612
S. Pravinth Raja et al.
Fig. 49.2 Performance analysis of the methodologies with respect to precision against the different datasets
Fig. 49.3 Performance analysis of the methodologies on recall against the different datasets
Fig. 49.4 Performance analysis of the methodologies with respect to F measure against the different datasets
As shown in Table 49.3 machine learning architectures enable the complete process to execute 2–3 orders of magnitude faster than PCA learning approach toward attack classification on heterogamous architecture of botnet. Finally, it is verified that it achieves good results in terms of the attack reliability analysis. On comparing these three models, combinational model as the best performance since its average F measure score is the highest.
49 Malicious Bot Detection in Large Scale IoT Network Using …
613
Table 49.3 Performance evaluation of methodology against measures for datasets Dataset
Models
Precision in %
Recall in %
F measure in %
CTU-13 dataset
ACA
96.54
88.63
93.76
PCA
94.26
87.59
93.21
KNN
96.23
87.56
92.77
ACA
96.65
87.56
93.47
PCA
95.25
86.56
93.21
KNN
95.89
87.21
92.14
KDD dataset
49.5 Conclusion and Future Work Machine learning framework for IoT botnet attack detection on network traffic streams is modeled and simulated in this article against multiple heterogeneous feature and classes. Initially genetic algorithm is to produces the feature vector incorporating the feature subset for classification. However, feature subset extracted is processed using attack component analysis on basis of covariance and correlation analysis with Eigen vector. Further Eigen value classifier has been obtained on to generate the attack class basis of heterogeneous feature. Experimental outcomes have verified using cross fold validation that current architecture outperform conventional approaches with respect to computation time and accuracy. We will be using novel visual techniques with the machine learning techniques [13–17] to improve malicious bot detection. In the future, the model will be trained on a big data collection. Random Forest and SVM machine learning models can also be evaluated. Deep learning models, in addition to ResNet50 and LSTM models, can be used in runtime botnet identification. The study paradigm, in addition to being integrated with front-end web applications, can also be used with back-end web applications.
References 1. Hoang, X.D.: Botnet detection based on machine learning techniques using DNS query data. Future Internet 10(5), 1–11 (2018) 2. Wainwright, P., Kettani, H.: An analysis of botnet models. In: Proceedings of 3rd International Conference on Compute and Data Analysis, New York, NY, USA, pp. 116–121 (2019) 3. Li, S.-H., Kao, Y.-C., Zhang, Z.-C., Chuang, Y.-P., Yen, D.C.: A network behavior-based botnet detection mechanism using PSO and K-means. ACM Trans. Manage. Inf. Syst. 6(1), 1–30 (2015) 4. Alieyan, K., Almomani, A., Manasrah, A., Kadhum, M.M.: A survey of botnet detection based on DNS. Neural Comput. Appl. 28(7), 1541–1558 (2017) 5. Zhuang, D., Chang, J.M.: Enhanced PeerHunter: detecting peertopeer botnets through networkflow level community behavior analysis. IEE Trans. Inf. Forens. Secur. 14(6), 1485–1500 (2019) 6. Ehsan, K., Hamid, R.S.: BotRevealer: behavioral detection of botnets based on botnet life-cycle. Int. J. Inf. Secur. 10(1), 55–61 (2018)
614
S. Pravinth Raja et al.
7. Moustafa, N., Hu, J., Slay, J.: A holistic review of network anomaly detection systems: a comprehensive survey. J. Netw. Comput. Appl. 128, 33–55 (2019) 8. Wang, Z., Tian, M., Jia, C.: An active and dynamic botnet detection approach to track hidden concept drift. In: Proceedings of International Conference on Information and Communications Security. Lecture Notes in Computer Science: Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, vol. 10631, Berlin, Germany, pp. 646–660 (2018) 9. Aamir, M., Zaidi, S.M.A.: Clustering-based semi-supervised machine learning for DDoS attack classification. J. King Saud Univ. Comput. Inf. Sci. (2019). https://doi.org/10.1016/j.jksuci. 2019.02.003 10. Wang, C.-Y., Ou, C.-L., Zhang, Y.-E., Cho, F.-M., Chen, P.-H., Chang, J.-B., Shieh, C.-K.: BotCluster: a session-based P2P botnet clustering system on NetFlow. Comput. Netw. 145, 175–189 (2018) 11. Debashi, M., Vickers, P.: Sonification of network traffic for detecting and learning about botnet behavior. IEEE Access 6, 33826–33839 (2018) 12. Garg, S., Peddoju, S.K., Sarje, A.K.: Scalable P2P bot detection system based on network data stream. Peer-to-Peer Netw. Appl. 9(6), 1209–1225 (2016) 13. Sathiyanarayanan, M., Turkay, C., Fadahunsi, O.: Design and implementation of small multiples matrix-based visualisation to monitor and compare email socio-organisational relationships. In: IEEE COMSNETS (2018) 14. Sathiyanarayanan, M., Turkay, C.: Challenges and opportunities in using analytics combined with visualisation techniques for finding anomalies in digital communications. In: DESI VII Workshop on Using Advanced Data Analysis in eDiscovery & Related Disciplines to Identify and Protect Sensitive Information in Large Collections, London, UK, 12 June 2017 15. Sathiyanarayanan, M., Turkay, C., Fadahunsi, O.: Design of small multiples matrix-based visualisation to understand E-mail socio-organisational relationships. In: 2018 10th International Conference on Communication Systems & Networks (COMSNETS) (2017) 16. Sathiyanarayanan, M.: Visual analysis of E-mail communication to support digital forensics & E-discovery investigation in organisations. Thesis (2020) 17. Sathiyanarayanan, M., Fadahunsi, O.: Integrating digital forensics and digital discovery to improve E-mail communication analysis in organisations. In: Smart Computing Paradigms: New Progresses and Challenges, pp. 187–193. Springer, Singapore (2020)
Chapter 50
LSTM with Attention Layer for Prediction of E-Waste and Metal Composition T. S. Raghavendra, S. R. Nagaraja, and K. G. Mohan
Abstract Electronic garbage, or “e-waste,” may be hazardous to the environment due to its composition. Environmental scientists are focusing more on the dangerous heavy metal pollution in e-waste sites due to how deadly and persistent it is. The bulk of electronic waste has been disposed of improperly due to poor handling practises that compromise safety and environmental protection. In order to anticipate the composition of e-waste and metal, a Long Short-Term Memory (LSTM) with attention layer technique was proposed. Data from the E-waste recycling facility is used, and it has been preprocessed, to exclude extreme points that usually appeared when using alternative fuels. The preprocessed data is then obtained by the testing and training operations. Since the gates allow the input characteristics to flow through the hidden layers without modifying the output, the proposed LSTM network is simple to optimise. By examining the well-known feature maps from the prediction branch, the attention layer makes advantage of the correlation between the class labels.
50.1 Introduction The race to produce the most inventive electronics has become more intense as a result of the rapid growth of technology. The [1] e-composition waste raises potential environmental issues. When estimating general e-waste using the Delay Model technique, the End-Of-Life (EoL) value is employed to take Indonesian society’s perception of e-waste into account [2]. Environmental heavy metal pollution and recycling of electronic trash are strongly related. One of the lucrative industries that T. S. Raghavendra (B) Department of CSE, School of CSE and IS, Presidency University, Bengaluru, Karnataka, India e-mail: [email protected] S. R. Nagaraja Department of CSE, School of Engineering, Presidency University, Bengaluru, Karnataka, India e-mail: [email protected] K. G. Mohan Department of CSE, GITAM University, Bengaluru, Karnataka, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_50
615
616
T. S. Raghavendra et al.
has been effective in recovering priceless metals like gold, platinum, copper, etc., is e-waste recycling [3]. Large developing countries employed recycling extensively, and it was characterised by archaic methods including acid washing, open burning, smelting, and hand sorting. Significant amounts of heavy metals, such as trash lead and chromium and precious copper, were unavoidably released into the environment during recycling operations [4]. Environmental scientists are focusing more on the dangerous heavy metal pollution in e-waste sites due to how deadly and persistent it is. Electrical and electronic equipment is always present when recycling electronic trash that contains copper-leaching solution. Selective extraction is carried out where it is desired for future resource purification, both economically and environmentally [5]. Concern for nature is growing as a result of environmental behaviour, events, and potential negative consequences on people and wildlife [6]. The bulk of electronic waste has been disposed of improperly due to poor handling practises that compromise safety and environmental protection. Among the hazardous metals and organic contaminants emitted during the operations include halogenated furans, polycyclic aromatic hydrocarbons, and FR additives [7]. The Union of Europeans and other developed countries are addressing the e-waste issue by immediately taking the necessary action. Also, the majority of nations use the most environmentally friendly practises, like methods for recovering resources from e-waste [8]. Yet, just a few nations—including China, Sri Lanka, India, Bhutan, Ghana, Pakistan, and Cambodia—can manage e-waste in an appropriate manner. Certain job functions involve methods that endanger both people and the environment [9]. The body of the paper is divided into five sections: Sect. 50.1, which examines current methodologies, Sect. 50.2, which discusses the proposed methodology for e-waste and metal composition, Sect. 50.3, which contains the experimental results and a discussion, and Sect. 50.4, which explains the research’s conclusion.
50.1.1 Dataset The data used in this suggested prediction approach comes from the Returkraft WtE plant’s Kristiansand, Norway, E-waste recycling facility. A steam turbine powered by the plant’s steam is utilised to generate energy and heat the city’s district heating system. The WtE plant burns 130 kilotons of MSW annually, of which 79 kilotons are predominantly made up of residential trash, in order to produce 250 GWh of heat and 95 GWh of electricity. The plant supplied a dataset with a Lower Heating Value. LHV values were precisely determined depending on steam generation and were provided [16].
50 LSTM with Attention Layer for Prediction of E-Waste and Metal …
617
50.1.2 Data Preprocessing In order to remove extreme points that commonly occurred when using fuels other than municipal solid waste (MSW), particularly during startup and shutdown of the plant, it was required to clean up Lower Heating Value (LHV) data for MSW. An outlier was defined as a point that deviated from the median by more than three median standard deviations. The averages of the two close places were used to restore the extreme points that had been erased. Daily weather data were obtained by averaging hourly weather data, and similar to ML methods, any missing values were filled in using the averages of the two closest points. Although there is a connection between the two, it is not quite obvious how much the LHV depends on precipitation and wind speed. Holidays had the lowest LHV, which was lower on the weekends than during the week. The LHV was lower throughout the milder seasons of the year, such as winter weeks, compared to the hotter summer weeks. The findings and discussion section goes into further depth on the connections between the response and predictors.
50.1.3 Training and Validation Data The training dataset covered two years of daily data from January 1, 2017, to December 31, 2018, and contained 730 averaged daily observations for all input and output variables. The validation dataset included 294 observation sites and daily average data from 01.01.2019 to 21.10.2019.
50.1.3.1
Prediction
The preprocessed data is given to the training and testing procedures. The prediction process for metal composition and e-waste also uses the Long Short-Term Memory (LSTM) with attention layer. Recurrent neural networks, such as the LSTM, have an architecture that mimics temporal dependency (RNN).
50.2 Proposed Methodology The block diagram of proposed method is shown in Fig. 50.1. The hidden state h t for tth word in the sub-path is the function of previous state h t−1 and present word xt . The input is linearly transformed by the weight matrix and
618
T. S. Raghavendra et al.
Fig. 50.1 Block diagram of proposed LSTM with attention layer method for the prediction of e-waste and metal composition
squashed nonlinearly by the activation function as shown in Eq. (50.1). h t = f (Win xt + Wrec h t−1 + bh ),
(50.1)
where Win and Wrec are the weight matrices for the input and recurrent connections, respectively. The bh is the bias term for the vector in hidden state, and f h is the activation function of nonlinear layer. The LSTM-based RNN includes four components that are input gate i t , output gate ot , forget gate f t , and memory cell ct . The three adaptive gates such as i t , f t , and ot are dependent on the previous h t−1 layer. The current input xt and the extracted vectors of features gt are calculated by using Eqs. (50.2)–(50.5).
50 LSTM with Attention Layer for Prediction of E-Waste and Metal …
619
i t = σ (Wi · xt + Ui · h t−1 + bi ),
(50.2)
( ) f t = σ W f · xt + U f · h t−1 + b f ,
(50.3)
ot = σ (Wo · xt + Uo · h t−1 + bo ),
(50.4)
) ( gt = tanh Wg · xt + Ug · h t−1 + bg ,
(50.5)
where i t is the input gate, f t is the forget gate, ot is the output gate, and gt is the extracted feature gate. The W is the weights of particular gate, and the current memory of cells ct is the combination of previous cell content ct−1 and the content of candidate gt that is weighted by input gate i t and forget gate f t . The current memory of cell ct is calculated by using Eq. (50.6). ct = i t ⊗ gt + f t ⊗ ct−1 .
(50.6)
The output of LSTM units is the hidden states’ recurrent network that is calculated by using Eq. (50.7). h t = ot ⊗ tanh(ct ),
(50.7)
where the σ is the sigmoid function, and ⊗ is the element-wise multiplication. The importance of the predictors in terms of their impact on the response was visualised using partial dependence plots. The response values, considering a given predictor, are estimated using the trained model and the mean values of the predictors that are not being assessed for importance. Pseudocode of LSTM Input: x = [x1 , . . . .., x365 ], xi ∈ Rn Given parameters: W f , U f , b f , Wc˜ , Uc˜ , bc˜ , Wi , Ui , bi , Wo , Uo , bo Initialize h o , co = 0→ for t = 1, . . . ., 365 do Calculate f t , c˜t , i t Update the cell state as ct Calculate ot
620
T. S. Raghavendra et al.
end for Output: h = [h 1 , . . . .., h 365 ], h i ∈ Rm ] where Inputs, x = [x1 , . . . , x365 ], xi ∈ Rn for the complete sequence of meteorological observations. xi of time step t, is processed time step by time step. the next layer takes the output: h = [h 1 , . . . , h 365 ] of the previous layer as input where h is the last output of the second LSTM layer. f t forget gate c˜t cell state i t input state ot output state
50.3 Results and Discussion The experimental findings of the suggested LSTM with attention layer technique are detailed in this section. The suggested approach is used with Python 3.7.3 on a system with 8 GB of memory and a 2.2 GHz processor. The suggested LSTM with attention layer’s performance measures and comparison to the current approaches are as follows.
50.3.1 Performance Metrics The following is a description of the performance metrics used in the proposed LSTM with attention layer technique for the prediction of e-waste and metal composition: • Accuracy: Accuracy =
Number of correct predictions . overall predictions
(50.8)
50 LSTM with Attention Layer for Prediction of E-Waste and Metal …
621
• Precision: Precision =
TP . TP + FP
(50.9)
• Recall: TP . TP + FN
(50.10)
TP . TP + 1/2(FP + FN)
(50.11)
Recall = • F-score: F-score =
50.3.2 Quantitative Analysis The proposed analysis of proposed LSTM with attention layer for prediction of e-waste and metal composition is explained in this section (Table 50.1). For the prediction of e-waste and the metal composition approach for stock market prediction, the suggested LSTM with attention layer method achieved accuracy of 98.67%, precision of 94.32%, recall of 95.54%, and f-score of 96.29%. The metal composition method and the suggested LSTM with attention layer methodology for predicting e-waste are quantitatively analysed in Table 50.2. For the prediction of e-waste and metal composition, the suggested LSTM with attention layer technique generated accuracy of 98.67%, precision of 94.32%, recall of 95.54%, and F-measure of 96.29%. The present Convolutional Neural Network (CNN) demonstrated accuracy of 89.45%, precision of 86.38%, recall of 87.93%, and F-measure of 88.57% in quantitative analysis. The current Artificial Neural Network (ANN) demonstrated accuracy of 90.25%, precision of 87.80%, recall of 88.54%, and F-measure of 89.73% in quantitative research. The recurrent neural network’s capacity to successfully retain every detail throughout time, in addition to remembering earlier input, makes time series prediction possible. In a quantitative Table 50.1 Quantitative analysis of proposed LSTM with attention layer for prediction of e-waste and metal composition prediction
Metrics
Proposed LSTM
Accuracy
98.67
Precision
94.32
Recall
95.54
F-score
96.29
622
T. S. Raghavendra et al.
Table 50.2 Quantitative analysis of proposed LSTM with attention layer method for the prediction of e-waste and metal composition method with existing technique such as CNN, ANN Methods
Accuracy (%)
Precision (%)
Recall (%)
F-measure (%)
CNN
89.45
86.38
87.93
88.57
ANN
90.25
87.80
88.54
89.73
Proposed LSTM with attention layer
98.67
94.32
95.54
96.29
Fig. 50.2 Comparative analysis graphical representation of proposed LSTM with attention layer method for the prediction of e-waste and metal composition method
comparison to other methods, the suggested LSTM with attention layer method for the prediction of e-waste and metal composition has performed better. For clarity, Fig. 50.2 compares quantitatively the proposed LSTM with attention layer method and the current method for predicting metal content and e-waste.
0.065
135
0.06
< 0.001
Cobalt
Copper
Dysprosium
Europium
0.22
0.04
< 0.001
5.3
< 0.001
Gold
Indium
Lanthanum
Lead
Mercury
Glass
Gallium
Gadolinium
< 0.001
0.07
Chromium
Ferrite
< 0.001
Cerium
Cadmium
< 0.001
5.3
0.04
0.22
0.0016
< 0.001
< 0.001
0.06
135
0.065
0.07
< 0.001
2.5
1319
15,760
656
0.03
0.2
2.5
Barium
Beryllium
14
0.01
0.01
0.77
Arsenic
III
Antimony
0.77
II 67
g/unit
Materials
Aluminium
I
Materials’ Composition Products
0.007
0.003
0.11
162
< 0.001
0.008
824
0.005
0.71
IV
0.003
0.11
216
0.005
0.002
< 0.001
824
< 0.001
V
464
0.31
6845
483
952
1
240
VI
< 0.001
16
< 0.001
0.079
0.2
590
0.003
< 0.001
0.001
< 0.001
130
VII
0.001
0.082
0.2
590
0.003
0.002
< 0.001
< 0.001
130
VIII
1
1
0.024
26
3.8
12
IX
0.6
0.038
10.6
14
6.3
0.003
0.084
2.9
X
0.119
6915
0.119
78
0.407
1370
XI
0.005
0.06
15
441
XII
< 0.001
1.1
< 0.001
0.008
0.044
< 0.001
< 0.001
0.012
27
0.013
0.014
< 0.001
0.49
0.002
0.154
XIV
(continued)
0.005
15
441
XIII
50 LSTM with Attention Layer for Prediction of E-Waste and Metal … 623
0.04
2.1
3.6
0.04
Molybdenum
Neodymium
Nickel
Palladium
2088
0.45
1.25
4
0
3
3
2
1
4
8
# of precious metals
13
10
14
# of critical raw material
1
0.004
Zinc
0.004
1
0.002
Yttrium 8.6
20
1 0.005
18
Vanadium 0.11
18
0.002 24
< 0.001
2530
3
10
0.016
0.633
32
3322
Tungsten
0.002
1.7
0.633
< 0.001
1.7
Titanium
Tin
Terbium
Tellurium
Tantalum
Steel/iron
0.52
0.52
1780
0.04
0.633
VIII
63
0.009
1
IX
3
7
< 0.001
0.633
0.633
24
2530
3
2
4
1
11
4
8
1
1
8
0.244
0.01
0.004
60
0.015
1.5
0.05
X
1172
0.295
XI
0
2
0.4
0.116
0.406
226
0.45
< 0.001
1780
0.04
0.633
VII
1
0.25
2481
199
VI
Silver
0.25
573
0.044
V
0.119
< 0.001
612
0.044
IV
5
0.274
8755
III
Silicon
0.274
Praseodymium
0.004
0.04
3.6
2.1
0.04
II
Selenium
0.004
Platinum
Plastics
I
Products
(continued)
3
4
62
0.031
0.145
44
0.003
XII
3
1
62
3
14
< 0.001
< 0.001
< 0.001
0.05
0.055
0.008
0.722
0.427
0.008
XIV
(continued)
0.031
44
0.003
XIII
624 T. S. Raghavendra et al.
I
II
III
IV
V
VI
VII
VIII
IX
X
XI
XII
XIII
XIV
I LCD notebook; II LED notebook; III CRT TVs; IV LCD Tv’s; V LED Tv’s; VI CRT monitors; VII LCD monitors; VIII LED monitors; IX cell phones; X smart phones; XI PV panels; XII HDDs; XIII SSDs; XIV tablets Source Cucchiella et al. [10]
Products
(continued)
50 LSTM with Attention Layer for Prediction of E-Waste and Metal … 625
626
T. S. Raghavendra et al.
Table 50.3 Comparison results of the proposed LSTM with attention layer method with existing method Methods
Life span
Kosai et al. [3]
0.8
Proposed LSTM with attention layer
0.9
WEEE Volume Minimal Dataset of WEEE Products Weight (kg) AS IS (kt)
TO BE (kt)
Δ change (%)
LCD notebooks
3.5
80
97
21
LED notebooks
3.5
22
45
105
CRT Tv’s
25
85
67
− 21
LCD Tv’s
10
35
399
1040
LED Tv’s
10
10
504
4940
CRT monitors
16
340
133
− 61
LCD monitors
5
155
194
25
LED monitors
5
43
244
467
Cell phones
0.08
11.5
5.2
− 55
Smart phones
0.12
19
39
105
TV panels
80
8.3
10
20
HDDs
0.58
32
52
63
SSDs
0.4
0.4
6
1400
Tablets
0.5
4.9
10
104
50.3.3 Comparative Analysis Based on performance criteria including accuracy, precision, recall, and f -score, Table 50.3 compares the proposed method to existing methods. Comparisons are made between the recommended LSTM with attention layer strategy and other popular techniques, as [11], for the prediction of e-waste and metal composition. The longevity of 0.9% was attained by the proposed LSTM with attention layer strategy for the prediction of e-waste and metal composition. The existing methodology, however, only indicated a 0.8% life expectancy. As a consequence, the proposed strategy outperformed the current strategy. Figure 50.3 depicts graphically the comparison graph between the metal composition approach and the suggested LSTM with attention layer methodology for forecasting e-waste.
50 LSTM with Attention Layer for Prediction of E-Waste and Metal …
627
Fig. 50.3 Graphical representation of quantitative analysis of proposed LSTM with attention layer method for the prediction of e-waste and metal composition method
50.4 Conclusion In this study, the composition of e-waste and metal was estimated using the LSTM with attention layer technique. Since the gates allow the input characteristics to flow through the hidden layers without modifying the output, the proposed LSTM network is simple to optimise. The results of the experiments demonstrate that the proposed attention layer LSTM performed better than the current approach.
References 1. Gundupalli, S.P., Hait, S., Thakur, A.: Classification of metallic and non-metallic fractions of e-waste using thermal imaging-based technique. Process. Saf. Environ. Prot.Saf. Environ. Prot. 118, 32–39 (2018) 2. Siddiqi, M.M., Naseer, M.N., Abdul Wahab, Y., Hamizi, N.A., Badruddin, I.A., Hasan, M.A., ZamanChowdhury, Z., Akbarzadeh, O., Johan, M.R., Kamangar, S.: Exploring E-waste resources recovery in household solid waste recycling. Processes 8(9), 1047 (2020) 3. Liu, J., Chen, X., Shu, H.Y., Lin, X.R., Zhou, Q.X., Bramryd, T., Shu, W.S., Huang, L.N.: Microbial community structure and function in sediments from e-waste contaminated rivers at Guiyu area of China. Environ. Pollut.Pollut. 235, 171–179 (2018) 4. Jiang, B., Adebayo, A., Jia, J., Xing, Y., Deng, S., Guo, L., Liang, Y., Zhang, D.: Impacts of heavy metals and soil properties at a Nigerian e-waste site on soil microbial community. J. Hazard. Mater. 362, 187–195 (2019) 5. Debnath, B., Chowdhury, R., Ghosh, S.K.: Sustainability of metal recovery from E-waste. Front. Environ. Sci. Eng. 12(6), 1–12 (2018) 6. Gundupalli, S.P., Hait, S. and Thakur, A.: Thermal imaging-based classification of the E-waste stream
628
T. S. Raghavendra et al.
7. Vieira, B.D.O., Guarnieri, P., Camara Silva, L., Alfinito, S.: Prioritizing barriers to be solved to the implementation of reverse logistics of e-waste in brazil under a multicriteria decision aid approach. Sustainability 12(10), 4337 (2020) 8. Kyere, V.N., Greve, K., Atiemo, S.M., Amoako, D., Aboh, I.K., Cheabu, B.S.: Contamination and health risk assessment of exposure to heavy metals in soils from informal e-waste recycling site in Ghana. Emerging Sci. J. 2(6), 428–436 (2018) 9. de Oliveira Vieira, B., Guarnieri, P., e Silva, L.C., Alfinito, S.: Prioritizing barriers to be solved to the implementation of reverse logistics of E-waste in Brazil under a multicriteria decision aid approach. Sustainability 12(10), 4337 (2020) 10. Cucchiella, F., et al.: Recycling of WEEEs: an Economic assessment of present and future e-waste streams. Renew. Sustain. Energy. Rev. 51, 263–272 11. Kosai, S., Kishita, Y., Yamasue, E.: Estimation of the metal flow of WEEE in Vietnam considering lifespan transition. Resour. Conserv. Recycl.. Conserv. Recycl. 154, 104621 (2020)
Chapter 51
ADN-BERT: Attention-Based Deep Network Model Using BERT for Sarcasm Classification Pallavi Mishra, Omisha Sharma, and Sandeep Kumar Panda
Abstract Sarcasm classification has gained popularity among researchers due to its complexity and implicit form of context representation. The most challenging aspect of sarcasm identification is to understand the exact context behind the statement. Therefore, a context-based model helps solve this critical task in the research domain. The importance of sarcasm detection and classification is primarily helpful for society to avoid misinterpreting statements or reviews that could affect one’s mental condition and perception. Unfortunately, to increase the retention level of audience toward media news, often the media incorporate sarcasm in their news headlines. However, people find it difficult to detect sarcasm in news headlines, resulting in them having a false impression of the news and spreading it to their surroundings. Therefore, it has become a significant concern among the society to develop a sense of understanding of the hidden context behind the statement before making any type of impression and judgment. In this paper, we have focused on this problem statement and developed a novel technique called an Attention-based deep network model using Bidirectional Encoder Representations from Transformers (BERT) i.e., ADN-BERT to classify news headlines into sarcastic and non-sarcastic categories. The News Headlines dataset is collected from Kaggle, and we have incorporated text augmentation to increase the size of the dataset to yield better results and accuracy. The evaluation metrics used in our study include accuracy, recall, precision, and F1-score. Our proposed model (ADN-BERT) has outperformed the recent state-of-the-art techniques with 94.1% accuracy, 95.2% recall, 94.5% precision, and 94.8% F1-score.Please check and approve the edits made in the chapter title and running title.Edited. ApprovedPlease check and confirm if the author names and initials are correct.Yes, correct. P. Mishra (B) Faculty of Science and Technology (IcfaiTech), Dept. of Computer Science and Engineering, ICFAI Foundation for Higher Education, Hyderabad, Telangana, India e-mail: [email protected] O. Sharma · S. K. Panda Faculty of Science and Technology (IcfaiTech), Dept. of Artificial Intelligence and Data Science, ICFAI Foundation for Higher Education, Hyderabad, Telangana, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_51
629
630
P. Mishra et al.
51.1 Introduction In today’s data-driven world, people tend to share their opinions and feelings about important events such as online products purchasing, restaurants reviews, sports events, current trends, natural disasters, and political issues on a global social platform. The growth of social media platforms, which are becoming a major hub for global communication, produces enormous amounts of data. These user-generated data are of great importance for both the common people and business organizations, not only in terms of decision-making, but also have become one of the major sources of social awareness about various events occurring within and across the globe. However, the analysis of these vast amounts of data manually is a cumbersome task in terms of both time complexity and cost complexity. As a result, efficient and precise data analysis requires an automated system. Sentiment analysis (SA), otherwise known as opinion mining, is an automated approach in analyzing the tone or emotions of a speaker or writer with respect to some topic or the overall contextual polarity of a document. SA is the use of natural language processing (NLP), text analysis, and computational linguistics to identify and extract in-depth subjective information, i.e., information based on personal opinions, feelings, or beliefs, rather than on facts or evidence (objective information). In general, subjective information can be expressed through natural language, for example, in the form of opinions, reviews, or personal narratives. The importance of extraction of subjective information proves beneficial in certain contexts such as providing valuable insights into how people perceive and experience a product, service, or situation in case of business domains. For example, customer reviews can help a business understand how their products are perceived by their customers, while survey responses can help a company gain insights into employee satisfaction. On the other hand, these user reviews and opinions also act as a valuable informative domain, where other customers rely for their decision-making in online purchasing, stock market investments, booking restaurants, flights, etc. The usage of sentiment analysis has widened its scope in different areas such as understanding customer opinions and feedback, identifying trends and patterns useful for market research and product development, monitoring brand reputation, and improving marketing and advertising. However, there exist several challenges associated with sentiment analysis such as (a) contextual understanding, where the meaning of words and phrases can change depending on the context in which they are used; (b) ambiguous nature of natural language, which can make it difficult for sentiment analysis models to accurately identify the sentiment of a statement; and (c) sarcasm and irony, as they often involve saying the opposite of what is meant. This can lead to sentiment analysis models misclassifying sarcastic or ironic statements as having a different sentiment than intended. In this study, sarcasm classification using deep learning models is primarily focused. Sarcasm classification is one of the most significant challenges of sentiment analysis due to its varying meaning of the text. The presence of metaphors and ironic words in a text is critical for a model to understand the real context behind it. A
51 ADN-BERT: Attention-Based Deep Network Model Using BERT …
631
sarcastic text either in the form of customer reviews or opinionated text about any individuals or events involves an implicit criticism toward the specified target. In other words, sarcasm detection is one of the implicit aspect-oriented tasks of aspectbased sentiment. Since, many of the public opinions are basically expressed in a very unconventional way that conflicts with the context of the matter. The wrong interpretation of the sarcastic statement may lead to serious issues especially in the field of political domain, global issues. To avoid such kind of disambiguation and misinterpretation, the detection of sarcastic text would prove helpful to derive the intended meaning behind the opinions. In this work, we have proposed a novel method i.e., Attention-based deep network model using BERT (ADN-BERT) to classify sarcastic and non-sarcastic reviews fromNews Headlines dataset. We have incorporated one of the data augmentation techniques i.e., text augmentation to increase the size of our original dataset to improve the performance and generalization of deep learning models. The rest of the paper is organized as follows: Sect. 51.2 discusses related works in News Headlines dataset, Sect. 51.3 provides exploratory dataset analysis, Sect. 51.4 discusses proposed methodology, Sect. 51.5 presents experimental result analysis and comparative study, and finally, Sect. 51.6 describes the conclusion and future work of our proposed research work.
51.2 Related Work This section briefly describes about the past research works conducted on sarcasm classification tasks using the News Headlines dataset. The author [1] used seven different machine learning classifiers to perform sarcasm classification tasks. In this study, the author proposed voted classifier using bag-of-words (BoW) and n-gram frequency as feature selection methods and achieved 86.4% accuracy. Another approach by Misra et al. [2] designed hybrid neural network model using Bidirectional LSTM to capture both-way context and subsequently added attention layer to assign higher weights to the important context and finally passed output to the convolutional neural network model. The proposed model has obtained accuracy of 89.7%. Nayak et al. [3] compared the performance of both machine learning classifiers and deep learning classifiers using both traditional pre-trained word embedding model such as Word2vec and GloVe and recent sophisticated pre-trained models such as Bidirectional Encoder Representations from Transformers (BERT) for sarcasm classification. The author concluded that LSTM with BERT word embedding outperformed other classifiers with 89. 5% accuracy. The author [4] proposed a hybrid model comprising of autoencoder model-based LSTM, universal sentence encoder (USE), and pre-trained word embedding BERT model to classify sarcastic reviews. This paper deals with three different datasets such as SARC, Twitter, and News Headlines. But we are only concerned about the accuracy on news headlines dataset. The hybrid model achieved 90.8% accuracy.
632
P. Mishra et al.
Zanchak et al. [5] adapted logistic regression and Bayesian classifier with neural network models and performed TF-IDF, GloVe, and Word2vec on these models as feature extraction techniques and concluded that neural networks with GloVe achieved the highest accuracy among them with 80.5% accuracy. Liu et al. [6] used BERT-LSTM-based model for sarcasm classification task. In this paper, BERT, a pre-trained word embedding model is used to convert raw input text into word embedding of high-dimensional vector. Then finally, the output of the BERT model is used by LSTM to classify sarcastic or non-sarcastic sentences. The proposed model achieved accuracy of 91.8% on test data. Sharma et al. [7] proposed an ensemble-based technique using two different word embedding models such as GloVe and Word2vec and single LSTM model. The author used two popular word-embeddings techniques such as GloVe and Word2vec are used to convert text into word embedding vectors, while LSTM can convert the input into sentence embedding of high-dimensional vectors. Finally, the dense layers generated as outputs of each classifier are to achieve accuracy higher than individual classifiers. This experiment has achieved 88.9% accuracy. The author [8] suggested a supervised learning model for sarcasm detection using GloVe for word embedding vector and then it passed as input to convolutional neural network (CNN) followed by max-pooling with dropout layer to avoid any kind of overfitting. Finally, its output is passed on to Bi-LSTM model for classification. This proposed model obtained accuracy of 86.13%. Ajnadkar et al. [9] proposed a model for sarcasm classification task by combining 9 layers of deep learning models based on word embedding, Bidirectional LSTM, attention mechanism, and convolutional neural networks (CNNs). The proposed model achieved the highest accuracy of 89.19% when compared to individual deep learning models. For sarcasm detection on News Headlines dataset, Jayaraman et al. [10] used six different types of supervised learning models with feature extraction techniques, including Naive Bayes, support vector machine, logistic regression, bidirectional gated recurrent units, Bidirectional Encoder Representations from Transformers (BERT), DistilBERT, and RoBERTa. RoBERTa has performed relatively well among these models. Another approach by author [11] designed a deep neural based model using embedding layer as first layer of the model followed by convolutional neural network and LSTM model. This study is experimented on news headlines dataset, and the proposed model reported with 86.16% accuracy. Jariwala et al. [12] used support vector machine (SVM) as supervised classification for sarcasm classification task. This study focused on feature extraction techniques using intensifiers, sentiments, frequency counts of words, and special symbols in text as different ways of feature extraction. Finally, the extracted features are given as input to SVM, and the proposed model resulted with 78.82% accuracy. Although many other research works have been carried out in sarcasm classification task, they have used Twitter dataset which is noisy and sparse. It would result in poor vocabulary size since most of these informal words may not have indexed in vocabulary of pre-trained word embedding model. Therefore, to avoid sparsity of the
51 ADN-BERT: Attention-Based Deep Network Model Using BERT …
633
Table 51.1 Summary of related works on News Headlines dataset Work
Technique
Accuracy Results
Bharti et al. [1]
Implemented voted classification of 7 different machine learning classifiers
86.4%
Misra et al. [2]
Bi-LSTM + Attention + CNN
89.7%
Nayak et al. [3]
Compared multiple combinations of word embedding techniques and supervised classification techniques for sarcasm classification task
89.5%
Sharma et al. [4]
USE + BERT + LSTM autoencoder
90.8%
Zanchak et al. Logistic regression, Naïve Bayes classifier [5]
80.5%
Liu et al. [6]
BERT word embedding with LSTM
91.8%
Sharma et al. [7]
Ensemble-based technique with word embedding of GloVe and Word2vec with LSTM
88.9%
Shrikhande et al. [8]
CNN + GloVe embedding technique
86.13%
Ajnadkar et al. [9]
Bi-LSTM + Attention + CNN with word embedding deep learning-based model
89.19%
Mandal et al. [10]
Deep learning models based on CNN and LSTM
86.0%
Jariwala et al. Support vector machine (SVM) on News Headlines [11] dataset with optimal feature selection
78.82%
embedding matrix and errors in spellings, we have prefered News Headlines dataset which were formally written by professionals [2] (Table 51.1).
51.3 Exploratory Dataset Analysis (EDA) In this study, we have collected News Headlines dataset from Kaggle website. This dataset contains headlines of both sarcastic and non-sarcastic categories, collected from two different websites “The Onion” and “HuffPost”, respectively. The objective of considering this dataset is because of the way of formally written reviews by professionals so that probability of misspelling would be of less likelihood as well as low sparsity of the vocabulary matrix. The summary of the News Headlines dataset is shown in Table 51.2. There are totally 3 attributes present in this dataset: Headlines, “is_sarcastic”, and article_link. However, we have dropped article_link to improve feature selection method. Moreover, the total number of instances in this dataset is 28,619 with no missing or null values. The values are assigned 0 to non-sarcastic sentences and 1 to sarcastic sentences in “is_sarcastic” attribute of news headline dataset. In order to increase training size for the model to yield better performance, we
634 Table 51.2 Descriptive statistics of original News Headlines dataset
Table 51.3 Descriptive statistics of News Headlines dataset after applying text-augmentation technique
P. Mishra et al.
# Headlines
28,619
Sarcastic headlines (in %)
52
Non-sarcastic headlines (in %)
47.6
# Headlines
133,545
Sarcastic headlines (in %)
56.1
Non-sarcastic headlines (in %)
43.9
Max. word length in headlines
51
need large-sized dataset. Therefore, we have newly incorporated text augmentation technique such as nlp_aug algorithm to synthesize new data from existing data. The methods used by nlp_aug for text augmentation involve substituting words with their corresponding synonyms, replacing them with words having similar word embedding vector, also by using context-based transformer model to replace with words having similar contextual meaning, back translation, etc. After incorporating nlp_aug text augmentation with the original news headline dataset, we increased the size of our dataset. The statistics of the newly synthesized dataset is summarized in Table 51.3. The class distribution of sarcastic and non-sarcastic sentences of the newly synthesized dataset (Table 51.3), which we have considered our final dataset for this study, is shown in Fig. 51.1. The mean value of the data column (“is_sarcastic”) is approximately 0.43, which signifies that the dataset is not heavily affected by skewness and, thus, appropriated for our study.
Fig. 51.1 Class distribution of “is_sarcastic” attribute of News Headlines dataset
51 ADN-BERT: Attention-Based Deep Network Model Using BERT …
635
51.4 Proposed Methodology In this study, we have described our proposed model, i.e., the Attention-based deep network model using BERT (ADN-BERT) to classify sarcastic and non-sarcastic headlines from the dataset (Table 51.3). The ADN-BERT model comprises five stages such as input layer, embedding layer (BERT), attention layer, dense layer, and output layer. The architecture of (ADN-BERT) model is shown in Fig. 51.2.
51.4.1 Input Layer The News Headlines dataset are fed to the input layer, which is connected to the embedding layer and generates word embedding (i.e., dimensional vector/numerical representation) using Bidirectional Encoder Representations from Transformers
Fig. 51.2 Architecture of our proposed model (ADN-BERT)
636
P. Mishra et al.
(BERT). All input text are converted into its lower-case word before the initialization of tokenization in BERT.
51.4.2 Embedding Layer In this study, we have used “bert-base-uncased” as one of the types of BERT pretrained word embedding model for tokenization (i.e., encoding) of the input text. This model has been trained on lower-cased text and includes 110 million parameters with 768 number of hidden units. The BERT encoding process converts each input text into a sequence of numbers (tokens) of maximum word length (in this case, we have defined maximum 60 words of input text to be considered for tokenization and words beyond are ignored) that consists of input_ids and attention_masks. The input_ids represent tokenized input text, where each token is assigned with an integer corresponding to a unique word in the BERT vocabulary. The attention_ masks contains binary symbols to symbolize which tokens are tokenized and which token are padded to maintain the maximum length of the input text. Then, after the tokenization process, we create a model using TensorFlow Keras library for binary sarcasm classification task using two input tensors such as input_ids and attention_ masks. The model takes two inputs: input_ids and attention_masks. The input_ids is a tensor of shape (batch_size, 60), which represents the tokenized representation of the text inputs. The attention_masks is a tensor of the same shape, representing the attention masks for each token in the input sequence. In this study, we have taken batch_size as 32 as hyperparameter tuning. The first step in the model construction is to pass the input tensors through the pre-trained BERT model. The result of this is a tuple containing the hidden states of the BERT model, which are then extracted using indexing output (pooled output).
51.4.3 Attention Layer The extracted hidden states are then processed by an attention layer, which computes an attention score for each token in the input sequence. This score is used to compute a weighted sum of the hidden states, which is then passed through a dense layer with 32 units and a ReLU activation function. The result of this dense layer is then passed through a dropout layer to reduce overfitting.
51 ADN-BERT: Attention-Based Deep Network Model Using BERT …
637
51.4.4 Dense Layer Finally, the result of the dropout layer is passed through a dense layer with a single unit and a sigmoid activation function, which outputs a binary prediction for each input.
51.4.5 Output Layer The model is then compiled with the Adam optimizer, a binary cross-entropy loss function, and accuracy as a metric. The compiled model is returned as the final output of the function.
51.5 Experimental Results and Analysis This section describes the experimental results on News Headlines dataset achieved by our proposed model (ADN-BERT) and compares the results with other baseline models. We have split the dataset into three phases, i.e., training phase, validation, and testing phase, with the ratio of 60% for the training dataset, 20% for the validation dataset, and the rest 20% for testing phase, with randomly shuffled data to avoid overfitting and biasness. The performance comparison of other baseline models with our proposed model is shown in Table 51.4. Results of our proposed model’s accuracy and loss curves are shown in Figs. 51.3 and 51.4, respectively. Table 51.4 Performance comparison of baseline models and proposed model (ADN-BERT) on News Headlines dataset Model
Accuracy (in %)
Precision (in %)
Recall (in %)
F1-score (in %)
Nayak et al. [3]
89.5
89.5
90.2
89.7
Sharma et al. [4]
90.8
92
91.1
91.5
Sharma et al. [7]
88.9
91.1
88.67
89.87
Shrikhande et al. [8]
86.13
85.4
86.8
85.6
Proposed model
94.1
94.5
95.2
94.8
638
P. Mishra et al.
Fig. 51.3 Accuracy of our proposed model
Fig. 51.4 Loss curve of our proposed model
51.6 Conclusion and Future Work In our study, the proposed attention-based deep network model using BERT has achieved better results as compared to other recent baseline models in classifying sarcasm sentences with an accuracy of 94.1%. The BERT pre-trained word embedding with the attention model proved efficient in dealing with context-based word embedding, which resulted in better performances. In the future, more advanced deep learning models with optimization techniques can be applied for better results.
51 ADN-BERT: Attention-Based Deep Network Model Using BERT …
639
References 1. Bharti, S.K., Gupta, R.K., Pathik, N., Mishra, A.: Sarcasm detection in news headlines using voted classification. In: Proceedings of the 2022 Fourteenth International Conference on Contemporary Computing, pp. 208–212 (2022) 2. Misra, R., Arora, P.: Sarcasm detection using hybrid neural network (2019). arXiv preprint arXiv:1908.07414 3. Nayak, D.K., Bolla, B.K.: Efficient deep learning methods for sarcasm detection of news headlines. In: Machine Learning and Autonomous Systems: Proceedings of ICMLAS 2021, pp. 371–382. Springer Nature, Singapore (2022) 4. Sharma, D.K., Singh, B., Agarwal, S., Kim, H., Sharma, R.: Sarcasm detection over social media platforms using hybrid auto-encoder-based model. Electronics 11(18), 2844 (2022) 5. Zanchak, M., Vysotska, V., Albota, S.: The sarcasm detection in news headlines based on machine learning technology. In: 2021 IEEE 16th International Conference on Computer Sciences and Information Technologies (CSIT), vol. 1, pp. 131–137. IEEE (2021) 6. Liu, H., Xie, L.: Research on sarcasm detection of news headlines based on Bert-LSTM. In: 2021 IEEE International Conference on Emergency Science and Information Technology (ICESIT), pp. 89–92. IEEE (2021) 7. Sharma, D.K., Singh, B., Garg, A.: An ensemble model for detecting sarcasm on social media. In: 2022 9th International Conference on Computing for Sustainable Global Development (INDIACom), pp. 743–748. IEEE (2022) 8. Shrikhande, P., Setty, V., Sahani, A.: Sarcasm detection in newspaper headlines. In: 2020 IEEE 15th International Conference on Industrial and Information Systems (ICIIS), pp. 483–487. IEEE (2020) 9. Ajnadkar, O.: Sarcasm detection of media text using deep neural networks. In: Computational Intelligence and Machine Learning: Proceedings of the 7th International Conference on Advanced Computing, Networking, and Informatics (ICACNI 2019), pp. 49–58. Springer, Singapore (2021) 10. Mandal, P.K., Mahto, R.: Deep CNN-LSTM with word embeddings for news headline sarcasm detection. In: 16th International Conference on Information Technology-New Generations (ITNG 2019), pp. 495–498. Springer International Publishing, Berlin (2019) 11. Jariwala, V.P.: Optimal feature extraction based machine learning approach for sarcasm type detection in news headlines. Int. J. Comput. Appl. 975, 8887 (2020) 12. Goel, P., Jain, R., Nayyar, A., Singhal, S., Srivastava, M.: Sarcasm detection using deep learning and ensemble learning. Multimedia Tools Appl. 81(30):43229–43252 (2022)
Chapter 52
Prediction of Personality Type with Myers–Briggs Type Indicator Using DistilBERT Suresh Kumar Grandh, K. Adi Narayana Reddy, D. Durga Prasad, and L. Lakshmi
Abstract Personality prediction states to the distinguishing set of features of an individual which affects every individual habit, doings, point of view, and approaches that will help us to recognize how people socialize, impact, communicate, and cooperate. This work helps us to predict a person’s personality. The existing personality prediction system involves a trained psychologist conducting various tests to predict the personality. This is time-consuming and also error-prone. Many tests need to be conducted before predicting. Our proposed system uses a machine learning approach to predict the personality of users with their information and chats present in social media. As there exists a great correlation person’s personality and the behavior that they represent on social media networks in the way of comments or any activity this way of predicting a personality is very useful in marketing while choosing their business partners, various advertisement social media networks to categorize user behavior and vend more pertinent advertisements, by employers who wish to know the personality of their new employees.
52.1 Introduction Personality is a psychological pattern that explains a broad range of a particular individual’s attitude connected with that individual’s qualities. It also refers to the combination of qualities or characteristics that form a person’s idiosyncratic character. An idea of a personality prediction system is observed as a major yet vaguely constructed in the world of psychology. The existing personality prediction system S. K. Grandh (B) · K. Adi Narayana Reddy · D. Durga Prasad · L. Lakshmi ICFAI Foundation for Higher Education, ICFAITech (FST), Hyderabad, India e-mail: [email protected] K. Adi Narayana Reddy e-mail: [email protected] D. Durga Prasad e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_52
641
642
S. K. Grandh et al.
involves a trained psychologist to conduct various tests to predict the personality. Hence, psychologists would considerably produce more specified, unbiased tests of personality. In the past few years, the usage of social media such as Facebook, YouTube, and Instagram has been growing rapidly and become a popular platform where people of different kinds express their likes, dislikes, interact with each other, and share knowledge. As these platforms act as a container for a huge amount of user information, they can be utilized to predict the personality of an individual. Since social media accounts enable us to post about our preferences, feelings, and point of view without the fear of being judged, these actions can be used to understand the personality of a particular user. Surveys or research on personality prediction are done on a large scale based on the MBTI model. MBTI is a self-analysis investigation designed to demonstrate specific emotional patterns on how individuals understanding and take decisions on their surroundings. This type of system recognizes 16 forms of personality, based on four dimensions. The main advantages of this work are screening potential candidates, marketing, advertising, and many other domains. The goal of the proposed system is to make use of deep learning models to construct a model which accepts the text of a particular individual’s social media profile as input (social media post) and produces the predicted personality of that person-based MBTI Type Indicator.
52.2 Related Work Personality prediction mainly works on individual characteristics of a person, communication skills, and habits. It decides how an individual behaves, responds to a situation, communication with others, and launches a person’s desires and aspects [1]. Personality analysis is very useful in most of real-life situations like in HR interviews for analyzing the person for a job role and medical records analysis. The authors of [2] created a machine learning model for analyzing the attributes for personality of an individual. They also used Myers–Briggs Indicator dataset with 8500 tweets of persons belonging to four traits. In [3], the authors carried out a reasonable study with several classifiers and feature vector combinations to predict personality. The classifiers used are neuralnet classifier, naive Bayes classifier, and SVM classifier. In [3], the researchers used Logistic Regression and support vector machine to predict personality. The obtained multi-class accuracy was around 31%. They suggested using binary classification to generate higher accuracy. In [4], the authors used LSTM model with good hyperparameters like dropout rate of 0.1, kernel initialized to zero, recurrent dropout rate of 0.2, and sigmoid activation function. In [5], the authors implemented RNN model and found that the precision of the recurrent neural network model with the user classification procedure was better than the recurrent neural network model using the post-classification methodology. In [6], the authors used multiple algorithms like Random Forest, Extra Trees, and XGBoost. They reported that although Gradient Boosting classically
52 Prediction of Personality Type with Myers–Briggs Type Indicator Using …
643
performs improved on most datasets after parameter optimization, it likely performed slightly inferior in this case due to the quantity of noise existing in the data, which is predictable seeing the source of the dataset. In [7], the authors used multiple algorithms with several feature vector combinations. In [8], the authors observed Random Forest as the best model. In [9], the authors implemented Logistic Regression, KNN, and XGBoost. In [10], the authors implemented Extra Tree Classifier, naive Bayes classifier, Logistic Regression, and SVM. The results of the project showed that the Logistic Regression can exhibit better accuracy by tuning its parameters accordingly, and the accuracy estimates are considerably good. For the forthcoming study, they plan to accumulate more datasets and use the XGBoost procedure, their planning’s, and other procedures to advance this forecast classification.
52.3 Methodology Personality prediction plays a vital role in recommendation systems, marketing science, employee recruitments, counseling, and advertisement. In our proposed methodology, we have used MBTI dataset with four traits. The proposed system architecture is shown in Fig. 52.1.
Fig. 52.1 Architecture
644
S. K. Grandh et al.
52.3.1 Architecture 52.3.2 Dataset In this work, we made use of the (MBTI) Myers–Briggs Personality Type Dataset which was openly accessible in Kaggle. The Myers–Briggs Type Indicator is a personality type model which distinguishes every person into 16 personality sorts across four dimensions: • • • •
Introversion (I)—Extroversion (E) Intuition (N)—Sensing (S) Thinking (T)—Feeling (F) Judging (J)—Perceiving (P).
This dataset involves around 8600 rows of data and two columns. The two columns are—type and posts. The type column comprises the 16 personality types, and the posts column is the raw text which contains the person’s last 50 posts. The data was gathered through a Personality Café Forum.
52.3.3 Preprocessing In the proposed research work, the data is collected from online Web sites, the data is not present in proper state and dataset is biased. After complete evaluation of the dataset, we have applied some preprocessing techniques. The steps include—data cleaning, splitting of the individual tweets, and sampling them based on the relative frequency of class labels and creating individual test/train splits for each personality type (total four types). In data cleaning, the tweets are changed to lowercase and all HTTP URLs are dropped. Splitting of tweets using the separator ||| is done, each row is split into multiple rows (each tweet on a single row). Then five rows are merged into one row so that each row consists of five tweets instead of 50 per row. Sampling is done where a label type having unbalanced data (I/E and N/S) is under sampled by 1: 5
52.3.4 Encodings Using Hugging-Face “distilbert-base-uncased” model’s AutoTokenizer, all the tweets are converted into encodings—sub-string encodings and attention masks.
52 Prediction of Personality Type with Myers–Briggs Type Indicator Using …
645
52.3.5 Training Using Hugging-Face AutoModel For Sequence Classification and “distilbert-baseuncased” with the classification headset to two classes, the model is fine-tuned with the training data for each MBTI type (i.e., I/E, N/S, F/T, P/J) for three epochs with a learning rate of 0.00002 and a batch size of 16 or a lesser batch size based on GPU allocated. A trial and error is made with the batch size based on the GPU Type and RAM allocated by Google Colab.
52.4 Results The accuracy of the model for each trait is tabulated as follows. A simple UI is built using Flask to facilitate users in accessing the personality prediction. The tweets of Elon Musk and Narendra Modi are taken from Kaggle for testing.
52.4.1 Comparative Study of Accuracies: Table 52.1 shows the accuracies obtained from various machine learning models for personality prediction. Hugging-Face Transformers (AutoModel For Sequence Classification and Distilbert-base-uncased) with the classification headset to two classes, we trained the model for each MBTI type (i.e., I/E, N/S, F/T, P/J) for three epochs with a learning rate of 0.00002 and a batch size of 16 or a lesser batch size based on GPU allocated. We have implemented naive Bayes as a baseline model which tends to overfit as it considers each feature as independent. The comparative results obtained in naïve Bayes with DistilBERT are shown in Fig. 52.2. Later we have added the GloVe embedding’s and have implemented the RNN model. We used a sequential model and added embedding’s layer, LSTM layer, dense layer with sigmoid activation, and Adam optimizer. The comparative results obtained in RNN with DistilBERT are shown in Fig. 52.3. Table 52.1 Accuracies obtained from various models Type
Naïve Bayes
RNN
Ada Boost
Gradient Boosting
Random Forest
DistilBERT
I/E
53.19
50.56
51.11
51.72
50.66
60.57
N/S
52.68
55.46
53.32
52.95
50.76
72.27
F/T
57.91
53.07
56.18
57.20
53.95
72.27
P/J
53.94
43.48
50.47
51.67
51.10
72.27
646
S. K. Grandh et al.
Fig. 52.2 Accuracy comparison between naive Bayes and DistilBERT
Fig. 52.3 Accuracy comparison between RNN and DistilBERT
We then implemented ensemble models to improve the accuracy of prediction. We implemented an AdaBoost classifier with n_estimators = 500 and learning_rate = 0.1. The comparative results obtained in AdaBoost with DistilBERT are shown in Fig. 52.4. Later we implemented a Gradient Boosting classifier with n_estimators = 500 and learning_rate = 0.1. Gradient Boosting algorithms performance is good but its performance is almost equivalent to AdaBoost algorithm. The comparative results obtained in Gradient Boosting with DistilBERT are shown in Fig. 52.5. Later we implemented a Random Forest classifier with n_estimators = 5 and learning_rate = 0.1. The performance of Random Forest classifier is good when
52 Prediction of Personality Type with Myers–Briggs Type Indicator Using …
647
Fig. 52.4 Accuracy comparison between AdaBoost and DistilBERT
Fig. 52.5 Accuracy comparison between Gradient Boosting and DistilBERT
compared with Gradient Boosting and AdaBoost classifiers. The comparative results obtained in Random Forest with DistilBERT are shown in Fig. 52.6. Finally, we have implemented DistilBERT which provides superior accuracy when compared with naïve Bayes, RNNs, AdaBoost, Gradient Boosting, and Random Forest. Even though it provides good accuracy, accuracies are not pretty good due to some imbalance in the traits of the dataset.
648
S. K. Grandh et al.
Fig. 52.6 Accuracy comparison between Random Forest and DistilBERT
52.5 Conclusion The objective of this project is to give a machine learning model to Personality Type Prediction with the Myers–Briggs Type Indicator. Social media is extensively used by many people to share their thoughts and opinions on various scenarios. The posts shared on social media can be used to draw required conclusions on the user’s personality. We have used the DistilBERT classifier to predict personality based on the posts from social media. In the future, we would like to improve the accuracy by improving the quality of the dataset and also trying out the full BERT models.
References 1. Das, K., Prajapati, H.: Personality identification based on MBTI dimensions using natural language processing. Int. J. Creative Res. Thoughts (IJCRT) 8(6), 1653–1657 (2020). ISSN: 2320-2882 2. Bharadwaj, S., Sridhar, S., Choudhary, R., Srinath, R.: Persona traits identification based on Myers-Briggs Type Indicator (MBTI)—a text classification approach. In: 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 1076–1082 (2018). https://doi.org/10.1109/ICACCI.2018.8554828 3. Vaddem, N., Agarwal, P.: Myers–Briggs personality prediction using machine learning techniques. Int. J. Comput. Appl. 175(23), 41–44 (2020) 4. https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1184/reports/6839354.pdf 5. Amirhosseini, M.H., Kazemian, H.: Machine learning approach to personality type prediction based on the Myers-Briggs type indicator®. Multimodal Technol. Interact. 4, 9 (2020). https:// doi.org/10.3390/mti4010009 6. Gottigundala, T.: Predicting personality type from writing style (2020) 7. Patel, S., Nimje, M., Shetty, A., Kulkarni, S.: Personality analysis using social media. Int. J. Eng. Res. Technol. (IJERT) 09(03) (2020)
52 Prediction of Personality Type with Myers–Briggs Type Indicator Using …
649
8. Abidin, N., Akmal, M., Mohd, N., Nincarean, D., Yusoff, N., Karimah, H., Abdelsalam, H.: Improving intelligent personality prediction using Myers Briggs type indicator and random forest classifier. Int. J. Adv. Comput. Sci. Appl. 11 (2020). https://doi.org/10.14569/IJACSA. 2020.0111125 9. Shilpa, R., Supriya, V., Sweta, P., Vinaya Varshini, R., Uday Shankar, S.V.: Personality prediction using machine learning. Sci. Eng. J. (2021) 10. Chaudhary, S., Singh, R., Hasan, S.T., Kaur, M.I.: A comparative study of different classifiers for Myers-Brigg personality prediction model. Int. Res. J. Eng. Technol. (IRJET) (2018)
Chapter 53
A Novel Deep Learning Approach to Find Similar Stocks Using Vector Embeddings Rohini Pinapatruni and Faizan Mohammed
Abstract In today’s highly integrated and globalized world, it’s been observed many times that a group of stocks move in tandem with each other, showing almost similar price movements. Stocks belonging to the same sector usually exhibit this phenomenon. But, the relation/similarity between stocks can be beyond just belonging to the same sector. Also, the relationship between stocks can vary over time. In this paper, an autoencoder is created to find similar stocks in a certain period of time using vector embeddings by using deep learning approach. The premise is that stocks with similar vector representation (bottleneck of autoencoder) will also be similar and would display similar behavior and characteristics. Stock data of Nifty 50 was considered, and performance is evaluated by using Euclidean, cosine, and a combination of Euclidean and cosine similarities, and it is proved that the Euclidean distance is outperforming other metrics.
53.1 Introduction In recent times, there has been significant progress in the fundamental aspects of information technology, which has transformed the course of business financial markets, as one of the most fascinating innovations with a considerable impact on the economy of any nation. The financial market is a complex and dynamic system in which individuals actively sell and buy stocks, currency shares, and derivatives via virtual platforms backed by a few brokers. A stock market facilitates investors to purchase shares of publicly traded firms by means of exchange or over-the-counter trading. One of the most quintessential indicators by which the health of any economy can be estimated is by observing the state of the stock market of that economy. The financial well-being of most people in most nations is directly or indirectly affected R. Pinapatruni (B) · F. Mohammed Department of Data Science and Artificial Intelligence, Faculty of Science and Technology (IcfaiTech), ICFAI Foundation for Higher Education, Hyderabad, India e-mail: [email protected] F. Mohammed e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_53
651
652
R. Pinapatruni and F. Mohammed
by fluctuations in the stock market. Modeling stock prices and direction are very important problems in Quantitative Finance. At any moment in time, there are countless factors that affect the price of a stock. All the participants in the market are most probably analyzing every single factor and all publicly available information regarding the company, in order to value the company and figure out the fair price of the stock, given all the available information at that moment in time. Every investor does his own research, has his own biases, and may arrive at different conclusions regarding what the price of a stock should be. Nobody can say with 100% certainty what the future business growth and profitability of a company can be. So, there is a lot of uncertainty regarding the price of the stock since the future attributes of the company are uncertain. Therefore, uncertainty or stochasticity is inherent to the stock market. Also, there are many stocks in the world, which are inter-connected to each other and may affect one and other. Stocks can be positively correlated or negatively correlated. This correlation structure in the stock is not stationary, meaning the statistical nature of relation between stocks changes through time. Researchers and analysts have devised tools and techniques to forecast stock price changes and find the similarity among stocks and assist investors in making intelligent decisions. Researchers can use advanced trading algorithms to predict the market by means of non-conventional textual data from media. Usage of sophisticated machine learning methods such as text data analytics and ensemble algorithms has significantly improved the accuracy of prediction. Further, dynamic, inconsistent chaotic data, stock market analysis, and prediction became one of the most complex topics.
53.2 A Review of Existing Models In recent years, deep learning has transformed many fields. Deep learning has brought much growth and advancement to the domains of computer vision (CV) and Natural Language Processing (NLP). The introduction of Convolutional Neural Networks (CNN) [1] has made deep learning the state of the art for all computer vision tasks [2]. Deep learning is also a state-of-the-art approach for NLP [3]. Before the advent of transformer-based architecture [4] and BERT [5], long short-term memory (LSTM) was mainly used for NLP tasks. The application of machine learning and deep learning in the field of finance has been mainly focused on stock price prediction [6–8] and volatility forecasting [9]. There is a lack of research on applying deep learning techniques to important tasks such as portfolio creation and optimization. Effectively figuring out the complex relationships and correlations between various stocks and other financial instruments that are traded in the financial markets is essential and crucial for portfolio construction, optimization, and management.
53 A Novel Deep Learning Approach to Find Similar Stocks Using Vector …
653
Harry Markowitz in his pioneering work [10] used covariance between returns of stocks as a way for measuring the relationship between stocks and also assessing risk. Even today [10] is the core foundation of the portfolio management techniques that are currently used. Correlation has become the standard way of measuring similarity between time series of financial assets. This work is inspired by innovations in Natural Language Processing (NLP). NLP has particularly benefitted from advancements in deep learning, approaches to language modeling. Long short-term memory (LSTM) neural networks [11], transformers [4], and BERT [5] are the deep learning approaches that are currently being used widely in NLP. One of the key innovations in NLP that has significantly contributed to these above-mentioned methods and to the field of NLP is the concept of a vector embedding. Specifically, words are represented numerically by a vector. These vectors are called Word Embeddings in the field of NLP. Dolphin et al. [12] used time series of stock returns for creating vector embedding, Word embeddings have been shown to be capable of numerically capturing the essence of the word it’s representing and can even capture the semantic relation between two words. For example, the vector representation of the word “Airplane” and “Aircraft” would be very similar and would reside very close to each other in the embedding space. Two most popular methods of creating word embeddings are GloVe [13] and Word2Vec [14]. Embeddings are heavily used in NLP for representing language data in NLP. The application of embeddings in the financial domain has been limited to the NLP idea of word embeddings. Word embeddings have been in financial ML for the purpose of sentiment analysis of stock related news. Then, the extracted sentiment information is fed to main forecasting model as a feature [15]. Vector embeddings can also be used for handling categorical data. Rather than using one-hot encoding, which is sparse and inefficient, we can use entity embedding for numerically representing the categorical features in data [16]. De-autoencoders [17] are neural networks that are capable of compressing high-dimensional input data into a condensed lower dimensional space. A lower dimensional representation of the input can be thought of as a vector embedding of the input with only essential information that is required by the decoder for reconstruction of the input. The autoencoder is a nonlinear transformation, contrary to principal component analysis (PCA), which makes the autoencoder more flexible and powerful [18]. Autoencoder can also be used for the purpose of dimensionality reduction [19]. Image reconstruction networks are also proposed by using encoder and decoder architectures [20, 21]. Significant contributions of the proposed approach are • Create a denoising autoencoder that is capable of encoding stock time series data to a low-dimensional latent vector space. • Use the vector representation of stock data to find similar stocks during a specified time frame. • Experiment with different similarity measures such as Euclidian, cosine, and a combination of these metrics.
654
R. Pinapatruni and F. Mohammed
Manuscript is organized as follows: Brief introduction and existing state-of-the-art models are discussed in Sect. 53.1. A detailed discussion of the proposed denoising autoencoder model is given in Sect. 53.2. Section 53.3 gives the implementation and training details of the proposed model, and the results are discussed in Sect. 53.4. The concluding remarks of the proposed work are discussed in Sect. 53.5.
53.3 Proposed Approach Proposed work creates vector embeddings (latent representation of stock) using autoencoder to find similar stocks in a certain period of time. The most important layer in autoencoder is the bottleneck layer which holds the compressed nonlinear latent representation of the input. The proposed approach is divided into three sub-parts. 1. Creating a denoising autoencoder model 2. Implementing the autoencoder using vector embeddings to predict similar stocks 3. Result analysis using different similarity metrics.
53.3.1 Creating a Denoising Autoencoder Model Autoencoder can be a very powerful and expressive dimensionality reduction method. It’s akin to a nonlinear principal component analysis (PCA). The autoencoder consist of three parts encoder, bottleneck (latent space), and decoder as shown in Fig. 53.1. Encoder is the part of the autoencoder which is responsible for compressing the input to a lower dimensional latent representation that holds as much information about the input as possible. Bottleneck layer is also called the latent representation layer because this is the lower dimensional vector representation of a given input. Latent representation is taken as input by the decoder layer and tries to reconstruct the input with less number of errors. Layers of long short-term memory (LSTM) for both encoder and decoder part of the autoencoder are used in proposed model. LSTM is able to capture the temporal nature of the time series stock data. LSTM unit contains three gates: an output gate, a forgetting gate, and an input gate. Rules can be applied to information as it enters the network. Information that adheres to the algorithm will be retained. Information that does not conform will be erased via the forgetting gate. Long short-term memory (LSTM) with embedded layer and with automated encoder is employed. LSTM is utilized instead of RNN to avoid bursting and disappearing gradients. Proposed model architecture is shown in Fig. 53.3.
53 A Novel Deep Learning Approach to Find Similar Stocks Using Vector …
655
Fig. 53.1 Autoencoder architecture
53.3.2 Implementation Using Vector Embeddings The autoencoder tries to recreate the input as perfectly as possible. So, the target is input itself. Thus, it’s an unsupervised regression problem. It’s a regression problem because prices are usually positive real numbers (continuous). The autoencoder model consists of three parts. The encoder, decoder, and the bottleneck. All three components make use of LSTM layers for capturing the temporal nature of the data. Dense, LSTM, GRU, batch normalization of TensorFlow are used for autoencoder deep learning model building. Vector representation of all stocks data is obtained from the model. Autoencoder model is build using the following steps as shown in Fig. 53.2. 1. Data collection and preprocessing 2. Train the model 3. Make similarity matrices.
Fig. 53.2 Process flow
656
53.3.2.1
R. Pinapatruni and F. Mohammed
Data Collection and Preprocessing
Stock data of NIFTY50 is collected using YFinance (yahoo finance) Python library and preprocessed for missing values, wrong values, etc. For standardizing the time series data d, the moving average and similarly moving standard deviation are used with a window size of 10 days. The longer the window size the less reactive the moving average is to the current market state and if the window size is small then the moving average fluctuates too much.
53.3.2.2
Training the Autoencoder
Python is used to train the model, while MATLAB is utilized to reduce the dimensions of the input. My SQL is used for storing and retrieval. The model is fairly small with ~ 3300 parameters. Batch normalization layers are used for taking care of internal data drift and also for regularizing the model. Dropout layers are also used at various places in the model to avoid overfitting and also to encourage the model to learn new pathways for compressing the input data. For optimizer, Adam [22] with a small learning rate of 0.0003 was used. Small ensures that the model converges as close the global minima as possible. Proposed model can run on commodity hardware. Intel i5 processor with 8 GB RAM, 3 GB Nvidia graphic unit, 4 cores run at 2.2 GHz. Training phase-1 takes 5–10 min, while the second phase takes a few seconds to get the similarity matrix. Gaussian Noise Addition While Training Gaussian noise with relatively small standard deviation is added to the input for encouraging the model to generalize and also as tool to avoid overfitting. Financial time series data has a very low signal-to-noise ratio, so adding noise also makes sure the same data is never seen by the model, which helps in out-of-sample performance. The mean or center of the Gaussian from which noise is sampled is constant at zero, but at the beginning of every epoch, the standard deviation is sampled from a uniform distribution spreading between 0.005 and 0.015. Addition of Gaussian noise helps to compress the input to a lower dimensional latent space and to remove noise from the input data. Custom TensorFlow callback is used for sampling and changing the standard deviation of the noise Gaussian while training. Orthogonal Regularization Orthogonal regularizer encourages vectors to be orthogonal to each other. Being orthogonal means the correlation between vectors is zero, in other words they are independent of each other and capture unique things about the data and aren’t affected by same factors. L2 orthogonal regularizer was used.
53 A Novel Deep Learning Approach to Find Similar Stocks Using Vector …
657
53.3.3 Similarity Metrics Similarity measure is done to know how much similar two data objects are. It is a data mining or machine learning concept where dimensions represent features of objects. For smaller distance, features exhibit a high degree of similarity, while for larger distance, features exhibit a low degree of similarity. A similarity matrix is created to find the similarity measure. Similarity measures used in the current work are cosine similarity, Euclidian distance, and a Hybrid of Cosine and Euclidian Similarity Measures. Cosine Similarity Cosine similarity measure calculates the normalized dot product of two attributes. By determining the cosine similarity, cosine of the angle between the two objects is found. Cosine of 0° is 1, and < 1 for any other angle. It is a judgment of orientation. Two vectors with the same orientation have a cosine. Euclidian Distance Euclidean distance is one of the most commonly used distance measure. It is zero if the vectors are similar and positive for non-similar vectors. It belongs to the interval [0, + infinity]. But, it should be bounded, and similar vectors should yield output of 1 and 0 if the vectors not similar. So, one is added to the Euclidian distance and then inverted. Now the output belongs to the interval [0, 1], as required. A Hybrid of Cosine and Euclidian Similarity Measures The hybrid approach is the product of cosine similarity and Euclidian similarity. Cosine similarity does not take into account the magnitude the vectors; it only measures the cosine of the angle between them, which means all available information is not utilized. On the other hand, Euclidian distance only measures the distance between two vectors and does not take into account the orientation of those vectors. Thus, two similarity measures are combined to consider both cosine angle and distance between the two vectors (Fig. 53.3).
53.4 Results and Discussions The performance of the proposed denoising autoencoder model has been analyzed in this section. Stock data of NIFTY50is used. Historical stock data table includes information such as the beginning price, the maximum price, the lowest price, the transaction date, the closing price, the volume, and so on. As it is visible, the all three similarity measures are not significantly different from each other and are highly positively correlated. Figure 53.4 represents the similarity matrix heat map. Figure 53.5 represents returns from different similarity measures.
658 Fig. 53.3 Autoencoder model architecture
R. Pinapatruni and F. Mohammed
53 A Novel Deep Learning Approach to Find Similar Stocks Using Vector …
659
Fig. 53.4 Similarity matrix heat map
Fig. 53.5 Returns from different similarity measures
53.4.1 Eigen Portfolio Eigen portfolios are created to ascertain which similarity measure is the best. A covariance matrix is similar by definition to the correlation matrix (in the sense of a linear transformation similarity); the Eigen values will be the same, and the Eigen vectors have a 1–1 correspondence between them. This matrix has few properties
660
R. Pinapatruni and F. Mohammed
in the case of stock returns. It is a symmetric matrix, and hence, its Eigen values are positive, and its eigenvectors are orthogonal to each other. Eigen vectors are the “Eigen portfolios,” strategy weight allocations which are uncorrelated to other Eigen portfolios. Eigen values are the “risk” of the given Eigen portfolio. Figures 53.6 and 53.7 represent cumulative explained variance and weights derived from market mode, respectively. The Eigen vector associated with the highest Eigen value is called the market mode. It captures the most important economic factors, such as inflation, interest rates, currency exchange rate, and other essential macro-economic factors, that affect every capitalistic activity on a wide scale.
Fig. 53.6 Cumulative explained variance
Fig. 53.7 Weights derived from market mode
53 A Novel Deep Learning Approach to Find Similar Stocks Using Vector …
661
Fig. 53.8 Stock weights derived from cosine similarity matrix
Fig. 53.9 Stock weights derived from Euclidian similarity matrix
Stock weights derived from cosine similarity matrix are shown in Fig. 53.8. Stock weights derived from Euclidian similarity matrix are shown in Fig. 53.9. Stock weights derived from hybrid similarity matrix are shown in Fig. 53.10.
53.4.2 Vector Embeddings and Model Output The trained autoencoder model has a four-dimensional bottleneck, which means input is compressed to be represented by a four-dimensional vector, and we use orthogonal
662
R. Pinapatruni and F. Mohammed
Fig. 53.10 Stock weights derived from hybrid similarity matrix
regularlizer to make sure that each dimension captures as much distinct information about the input as possible. Vector embeddings of Sunpharma through time are represented in Fig. 53.11. The reconstructed input is very smooth and captures the trend and seasonality in the data. Graphical representation of real versus reconstructed stock similarities is shown in Fig. 53.12.
Fig. 53.11 Vector embedding of Sunpharma through time
53 A Novel Deep Learning Approach to Find Similar Stocks Using Vector …
663
Fig. 53.12 Real versus reconstructed
53.5 Conclusion Proposed denoising autoencoder deep learning model performs effectively when applied for compression high-dimensional heteroscedastic time series data, such as stock market data. The proposed autoencoder trained was able to successfully compress the input stock price time series to a four-dimensional latent space. Performance is evaluated using different similarity measures for constructing the similarity matrix from the obtained vector embeddings from the trained autoencoder. Stock weights derived from the Euclidean similarity matrix out performs with other metrics. The future direction for further work in order to improve the model can be by using some other probability distribution other than Gaussian as noise by experimenting with different bottleneck size and to incorporate multi-head attention for more robust model and better context modeling.
References 1. Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791 2. Zaidi, S.S.A., Ansari, M.S., Aslam, A., Kanwal, N., Asghar, M.N., Lee, B.: A survey of modern deep learning based object detection models. CoRR. https://arxiv.org/abs/2104.11892 (2021) 3. Torfi, A., Shirvani, R.A., Keneshloo, Y., Tavaf, N., Fox, E.A.: Natural language processing advancements by deep learning: a survey. CoRR. https://arxiv.org/abs/2003.01200 (2020) 4. Vaswani, A., et al.: Attention is all you need. CoRR. http://arxiv.org/abs/1706.03762 (2017) 5. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR. http://arxiv.org/abs/1810.04805 (2018) 6. Mehtab, S., Sen, J.: Stock price prediction using CNN and LSTM-based deep learning models. In: Proceedings of the 2020 International Conference on Decision Aid Sciences and Application (DASA), pp. 447–453 (2020)
664
R. Pinapatruni and F. Mohammed
7. Ghosh, P., Neufeld, A., Sahoo, J.K.: Forecasting directional movements of stock prices for intraday trading using LSTM and random forests. CoRR. https://arxiv.org/abs/2004.10178 (2020) 8. Kamalov, F., Smail, L., Gurrib, I.: Stock price forecast with deep learning. In: Proceedings of the 2020 International Conference on Decision Aid Sciences and Application (DASA), pp. 1098–1102 (2020) 9. Liao, S., Chen, J., Ni, H.: Forex Trading Volatility Prediction Using NeuralNetwork Models. arXiv preprint arXiv:2112.01166 (2021) 10. Markowitz, H.: Portfolio selection. J. Fin. 7(1), 77–91 (1952). https://doi.org/10.2307/2975974 11. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 12. Dolphin, R., Smyth, B., Dong R.: Stock Embeddings: Learning Distributed Representations for Financial Assets. arXiv preprint arXiv:2202.08968 (2022) 13. Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014). https://doi.org/10.3115/v1/D14-1162 14. Mikolov, T., Chen, K., Corrado, G.S., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR (2013) 15. Rahimikia, E., Zohren, S., Poon, S.H.: Realised Volatility Forecasting: Machine Learning Via Financial Word Embedding. arXiv preprint arXiv:2108.00480 (2021) 16. Guo, C., Berkhahn, F.: Entity embeddings of categorical variables. CoRR. http://arxiv.org/abs/ 1604.06737 (2016) 17. Bengio, Y., Yao, L., Alain, G., Vincent, P.: Generalized denoising auto-encoders as generative models. CoRR. http://arxiv.org/abs/1305.6663 (2013) 18. Ladjal, S., Newson, A., Pham, C.H.: A PCA-like autoencoder. CoRR. http://arxiv.org/abs/1904. 01277 (2019) 19. Fournier, Q., Aloise, D.: Empirical comparison between autoencoders and traditional dimensionality reduction methods. CoRR. https://arxiv.org/abs/2103.04874 (2021) 20. Pinapatruni, R., Shoba Bindu, C.: Learning image representation from image reconstruction for a content-based medical image retrieval. SIViP 14, 1319–1326 (2020) 21. Pinapatruni, R., Chigarapalle, S.B.: Adversarial image reconstruction learning framework for medical image retrieval. SIViP 16, 1197–1204 (2022) 22. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412. 6980 (2014)
Chapter 54
Investigating Vulnerabilities of Information Solicitation Process in RPL-Based IoT Networks Rashmi Sahay
and Cherukuri Gaurav Sushant
Abstract The Routing Over Low power and Lossy network (ROLL) working group of IETF proposed the IPv6 Routing Protocol over Low Power and Lossy network (RPL) to fulfill multiple traffic requirements of the Internet of Things (IoT) environments. RPL structures ubiquitous IoT networks in the form of Directed Acyclic Graphs (DAGs) that comprises basic sensor nodes (motes), motes with routing capability, and one or more root (sink) nodes. RPL ensures optimal topological organization of IoT networks by its rank and version number properties. However, vulnerabilities in the RPL routing process pose threats to several protocol violations in the IoT environment. In RPL, information solicitation is the process by which sensor nodes seek network configuration information required to join an IoT network. The paper investigates the vulnerabilities associated with the information solicitation process in RPL-based IoT networks that leads to flooding of control messages. Consequently, there is energy and queue buffer depletion of sensor nodes in IoT networks. On the one hand, queue buffer depletion may lead to the dropping of packets by sensor nodes. On the other hand, energy depletion of sensor nodes may altogether make them nonfunctioning. Therefore, IoT network administrators must pay attention to the device’s power consumption. Recent research focuses on energy depletion due flooding of control packets. Nevertheless, flooding also affects other significant parameters like, the network convergence time, beacon interval and packet arrival rate. In this work, we perform an in-depth evaluation of the performance of RPL-based IoT networks under the threat of vulnerabilities associated with information solicitation process. The results of our analysis will assist in devising mitigation mechanisms.
R. Sahay (B) Department of Computer Science and Engineering, Faculty of Science and Technology, IcfaiTech, The ICFAI Foundation For Higher Education, Hyderabad, India e-mail: [email protected] C. G. Sushant Department of Data Science and Artificial Intelligence, Faculty of Science and Technology, IcfaiTech, The ICFAI Foundation For Higher Education, Hyderabad, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_54
665
666
R. Sahay and C. G. Sushant
54.1 Introduction The phenomenal growth of Internet of Things-based applications in recent times has increased the security breaches. IoT environments comprise three significant components, namely the edge tire, the platform tire, and the enterprise tire. The edge tire constitutes network of sensors, RFIDs, and actuators, a.k.a Low power and Lossy Networks (LLNs) that realizes IoT’s vision to connect all physical devices to the Internet. IoT-based applications like home automation, industrial automation, and smart cities entail large-scale deployment of IoT nodes that must handle Point to Point traffic (P2P), Multipoint to Point traffic (MP2P), and Point to Multipoint traffic (P2MP). In order to fulfill the traffic requirements, IPv6 Routing Protocol over Low Power and Lossy networks (RPL) [1] was proposed by IETF. Although RPL fulfills all traffic requirements of IoT-LLNs, its vulnerability to malicious intruders increases security risks [2]. Insider security threats in RPL-based Internet of Things lead to depletion of node resources, sub-optimal routing paths, and increased packet loss [3]. Recently, many researchers have worked on analyzing the vulnerabilities of RPL against routing attacks. Nevertheless, there is a lacking of research in terms of risk analysis associated with RPL attacks and development of robust security mechanism. In this work, we focus on analyzing the vulnerabilities of RPL against information solicitation attacks and the associated performance impact on the IoT network. Recent research focuses on flooding of information solicitation messages, commonly known as Hello flooding attacks [4]. However, an in-depth analysis of the parameters associated with information solicitation messages that may be exploited to sub-optimize the performance of IoT-LLNs is lacking. In our work, we not only address this research gap but also perform an analysis of the flooding of UDP data packets and compare it with the flooding of information solicitation messages. The following subsection discusses the RPL’s information solicitation process and its vulnerabilities.
54.2 RPL and Information Solicitation RPL structures motes in IoT-LLNs in the form of Directed Acyclic Graphs (DAGs). A DAG may consist of single or multiple Destination Oriented Directed Acyclic Graphs (DODAGs). A DODAG consists of the following types of nodes: – Basic Sensor nodes: These nodes are the terminal nodes in a DODAG. – Sink (root) node: A sink node collects data packet from other sensor nodes in the DODAG. A sink node is also responsible of disseminating network configuration information by broadcasting DODAG Information Object (DIO) messages. DIO carries configuration parameters like ranks, version number, mode of operation, objective function, etc., that are used by the motes in the DODAG to reckon the most efficient traffic route to the sink node. Sink node broadcasts DIO messages periodically governed by trickle algorithm [5].
54 Investigating Vulnerabilities of Information Solicitation …
667
– Sensor nodes with routing capabilities: In the case of IoT application that requires large-scale deployment of sensor nodes, the sink node cannot be in the communication range of all the other motes in the IoT-LLN. Therefore, sensor nodes with routing capabilities are required that may receive data packets from neighboring nodes and forward them to the sink directly or via another router node. A new node may join a DODAG by multicasting DODAG information solicitation (DIS) messages to nodes in its communication range that are already part of the DODAG. When the neighboring nodes receive a DIS message, they suspend their data forwarding operations and respond to the DIS request with a DIO message. The receiving node weighs up the rank values in the received DIO messages and selects a routing node with the least rank value as its ancestor node. The node then unicasts a DODAG Advertisement Object (DAO) message to the ancestor. The ancestor forwards the DAO message to the sink. Rank depicts the proximity of a node to the sink node in a DODAG. Therefore, a new node must always choose a node with minimum rank as its parent node. Rank is computed based on the objective function and routing metric advertised in the DIO messages. In this paper, the Expected Transmission Count (ETX) is used as the routing metric. RPL can organize DODAGs and satisfy the three routing requirements, namely MP2P, P2MP, and P2P, using just the DIO and DAO messages. However, in the case of unavailability of DIS control messages, new nodes trying to join a DODAG will have to wait for the next trickle expiry to receive network configuration information through DIO messages. As a result, the joining process of the nodes in the DODAG may get delayed. To avoid this delay, RPL allows the use of DIS messages that a new node may use to solicit configuration information from neighboring nodes. Nevertheless, malicious nodes may exploit DIS messages to instigate flooding attacks. In the following subsection, we explain the threat posed by exploiting DIS messages and the associated parameters.
54.2.1 Threat Model of Information Solicitation Attacks A node soliciting network configuration information multicasts a DIS message, and receiving motes in its radio range acknowledge with a unicast DIO message. For example, the Node 7 is multicasting DIS messages to Node 2 and Node 3 in its radio range represented by the green circle as shown in Fig. 54.1a. In response Node 2 and Node 3 suspend their data forwarding operation to unicast a DIO message to Node 7 as shown in Fig. 54.1b. Popular implementations of RPL employ two important parameters associated with RPL explained below. 1. DIS_interval: RPL allows a node to multicast DIS messages at intervals governed by DIS_interval until it acquires a DIO message from at least one the neighboring nodes. Usually the DIS_interval is set to 60 s. The expected time in which the soliciting node may receive a DIO response can be approximately estimated by the following equation.
668
R. Sahay and C. G. Sushant
Fig. 54.1 Network configuration solicitation
. ResponseT ime = D I S _T ran + D I S _ Pr o + D I O _ Star t _ Delay + D I O _T rans,
(54.1)
where D I S_T ran is the DIS transmission time, i.e., time take by the DIS message to reach neighboring nodes. . D I S_Pr o is the time take by the neighboring nodes to process the DIS message. . D I O_Star t_Delay is wait time before a node may initiate DIO transmission. . D I O_Star t_Delay is usually 5 s. . D I O_T ran is the DIO transmission time, i.e., time take by the DIO message to reach the soliciting node. 2. DIS_delay is wait time before a node may initiate multicasting DIS message. . D I S_Delay is also usually 5 s. .
A malicious node may exploit the above two parameters to degrade the performance of IoT-LLNs. On the one hand, a malicious node may intentionally reduce its DIS_interval mush lesser than the . Respones_time in order to overwhelm it’s neighboring nodes with DIS messages. On the other hand, a malicious node may intentionally increase its DIS_dealy time to delay network convergence. With this motivation, we present our problem statement in the following section.
54.3 Problem Statement The goal of our work is to analyze the vulnerabilities associated with the process of information solicitation through DIS messages in RPL-based IoT networks. Based on our in-depth study of the information solicitation process in RPL, we summarize the following observations: 1. Reducing the DIS_interval may lead to flooding of DIS messages. Therefore, nodes will be engaged in processing DIS messages rather than sending and forwarding data packets to sinks nodes. 2. Increasing the DIS_dealy time slows down the process of network organization.
54 Investigating Vulnerabilities of Information Solicitation …
669
Table 54.1 Parameters considered for performance evaluation S. No Parameter Definition 1
Packet arrival rate (PA)
2
Transmit power (TP)
3
CPU power (CP)
4
Listen power (LP)
5
Low mode power (LMP)
6
Routing metric (RM)
7
Network overhead (NOH)
8
Network convergence Time (NCT)
9
Churn (CH)
10
Beacon interval (BI)
Total number of data packets received by the root node over simulation time Power consumed by a node in transmitting data packets and control messages Power consumed by a node in processing data packets and control messages Power consumed by a node in listening to probe packets from neighboring nodes Power consumed by a node in detecting channel activity by periodically turning on its radio The measure of selecting an optimal path for routing data packets Total number of control messages exchanges by the nodes over simulation time NCT is the measure of time taken by a group of routers in a network to reach the state of convergence Number of parent child link changes over simulation time Time lag between subsequent beacons sent by a node. Nodes transmit beacon frames to announce their presence
The aspiration of this paper is to achieve the following objectives: 1. Experimental analysis of decreasing DIS_interval and its impact on the efficiency IoT-LLNs. 2. Experimental analysis of increasing the DIS_delay and its impact on IoT-LLNs. 3. Analyzing the impact of increasing the number DIS packets multicasted on IoTLLNs. 4. Comparing the impact of the above three cases with sending multiple duplicate UDP packets. In order to perform the impact analysis, we consider ten different parameters explained in Table 54.1. In the following section, we explain the experimental setup used in this work.
670
R. Sahay and C. G. Sushant
54.4 Experimental Setup In order to analyze the vulnerabilities of the information solicitation process in RPL and its impact on the performance of IoT-LLNs, we use the Cooja simulator. A popular operating system used in IoT-LLNs is the Contiki OS. The Cooja simulator, available in Contiki OS, facilitates the emulation of IoT-LLNs. In this work, we also use the collect view and radio messages tool available in Contiki OS. The collect view tool provides payload details of the packets received by the root node. The radio messages tool is used to analyze the exchange of data and control messages among various nodes in the IoT-LLN. The details of the experimental setup are given in Table 54.2.
54.5 Results and Discussion In order to examine the vulnerabilities of the information solicitation process, we analyzed the following five simulation scenarios: 1. Case 1—Normal Scenario: All the nodes are fair nodes. None of the nodes altered by any of the configuration parameters. 2. Case 2—Increased no. of DIS Messages: The malicious node multicasts five DIS messages in succession instead of one to the nodes in its radio range. 3. Case 3—Decreasing the DIS_delay: The malicious node decreases the DIS_ delay by 5 s. 4. Case 4—Reducing DIS interval: The malicious node reduces the DIS interval by 60 s. 5. Case 5—Increasing UDP Send Packets: The malicious node forwards five UDP packet in succession instead of one to the sink node via parent nodes. We simulate Case 5 in order to compare the impact of flooding of control messages
Table 54.2 Details of the experimental setup Operating system Simulator Radio medium Routing protocol Routing metric Mote type Radio range No. of fair nodes No. of malicious nodes
Contiki OS Cooja simulator Unit disk graph model RPL ETX Sky mote 50 m 14 1
54 Investigating Vulnerabilities of Information Solicitation …
671
Fig. 54.2 Simulation setup
as a result of attacks exploiting information solicitation process to the impact of flooding of UDP packets. The sensor map of the simulation setup is depicted in Fig. 54.2. In the following subsections, we compare the five simulation scenarios in terms of the parameters mention in Table 54.1.
54.5.1 CPU Power and Transmit Power Consumed It may be observed from Fig. 54.3 that the average CPU and the transmit power increases drastically when the malicious node multicasts DIS packets successively compared to the normal scenario. From Fig. 54.3, it may be observed that successive forwarding of UDP packet, i.e., Case 5, does not increase the power consumption compared to the normal scenario. Therefore, IoT-LLNs tolerate the flooding of UDP packet better in comparison with flooding of DIS messages.
54.5.2 Listen and LMP Power Consumed It may be observed from Fig. 54.4a that the average listen power consumed is increased by approximately 1000% in Case 2 compared to the normal scenario. Cases 3–5 do not impact the listen power consumed as depicted in Fig. 54.4b. On the contrary, an increase in the number of DIS or UDP packets multicasted or forwarded results in reduced sleep time of nodes, and the average LMP power consumed decreased by approximately 7–8% in both Cases 2 and 5.
672
R. Sahay and C. G. Sushant
Fig. 54.3 Analysis of average CPU and transmit power consumed
Fig. 54.4 Analysis of average listen and LMP power consumed
54.5.3 Routing Metric and Beacon Interval From Fig. 54.5a, it may be observed that the average beacon interval for Case 2 is lesser in comparison with other scenarios. As the malicious node multicasts successive DIS messages, neighboring nodes are forced to reset their trickle time in order to respond with DIO messages. Consequently, average beacon interval reduces. There is slight drop in average beacon in Case 3 as well when the DIS interval is reduces. However, as observed from Fig. 54.5b, the average routing metric is unimpaired in all the attack cases. Therefore, it may be deduced that the vulnerabilities of the information solicitation process or even flooding of UDP packets do not affect the topology of IoT-LLNs.
54 Investigating Vulnerabilities of Information Solicitation …
673
Fig. 54.5 Average beacon interval and routing metric
54.5.4 Network Overhead The network overhead in computed in terms of the number of DIS, DIO and DAO messages is exchanged among the nodes in the IoT-LLN. It may be observed from Fig. 54.6 that the network overhead drastically increases with increase in DIS messages (Case 2). However, reducing the DIS_interval (Case 4) or DIS_Delay (Case 3) does not impact network overhead. Also, though network overhead slightly increases with increased no. of UPD packets (Case 5), it is still much less in comparison with Case 2.
54.5.5 Network Convergence Time and Packet Arrival As observed from Fig. 54.7a, the network convergence time increases in Case 2 compared to the normal scenario. On the contrary, the network convergence time reduces in the other scenarios. Figure 54.7b depicts the percentage of packet received by the sink node from the immediate victim nodes (nodes in the radio range of the attacker nodes). In Case 2, the sink node receives only 14% of UDP packets form the victim nodes compared to 28% in Case 1. On the contrary, in Case 5, the sink node receives 57% of UDP packets from the victim nodes compared to 28% in Case 1. For large-scale networks, these consequences are highly unfavorable.
674
R. Sahay and C. G. Sushant
Fig. 54.6 Network overhead
Fig. 54.7 Analyzing network convergence time and packet arrival
54.6 Comparison with Related Work Rajasekar et al. [6] studied the impact of DIS flooding attack on RPL-based networks on packet delivery ratio and end-to-end delay. Authors in [7] analyzed the impact of DIS attacks on node joining time and power consumed by the nodes. Bokka et al. [8] also analyzed the network overhead and throughput loss incurred due to DIS attacks. Medjek et al. [9] too analyzed the impact of DIS attacks on network overhead as well as power consumed. Based on the review of the existing literature, we found that most of the existing work on the analysis of information solicitation attacks consider only a subset of parameters depicted in Table 54.1. As information solicitation attacks may lead to unavailability of nodes [10], it is essential to perform an in-depth analysis considering all possible parameters. In our work, we have consid-
54 Investigating Vulnerabilities of Information Solicitation …
675
ered ten different parameters for analysis which provides a detailed understanding of the attack’s impact.
54.7 Conclusion and Future Scope of Work Our research executes an in-depth analysis of the vulnerabilities associated with RPL’s information solicitation process. We also analyzed the parameters of the RPL’s DIS multicasting. We analyzed the impact of vulnerability on the performance of RPL-based IoT-LLNs. Based on the results of our experimental analysis, we conclude that RPL’s information solicitation attacks deplete network resources, delay network convergences, reduce throughput, and increase network overhead. Based on the result of our present work, we will design a robust mitigation mechanism in the future.
References 1. Winter, T., Thubert, P., Brandt, A., Hui, J., Kelsey, R., Levis, P., Pister, K., Struik, R., Vasseur, J.P., Alexander, R.: RPL: IPv6 Routing Protocol for Low-power and Lossy Networks (2012) 2. Raoof, A., Matrawy, A., Lung, C.H.: Routing attacks and mitigation methods for RPL-based Internet of Things. IEEE Commun. Surv. Tutorials 21(2), 1582–606 (2018) 3. Sahay, R., Geethakumari, G., Modugu, K.: Attack graph-based vulnerability assessment of rank property in RPL-6LOWPAN in IoT. In: 2018 IEEE 4th World Forum on Internet of Things (WF-IoT), pp. 308–313. IEEE (2018) 4. Hkiri, A., Karmani, M., Machhout, M.: The routing protocol for low power and lossy networks (RPL) under attack simulation and analysis. In: 2022 5th International Conference on Advanced Systems and Emergent Technologies, pp. 143–148. IEEE (2022) 5. Aghaei, A., Torkestani, J.A., Kermajani, H., Karimi, A.: LA-Trickle: a novel algorithm to reduce the convergence time of the wireless sensor networks. Comput. Netw. 4(196), 108241 (2021) 6. Rajasekar, V.R., Rajkumar, S.: A study on impact of DIS flooding attack on RPL-based 6LowPAN network. Microprocess. Microsyst. 1(94), 104675 (2022) 7. Kalita, A., Brighente, A., Khatua, M., Conti, M.: Effect of DIS attack on 6TiSCH network formation. IEEE Commun. Lett. 26(5), 1190–1193 (2022) 8. Bokka, R., Sadasivam, T.: DIS flooding attack impact on the performance of RPL based Internet of Things networks: analysis. In: 2021 Second International Conference on Electronics and Sustainable Communication Systems (ICESC), pp. 1017–1022. IEEE (2021) 9. Medjek, F., Tandjaoui, D., Djedjig, N., Romdhani, I.: Multicast DIS attack mitigation in RPLbased IoT-LLNs. J. Inf. Secur. Appl. 1(61), 102939 (2021) 10. Mangelkar, S., Dhage, S.N., Nimkar, A.V.: A comparative study on RPL attacks and security solutions. In: 2017 International Conference on Intelligent Computing and Control (I2C2), pp. 1–6. IEEE (2017)
Chapter 55
Computer-Based Numerical Analysis of Bioconvective Heat and Mass Transfer Across a Nonlinear Stretching Sheet with Hybrid Nanofluids Madhu Aneja, Manoj Gaur, Tania Bose, Pradosh Kumar Gantayat, and Renu Bala Abstract Focusing on hybrid nanofluids Cu-Al.2 O.3 and Joule dissipation with bioconvection heat and mass transfer over a nonlinear stretching sheet is the goal of the current work. By applying the appropriate transformations, these equations are converted into ordinary differential equations. The MATLAB-based numerical method “BVP4C” is used to numerically solve the equations. The effects of auxiliary variables on velocity, temperature .(T ), concentration .(C), and gyrotactic microorganisms (.χ) are depicted in graphs together with those of Prandtl number, bioconvection Rayleigh number, couple stress parameter, Schmidt number, and Bioconvection Schmidt number. Nomenclature c .C
. p
Cf DB
. .
.
Dn
Specific heat .(Jkg−1 K−1 ) Concentration of hybrid nanoparticles Skin-friction coefficient Brownian diffusion coefficient 2 −1 .(m s ) Diffusivity of microorganisms 2 −1 .(m s )
Ec Gr .k hn f . .
.
N
.
Nn
Eckert number Grashoff number Thermal conductivity of hybrid nanofluid Concentration of the microorganisms at the surface Microorganism’s density number
M. Aneja · T. Bose · R. Bala (B) Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura 140401, India e-mail: [email protected] T. Bose e-mail: [email protected] M. Gaur APEX Institute of Technology (CSE), Chandigarh University, Gharuan, Mohali, India e-mail: [email protected]; [email protected] P. K. Gantayat Department of Computer Science and Engineering, ICFAI Foundation for Higher Education, Hyderabad, 501203, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5_55
677
678
Nr .N u . Pe . Pr . Rb . Sb . Sc . Sh .T .u, v .
.
Wc
γ
.
M. Aneja et al.
Buoyancy ratio parameter Nusselt number Peclet number Prandtl number Bioconvection Rayleigh number Bioconvection Schmidt number Schmidt number Sherwood number Temperature (K) Interstitial velocity component (m/s) Maximum cell swimming speed(.m/s)
ρf .μhn f .λ .η .σ .
φ χ
. .
Density(.Kgm−3 ) Viscosity of hybrid nanofluid Mixed convection parameter Similarity variable Microorganism’s concentration difference parameter Volume fraction of nanoparticles Dimensionless microorganism’s density number Subscripts
1 2 .e f f Greek Symbols .hn f Dimensionless temperature dif- .w .∞ ference . .
Alumina nanoparticles Copper nanoparticles Effective Hybrid nanofluid Wall Free stream
55.1 Introduction The most recent advancements in nanotechnology and the downsizing of electric equipment have had a major cooling impact on productivity and long-term gadget consistency. The remarkable heat transmission rate is the primary requirement of large corporations and other technical industries. Therefore, scientists and engineers create some new techniques to control these flaws in the electronics. Classical heat transfer fluids such as oil, water, air, and others are insufficient to meet the industrial requirements due to low thermal conductivity. The idea of adding tiny solid particles of size microns or millimeter to base fluids was introduced in the past few decades to improve its thermal conductivity. In practical implications, it was found that these large-sized particles cause obstruction in the flow of fluid. Choi and Eastman [1] introduced the concept of employing nanosized particles instead of large-sized particles in the host fluid to boost the heat transfer rate to address this problem. Nanofluids are the fluids that contain nanosized particles with high heat conductivity. Most commonly, metals (Cu, Ag, Au), metals in their oxide forms (Al.2 O.3 and CuO), semiconductors (TiO.2 and SiO.2 ), non-metals (graphite and carbon nanotubes), carbides (SiC), and nitrides (SiN) are examined when nanoparticles are used with base fluids. Further, low cost, long-term stability, and high flexibility are the three most important prerequisites for using nanofluids in heat transfer applications. Despite the fact that nanofluids have a high thermal efficiency, researchers are eager to further enhance their thermophysical characteristics because employing conventional nanofluids as heat transfer fluids has several disadvantages. In order to create nanofluids with high thermal conductivity, high heat transfer rate, low concentration, low cost, high stability, and flexibility, it is advantageous to combine the attributes of these nanopar-
55 Computer-Based Numerical Analysis …
679
ticles. This motivated the researchers to create a novel nanofluid consisting of two distinct types of nanoparticles contained in a specialized carrier liquid. This subclass of nanofluids is referred to as hybrid nanofluids. There are numerous applications for hybrid nanofluids in nanotechnology such as bionanofluids for the treatment of cancer, power generation, coolant for various machinery (computers or CPUs, microelectronics), aeronautics, refrigeration, cooling of nuclear reactors, chemical processes, and many more. Therefore, in recent years, researchers have become increasingly interested in working on hybrid nanofluids, which encompass modeling, composition, experimental verification, and applications [2–6] theoretically examine the flow and heat transfer properties of various hybrid nanofluids under various conditions. Additionally, microorganisms are administered during these days for providing the stability and micro-scale mixing to the suspension of nanoparticles in the base fluid. Kuznetsov first introduced the idea of nanofluid bioconvection in his paper (see [7]). Bioconvection is a natural phenomenon that is caused by the random motion of microorganisms in single-cell or colony-like forms. Due to the microscopically minute motion (convection) that microorganisms in a liquid generate, bioconvection is considered a large-scale phenomenon. Micro-molecules of microorganisms are more significant in industrial microbiology because bacteria create biosurfactants and surface-active chemicals. Enhanced oil recovery and other forms of environmental bioremediation both involve biosurfactants. Industrial applications for the combined effects of nanofluids and microbes include micro-fluidics devices like micro-channels and micro-reactors, treatment of cancer, in biomedicines (delivery of drugs) and others. Owing to these applications, numerous authors contributed in the research of above-mentioned phenomenon [8–13]. Based on the aforementioned literature review, it has not been considered how a bioconvective hybrid nanofluid (Al.2 O.3 and Cu in water) will behave when it flows through a nonlinear stretching surface while being subject to dissipative effects. This investigation focuses on bioconvection heat and mass transport over a nonlinear stretched sheet with hybrid nanofluid and joule dissipation. Using the BVP4C solver technique, solutions are produced. This solver can handle very nonlinear PDEs and also provides numerical solutions. The numerical method illustrates the effect of relevant factors on various flow profiles and the effect of hybrid nanofluid on physical quantities.
55.2 Problem Formulation A nonlinear stretching sheet is used to study a two-dimensional steady-state bioconvective hybrid nanofluid flow. The y-axis is normal to the surface, while the x-axis is taken along the stretching surface (See Fig. 55.1). The volumetric concentration.φhn f ≤ 4% is considered to avoid the bioconvection instability. The stretching
680
M. Aneja et al.
Fig. 55.1 Schematic diagram along with co-ordinates system
velocity is .u = Dx b . The symbols .Tw , .T∞ , .Cw , .C∞ , . Nw , and . N∞ stand for the wall and ambient temperatures, nanoparticle concentration, and microbial concentration, respectively. The governing equations are written after the application of boundary layer assumption. .
u
.
u
.
∂u ∂v + = 0, ∂x ∂y
μhn f ∂ 2 u ∂u ∂u + gβhn f (T − T∞ ) +v = ∂x ∂y ρhn f ∂ y 2 ( ) μhn f u + gβc (C − C∞ ) − , ρhn f K μhn f khn f ∂ 2 T ∂T ∂T + , +v = ∂x ∂y (ρc p )hn f ∂ y 2 (ρc p )hn f
∂2 C ∂C ∂C +v = DB 2 , ∂x ∂y ∂y ( ) ( ) ∂ ∂N ∂N ∂N bWc ∂ ∂C .u +v + N = Dn . ∂x ∂y (Cw − C∞ ) ∂ y ∂y ∂y ∂y u
.
Along with the boundary conditions:
(55.1) (55.2)
(55.3) (55.4) (55.5)
55 Computer-Based Numerical Analysis …
681
Table 55.1 Thermophysical properties and adapted framework of hybrid nanofluid Type II by Takabi and Salehi [14] Property Hybrid nanofluid μhn f μf
.μ
.
.ρ
.ρhn f
.ρc p
.(ρc p )hn f
.k
.
khn f kf
=
1 (1−φhn f )2.5
= ρ f − ρ f φhn f + φ1 ρs1 + φ2 ρs2 = (ρc p ) f − φhn f (ρc p ) f + φ1 (ρc p )s1 + φ2 (ρc p )s2 ) ( +φ2 k2 [ φ1 kφ1hn +2k f +2(φ1 k1 +φ2 k2 )−2φhn f k f ] f ) = ( φ1 k1 +φ2 k2 φhn f
+2k f −(φ1 k1 +φ2 k2 )+φhn f k f
Table 55.2 Thermophysical properties for water (. H2 O) and nanoparticles [15] Thermophysical Pure water Alumina Copper properties .C p (Jkg
−1 K−1 )
4179 997.1 0.6130
−3 .ρ(Kgm )
−1 K−1 )
.k(Wm
765 3970 40
385 8933 400
u(x) = Dx b , v(x) = 0, T (x) = Tw , . C(x) = C w , N (x) = Nw , at y = 0, u → 0, v → 0, T → T∞ , N → N∞ as y → ∞,
(55.6)
The thermophysical properties is mentioned in Tables 55.1 and 55.2
55.3 Similarity Transformations The following similarity variables are used: /
D(b + 1)x b−1 , u = Dx b f ' (η), 2ν / ( ) D(b + 1)νx b−1 b−1 b−1 ' 2 v=− x ηf , f + 2 b+1 T − T∞ , θ (η) = Tw − T∞ N − N∞ C − C∞ , χ X (η) = . φ(η) = Cw − C∞ Nw − N∞ η=y
.
(55.7)
682
M. Aneja et al.
The usage of transformations lead to the satisfaction of Eq. (55.1) and reduction of Eqs. (55.2)–(55.5) in the following form: μhn f μf
. ρ hn f
ρf
−λ 1 . Pr
2 f '2 4f' − b + 1 (b + 1)K 1 ( ) θ − Nr φ − Rbχ = 0,
f ''' + f f '' −
khn f kf (ρc p )hn f (ρc p ) f
ρf ρhn f
θ ' + f θ ' − f ' θ + Pr Ec f '2 = 0, φ '' + Sc f φ ' = 0,
'
[
''
'
χ + Sb f χ − Pe (χ + σ )φ + φ χ
.
(55.9) (55.10)
.
''
(55.8)
'
] = 0.
(55.11)
With the boundary conditions: f ' (0) = 1, f (0) = 0, θ (0) = 1, φ(0) = 1, χ (0) = 1, f (η) → 0, θ (η) → 0, φ(η) → 0, χ (η) → 0, as η → ∞, .
'
(55.12) where the following set of physical factors governs the flow are μf cf kf c Pe = bW Dn Gr λ = Re 2 K K1 = ν f
Pr =
νf DB ∞ σ = NwN−N ∞ gβ (T −T )x 4 Gr = f wν f ∞ u2 Ec = c f (T f w−T∞ )
Sc =
νf Dn w −C ∞ ) = ββcf(C (Tw −T∞ ) Re = ux νf
Sb = Nr
55.4 Results and Discussion The mathematical equations that govern the characteristics of physical model are nonlinear partial differential equations. These nonlinear coupled partial differential equations along with the boundary conditions are transformed into ordinary differentials equations and are then solved numerically using the BVP4C method using MATLAB. Numerical results are calculated by varying thermophysical parameters such as mixed convection parameter .λ, porosity factor (. K 1 ), buoyancy ratio parameter . Nr , bioconvection Rayleigh number . Rb, Schmidt number . Sc, bioconvection Schmidt number . Sb and results are demonstrated graphically (Figs. 55.2, 55.3, 55.4 and 55.5) in terms of velocity profile, temperature profile, concentration of nanoparticles profile, density of microorganisms profile.
55 Computer-Based Numerical Analysis …
683
Fig. 55.2 Impact of varied physical parameters on velocity profile
Figure 55.2a shows the effect of mixed convection on velocity. The increase in .λ shows an upsurge in velocity profile. Figure 55.2b depicts that system’s resistance decreases with an increase in porosity factor . K 1 . Figures 55.2c and 2d illustrate the change in velocity profile with respect to . Nr and . Rb. Velocity increases for rising values of . Nr and shows opposite behavior for increasing values of . Rb.
684
M. Aneja et al.
Fig. 55.3 Impact of varied physical parameters on temperature profile
Fig. 55.4 Impact of physical parameter . Sc on concentration profile
Figure 55.3a manifests the change in temperature with respect to . Pr . The rising values of . Pr lead to a decrease in temperature. Figure 55.3b shows that temperature increases for an upsurge in . Ec. Figure 55.4 shows that concentration decreases for an increase in Schmidt number .(Sc). Figure 55.5a illustrates the decreasing behavior of . Pe causes a fast reduction in the thickness of motile microorganisms. Figure 55.5b shows the increase in microorganism distribution when Bioconvection Schmidt number increases. Figure 55.5c depicts that the rising values of .σ slow down the density of motile microorganisms.
55 Computer-Based Numerical Analysis …
685
Fig. 55.5 Impact of varied physical parameters on density of microorganisms profile
55.5 Concluding Remarks In the present problem, numerical results have been obtained using computer-based numerical technique for the analysis of heat and mass transfer through nonlinear stretching sheet via hybrid nanofluids. The results obtained showing the effect of varying crucial parameters on velocity profile, temperature profile, concentration of nanoparticles, and density of microorganisms. It is observed that velocity increases with an increment in mixed convection parameter. On the other hand, temperature decreases with an enhancement in Prandtl number.
686
M. Aneja et al.
References 1. Choi, S.U., Eastman, J.A.: Enhancing Thermal Conductivity of Fluids with Nanoparticles, Argonne National Lab (ANL), Argonne, IL (United States) (1995) 2. Suresh, S., Venkitaraj, K.P., Selvakumar, P., Chandrasekar, M.: Synthesis of Al.2 O.3 Cu/water hybrid nanofluids using two step method and its thermo physical properties. Colloids Surfaces A: Physicochemical Eng. Aspects 388(1–3), 41–48 (2011) 3. Jana, S., Salehi-Khojin, A., Zhong, W.H.: Enhancement of fluid thermal conductivity by the addition of single and hybrid nano-additives. Thermochimica Acta 462(1–2), 45–55 (2007) 4. Madhesh, D., Kalaiselvam, S.: Experimental analysis of hybrid nanofluid as a coolant. Proc. Eng. 97, 1667–1675 (2014) 5. Tlili, I., Nabwey, H.A., Ashwinkumar, G.P., Sandeep, N.: N, 3-D magnetohydrodynamic AA7072-AA7075/methanol hybrid nanofluid flow above an uneven thickness surface with slip effect. Sci. Rep. 10(1), 1–13 (2020) 6. Aneja, M., Sharma, S.: Analysis of heat transfer characteristics of a Al.2 O.3 -SiO.2 /water hybrid nanofluid in a localized heated porous cavity. Arab. J. Sci. Eng. (2022) 7. Kuznetsov, A.V.: The onset of nanofluid bioconvection in a suspension containing both nanoparticles and gyrotactic microorganisms. Int. Commun. Heat Mass Transf. 37(10), 1421–1425 (2010) 8. Aneja, M., Sharma, S.: Numerical study of bioconvection flow of nanofluids using non-fouriers heat flux and non-ficks mass flux theory. Int. J. Modern Phys. B 33(31), 950376 (2019) 9. Alsenafi, A., Beg, O.A., Ferdows, M., Beg, T.A., Kadir, A.: Numerical study of nano-biofilm stagnation flow from a nonlinear stretching/shrinking surface with variable nanofluid and bioconvection transport properties. Sci. Rep. 11(1), 1–21 (2021) 10. Awais, M., Ehsan, S.A., Asif, M.R.Z., Parveen, N., Khan, W.U., Yousaf, M.M., He, Y.: Effects of variable transport properties on heat and mass transfer in MHD bioconvective nanofluid rheology with gyrotactic microorganisms: numerical approach. Coatings 11(2), 231 (2021) 11. Neethu, T.S., Sabu, A.S., Mathew, A., Wakif, A., Areekara, S.: Multiple linear regression on bioconvective MHD hybrid nanofluid flow past an exponential stretching sheet with radiation and dissipation effects. Int. Commun. Heat Mass Transf. 135, 106115 (2022) 12. Rattan, M., Bose, T., Chamoli, N., Singh, S.B.: Creep analysis of anisotropic functionally graded rotating disc subject to thermal gradation. In: Materials Physics and Chemistry, pp. 71–88. Apple Academic Press (2020) 13. Bose, T., Rattan, M., Chamoli, N., Gupta, K.: Investigation into steady state creep of AlSiC cylinder on account of residual stress with internal and external pressure. In: AIP Conference Proceedings, vol. 2357, no. 1, p. 110001. AIP Publishing LLC (2022) 14. Takabi, B., Salehi, S.: Augmentation of the heat transfer performance of a sinusoidal corrugated enclosure by employing hybrid nanofluid. Adv. Mech. Eng. 6, 147059 (2014) 15. Oztop, H.F., Abu-Nada, E.: Numerical study of natural convection in partially heated rectangular enclosures filled with nanofluids. Int. J. Heat Fluid Flow 29(5), 1326–1336 (2008)
Author Index
A Abbas, Fatima Hashim, 375, 405 Abdulsattar, Nejood Faisal, 375, 405 Abhisek Omkar Prasad, 21 Abhishek Karmakar, 145 Abhishek Mishra, 115 Abosinnee, Ali S., 405 Adi Narayana Reddy, K., 641 Ahilan, A., 461, 487 Akshitha, G., 277 Alani, Sameer, 391, 419 Al-Azzawi, Waleed Khalid, 375 AL-Hameed, Mazin R., 419 Ali, Rabei Raad, 405 Alkhafaji, Mohammed Ayad, 375, 433 Alkhayyat, Ahmed H., 375, 405 Alsalamy, Ali, 391 Al-Tahai, Mustafa, 391, 419 Amrutha Sajeevan, 351 Angel Arul Jothi, J., 161 Angel Sajani, J., 487 Anjana Gosain, 35 Ankur Gupta, 555 Anlin Sahaya Tinu, M., 461 Annapareddy V. N. Reddy, 49 Anuradha, T., 247, 277 Ashok Kumar, L., 197, 219 Ashu Abdul, 187 Avala Raji Reddy, 309, 319 Avanija, J., 319 Awasthi, Y. K., 547 Azeem, Menatallah Abdel, 173
B Balachandar, S., 577
Balijapalli Prathyusha, 329 Bhavya Sri, P., 267 Bibal Benifa, 363 Bindhu, A., 513 Brindha, R., 447
C Carroll, Fiona, 73 Chandana, P., 257 Chandini Nekkantti, 591 Chandra Vamsi, P., 341 Charan, N. S., 319 Ch. Devi Likhitha, 257 Chen, Thomas M., 605 Chen, Xi, 125 Cherukuri Gaurav Sushant, 665 Ch. Hima Bindu, 591 Chinnaiyan, R., 577 Chinthalapudi Siva Vara Prasad, 49
D Deepak Kumar, B. P., 309 Deepika, J., 529 Dev, Soumyabrata, 173 Dhanuka, Gaurav, 161 Dharmesh Dhabliya, 555 Dineshraja, V., 197 Dinesh Reddy, V., 209 Dipali Bansal, 547 Divya, R., 577 Duddu Sai Praveen Kumar, 49 Durga Prasad, D., 641
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Evolution in Computational Intelligence, Smart Innovation, Systems and Technologies 370, https://doi.org/10.1007/978-981-99-6702-5
687
688 E Ebbie Selva Kumar, C., 447 ElaproluSai Prasanna, 247 Eveline Pregitha, R., 447
F Faizan Mohammed, 651 Fatima Abdul Jabbar, 197
G Ganesh Prasad, 577 Gatea, Qahtan Mutar, 433 Geetha, V., 475 Golda Jeyasheeli, P., 529 Guneser, Muhammet Tahir, 375, 433
H Hart, Penny, 1 Huang, Chu-Ren, 125
I Ibrahim, Raed Khalid, 405 Indra Gandhi, K., 135 Indrajit Kar, 85
J Jafar Ali Ibrahim, S., 591 Jangam Pragathi, 247 Jangam Sneha Madhuri, 49 Jasmine Gnana Malar, A., 461, 487 Jaspreeti Singh, 35 Joardar, Ishaan, 161 Jonnadula Narasimharao, 297, 309
K Kadeem, Sahar R. Abdul, 391 Kalaiarasan, 565 Kalsi, Nick, 73 Kalyan Chakravarthy, N. S., 591 Kalyani, G., 287 Karthika Renuka, D., 197, 219 Kaur, Sukhminder, 591 Kila, Akinola, 1 Kotha Sita Kumari, 257, 267 Kuncham Pushpa Sri Vyshnavi, 235 Kuruba Manasa, 99
Author Index L Lakshmi, L., 641 Lakshmi Sujitha, T., 277 Lakshya Gupta, 161 Lavanya, G., 329 Lim, Lily, 125
M Madhu Aneja, 677 Mahesh, B., 287 Mahmood, Sarmad Nozad, 391, 419 Mamta Sharma, 555 Manasa, N., 267 Manoj Gaur, 677 Mansour, Hassnen Shakir, 419 Mayank Shrivastava, 21 Minor, Katarzyna, 73 Mohan, K. G., 615 Mutar, Mohammed Hasan, 375 Muthu Kumar, B., 461, 487
N Nagaraja, S. R., 615 Naga Rama Devi, G., 319 Nuthanakanti Bhaskar, 297
O Obaid, Ahmed J., 433 Oleti Prathyusha, 49 Omisha Sharma, 629 Ouamour, S., 65
P Pallavi Mishra, 629 Pavan Kumar, C. S., 235 Platts, Jon, 73 Polavarapu Bhagya Lakshmi, 209 Pradosh Kumar Gantayat, 677 Pradumn Mishra, 21 Prakash, K., 565 Prasad, T. V., 341 Prasanna, M., 99 Pravinth Raja, S., 605 Premawardhana, Madara, 173 Priyanka Chowdary, P., 309 Priya, R., 499
Q Qader, Aryan Abdlwhab, 391, 419
Author Index R Raghavendra, T. S., 615 Raheem Unnisa, 297 Rajkumar Ettiyan, 475 Rallabandi Shivani, 99 Ramesh, P., 591 Rashmi Sahay, 665 Rathi, Himanshu, 161 Ravi Kant, 115 Ravi Raja, A., 591 Reddy Madhavi, K., 329 Renu Bala, 677 Revanth Sai, B., 341 Rishabh Balaiwar, 85 Ritika Kumari, 35 Rohini Pinapatruni, 651 Rohit Anand, 555 Rohith, K., 287 Ruchi Vyas, 605
S Sabarish, B. A., 351 Sabarmathi, G., 577 Saba Sultana, 329 Sadhrusya, D., 267 Sai Bhargav, 99 Sai Sree Bhavana, 591 Sandeep Kumar Panda, 629 Sandhyarani, 329 Sanju Varghese John, 363 Sasi Kiran, J., 319 Sathiyanarayanan, Mithileysh, 605 Sathya, R. R., 529 Satti Babu, D., 341 Sayoud, H., 65 Sengar, Sandeep Singh, 173, 209 Shaleen Bhatnagar, 605 Shantanu Ghosh, 209 Sharik Gazi, 145 Shashwat Mishra, 145 Shubhendra Kumar, 145 Siginamsetty Phani, 187 Sk. Shahin, 257 Sonali Kumari, 547 Sree Saranya Batchu, 309 Srikanth, S., 135
689 Sri Vinitha, V., 219 Suba Varshini, V., 135 Sudharsan Balaji, 135 Sudheshna, G. D. N. S., 341 Sudipta Mukhopadhyay, 85 Supriya Sanjay Ajagekar, 555 Suraya Mubeen, 297 Suresh Chandra Satapathy, 21 Suresh Kumar Grandh, 641 Suresh Kumar, S., 591 Swarna Latha, J., 329 Swarnamugi, 577 Swathi, G., 309 Sweetline Priya, E., 499
T Tabeen Fatima, 319 Tania Bose, 677 Tanmay Khule, 85 Tejaashwini Goud, J., 297 Thanammal, K. K., 513
U Uma Shanker Tiwary, 145 Urja Jain, 21
V Valli Madhavi Koti, 555 Valluri Padmapriya, 99 Vara Swetha, 247 Varsha Singh, 145 Venkata Kishore, N., 341 Venkata Ratna Prabha, K., 591 Venkat Kiran, I. N., 287 Voruganti Naresh Kumar, 297
W Wang, Vincent Xian, 125
Y Yarlagadda Mohana Bharghavi, 235 Yenduri Harshitha Lakshmi, 235