133 39 14MB
English Pages 512 [498] Year 2022
Lecture Notes in Electrical Engineering 837
Goutam Sanyal · Carlos M. Travieso-González · Shashank Awasthi · Carla M. A. Pinto · B. R. Purushothama Editors
International Conference on Artificial Intelligence and Sustainable Engineering Select Proceedings of AISE 2020, Volume 2
Lecture Notes in Electrical Engineering Volume 837
Series Editors Leopoldo Angrisani, Department of Electrical and Information Technologies Engineering, University of Napoli Federico II, Naples, Italy Marco Arteaga, Departament de Control y Robótica, Universidad Nacional Autónoma de México, Coyoacán, Mexico Bijaya Ketan Panigrahi, Electrical Engineering, Indian Institute of Technology Delhi, New Delhi, Delhi, India Samarjit Chakraborty, Fakultät für Elektrotechnik und Informationstechnik, TU München, Munich, Germany Jiming Chen, Zhejiang University, Hangzhou, Zhejiang, China Shanben Chen, Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, China Tan Kay Chen, Department of Electrical and Computer Engineering, National University of Singapore, Singapore, Singapore Rüdiger Dillmann, Humanoids and Intelligent Systems Laboratory, Karlsruhe Institute for Technology, Karlsruhe, Germany Haibin Duan, Beijing University of Aeronautics and Astronautics, Beijing, China Gianluigi Ferrari, Università di Parma, Parma, Italy Manuel Ferre, Centre for Automation and Robotics CAR (UPM-CSIC), Universidad Politécnica de Madrid, Madrid, Spain Sandra Hirche, Department of Electrical Engineering and Information Science, Technische Universität München, Munich, Germany Faryar Jabbari, Department of Mechanical and Aerospace Engineering, University of California, Irvine, CA, USA Limin Jia, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Alaa Khamis, German University in Egypt El Tagamoa El Khames, New Cairo City, Egypt Torsten Kroeger, Stanford University, Stanford, CA, USA Yong Li, Hunan University, Changsha, Hunan, China Qilian Liang, Department of Electrical Engineering, University of Texas at Arlington, Arlington, TX, USA Ferran Martín, Departament d’Enginyeria Electrònica, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain Tan Cher Ming, College of Engineering, Nanyang Technological University, Singapore, Singapore Wolfgang Minker, Institute of Information Technology, University of Ulm, Ulm, Germany Pradeep Misra, Department of Electrical Engineering, Wright State University, Dayton, OH, USA Sebastian Möller, Quality and Usability Laboratory, TU Berlin, Berlin, Germany Subhas Mukhopadhyay, School of Engineering & Advanced Technology, Massey University, Palmerston North, Manawatu-Wanganui, New Zealand Cun-Zheng Ning, Electrical Engineering, Arizona State University, Tempe, AZ, USA Toyoaki Nishida, Graduate School of Informatics, Kyoto University, Kyoto, Japan Federica Pascucci, Dipartimento di Ingegneria, Università degli Studi “Roma Tre”, Rome, Italy Yong Qin, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China Gan Woon Seng, School of Electrical & Electronic Engineering, Nanyang Technological University, Singapore, Singapore Joachim Speidel, Institute of Telecommunications, Universität Stuttgart, Stuttgart, Germany Germano Veiga, Campus da FEUP, INESC Porto, Porto, Portugal Haitao Wu, Academy of Opto-electronics, Chinese Academy of Sciences, Beijing, China Walter Zamboni, DIEM - Università degli studi di Salerno, Fisciano, Salerno, Italy Junjie James Zhang, Charlotte, NC, USA
The book series Lecture Notes in Electrical Engineering (LNEE) publishes the latest developments in Electrical Engineering - quickly, informally and in high quality. While original research reported in proceedings and monographs has traditionally formed the core of LNEE, we also encourage authors to submit books devoted to supporting student education and professional training in the various fields and applications areas of electrical engineering. The series cover classical and emerging topics concerning: • • • • • • • • • • • •
Communication Engineering, Information Theory and Networks Electronics Engineering and Microelectronics Signal, Image and Speech Processing Wireless and Mobile Communication Circuits and Systems Energy Systems, Power Electronics and Electrical Machines Electro-optical Engineering Instrumentation Engineering Avionics Engineering Control Systems Internet-of-Things and Cybersecurity Biomedical Devices, MEMS and NEMS
For general information about this book series, comments or suggestions, please contact [email protected]. To submit a proposal or request further information, please contact the Publishing Editor in your country: China Jasmine Dou, Editor ([email protected]) India, Japan, Rest of Asia Swati Meherishi, Editorial Director ([email protected]) Southeast Asia, Australia, New Zealand Ramesh Nath Premnath, Editor ([email protected]) USA, Canada: Michael Luby, Senior Editor ([email protected]) All other Countries: Leontina Di Cecco, Senior Editor ([email protected]) ** This series is indexed by EI Compendex and Scopus databases. ** More information about this series at https://link.springer.com/bookseries/7818
Goutam Sanyal · Carlos M. Travieso-González · Shashank Awasthi · Carla M. A. Pinto · B. R. Purushothama Editors
International Conference on Artificial Intelligence and Sustainable Engineering Select Proceedings of AISE 2020, Volume 2
Editors Goutam Sanyal National Institute of Technology Durgapur Durgapur, West Bengal, India Shashank Awasthi Department of Computer Science and Engineering G. L. Bajaj Institute of Technology and Management Greater Noida, India
Carlos M. Travieso-González University of Las Palmas de Gran Canaria Las Palmas de Gran Canaria, Spain Carla M. A. Pinto School of Engineering, Polytechnic of Porto University of Porto Porto, Portugal
B. R. Purushothama Department of Planning and Development National Institute of Technology Goa Goa, India
ISSN 1876-1100 ISSN 1876-1119 (electronic) Lecture Notes in Electrical Engineering ISBN 978-981-16-8545-3 ISBN 978-981-16-8546-0 (eBook) https://doi.org/10.1007/978-981-16-8546-0 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Preface
Artificial intelligence (AI) has accelerated progress in every sphere of a human’s life. AI is also helping the next generation of companies to reduce their environmental and social impact by improving efficiency and developing new products. But, it also brings greater challenges to the sustainable development of engineering products. Sustainability is the greatest challenge in a variety of areas like AI, AR/VR, robotics, IoT, non-conventional energy and environment, agriculture, health, transportation, etc. Therefore, the relationship between AI and sustainable engineering is worth studying. This book presents select proceedings of the International Conference on Artificial Intelligence and Sustainable Engineering (AISE-2020). It covers various topics like artificial intelligence in security and surveillance, health care, big data analytics, engineering design for sustainable development using IoT/AI, etc. This book can be a valuable resource for academicians, researchers and professionals working in the field of artificial intelligence and can provide solutions to the challenges faced in sustainable engineering based on AI and supporting tools. It will contribute in enhancing the understanding of knowledge and research related issues in the domain of Artificial Intelligence and Sustainable Engineering. Durgapur, India Las Palmas de Gran Canaria, Spain Greater Noida, India Porto, Portugal Goa, India
Goutam Sanyal Carlos M. Travieso-González Shashank Awasthi Carla M. A. Pinto B. R. Purushothama
v
Contents
Drowsiness Detection System Using OpenCV and Raspberry Pi: An IoT Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Abhijeet A. Urunkar, Aditi D. Shinde, and Amruta Khot
1
Deep Multi-agent Reinforcement Learning for Tag Game . . . . . . . . . . . . . Reshma Raj and A. Salim
7
A Proposed Framework to Achieve CIA in IoT Networks . . . . . . . . . . . . . Monika Mangla, Smita Ambarkar, Rakhi Akhare, Sanjivani Deokar, Sachi Nandan Mohanty, and Suneeta Satpathy
19
Localization in Underground Area Using Wireless Sensor Networks with Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . P. Rama and S. Murugan Master–Slave Robots Using Swarm Intelligence . . . . . . . . . . . . . . . . . . . . . . T. Suraj Duncan, T. R. Jayanthi Kumari, Rithin John, and B. P. Aniruddha Prabhu An IoT-Based Approach in Automatic Garbage Segregation to Develop an Intelligent Garbage Segregator . . . . . . . . . . . . . . . . . . . . . . . . Amara Aditya Manikanta, Rohit Tanwar, Aviral Kumar Srivastava, and Kratika Arora Multi-view Deep Learning for Weather Recognition . . . . . . . . . . . . . . . . . . Shweta Mishra, Saurabh Kumar, and Vipin Kumar Optimizing Gender Detection Using Deep Learning Technique for Android Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ashish Chopra, Nishita Shah, and Yash Jain A Critical Analysis of VQA Models and Datasets . . . . . . . . . . . . . . . . . . . . . Himanshu Sharma
31 41
49
61
75 97
vii
viii
Contents
Impact on Steady-State Security of Power System Using TCSC . . . . . . . . 111 Mayank Goyal and Gaurav Kumar Gupta Game Theory-Based Proof of Stake Mining in Blockchain for Sustainable Energy Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Nitin K. Tyagi, Mukta Goyal, and Adarsh Kumar Proposal to Emphasize on Power Production from Solar Rooftop System in the University Campus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Subhash Chandra and Arvind Yadav Path Planning of E-puck Mobile Robots Using Braitenberg Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Bhaskar Jyoti Gogoi and Prases K. Mohanty Deep Neural Networks on Acoustic Emission in Stress Corrosion Cracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 R. Monika and S. Deivalakshmi Device for Position Tracking System with GPS for Elderly Person’s Health Aspect to Make Them Equally Accessible in the Developmental and Monitoring Process in the Society . . . . . . . . . . . 169 Shyamal Mandal, Samar Jyoti Hazarika, and Ranjit Sil Garbage Monitoring System Using LoRa Technology . . . . . . . . . . . . . . . . . 175 Amarjeet Singh Chauhan, Abhishek Singhal, and R. S. Pavithr Optimal Controller Design for Buck Converter Fed PMBLDC Motor Using Emperor Penguin Optimization (EPO) Algorithm . . . . . . . . 187 Deepak Paliwal and Dhanesh Kumar Sambariya Reducing Start-Up Delay During Churn in P2P Tree-Based Video Streaming System Using Probabilistic Model Checking . . . . . . . . . . . . . . . 199 Debjani Ghosh, Shashwati Banerjea, Mayank Pandey, Akash Anand, and Satya Sankalp Gautam Operational Flexibility with Statistical and Deep Learning Model for Electricity Load Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 Ayush Sinha, Raghav Tayal, Ranjana Vyas, and O. P. Vyas Compressive Spectrum Sensing for Wideband Signals Using Improved Matching Pursuit Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 R. Anupama, S. Y. Kulkarni, and S. N. Prasad Controller Design for Steering and Diving Model of an AUV . . . . . . . . . . . 251 Ravishankar P. Desai and Narayan S. Manjarekar Hassle-Free Food Ordering Superintendence . . . . . . . . . . . . . . . . . . . . . . . . . 269 Geetanjali Raj, Upasana Sharma, Ranjeeta Yadav, Anjana Bhardwaj, and Sachin Yadav
Contents
ix
Aggressive Packet Combining Scheme for Co-Operative Wireless Communication Ensuring Enhanced Throughput . . . . . . . . . . . . . . . . . . . . 279 Mayuri Kundu, Swarnendu Kumar Chakraborty, Argha Sarkar, and D J Nagendra Kumar Textural Feature Analysis Technique for Copy-Move Forgery Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 Prince Sapra Identification of Face Mask Detection Using Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 Lingineni Pavan Kalyan, G. Nagaraju, R. Keerthi Reddy, and Srinivas Mulkalapalli A Discriminative Learning-Based Deep Learning Approach for Diabetic Retinopathy Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 Nitigya Sambyal, Poonam Saini, and Rupali Syal Blur and Noise Removal from the Degraded Face Images for Identifying the Faces Using Deep Learning Networks . . . . . . . . . . . . . . 325 T. Shreekumar, N. V. Sunitha, N. Suhasini, K. Suma, and K. Karunakara Importance of Self-Learning Algorithms for Fraud Detection Under Concept Drift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 S. Kotekani Shamitha and V. Ilango A Robust Approach of COVID-19 Indian Data Analysis Using Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355 Deepshikha Jain, Venkatesh Gauri Shankar, and Bali Devi Viability and Applicability of Deep Learning Approach for COVID-19 Preventive Measures Implementation . . . . . . . . . . . . . . . . . . 367 Alok Negi and Krishan Kumar A Review on Security and Privacy Issues in Smart Metering Infrastructure and Their Solutions in Perspective of Distribution Utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381 Rakhi Yadav and Yogendra Kumar Detecting Image Forgery Over Social Media Using Residual Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393 Bhuvanesh Singh and Dilip Kumar Sharma Content-Based Video Retrieval Based on Security Using Enhanced Video Retrieval System with Region-Based Neural Network (EVRS-RNN) and K-Means Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 401 B. Satheesh Kumar, K. Seetharaman, and B. Sathiyaprasad Approximate Bipartite Graph Matching by Modifying Cost Matrix . . . . 415 Shri Prakash Dwivedi
x
Contents
Comparative Analysis of Texture-Based Algorithms LBP, LPQ, SIFT, and SURF Using Touchless Footprints . . . . . . . . . . . . . . . . . . . . . . . . . 423 Anshu Gupta and Deepa Raj Air Quality Index (AQI) Using Time Series Modelling During COVID Pandemic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441 Aayush Tyagi, Loveleen Gaur, Gurinder Singh, and Anil Kumar PVO Based Reversible Secret Data Hiding Technique in YCbCr Color Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453 Neeraj Kumar, Dinesh Kumar Singh, and Shashank Awasthi Sign Language Recognition Using Diverse Deep Learning Models . . . . . . 463 Neeraj Gupta Local and Global Features Based on Ear Recognition System . . . . . . . . . . 477 Rohit Agarwal QOSCR: Quantification of Source Code Resemblance . . . . . . . . . . . . . . . . 487 Mayank Agrawal
About the Editors
Dr. Goutam Sanyal has served as the Head of the Department of Computer Science and Engineering at the National Institute of Technology (NIT) Durgapur, India. He received B.Tech. and M.Tech. from NIT, Durgapur, and Ph.D. (Engineering) from Jadavpur University, Kolkata, in robot manipulator path planning. He holds over 35 years of experience in the field of teaching, research, and administration. He has published nearly 200 papers in reputed international journals and conferences. He has guided 22 Ph.D. scholars in steganography, wireless sensor networks, computer vision, natural language processing. He has supervised more than 40 PG and 300 UG theses. He is a co-author of a book in Computer Graphics and Multimedia and 8 Book chapters. He is a regular Member of IEEE, a Life Member of CSI, and a Fellow of IEI. His biography has been selected for inclusion in Marquis Who’s Who in the World 2016, 2017, 2018, 2019, and 2020 Edition. Carlos M. Travieso-González is a Full Professor of signal processing and pattern recognition and head of the signals and communications department at the University of Las Palmas de Gran Canaria (ULPGC-Spain). He received the M.Sc. degree in 1997 in Telecommunication Engineering at Polytechnic University of Catalonia (UPC), Spain, and a Ph.D. degree in 2002 at ULPGC. His research lines are biometrics, biomedical signals and images, data mining, classification system, signal and image processing, machine learning, and environmental intelligence. He has researched 51 international and Spanish research projects. He has 04 authored books, 24 edited books, 440 journal papers, and 07 patents in Spanish Patent and Trademark Office published to his credit. He has been a supervisor on 8 Ph.D. Thesis (12 more are under supervision), and 130 Master Thesis. Dr. Shashank Awasthi is a professor in the computer science and engineering department of GL Bajaj Institute of Technology and Management, India. He holds a Ph.D. degree in Computer Science and Engineering and M.Tech. in Computer Science and Engineering from Dr. A. P. J. Kalam Technical University, Lucknow, and MCA from Dr. B. R. Ambedkar University Agra. His area of interest is wireless sensor networks. He is having more than 18 years of experience in teaching and research. He has published over 30 research papers in international journals/conferences of xi
xii
About the Editors
repute. He is a member of IEEE and the International Association of Engineers, Hong Kong. He is a member of the Editorial Board of various reputed International Journals. Carla M. A. Pinto is an adjunct professor at the school of engineering at Polytechnic of Porto. She completed her Doctorate in Mathematics in 2004 from Universidade do Porto Faculdade de Ciências, Portugal. Prof. Pinto published over 46 articles in highimpact peer-review journals and more than 50 in international peer-reviewed conferences. She works in the area of applied mathematics with a focus on epidemiology and robotics. B. R. Purushothama obtained his Ph.D. in computer science and engineering from the National Institute of Technology (NIT) Warangal, India, and his M.Tech. in computer science and engineering from the NIT Surathkal, India. He is currently working as an assistant professor of computer science and engineering at NIT Goa, India. He has academic experience of over 16 years. His areas of interest are cryptography and information security, security analytics, cloud security, machine learning applications to Security. He has published several works in peer-reviewed journals and proceedings.
Drowsiness Detection System Using OpenCV and Raspberry Pi: An IoT Application Abhijeet A. Urunkar, Aditi D. Shinde, and Amruta Khot
Abstract In today’s era, major accidents occur due to less concentration of driver. The major reason for losing concentration while driving is due to exhaustions. So proposed system is to develop an IoT-based application which detects drowsiness of a person based on his/her eyelid and facial movements. The system can be installed in moving car and works on continuous capturing of drivers activities over eyelid movements and yawning. These actions help to detect driver’s current status for driving car and help to give early warning to drivers to avoid future mishaps. Facial movements are analysed on Raspberry Pi using OpenCV and Dlib Libraries. Keywords Driver drowsiness · IoT application · Raspberry Pi · OpenCV
1 Introduction Drowsiness detection using manual and automatic approach is a big and challenging task for current researchers which is subject to actions of drivers. Manual approach in running car is difficult and cannot ensure prevention of accidents. As this approach depends on the human comprehension of situation, then action is taken by person. Majority of the times police follows this manual drowsiness detection system by considering characteristics like alcohol consumed by person, blood shade present on cloths, vehicle speed, traffic rules followed by driver, mechanical defects in car, health of driver with visibility factor, weather conditions and road conditions, although these said techniques cannot ensure in traffic road accident prevention.
A. A. Urunkar (B) · A. D. Shinde · A. Khot Department of Information Technology, Walchand College of Engineering, Sangli, India e-mail: [email protected] A. D. Shinde e-mail: [email protected] A. Khot e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 G. Sanyal et al. (eds.), International Conference on Artificial Intelligence and Sustainable Engineering, Lecture Notes in Electrical Engineering 837, https://doi.org/10.1007/978-981-16-8546-0_1
1
2
A. A. Urunkar et al.
So to detect driver’s drowsiness in automatic approach using eyelid movements and facial movement, we propose a system using OpenCV, Dlib and python library on Raspberry Pi [1–3]. Drowsiness detection is classified as intrusive and non-intrusive techniques. Intrusive techniques measure biological health of human like brain wave, heart beat rates and temperature of body. These may give an accuracy at accepted level, but not to be realistic for implementation. Such system may cause driver distracting or annoying as drivers body is equipped with electronic sensor causing hurdles in driving. Also this might make driver more conscious rather than gaining concentration. Also human body factors, like temperature sweetening, may hamper their performance of system. In such situations, non-intrusive drowsiness detection techniques are most appropriate for actual driving situations [1–4]. Data collection for non-intrusive drowsiness detection system considers human physical movements like leaning gestures, drivers head movement with direction, state and condition of eye like open or closed, blinking frequency and interval and continuous eyelid frequency which are more popular among researchers [5]. Development of system to prevent an accident causing due to drowsiness of driver and measuring drowsiness of driver needs to be incorporated. This system will help driver by giving early warning to drivers about his unfitness for driving and to slow down the speed of car. The proposed system works on live streaming of capturing the drivers facial expressions and eyelid movements using Raspberry PI installed in car and processing live video stream using OpenCV and DLib libraries. The system measures level of drowsiness of driver and according to that suggests driver to take action after processing expressions [2]. Finally, driver response can frequently be observed by capturing driver’s movements and to convey a response to the detection system to show his or her concentration. An audio or visual form of sign can be delivered to driver intermittently, to indicate his or her concentration, the driver can revert back his response, if the driver fails to respond within a stipulated time interval, the frequency of the mark receiving from driver can be increased and then generates an alarm or alert to define drowsiness state of drivers [1, 4].
2 Proposed System 2.1 Architecture of the Proposed System In the proposed architecture of system, the Raspberry Pi 3 model B is used with 5 V power supply for vehicle battery supply. The Raspbian operating system is installed on SD with minimum 32 GB storage space needed for video processing. Micro SD card is used in Raspberry Pi 3 with camera to USB ports. It processes video in runtime and generates audio and on screen textual alarm; hence, it does not require storage
Drowsiness Detection System Using OpenCV …
3
Fig. 1 Architecture of the proposed system
for long-term usage. Also, piezo buzzer is connected to GPIO ports of Raspberry Pi (Fig. 1). OpenCV open-source computer vision library for processing images captured from web camera is installed in Raspberry Pi. This machine learning library with Dlib toolkit pre-trained model is used for face detection and analysis based on driver’s expressions and eyelid movements. This helps in the recognition of the state of drowsiness. Dlib is a toolkit which contains various machine learning algorithms and tools for analysing facial expressions. It is also used for getting facial geometrics points and to compute aspect ratio.
2.2 Data Set to Be Used Dlib is used to estimate the location of 68 (X, Y) coordinates on facial structures to map the pre-trained facial geometry detection.
2.3 Hardware Used (a) (b) (c) (d)
Raspberry Pi 3B model 3.2 inch Raspberry Pi LCD display 2.5 Megapixel web camera Piezo buzzer (Fig. 2).
3 Result Analysis System shows alert after detecting the drowsiness of the driver (Fig. 3). We have tested this system with 20 different people with ten people having eyeglasses and others without eyeglasses, in which half of the people are pretending
4
A. A. Urunkar et al.
Fig. 2 Hardware set-up
Fig. 3 Result of drowsiness for closed eye
the case of drowsiness state and others are in alert state. While performing test, different lighting situations are considered for driving. Following conditions are considered: 1. 2. 3.
Normal brightness: controlled lighting condition High brightness: a condition of exposing a bright light on camera or driver Low brightness: A condition where poor lighting.
Drowsiness Detection System Using OpenCV …
5
4 Summary and Future Work The proposed system ensures objectives and robustness in different environments of operation. The system tested and validated for different people, with some of them having eyeglasses on while the rest were without eye glasses. Also to conduct testing of system, variation in lighting conditions is also taken into consideration which affects the performance of the system under variety of environment. Hence, this techniques helps to detect the driver’s drowsiness situation to avoid road accidents. Future Scope: The system can be used with IR cameras to enable it with night mode which is essential for drivers for long night drives. Some optimized enhancement methods can be added to avoid noise and distortion due to movement of vehicle.
References 1. Ratna Kaavya M , Ramya V, Ramya G Franklin (2019) Alert system for driver’s drowsiness using image processing. In: International conference on vision towards emerging trends in communication and networking (ViTECoN) 2. Lin SD, Lin J, Chung C (2013) Sleepy eye’s recognition for drowsiness detection. In: 2013 international symposium on biometrics and security technologies, Chengdu, pp 176–179. https:// doi.org/10.1109/ISBAST.2013.31 3. Hirata Y, Nishiyama J, Kinoshita S (2009) Detection and prediction of drowsiness by reflexive eye movements. In: 31st annual international conference of the IEEE EMBS Minneapolis, Minnesota, USA 4. Artanto D, Prayadi Sulistyanto M, Deradjad Pranowo I, Pramesta EE (2017) Drowsiness detection system based on eye-closure using a low-cost EMG and ESP8266. In: 2nd international conferences on information technology, information systems and electrical engineering (ICITISEE) 5. Ursulescu O, Ilie B, Simion G (2018) Driver drowsiness detection based on eye analysis. In: 2018 international symposium on electronics and telecommunications (ISETC). IEEE, , pp 1–4 6. Fouzia, Roopalakshmi R, Rathod JA, Shetty AS, Supriya K (2018) Driver drowsiness detection system based on visual features. In: 2018 second international conference on inventive communication and computational technologies (ICICCT), Coimbatore, pp 1344–1347. https:// doi.org/10.1109/ICICCT.2018.8473203 7. Alshaqaqi B, Baquhaizel AS, Amine Ouis ME, Boumehed M, Ouamri A, Keche M (2013) Driver drowsiness detection system. In: 2013 8th international workshop on systems, signal processing and their applications (WoSSPA), Algiers, pp 151–155. https://doi.org/10.1109/ WoSSPA.2013.6602353 8. OpenCV (2001) Open source computer vision library reference manual 9. Lienhart R, Maydt J (2002) An extended set of Haar-like features for rapid object detection. In: Proceedings of the IEEE international conference on image processing 10. Bhowmick B, Kumar C (2009) Detection and classification of eye state in IR camera for driver drowsiness identification. In: Proceeding of the IEEE international conference on signal and image processing applications
Deep Multi-agent Reinforcement Learning for Tag Game Reshma Raj and A. Salim
Abstract Deep reinforcement learning (DRL) is one of the most promising branches of machine learning. DRL has paved the way for an intelligent society made of autonomous agents that learn from its experience. The world in itself can be considered as a collection of agents. The agents take action in an environment based on their observation to obtain the maximum reward. Inspired from the break-through success in mimicking the game of Go, a lot of research has been conducted to optimize the behavior of agents in a single-agent framework. However, the ability of agents to interact and cooperate in a multi-agent setting is still a complex problem. This work interprets how individual agents in a Tag Game simulation develop cooperative and competitive abilities by mere manipulation of their reward function. Also, the paper compares the behavior of agents under various environmental scenarios from single to multi-agent cases. Keywords Reinforcement learning · Multi-agent systems · Neural networks · Predator–prey games
1 Introduction Reinforcement learning (RL) [1] is a mathematical framework that supports selflearning from experience. The learning takes place through continuous interaction with the environment that has consequences fed back as rewards. The agent modifies its behavior to obtain an optimal sequence of actions that can lead to a maximum reward. The application of RL ranges from playing challenging games to building robots [2]. However, the traditional RL algorithms are limited to a single-agent setting that has a few number of states and actions. The integration of deep learning [3] with RL initially remained unstable and complex. But the groundbreaking paper proposed by deep mind that introduced a R. Raj (B) The Federal Bank Ltd, Kochi, India A. Salim Department of CSE, College of Engineering, Trivandrum, Trivandrum, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 G. Sanyal et al. (eds.), International Conference on Artificial Intelligence and Sustainable Engineering, Lecture Notes in Electrical Engineering 837, https://doi.org/10.1007/978-981-16-8546-0_2
7
8
R. Raj and A. Salim
Deep-Q framework (DQN) [4] to play Atari games revolutionized the field. Apart from the astonishing ability to beat experts in various games, the DQN agents could learn on their own from raw inputs without any prior knowledge. Additionally, selfplay has proved to be a useful strategy for training. Further, the DQN structure was enhanced to double DQN [5] to improve its performance. Compared to a single-agent framework, the multi-agent setting poses numerous challenges [6]. First, the environment for each agent is dynamic. When multiple agents are interacting in an environment, their actions can directly impact that of other agents. Second, communication between the agents is difficult, and hence, there can be ambiguity among ally agents. Third, the complexity increases with the number of agents. This work explores the multi-particle environment for Tag Game [7]. An adaptation of DQN for decentralized multi-agent framework that is capable of learning policies by self-play is studied. Finally, the behavior of agents in a single-agent and multi-agent scenarios is compared and analyzed.
2 Related Work Reinforcement learning explains the various concepts including value functions, discounted rewards and optimization equations [8]. Initially, the Q-values were stored in the form of tables where the rows represent the states and columns represent the actions. The dimensions of the array are the number of states multiplied by the number of actions. Bing-Qiang Huang et al. used a single-layered neural network for robot obstacle avoidance problems [9]. Volodymyr Mnih et al. used recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from a high-dimensional sensory input using end-to-end reinforcement learning [10]. The agent was tested on the challenging domain of classic Atari 2600 games. The experiment demonstrates that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 Atari games. J. Pan et al., however, depicted the drawback of the Deep-Q approach that it takes more training episodes than the traditional RL algorithms to learn the gameplay [11]. Dealing with a multi-agent environment is highly contrasting and different. The challenges and application potentials in multi-agent reinforcement learning (MARL) are discussed in [12]. The factors that determine the learning goal in a MARL problem are identified. Recent successful research on how a decentralized learning of individual agents can work uses Deep-Q networks for Pong game [13]. The work was further extended to examine the coordination among agents that play doubles Pong games on the same side [14]. This learning is possible individually by manipulating the reward function. In the approach, the agents could learn its policy jointly.
Deep Multi-agent Reinforcement Learning for Tag Game
9
3 Background and Problem 3.1 Single-Agent Reinforcement Learning In RL, an autonomous agent takes actions in an environment by observing the state of the environment. Each action changes the state of the environment and can result in a reward or punishment. The reward (positive or negative) defines whether that particular action is good or bad. Often, the reward for an action is obtained after a number of timesteps or even at the end the task. Formally, the model that is used to depict the interaction in RL are Markov decision processes. It is defined as a tuple (S, A, T, R) where S is the set of states, A is the set of actions, T is the transition function, and R is the reward function. The transition function T gives the probability of transition to the next state s’ from the current state s by performing an action a at the timestep t: T s, a, s = Pr s(t+1) = s |st = s, at = a
(1)
The reward function R is the probability of getting an expected reward r t+1 from state s, when action a is performed: R s, a, s = Pr rt+1 |st = s, at = ast+1 = s
(2)
π (s, a) = p(at = a|st = s)
(3)
The policy function π decides which action is to be chosen in the next timestep. At a timestep t, the probability of taking an action a in the state s is defined by the policy function.
3.2 Deep Reinforcement Learning The basic Q-learning algorithm [15] stores the Q-values in a table format where the rows represent the states and columns represent the actions. But since even a moderate task in the real world has an exponential number of states and actions, it is not always feasible to store and manipulate Q-tables. Hence, an approximation is required that can act as a function to map states to actions. Such a nonlinear approximation is provided by neural networks (NNs). The network, characterized by θ, can be trained by minimizing the loss function: L(θ ) = E
r + g max Q s , a , θ − Q(s, a, θ ) a
2 (4)
10
R. Raj and A. Salim
Now the goal is to minimize this loss between the predicted and target values. Unlike the supervised learning method, here both target and predicted values are variable. The DQN framework obtains human level control on the Atari 2600 platform using two key ideas. The first one is the use of a replay buffer to store experiences that can be used to reduce data correlation and improve data efficiency. The second one is to use a separate target network for updating Q-values to improve the stability of the network. Since the DQN uses convolutional neural networks [16], it can exhibit an end-to-end mechanism by accepting the image pixels directly as input to the network.
3.3 Multi-Agent Reinforcement Learning In the real world, most of the complex problems that are encountered everyday can be easily modeled using a multi-agent setting. Ranging from social dilemmas to traffic control, the role of each agent is inevitable. Hence, rather than formulating an environment for each agent, it is better to consider multiple agents in a shared environment. Formally, the problem can be expressed as an extension of Markov decision process in single-agent case as a tuple (n, S, A1 , … An , T, R1, … n ) where n is the total number of agents, S represents the set of states, Ai is set of actions for agent i, T is the transition function, and R is the reward function. A joint policy π = is the collection of policies. Here, −i is the set of all agents other than i. Similar to a single agent’s goal, the goal of multiple agents is to maximize an expected reward, over timesteps.
3.4 Problem The world in a Tag Game consists of runners in green, chasers in red, and landmarks in black as shown in Fig. 1. The chasers chase the runners in a 2D world where the runners always try to escape. The landmarks provide places to hide for the runners that make the environment more challenging and complex. In a given limit of timesteps, the chaser needs to collide with the runner to obtain a chaser-win. Otherwise, it is considered that the runner has escaped and hence a runner-win. In the unpredictable environment, an agent can learn different sport skills like stopping and changing directions which is important for decision making in the real world. Also, cooperative and competitive strategies are developed with an aim to win the game. Each agent corresponds to one of the colored moving ball in the environment. The environment is created with a wall on all the four sides, so that the agents always remain in the window boundary. An agent can take one of the five actions in each timestep: move north, move south, move east, move west, or stay at the same position. The game ends when either the chaser collides with a runner or if the maximum number of iterations is reached.
Deep Multi-agent Reinforcement Learning for Tag Game
11
Fig. 1 Tag game environment
We have used a challenging and realistic environment where the number of agents can be varied to analyze performance of agents in various scenarios. Also, the various hyper-parameters like learning rate and discount factor can be modified to explore the agents’ strategies. The agents learn to act, compete, and cooperate from the scratch, using only reward as the sole reinforcement signal. Rather than using a centralized training scheme, we have used a decentralized concurrent learning strategy where each agent earns its own internal feedback (positive or negative) and optimizes itself trying not to lose.
4 Method In the given environment, N slower chasers chase M faster runners with random landmarks providing sites for hideout. A collision can result in a sudden hike in the reward obtained by the agents. Agents observe the relative positions and velocities of other agents and the position of the landmarks. Each agent is implemented as a variant of DQN. At each timestep, the agent observes its surroundings, providing spatial relationship with its neighborhood as input to the network. The structure of the network is as shown in Fig. 2. The output of the network will be the Q-value for each of the five possible movements: north, south, east, west, and stay. In order to reason with the imbalance in exploration with exploitation, an ε-greedy action selection strategy is used to determine which action is to be performed next: an action with optimal Q-value or a randomly selected action to explore. The method uses a concurrent learning strategy where the agents run independently. The multi-agent settings are often found to be unstable due to the difficulty in decision making by an agent. The reason for this limitation is the dynamic environment of the agent, where the action of an agent depends on the policies of all the other agents also. This is overcome in concurrent learning by keeping the environment of
12
R. Raj and A. Salim
Fig. 2 DQN structure
the agent stationary when it is deciding its action. Also, the changes made by all the agents are reflected to the agent in the successive timesteps. Thus, agents can learn independently, although they have the same goal.
4.1 Rewarding Scheme The reward that each agent achieves is the reinforcement signal that the agent obtains to determine the quality of the action taken. Hence, shaping the reward function to learn an optimal policy is important. At each timestep, the agent obtains a reward based on its distance to the adversaries. If a chaser collides with a runner, the chaser will obtain a reward of +10, while the runner obtains a negative reward of −10. To motivate an agent in any noncolliding state, a runner gets increased reward for increased distance from chaser and the chaser gets decreased reward for increased distance from runners. In addition, we have also provided landmarks where runners can hide. Hence, the runners get an additional reward for getting near the landmarks.
4.2 Experience Replay One of the key ideas of DQN, experience replay helps to learn from past experiences. Thus, experiences are stored as state, action, reward, and next state in the form of tuples . During training, random batches of stored tuples are sampled from the
Deep Multi-agent Reinforcement Learning for Tag Game
13
memory that allows learning from varied circumstances. Also, greater data efficiency is attained. However, the concept becomes complex with an increasing number of agents as it requires larger memory.
4.3 Separate Target Network The DQN improves the stability of the network by using a separate target network. If only a single policy network is used for both predicting action Q-values and calculating target Q-values in the next state, an update that increases q(s, a) also increases q(s’, a) for all a and hence the loss increases. This problem of divergence can be avoided by using another network for calculating target value g maxa’ q* (s , a ) that is a clone of the policy network. The weights of the target network are updated only after a fixed number of timesteps.
5 Experimental Results In this work, we have implemented each agent as a separate linear deep neural network. On running an episode, either the stored network is loaded and returned or else a new network is created. An ε-greedy policy is used to select action at each timestep, where the value of ε is taken as 0.25. Also, the hyper-parameter for the discount factor g is taken as 0.9 giving more importance to the rewards obtained in future. We have kept the number of agents (runners and chasers) and the number of iterations as hyper-parameters that can be varied to study the performance in various scenarios. Giving a fair opportunity to both the runner and chaser to win, the game will end after 250 iterations with a runner-win if the chaser fails to capture the runner within that interval. The network is fed with small mini-batches of 32 samples, and the target network is updated after 400 iterations. This is done to stabilize the network.
5.1 Loss Convergence Figure 3 shows the loss value for an episode (about 250 timesteps) of an agent in a single runner-chaser setting. Figure 4 plots loss of an agent in a multi-agent setting (2 runners vs. 2 chasers). In both the cases, we can see that the learning curves converge gradually, although the loss is more consistent when the number of agents is less. This can be explained by the dynamic environment to which the agent is exposed to where it needs to take random actions as well. A single game is considered as an episode in the experiment. For each episode, each network that represents the agent runs in parallel. For each of the agent networks,
14
R. Raj and A. Salim
Fig. 3 1 Runner versus one chaser
Fig. 4 2 Runner versus two chaser
the mean squared error and loss is calculated. Adam optimizer is used in the experiment to optimize the loss. For analysis purposes, at the end of each episode, the score of the runner and chaser are updated. If the chaser wins the game, the Chaser-Score is incremented by 1, or else the Runner-Score is incremented by 1.
Deep Multi-agent Reinforcement Learning for Tag Game
15
5.2 Rewards and Q-Values The rewards and q-values obtained by a losing and winning agent are shown in Figs.5 and 6, respectively. It can be noted that the values are almost identical in pattern for the agent under consideration. This is a proof to the fact that the behavior of the agent is adapting to its environment. Also, agents are adjusting to the behavior of other agents in the environment.
Fig. 5 Reward and Q-values of a losing agent
Fig. 6 Reward and Q-values of a winning agent
16
R. Raj and A. Salim
Fig. 7 Game score when both runner and chaser are given equal and fair opportunity
Table 1 Relationship of game score with varying iterations and agents
Scenario
No of iterations
Chaser score
Runner score
1C versus 1R
75
96
104
3C versus 3R
500
188
12
5.3 Game Results Figure 7 compares the total game scores of runner and chaser when both are given a fair chance of winning. It can be observed that the probability of a chaser winning the game increases with the total number of agents in the environment. The random location of each agent at the beginning of the game also plays an important role in the decision making. Table 1 proves that the agents are trained according to the reward as they show an expected behavior. If more time is given for a chaser, the opportunity to catch a runner increases. Also, the runner escapes most of the time if the number of iterations is reduced. Thus, it can be noted that the efficiency of chaser increases with an increased number of iterations and more number of agents, constrained by their initial random location.
5.4 Analysis Based on Random Dynamic Landmarks The landmarks are entities that determine the environment of the agents. Two types of landmarks are used in the world: the actual landmarks that provide hiding positions for the runner and the boundary landmarks that limit the visible environment. Experiments are conducted by varying the number and position of the actual landmarks in a limited environment.
Deep Multi-agent Reinforcement Learning for Tag Game
17
Fig. 8 Chaser score under different number of landmarks with two runners and two chasers
Figure 8 shows the results of analysis conducted over 800 episodes for 2, 4, and 8 landmarks. At the beginning of each game, a new random position is given to all the landmarks, so that the environment becomes more unpredictable. It can be seen that with increasing numbers of landmarks, it becomes more difficult for the chaser to win. However, we can see that the chaser score improves steadily irrespective of the number of landmarks. This is because eventually the chaser learns to cope with the changing environment. It is also influenced by the completely random position of other entities and landmarks.
6 Conclusions Although a lot of literature has been conducted in reinforcement learning, works that contribute to how different types of agents compete and converge in the multiagent settings are few. In the given work, we have presented a simple Tag Game environment where each agent is an independent DQN. The complexity is further reduced by using spatial information as input to a linear network rather than using convolutional neural networks which introduces a large number of parameters. Also, the time taken for capturing frames and preprocessing then is reduced making training faster. The analysis shows that the agents can converge on their own by manipulating only the reward structure. The work draws an interpretation of the behavior of agents in varying environments, considering change in number of iterations and co-agents.
18
R. Raj and A. Salim
References 1. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge, MA 2. Touzet CF (2000) Robot awareness in cooperative mobile robot learning. Auton Robots 8(1):87–97 3. Deng L (2014) A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Trans Signal Inf Process 3 4. Li L (2017) Deep reinforcement learning: an overview. arXiv. [Online]. Available: https://arxiv. org/abs/1701.07274 5. van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double Q-learning. In: Proceedings of the association for the advancement of artificial intelligence, pp 2094–2100 6. Neto G (2005) From single-agent to multi-agent reinforcement learning: foundational concepts and methods learning theory course 7. Lowe R, Wu Y, Tamar A, Harb J, Abbeel P, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: NIPS 2017 8. Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA (2017) A brief survey of deep reinforcement learning. IEEE Signal Process Mag, special issue on Deep Learning for Image Understanding 9. Huang B-Q, Cao G-Y, Guo M (2005) Reinforcement learning neural network to the problem of autonomous mobile robot obstacle avoidance. In: Proceedings of the 4th international conference on machine learning and cybernetics, Guangzhou, People’s Republic of China, pp 85–89 10. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529– 533 11. Pan J, Wang X, Cheng Y, Yu Q (2018) Multisource transfer double DQN based on actor learning. IEEE Trans Neural Netw Learn Syst 12. Busoniu L, Babuska R, De Schutter B (2008) A comprehensive survey of multiagent reinforcement learning. IEEE Trans Syst Man Cybern 38(2), 156–172 13. Tampuu A, Matiisen T, Kodelja D, Kuzovkin I, Korjus K, Aru J, Vicente R (2015) Multiagent cooperation and competition with deep reinforcement learning. abs/1511.08779 14. Diallo EAO, Sugiyama A, Sugawara T (2017) Learning to coordinate with deep reinforcement learning in doubles pong game. In: 2017 16th IEEE international conference on machine learning and applications (ICMLA), Cancun, pp 14–19 15. Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8(3–4):279–292 16. Krizhevsky A, Sutskever I, Hinton G (2012) ImageNet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1106–1114
A Proposed Framework to Achieve CIA in IoT Networks Monika Mangla, Smita Ambarkar, Rakhi Akhare, Sanjivani Deokar, Sachi Nandan Mohanty, and Suneeta Satpathy
Abstract Internet of things (IoT) is a ubiquitous technology that has witnessed its application in numerous applications ranging from agriculture, transportation to healthcare, etc. IoT platform aims toward connecting numerous sensors and actuators (sometimes referred to as edge devices) via the Internet. Here, it is worth mentioning that IoT standard is widely accepted as the prime standard for connecting low power lossy communication containing constrained resources. However, the realtime deployment of these sensors faces various challenges like multi-hop communication, heterogeneous topology, unattended deployment, etc. Apart from these challenges, the most concerning challenge is unsecured communication. As sensors are low power and have limited computational capability (in terms of memory and processing), they do not have any inbuilt security mechanism. Furthermore, the communication protocol used in IoT network like 6LOWPAN, LORA, ZIGBEE, RFID, etc., also have limited security. Thus, security aspect takes a backseat despite rigorous automation and digitization. But security is undoubtedly the most concerning aspect for any application. Hence, our paper focuses on security issues of IoT applications and proposes a secured framework to preserve confidentiality, integrity, and authentication (CIA) with respect to various attacks on IoT network. Keywords IoT · Sensors · Network attack · CIA principle · Denial of service · Cloud computing
M. Mangla (B) Department of Information Technology, Dwarkadas J. Sanghvi College of Engineering, Mumbai, India S. Ambarkar · R. Akhare · S. Deokar Department of Computer Engineering, Lokmanya Tilak College of Engineering, Navi Mumbai, India S. N. Mohanty Department of Computer Science & Engineering, Vardhman College of Engineering (Autonomous), Hyderabad, India S. Satpathy Faculty of Emerging Technologies, Sri Sri University, Cuttack, Odisha, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 G. Sanyal et al. (eds.), International Conference on Artificial Intelligence and Sustainable Engineering, Lecture Notes in Electrical Engineering 837, https://doi.org/10.1007/978-981-16-8546-0_3
19
20
M. Mangla et al.
1 Introduction IoT is the most recent advancement in technology that has revolutionized the way communication takes place. In IoT, every object possesses some intelligent behavior and communicates with all connected objects in the network [1]. The generic architecture of IoT is illustrated in Fig. 1. This capability of IoT advocates its application in many domains like agriculture, health, weather monitoring, etc. However, real-time and widespread deployment of IoT calls for efficient maintenance of privacy and security aspect in the network. This unprecedented requirement for security is owing to the usage of open networks for communication which escalates its vulnerability to attacks. The network attacks can be broadly classified as internal and external attacks. Internal attacks are caused due to protocol frailty, whereas external attacks are caused due to vulnerability of the overall network. The aim of providing security is to prevent IoT networks against these internal and external attacks. A network is said to be secure if it provides confidentiality to the data, supports valid authentication mechanisms, and maintains availability of resources. These principles of confidentiality, integrity, and authentication are also referred to as CIA principles. CIA principles basically refers to confidentiality, integrity, and availability in the communication networks [2]. It basically can be implemented for data security ensuring that all possible dimensions of security have been considered. The implementation of CIA principles is necessitated owing to escalated integration of IoT with global Internet. This integration of IoT exposes the sophisticated and confidential data to a large audience and thus exponentially widens the scope of network vulnerability. CIA principles of network security can be detailed as follows: Confidentiality: Usage of communication through wireless technologies in IoT escalates the vulnerability and risk of confidentiality. Moreover, it is a demanding job to provide data confidentiality to capacity-constrained devices involved in IoT. For instance, the majority of devices in IoT involve sensors, tablet, RFID tags, etc.,
Fig. 1 Illustration of generic IoT architecture
A Proposed Framework to Achieve CIA in IoT Networks
21
Fig. 2 Illustration of challenges in IoT security
having computational and storage capacity limitations. Hence, it becomes a huge challenge to obtain data confidentiality in IoT networks. Integrity: As IoT networks mainly operate at their own in an unattended environment having limited maintenance, this escalates the avenues for data tampering in comparison with supervised networks; hence, it is concluded that IoT networks are quite vulnerable to integrity loss, and hence, preventive approaches must be adopted. Availability: The future of IoT observes connecting billions of IoT devices to the network where each device requires uninterrupted access to the data. Just like data, it also requires continuous access to devices and services. Hence, another vital challenge for IoT networks is to maintain uninterrupted availability of data, services, and devices in the network. In addition to maintaining CIA principles, IoT also faces several considerable challenges like scalability, heterogeneity, energy efficiency, etc. All such challenges are shown in Fig. 2. Here, in this paper, authors attempt to discuss the various challenges for maintaining CIA principles in IoT networks. Additionally, authors also attempt to propose architecture for efficient and effective IoT networks. The paper has been organized as follows: The rudimentary of IoT networks and associated CIA principles has been introduced in Sect. 1. Challenges and threats for CIA principles have been discussed in Sect. 2. Section 3 focuses on the related work that has been proposed by various researchers. Employment of IoT networks in various domains have been considered during discussion of related work. An efficient architecture for maintaining CIA principles in IoT networks has been presented in Sect. 4. Experimental setup has been demonstrated in Sect. 5. Finally, conclusion and future directions for research have been considered in Sect. 6.
22
M. Mangla et al.
2 CIA Challenges in IoT Networks In order to maintain CIA principles in IoT networks, a single solution is to maintain efficient authentication mechanism in the network. Also, efficient authentication ensures that there is no compromise in the CIA principle in IoT network. However, despite best preventive measures, IoT network is vulnerable to various security attacks. Major classifications of attacks that affect network security have been mentioned in the following subsection.
2.1 Networks Attacks on Confidentiality Confidentiality is a vital security goal that cannot be compromised at any cost as violation of confidentiality gives entire network detail and/or access to the attacker. Attacker performs sniffing attack (a passive attack) in order to gain access to the network with the help of some compromised device. Thus, attacker gains access to information including routing information, data content, and network topology. Possession of this information provides the attacker an opportunity to accomplish routing attack, sinkhole attack, man in the middle attack (MIM), etc. [3].
2.2 Networks Attacks on Availability Availability of a node is compromised through interference by the attacker. Here, the attacker mainly targets the nodes with high traffic. Denial of service (DOS) which is a quite prevalent attack on availability. DoS disrupts the availability of various network resources [4]. It results as a consequence of receiving too many requests that eventually slow down or completely interrupt the system service. Hence, DoS completely shatters an IoT network and results in degraded network performance.
2.3 Networks Attacks on Integrity Integrity attack damages the network as it involves transmission of inconsistent information [3]. Integrity attack does so by manipulating routing information or by replaying data and/or control message. Using either method, integrity attack basically disrupts the communication. Wormhole attacks and replay attacks are examples of integrity attacks. In wormhole attack, the attacker positions itself in a dominating position and publicizes to other nodes in the network that it has the shortest route for data transfer. Resultantly, the attacker exhausts the bandwidth and disturbs the complete network.
A Proposed Framework to Achieve CIA in IoT Networks
23
Replay attack fraudulently retransmits a valid data number of times [5]. As a result, it becomes difficult for the receiver to authenticate the message eventually leading to integrity loss. In replay attack, the attacker accesses genuine network traffic and then communicate with receiver masquerading as the original sender. Thus, replay attack breaches network security by storing information without permission resulting in falsified identification and authentication. During literature survey, it is noticed that there have been several efficient and secure IoT architectures in existence. Some of these architectures have been discussed in the following section.
3 Related Work Consideration of diverse application domains enables enhanced understanding of the security architectures.
3.1 IoT Architecture in Healthcare Authors in [6] present a security context framework for distributed IoT systems in healthcare. The proposed framework employs adaptive security contexts to legitimate data of interest. This framework decides the audit list for security context which is used to evaluate the strength of selected network and technology. For the same, it may employ blockchain technology [7] to maintain distributed ledger. Thus, the proposed framework achieves accountability of devices, services, and responsible parties. It helps to leverage legal and policy aspects in addition to technical aspect for healthcare systems. Major advantage of this architecture is its enhanced security if devices are incorporated with an identification unit. Embedding identification units in the devices alleviate the scope for unauthorized access to health data. However, it has a downside also. The proposed framework is difficult to implement in reality as current health monitoring devices are not equipped with identification unit. This problem has been resolved by associating devices with user names but that eliminates the scope for resource sharing. In another framework [8], authors have proposed a framework having three different kinds of communication channels. These three communication channels are as follows: 1. 2. 3.
Communication channel among biometric sensor nodes to internal processing unit (IPU) Communication channel among IPU and gateway Communication channel among gateway and cloud.
24
M. Mangla et al.
In this framework, all these channels are not secured, thus giving attacker a scope to attack these insecure channels. Authors use AES-256 and SHA-3 hash function to maintain CIA principle of security. In this approach, authors proposed a key exchange method based on RSA algorithm for the purpose of registration at every level. Authors claim to achieve the CIA principles using proposed approach. However, it has a downside that it cannot be implemented for low power and lossy networks (LLN) devices as the involved algorithm has complex nature.
3.2 IoT Architecture in Smart Cities In this section, authors present the prevalent security architecture for smart cities. The motive behind considering the architecture of smart cities is that cities are technologically revolutionizing at an unprecedented rate. This technological revolution in real life demands handling of associated challenges in terms of technology. Author in [9] has presented a communication protocol. This protocol aims to enhance security by employing identification management. It suggests a mechanism to interact with help of unique IDs for each entity. Here, the author proposed a smart city security layer (CSL) architecture. CSL is responsible to change the sent identifier to another identifier which is generated by integrating ID and the service. Thus, the identity of the entity is confidential and unique within the overall environment. The major strength of CSL is its ability to present unique ID for each entity and service type. Hence, in a system employing CSL, real user is protected through its real ID.This protection comes at an additional cost of an extra layer to handle authorization and authentication process.As a result, continuous and progressive evolution is taking place in CSL in order to provide a optimal solution to widen its application. Authors in [10] have proposed an architecture ArchiMate 3.0 consisting of four layers. In this model, the business layer includes products and services. Thereafter, application layer manages application services and provides a support to business layer. Technology layer manages the required infrastructural services and devices. Finally, physical layer models the physical components in the system. Authors in [10] focused on CIA principles of security for security development and addressed security issues with respect to information security, IoT security, and smart city security. The downside of this architecture is that it is static and can be employed in specific kind of smart cities. The architecture is quite complex for implementation as it involves several aspects.
A Proposed Framework to Achieve CIA in IoT Networks
25
3.3 IoT Architecture in Agriculture Authors present a few popular IoT architectures existing in literature implementable in the agriculture domain. Authors in [11] proposed a system which monitors the vital agriculture parameters such as soil moisture, humidity, and temperature and stores these in dataset. If these parameters are tampered by an unauthorized person, it may lead to huge loss for the crop and the farmer eventually affecting the growth of the nation. Hence, it necessitates a secure and authentic method to maintain CIA principle of data. In this regard, authors in [11] proposed implementation of Burrows-AbadiNeedham (BAN) logic to verify the authenticity of the message. Once the message is authenticated, it further uses automated validation information security protocol application (AVISPA) to specify the security properties. This architecture provides shield against active as well as passive attacks. The proposed architecture provides security and privacy against all possible attacks, but it still lacks to achieve apical accuracy due to insufficient data sets. Another downside of this architecture is its increased complexity. Authors in [12] presented a RFID-based user authentication approach for IoTbased smart farming. Farm management information system (FMIS) interconnects agricultural machines and allows in-field communication. Current FMIs lack security and user authentication. This limitation is overcome in proposed architecture by providing a decentralized and open-source FMIS. In this architecture, confidential communication is ensured between actors and data sovereignty which is maintained. Additionally, it employs TLS based on certificates issued by PKI to secure communication channels. It also integrates OAuth2 to achieve RAC. In the suggested approach, user authentication is specifically emphasized. Thus, it is concluded that this approach is cost efficient for RFID-based authentication. The challenge of the proposed approach is that it bears the limitations of RFID technology. In [13], authors propose a secure IoT system for farm monitoring. The proposed architecture uses AES 128 for enhanced security. It also employs methodologies like division and rearranging, checksum using private and open key to encode the data. The proposed system is capable of obtaining optimal latency, improved packet delivery rates, and enhanced security. This architecture focuses on reducing the complexity of preserving confidential agriculture information. The challenge of this approach is time taken by this algorithm which needs to be reduced particularly for cryptography process. Following section discusses the proposed architecture for IoT networks that aim to maintain CIA principles.
26
M. Mangla et al.
4 Proposed Framework Attacks on IoT network result into loss of CIA principles, as stated earlier. Loss of CIA in a network is not acceptable; hence, preventive measures should be taken. Here, in this paper, the authors propose a framework that maintains CIA principle in the network. The proposed framework as shown in the following Fig. 3 is multi-layered which prevents loss of data at every layer using appropriate preventive approach. As evident from the figure, the architecture consists of sensing nodes in the IoT network. This IoT network uses IPV6 low-power personal area network (6LoWPAN) as communication protocol and routing protocol for low power and lossy network (RPL) as routing protocol. RPL forms destination-oriented directed acyclic graph (DODAG). This sensor network will forward a data toward gateway. Sensor forward data to gateways in streaming fashion which may create congestion at gateway. Thereafter, the streaming data is packetized at gateway, and packets are transmitted to fog node at regular intervals. At fog node, this data is aggregated, and aggregated data is forwarded to cloud. Aggregation of data at fog node aids in minimizing the likelihood of bottleneck at cloud.
4.1 Security in Proposed Architecture As discussed in the previous section, 6LoWPAN and RPL are vulnerable to various security attacks. While surveying the relevant literature, it is observed that popular security architectures provide security mechanism at either side of the network which does not mitigate the attacks completely. Hence, authors realize that in order to achieve the completely secured network, security mechanism should be incorporated at each layer. Hence, authors in this paper propose a multilayer secured architecture providing security at each layer as shown in the figure. As shown in Fig. 3, layer 1 (connecting edge devices and gateway) is secured using symmetric key approach. Here, each edge device has its own identification key
Fig. 3 Illustration of proposed approach
A Proposed Framework to Achieve CIA in IoT Networks
27
which also been stored at the gateway. As per the proposed approach, each edge device uses a lightweight encryption algorithm to encrypt its data with its identification key. However, encryption of each message at edge device level may result into abundance of computational complexity. Therefore, the authors propose that each message need not be encrypted. For instance, if there exists a device monitoring BP level of a patient in healthcare, the tolerable range of such device is 80–120. Now while the reading of the device lies in the range, it will be forwarded without applying encryption. On the contrary, if the reading goes beyond threshold in either direction, the data will be encrypted. Using this technique provides security in the network and also helps to detect the intervention in network. For instance, if the BP reading of a patient is recorded in the range of 80–120, it will be sent without encryption. In any case if attacker modifies this reading (either below 80 or more than 120), the modified value will be forwarded to the gateway as plain text. Now, whenever gateway receives any value outside the permissible range as plain text, it understands the presence of intervention in the network. This understanding is developed from the fact that in the proposed approach, any value outside the range will always be forwarded in an encrypted manner. Thereafter, the second layer (between gateway and fog) is secured using encapsulation and tunneling approach. Here, only encrypted packets are stamped using gateway’s identification key and tunneled to fog node. Furthermore, the encrypted packet received at fog node are aggregated and transmitted to cloud node at regular intervals. This is the third layer between fog and cloud which is secured by traditional confidentiality algorithm.
4.2 Advantages of Proposed Approach The proposed framework achieves for CIA goals of security in the most economical manner. The proposed approach suggests implementing a lightweight algorithm at sensor level considering their limited computational and storage capability. Usage of lightweight authentication algorithm is an improvement in the proposed approach. Moreover, the proposed algorithm also minimizes the data transmission as a result of introduction of aggregation algorithm at fog node.
5 Experimental Setup Authors have used the following IoT network for validating efficiency of the proposed framework. For the same, authors have used a network demonstrated in Fig. 4. This network has been developed using contikicooja, a promising, and tiny operating system for realizing IoT networks. The network demonstrated in Fig. 4 consists of 21 sensor nodes. Among these 21 nodes, node 1 (demonstrated by green color) acts as gateway node and remaining 20 nodes serves as sender nodes. These sending nodes
28
M. Mangla et al.
Fig. 4 Illustration of implemented IoT network
continuously sense the data and forwards it toward gateway node. This network uses 6LoWPAN communication protocol. During network setup, gateway node (aka sink node) sends the destination information object (DIO) message to every node. This DIO basically requests all nodes to join the network. If any sender node is willing to join the network, it sends an acknowledgement in the form of destination acknowledgement object (DAO). Upon receipt of this DAO, network is set-up using destination-oriented directed acyclic graph (DODAG). Our paper analyzes the efficiency of illustrated IoT network using discrete parameters like average power consumption, radio duty cycle, network hops, and received packets per node. The average power consumption is calculated using CPU, lowpower mode (LPM), network hops, and power incurred in transmission and receiving. Here, LPM represents the power consumed by sensors when these are in sleep mode. Average power consumption of a node is in direct proportion to its distance from sink node. Also, it is evident from Fig. 5d that node 3 is two hops away from sink node, and therefore, requires huge power and radio duty. Consumption of maximum power and radio duty cycle by node 3 is shown in Fig. 5a, b, respectively. Here, radio duty cycle represents the total number of packets transmitted and received by the node. Radio duty cycle for all nodes is shown in Fig. 5b. Here, authors would like to mention that the network demonstrated in Fig. 4 is an unsecured network as the underlying 6LoWPAN routing protocol is an unsecured protocol. Here, it is worth mentioning that the proposed approach not only addresses the security issue for 6LoWPAN but also secures the link up to cloud. Thus, the
A Proposed Framework to Achieve CIA in IoT Networks
29
Fig. 5 a Graph for average power consumption. b Graph for avg radio duty cycle. c Graph for received packets per node. d Graph for N/W hopes for each node
proposed approach provides the complete security solution for IoT (6LoWPAN) network in any manufacturing industry.
6 Conclusion and Future Scope Here, we discussed the most prevalent CIA principles of security. It was followed by various kinds of attacks on these CIA principles. The paper presented the most commonly existing secured IoT architectures from various domains including healthcare, smart cities, and agriculture. The paper also presented an approach to maintain CIA principle in IoT networks that significantly outperforms the limitations of existing architectures. The proposed architecture achieves the CIA principle with minimal resources. The proposed work can be further extended in the direction of designing lightweight algorithms for authentication and encryption. It can also be taken further for its real-life implementation.
30
M. Mangla et al.
References 1. Zeinab KAM, Elmustafa SAA (2017) Internet of things applications, challenges and related future technologies. World Sci News 2(67), 126–148 2. Liu X, Zhao M, Li S, Zhang F, Trappe W (2017) A security framework for the internet of things in the future internet architecture. Future Internet 9(3):27 3. Jain A, Jain S (2019) A survey on miscellaneous attacks and countermeasures for RPL routing protocol in IoT. In: emerging technologies in data mining and information security. Springer, Singapore, pp 611–620 4. Khan MA, Salah K (2018) IoT security: review, blockchain solutions, and open challenges. Future Gener Comput Syst 82:395–411 5. Mangelkar S, Dhage SN, Nimkar AV (2017) A comparative study on RPL attacks and security solutions. In: 2017 international conference on intelligent computing and control (I2C2). IEEE, pp 1–6 6. Sangpetch O, Sangpetch A (2016) Security context framework for distributed healthcare IoT platform. In: International conference on IoT technologies for healthcare, pp 71–76. Springer, Cham 7. Nakamoto S (2019) Bitcoin: a peer-to-peer electronic cash system. Manubot 8. Chattopadhyay AK, Nag A, Ghosh D, Chanda K (2019) A secure framework for IoT-Based healthcare system. In: Proceedings of international ethical hacking conference 2018. Springer, Singapore, pp 383–393 9. Ferraz FS, Sampaio C, Ferraz C (2015) Towards a smart-city security architecture: proposal and analysis of impact of major smart-city security issues. In: SOFTENG 2015: the first international conference on advances and trends in software engineering information, pp 108–114 10. Berkel ARR, Singh PM, van Sinderen MJ (2018) An information security architecture for smart cities. In: International Symposium on business modeling and software design. Springer, Cham, pp 167–184 11. Diaz Lopez D, Uribe MB, Cely CS, Torres AV, Guataquira NM, Castro SM, Nespoli P, Marmol FG (2018) Shielding IoT against cyber-attacks: an event-based approach using SIEM. Wirel Commun Mobile Comput 12. Sicari S, Rizzardi A, Grieco LA, Coen-Porisini A (2015) Security, privacy and trust in Internet of Things: the road ahead. Comput Netw 76:146–164 13. Pacheco J, Hariri S (2016) IoT security framework for smart cyber infrastructures. In: 2016 IEEE 1st international workshops on foundations and applications of self* systems (FAS* W). IEEE, pp 242–247
Localization in Underground Area Using Wireless Sensor Networks with Machine Learning P. Rama and S. Murugan
Abstract The abstract is a mandatory element that should summarize the contents of the paper and this study is to cregard as the perplexity of calculation the geographic role of sensor nodes in a Wi-Fi nodes community wherein maximum sensors are devoid of a forcible self-organizing methodology. System suggests WUSN-SVMs—a singular answer to the subsequent stature. First, WUSN-SVMs localizes the community primarily based totally on simple joined message (i.e., hop counts only) are consequently straightforward and do it now no longer needed specialized within coverge devices or helping cellular gadgets as in maximum extant methods. Next, WUSN-SVMs are guides to support vector machine (SVM) knowledge. While, SVM is a type functionality, gadget display its suitable to the target quandary and corroborate that the localization mistakes may be top level through any minimal threshold given the proper education statistics size. Third, WUSN-SVMs abodes the border and coverage-hollow issues effectively. Last however now no longer least, WUSNSVMs gives abstinence target in a scattered way with the green use of processing and communiqué resources. System additionally suggests a changed model of massspring optimization to in addition revamp the area estimation of WUSN-SVMs. The auspicious overall performance of WUSN-SVMs is exhibited through our simulation study. Keywords Wireless underground sensor networks · Support vector machine · Localization · Mass-spring · Coverage · Data size
1 Introduction Wireless sensing element systems are classically consisted of cheap sensing hardware and restricted properties. In noraml suitation GPS- are not fully furnished for receiver and once such units are installs it will not work properly because of green exigencies. One of the other side, sophisticating the geographic targets of the sensor devices is P. Rama (B) · S. Murugan Department of Computer Science and Engineering, Sathyabama Institute of Science and Technology, Chennai, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 G. Sanyal et al. (eds.), International Conference on Artificial Intelligence and Sustainable Engineering, Lecture Notes in Electrical Engineering 837, https://doi.org/10.1007/978-981-16-8546-0_4
31
32
P. Rama and S. Murugan
important to do to several tasks of a sensor network like system managing, result finding, geography-based punctuation mark dealing out. Therefore, a vital drawback is to set up an exact, efficient, and fast-converging method for reckoning the node targets the fact of location data is stripped-down or well known [1, 2]. On the contrary hand, sophisticating the geographic areas of the wireless nodes hubs are basic to attempt to numerous errands in the sensor to arrange the system executives, occasion identification, geology-based inquiry preparing, and steering. Thusly, a vital issue is to design a precise, effective, and quick combining strategy for retribution the sensor areas and the truth area data is insignificant or well known.
2 Related Work and Our Motivation A direct limitation technology is to accumulate the data (for instance, availability and pair astute separation calculates) on the whole system into one spot, when to gathered data is handled halfway to gage the sensors’ areas utilizing scientific calculations, for example, semi unmistakable programming and multidimensional scaling. Notwithstanding its incredible estimate, this brought together methodology is unrealistic for huge scope a sensor organizes because of high calculation and correspondence costs [1]. Numerous strategies have been proposed to endeavor restrictions on a circulated way. The relaxation-based technique starts with all the nodes in initial positions and keeps refining their position using algorithms such as local neighborhood multi alteration and convex optimization [3, 4]. The organized framework fastens the procedures to isolate the system into covering districts, with the hubs in every area being situated generally to the district’s neighborhood facilitate framework (a brought together calculation might be utilized here). The neighborhood-organized frameworks are then consolidated, or “sewed,” together to shape also widely arrange framework. Limitation precision can be improved by utilizing reference point-based strategies which exploit hubs of known areas, called guides, and extrapolate obscure hub areas from the signal areas [5]. Most current strategies expect that the separation between two neighbor hubs can be estimated, normally by means of an extending procedure [6, 7]. For example, pair shrewd separation can be assessed dependent on the received signal strength indication (RSSI), time difference of arrival (TDOA), or angle of arrival (AOA). The issue with separation estimations is that the extending procedure (or equipment) is exposed to commotion, and to multifaceted nature/cost increments with precision are necessity. For an enormous wireless devices coordinate with low-end wireless devices, it is regularly not reasonable to furnished all with extending capacity. Here, exploration, framework takes care of the confinement issue with the accompanying humble prerequisites: (R1) reference point hubs exist, (R2) is legitimately sensor hubs, and (R3) the availability of data might be utilized for area estimations (pair shrewd separation estimations is not required). Prerequisite (R1) is for improved confinement precision. Prerequisite (R2) loosens up the solid necessity on the correspondence scope
Localization in Underground Area Using …
33
Fig. 1 Number of 1000 wireless devices on a 100 m × 100 m area, with 50 random beacons. A line joined the true and calculates targets of each sensor node. a Flow: the border issue remains after 10,000 averaging loops. b WUSN-SVMs: the border issue disappears
of reference point hubs. Prerequisite (R3) maintains a strategic distance from the costly going procedure. Every one of these prerequisites is sensible about enormous systems where sensor hubs are of little assets [1]. Scarcely any without range procedures have been proposed. APIT expects that the hub can be notified of an enormous number of signals; hence, it does not fulfill the necessity (R2). Spotlight offers great outcomes yet need an airborne medium to produce glow onto the wireless device field. The proposed method, the sensor device is to give instruction couple of remote calculation information must be reaching to converge in on a “global rigid” mention the concern wireless target device can be uniquely findout the works do not give fact the needed information (R3). A mainstream approaches that has similar prerequisites {R1, R2, R3} with proposed method is flow, when every hub is more than once situated as the centric of its near by until union [8, 9]. Figure 1 shows, the outlines of the two principle issues: First, the assembly issue (for example, many average circles are bring about long confinement time and critical data transfer capacity utilization). Second the corner issue (i.e., wirelss devices come to close the edge of the wirelss devices area are low quality is cited). Then after long time may occurs to number of past methods.
3 Contributions System proposes WUSN-SVMs—a new revalution solve the problem and give actual needs to target the improve the corner issue significantly (show in Fig. 1b). WUSNSVMs are additionally viable in systems by the presence of inclusion openings or deterrents. WUSN-SVMs restricts the system utilizing the learning idea of support vector machines (SVMs). SVM is a characterization technique by two principle segments: a portion work and many help vectors. The help vectors are gotten by means of the preparation stage given the preparation information. New information
34
P. Rama and S. Murugan
is characterized utilizing a straightforward calculation including the bit capacity and bolster vectors. For the restriction issue, framework characterized a lot of geological locales in the sensor field and order for every sensor hub of these districts. At that point, its area can be assessed inside the crossing point of the containing areas. The recent technology is investigating the materialness of SVM to the restriction issue. This procedure, in any case, expects that each hub can gage the immediate sign quality of all the reference points, which negates the prerequisite (R2). It should be an occurrence of a large system, a hub can just get frequency from a little subset of guides, thus, this method would be fundamentally low exact. The proposed method is more comfortable with the systems bigger scope; it depends on the availability of the immediate sign quality. Our commitment incorporates meanings of the portion capacity of types to order the wireless hubs, a system can be support the classifiers, and a hypothetical clear investigation into the limitation blunder. System also proposed a mass-spring optimization (MSO)-method is increase the performance of location accuracy.
3.1 SVM with WUSN Support vector machine (SVM) relies on the standard of formulated less issues the chance of SVM is to extend a perfect hyper plane within the element area, the hyper plane will isolate the two categories of information with the biggest span. SVM is associating AI calculation, therefore it ought has been ready to fabricate the system. SVM will settle straight and nonlinear order. On the off probability that the preparation tests are nonlinear, SVM match the examples into a towerring area element area by a nonlinear coming up with capability. The dimension of this feature area is also terribly massive, also endless, however the little turnon linearly severable. SVM builds the best categorization hyper plane during the towerring feature areas; therefore, the classifier’s calls additionally gained. To beat the bother of spatial property in high dimension areas, SVM starts the kernel operates in step with to plan for SVM, the hyper planes are solely separates into two categories of knowledge, however, additionally it makes the classification of intervals in the largest areas. It is guess that are linearly severable sample sets (x i , yi ), i = 1, 2, …, n. X ∈ Rd ), y ∈ is the category variety. The classification hyper plane within the d dimensional area is shows an Eq. (1): ω∗x +b =0
(1)
After the normalization, classification intervals are described the Eq. (2): |1 − b + 1 + b| 2 = ω ω
(2)
Localization in Underground Area Using …
35
The issue of looking through ideal characterization hyper planes is changed over into getting the base a incentive by fulfilling the conditions: 1 min w2 + C ξi 2 i=1 l
s.t. yi ((ϕ(xi ) ∗ w) + b) ≥ 1 − ξi , ξi ≥ 0, i = 1, . . . , l
(3)
where * is inward items, W square measure constant vector, atomic number 83 square measure slack factors. C is gruel boundaries, which is chosen by a shopper. The condition (3) gives the issued of pointing for ideal hyper planes is reborn into a sweetening issue. Lagrange multipliers will assess the bottom. The condition (4) depicts modified quadratic programming an issue. ⎫ n ⎬ 1 αi − αi α j yi y j K (xi , x j ) min Q(a) = min ⎭ ⎩ 2 i, j=1 i=1 ⎧ n ⎨
s.t.
n
yi ai = 0, 0 ≤ α ≤ Ci , i = 1, 2, . . . , n
(4)
i=1
where i is Lagrange multipliers of xi , K xi , x j = ϕ(xi )·ϕ x j . ϕ is kernel method. ϕ(x) be the planning capacity. Piece work streamlines inward item activity. However, also vectors be planned into unbounded dimensional gap, isolating hyper planes an acquired also the dynamic calculation is practicable. At long last, the choice capacity is depicted as condition (5). n αi∗ yi K (xi , x) + b∗ f (x) = sgn
(5)
i=1
There = i is a Lagrange multipliers, it can fulfill by 0 < i * < C, x i = Support vectors (SVs) the value of i is not 0, n = a is numeral of SVs, *b = is unfairness cost. Formula (5) defines that choice method based on support vectors. Which eludes that best hyper plane is builds by these support vectors. The objective of teaching is to receive the support vectors.
4 Problem Definition and System Architecture It is alleged, the N wireless sensor nodes (s1, s2, s3, …. sN) is deployed in 2nd regions [0, D], [D > 0] and also the announcement vary R of every device is similar.
36
P. Rama and S. Murugan
Double hubs will speak by every previous if denial sign interference altogether stay alive on them and their geographic separation is not specifically their correspondence is extended. There is k < n(i = 1 − K) by their renowned owns position to is known as guiding light nodes within the wireless sensing element system. These beacon nodes are accessible to one another. The left behind nodes (j = K + 1 − N) will get to the beacon nodes. < /n.
5 SVM Model Figure 2 shows, the second space separated and addicted to K cell every measurement is split interested in M cell. so, the world coated near the system separated on M × M cell. The every small room comprises the tag for coaching knowledge. Institutively, x co-ordinate has M = 2m categories cxi, y co-ordinate has M = 2m categories of CYj. The form assumes that every node exists on school [cx, cy] in second space. In different word, severally every node exists on [D/M}, iD/M]*[D/M, jD/M}] the component. Merely, they tend to style the middle of every component by (x i , yi ) and that the system allocate the middle of calculable units, because the foreseen spot. If the on top of calculation is so accurate, the placement mistake for every node is at the for the most part D/M*2(1/2) .The supply knowledge would be assembled by signals from the system and would be send to the top point of reference when the SVM calculation is successively. The SVM model would be communicated with all or any the preparation hubs. The SVM model is work via getting ready. The created
Fig. 2 Decision tree used for x dimension classification
Localization in Underground Area Using …
37
model would be communicated with all or any the hubs utilizing this model so as to register its cell whereby its dwells. Let (x(si ), y(s j )) denotes the true co-ordinates of nodes S j ’s position, and (S i , S j ) the step counts distance end to end of the shortest path of sensor nodes S i , Sj. Each nodes S i is symbolized on a vector |si , = h(si , s1 ),h(si , s j ),h(|si sk ). The instructed information onto SVM is the situate of beacons {S j } (i = 1 − K). System defines the kernel functions as a radial root method because of its experiential usefulness. K (si , s j ) = e−γ |si −s j | where |si . sj | is the l2 rule and −γ > 0 is a stable to be work out throughout the fractious corroboration stage of the guidance procedure.
6 Huge Size Teaching Sample Decrease Scheme for SVM The SVM has sensible simplification presentation and sensible categorization exactness. In case, for an enormous scope remote device preparation, confinement calculation obsessed with SVM appearances is the trouble of the huge scope knowledge tests. The big level coaching illustration fetches measured knowledge swiftness also enormous storage space command. Furthermore, if coaching sample knowledge mingles by outlier knowledge, the coaching product can end up in classification of accuracy declines. These issues can directly hinder the SVM technique application for localization rules. In this way, during the prior to getting ready the SVM, framework has to be compelled to preprocess the training take a look at info, to decrease the dimensions and evacuate the anomaly information. According to this basic steps with the essential principle of SVM in section two, analysis will acquire that solely support vectors (SVs) sample a square measure helpful for categorization and SVM coaching routput’s freelance of the non-support vector.
7 Workout-Based Finilization of the Example Position Type Based on FCM To make a decision kind of example purpose, these systems ought to work out the gap among each 2 point to urge the locality of every point. However, same technique desires O(n2) order of space computation. The sizes of coaching examples are well built then value point in time is extra. However, the wireless sensor network’s extent is just as well big, this process of come to a decision direct kind is not sensible. Therefore, this method proposes a replacement process supported blurry clump algorithmic rule to make your mind up sample points types. Primarily, this system divides the training samples of c taxonomic categories. Besides, this system takes the group
38
P. Rama and S. Murugan
center of each sub-class to square designed for every one purpose that belongs to various sub-class. The clump center should have exact set. Then, partisanship amount between every direct and every group center is measured, and neighborhood of every point is determined. Last one is therefore the {kind} of every direction is decided in step with the definitions. Fuzzy C-means clumps algorithmic rule (FCM) has the benefit of {easy} thought, simple understanding, and quick junction haste and far above the ground performance good organization, so its certain because of the call basis of a example points the type.
8 Simulation This system tested the model on a system of a thousand-detector node situated within the 100 m × 100 m a pair of-D area. This technique held consistent arbitrary allocation of the feeler targets and the choice of the flare sensors. Three completely dissimilar flare sizes are measured: twenty p.c of the system size (k = 200 beacons), twentyfive percent (k = 250 beacons), thirty percent (k = 300 beacons). The x-length and y-length dimension is split into 128 divisions (M = 128). All models are allotted on a computer mechanism with CPU-Intel Core2 pair E7500, 2 G memory. The simulation stage is mat workplace 7.0 and libsvm code is employed for SVM categorization. The output is gives in the Fig. 2. The grades recommend that location accurateness by WUSN-SVM is more than machine learning. While, the outlier information remixed may be disconnected by WUSN-SVM. The output that simplification of LSVM is better. So, the spot of nodes would be calculated a lot of exact. The Fig. 3, are able to observe that the typical localization fault by the WUSN-SVM is lesser concerning 2% than by the machine learning. Figure 4 gives, the model outputs of WUSN-SVM when compare to machine learning coaching for reduce the time utilization. Once the sensor head size is 20%, 26%, and 31%, the WUSN-SVM teaching time is concerning 20.13 s, 38.34 s, and 47.33 s, the LSVM teaching time is concern 37.81 s, 72.58 s, and 87.045 s. The model output is give the target in accuratly by WUSN-SVM is increased 1.8%, and also the overall teaching time is less upto 54.7%. Fig. 3 Targetten node error of WUSN-SVM versus LSVM
Localization in Underground Area Using …
39
Fig. 4 Difference of training time WUSN-SVM and LSVM
9 Conclusion The analysis has bestowed WUSN-SVM—network supported the theam of a SVMs for target spreading method. LSVM getting information from the already exdended of suitable of beacons and utilizes them as teaching knowledge to the educational procedure. Solely simple property data is employed in WUSN-SVM, creating it appropriate for systems that do not need to try wise remote capacity and focused (and/or sensor) serving policy. WUSN-SVM nevertheless given as improved outputs. The simulation work has given as WUSN-SVM better performs the flow, a preferred method is shows and distributes constant hypothesis of LSVM. A small head sensor populations, LSVM has been shown to be better to servival and a lot of right than AFL, a preferred method that needs tries wise remote calculation. WUSN-SVM lighten the areas downside and remainder the usefuleness against systems of reporting holes/obstacles, wherever several alternative methods presently suffer from. The transmission of process outlay is brokenless type. In network eventualities that may afford to train and helping devices to live geographic couple-based remote area, as a result of its very simple and fast reachable in target, WUSN-SVM may be wont to give smart beginning target for the devices.
References 1. Zorbas D, Raveneau P, Ghamri-Doudane Y (2018) Assessing the cost of deploying and maintaining indoor wireless sensor networks with RF-power harvesting properties. Pervasive Mobile Comput. 43:64–77 2. Zorbas D, Raveneau P, Ghamri-Doudane Y, Douligeris C (2018) The charger positioning problem in clustered RF-power harvesting wireless sensor networks. Ad Hoc Netw. 78:42–53
40
P. Rama and S. Murugan
3. Baumann D, Mager F, Singh H, Zimmerling M, Trimpe S (2018) Evaluating low-power wireless cyber-physical systems. In: Proceedings of the IEEE CPSBench, pp 13–18 4. Baumann D, Mager F, Zimmerling M, Trimpe S (2020) Control-guided communication: efficient resource arbitration and allocation in multi-hop wireless control systems. IEEE Control Syst Lett 4(1) 5. Houtan B, Zarrabi H (2020) Obstacle-aware fuzzy-based localization of wireless chargers in wireless sensor networks. Can J Electr Comput Eng 43(1) 6. Ma Y, Lu C (2018) Efficient holistic control over industrial wireless sensor-actuator networks. In: Proceedings of the IEEE ICII, pp 89–98 7. Solowjow F, Baumann D, Garcke J, Trimpe S (2018) Event-triggered learning for resourceefficient networked control. In: Proceedings of the American control conference (ACC), Milwaukee, WI, USA, pp 6506–6512 8. Beuchert J, Solowjow F, Raisch J, Trimpe S, Seel T (2020) Hierarchical event-triggered learning for cyclically excited systems with application to wireless sensor networks. IEEE Control Syst Lett 4(1) 9. Wang N, Fu J, Li J, Bhargava BK (2019) Source-location privacy protection based on anonymity cloud in wireless sensor networks. IEEE Trans Inf Forensics Secur
Master–Slave Robots Using Swarm Intelligence T. Suraj Duncan, T. R. Jayanthi Kumari, Rithin John, and B. P. Aniruddha Prabhu
Abstract Swarm intelligence (SI) is the collective behavior of decentralized, selforganized systems, synthetic or natural. The idea is utilized in canvases on artificial intelligence. Nature-inspired optimization algorithms are implemented efficiently to resolve low, mid and high-level computer vision problems. Typically, classical strategies depend upon cost function derivatives to reach a foremost solution. This can be infeasible in numerous reasonable circumstances. Nature-inspired algorithms have a tendency to either reproduce most suitable results or circulate the person closer to the pleasant feasible solution with the passage of time, mimicking a few phenomena found in nature. Artificial bee colony algorithm can be used to solve traveling salesman problems and its principals is used to find out answers for telling salesman problems where we use partial optimization techniques which affords theoretical foundations and some experimental outcomes on numerous datasets. Hence, its miles dependable in effectiveness, usability and researchability. Keywords Swarm intelligence · Asymmetric communication · Artificial bee colony algorithm · Nature-inspired optimization
1 Introduction The existence of every human has changed certainly more than 10 years. In the current circumstance, it is not possible for a human to do unimportant works like auto-collecting a material beginning with one spot then onto the following or from source to objective [1]. In spots like endeavors where lifting significant weights is necessary, there is no need that these movement practices need to passed on by individuals. In such cases, our robots can end up being helpful. The essential purpose T. Suraj Duncan (B) · R. John · B. P. Aniruddha Prabhu Department of Computer Science and Engineering, Cambridge Institute of Technology, Bengaluru, Karnataka, India T. R. Jayanthi Kumari Department of Electronics and Communication Engineering, East Point College of Engineering and Technology, Bengaluru, Karnataka, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 G. Sanyal et al. (eds.), International Conference on Artificial Intelligence and Sustainable Engineering, Lecture Notes in Electrical Engineering 837, https://doi.org/10.1007/978-981-16-8546-0_5
41
42
T. Suraj Duncan et al.
of our paper is demonstrate that a group of robots exhibiting swarm intelligence can be deployed to perform tasks that are unfavorable or even termed as harmful to humans. This not only reduces the load of human but also reduces any kind of threats posed by human intervention. Multitude mechanical autonomy can be employed to deal with multirobot coordination and its frameworks. The collaborative behavior rises up as the cooperation among the swarms and their communication with the leader. This technique of information gathering aroused due to the multitude insight in organic investigations of insects such as ants, bees and wasps that are present in different geographical locations, where swarm intelligence happens. People inside the gathering collaborate by trading locally accessible data with the end goal that the issue (worldwide goal) is unraveled more effective than it would be finished by a solitary person. Critical thinking, problem-solving actions that rises up out of such connections is called swarm intelligence [2]. Our paper propels its accentuation in transit that we propose two kinds of robots. The robot of first kind is explorer robot, which transcendently center the activities of a pioneer. The second kind of robot is the carrier robot, which follows the sets of the explorer robot. In various terms, an explorer robot can be called as a master and the carrier robot can be called as slave. Eventually, the swarm robots go after master–slave concept like the settlement of an underground bug or the bumble bees in a bee sanctuary. This makes for the concept of swarm intelligence [3]. The motivation for this work is given in Sect. 2. Section 3 throws light on the two key concepts that are incorporated in this paper. Section 4 demonstrates the architectural background of the master–slave robot. The simulation of swarm intelligence is carried out in Sect. 5, it demonstrates the clustering and aggregation and dispersion of the robot swarm and then conclude this article in Sect. 5 with some highlights of the future research directions.
2 Motivation The collaborative conduct of insects such as honey bee’s dance, the nest building of wasp, the line following of ants [4], can be considered as one of the most anticipated and mysterious aspect of biology. The concept of a leader and a group of followers can be applied to solve and deliver solutions to modern day problems. These swarm robots have the characteristics of sturdiness, adaptability, feasibility in movement, variation to non-basic disappointment, adaptability to the scene, etc. Beside load sharing, they are planned for various applications like firefighting, cultivating and moreover data sharing. It prevents over abuse of resources. Beside load sharing, they are expected for various applications like firefighting, cultivating and moreover data sharing.
Master–Slave Robots Using Swarm Intelligence
43
3 Related Work 3.1 Camel’s Behavior Camels in general spend time in searching for food over large areas during the chiller time of the day. Camels do not go out alone; they are always in a group which is called “caravan”. These groups can be of different styles like a group of just male camels or just female camels or group of female camel’s lead by one male camel and so on. Camels groups have a leader which will be an older male camel among the group, and the rest of the camels just follow the leader. Camels can discover the way back home without any human help or guidance. They find or sense food and water via smelling. The leader of the herd directs the other camels to different places where they are a larger probability of finding food [5].
3.2 Ant Bee Colony Algorithm A.
Basic Principle of the ABC: It is necessary, for grasping the ABC, to first learn about the following variables:
• Hunting bees: Communicate to the food source and broadcasting the information • Following bees: Stay in the vicinity of the swing dance, and decide upon a promising food source based on the information given by the hunting bees. • Detecting bees: Bees that are completely involved in looking at random new food source in the vicinity of the beehives. The bees are able to exchange their roles in different situations. The block diagram of bees’ roles exchanging is shown in Fig. 1.
4 System Architecture Figure 2 shows the architecture of master–slave configuration-based controlled using RF receiver and transmitter. Involves the concept of master–slave asymmetric communication being implemented in a centralized fashion where one device controls one or more other devices, thereby enabling communication among them. Supports effective usage of microcontrollers with IDE (integrated development environment) for various improvements and developments. Programming the robots logically plays a major role for efficient and effective performance during its operations. Basic framework of the robots (swarms) is achieved by uniform and systematic chassis assembly which provides a rigid base structure for placing various components on it.
44
T. Suraj Duncan et al.
Fig. 1 Role exchanging of bees
Fig. 2 System architecture of master–slave configuration-based controlled using RF receiver and transmitter
Master–Slave Robots Using Swarm Intelligence
45
There exists radio communications via RF (radio frequency) transmitter and receiver among the robots (swarms) for proper coordination and cooperation [6]. Involves the usage of DC (direct current) motors and motor drivers for uniform and systematic movements of the robots (swarms) in an effective and efficient manner. Interfacing the chassis, programmed Arduino Uno and various other components is done systematically and collectively to obtain the desired robot (swarm) ready to operate. Finally, checking the line following application as a task to perform successful master–slave working. Following a simple, systematic and step by step procedure satisfies the proper methodology of the swarm robots using swarm intelligence and master–slave concept which is an integral part involved in its working (Figs. 3 and 4).
Fig. 3 Asymmetric communication of master–slave robot
Fig. 4 The master (left) and the slave (right) robot
46
T. Suraj Duncan et al.
5 Simulation Swarm simulation is an open-source package which is useful for showing the interaction of agents such as social or biological and their emergent collective behavior [7]. It also plays a major role in various scientific, industrial and commercial applications. Simulation and other agent-based modeling platforms gives the scientists an opportunity to study, conduct and visualize different types of applications that satisfy various applications in synthetic macro and micro system environments. Simulation along with computational studies and mathematical models provide support to effective and efficient swarm behavior [8]. Computer simulation of swarms also helps for trial and error experiments before conducting the actual experiment on real-time applications and environments. Various algorithms, optimizations and real-life problems can be solved easily using simulators. Simulation also helps in efficient analysis and swarm prediction which can be used in the context forecasting problems. Offline line programming (OLP) and simulation are powerful integrator tools and helps end-users to save time and money when performing a particular task [9]. Minimizing cost and saving time are one of the most important reasons for simulation of swarm behavioral robots [10]. Simulation also helps in innovation and acquiring required target outputs satisfying various test cases in a small period of time. It also helps in representing an imitation and deception of a particular process or task under rigorous situations and circumstances which also helps in increasing the performance of the existing physical swarms (robots) and helps provide a scope for future enhancements. Figure 5 shows the formation of clusters of the robots. It also shows the aggregation, dispersion and formation of various shapes geometrical shapes. In Fig. 5, demo_1.py is the first demo in which the robots first aggregate together to form a random network [11]. They run consensus decision making to choose the target loop shape, then the role assignment robots message relay for target assignment [12]. The robots disperse and aggregate again to form a loop with robots on their designated order. The loop then reshapes to the chosen shape. In Fig.5, demo_2.py is the second demo that combines previous simulations differently. The robots first aggregate to form a loop. They run consensus decision making to choose the target loop shape, then role assignment on the loop using an adapted consensus algorithm. At the same time of role assignment, the robots dynamically reshape to the chosen shape.
6 Conclusion and Future Enhancements Successfully built swarm robots that works on the principle of master and slave commination. The slave robot follows the action of master robot which has the capability to follow a line.
Master–Slave Robots Using Swarm Intelligence
47
Fig. 5 a Aggregation. b Formation of predefined shapes. c Representation of swarm robots forming clusters
The data is transferred from the master robot to the slave robot using RF receiver and transmitter. Usage of line following application as a task to perform successful master–slave working. The proposed system is a master–slave relationship using nature inspired optimization technique—swarm intelligence. We can enhance this system by including additional technologies like artificial intelligence, machine learning and deep learning which enhances the master–slave robots with self-thinking capability to operate on their own under rigorous situations and circumstances without human intervention with ease, efficiency and effectiveness.
48
T. Suraj Duncan et al.
References 1. Yogeswaran M (2010) Swarm robotics: an extensive research review 2. Hasan Y (2017) Swarms robots and their applications. J Comput Eng 19(1):46–47 3. Mohan Y, Ponnambalam SG (2009) An extensive review of research in swarm robotics. In: 2009 world congress on nature & biologically inspired computing (NaBIC). IEEE, pp 140–145 4. Sun X, Liu T, Hu C, Fu Q, Yue S (2019) ColCOS : a multiple pheromone communication system for swarm robotics and social insects research. In: 2019 IEEE 4th international conference on advanced robotics and mechatronics (ICARM), pp 59–66 5. Ahmed ZO, Sadiq AT, Abdullah HS (2019) Solving the traveling salesman’s problem using camels herd algorithm. In: 2019 2nd scientific conference of computer sciences (SCCS), Baghdad, Iraq, pp 1–5 6. Li G et al (2019) A master-slave separate parallel intelligent mobile robot used for autonomous pallet transportation. Appl Sci 9(3), 368 7. Li X, Zheng Y (2019) Artificial bee colony algorithm and its application in traveling salesman problems. In: 2019 Chinese control and decision conference (CCDC), Nanchang, China, pp 1–5 8. Vaughan R (2008) Massively multi-robot simulation in stage. Swarm Intell 2(2–4):189–208 9. Census C et al (2019) Particle subswarms collaborative clustering. IEEE Trans Comput Soc Syst 6(6), 1165–1179 10. Shi Z, Tu J, Wei J, Zhang Q, Zhang X (2011) The simulation scenario for swarm robots based on open-source software Player/Stage. In: 2011 IEEE international workshop on opensource software for scientific computation, Beijing, pp 107–113 11. Trianni V, Groß R, Labella TH, Sahin ¸ E, Dorigo M (2003) Evolving aggregation behaviors in a swarm of robots. In: Banzhaf W, Ziegler J, Christaller T, Dittrich P, Kim JT (eds) Advances in artificial life. ECAL 2003. Lecture notes in computer science, vol 2801 12. Valentini G (2017) Discrete consensus achievement in artificial systems. In: Achieving consensus in robot swarms. Springer, Cham, pp 9–32
An IoT-Based Approach in Automatic Garbage Segregation to Develop an Intelligent Garbage Segregator Amara Aditya Manikanta, Rohit Tanwar, Aviral Kumar Srivastava, and Kratika Arora
Abstract Effective waste management is a major problem across the world. The lack of dumping space, less awareness among people and requirement of high economical resources are some of the issues that add on to the criticality of the problem. Segregation of waste is the preliminary step to manage waste. Scarcity of adequate knowledge on the same makes a majority of population unable to segregate and store waste at the domestic level or while dumping. Generally, the waste is segregated at dumping sites manually, and from there different types of waste are scheduled to their destinations. This uses more workforce and money for providing daily wages to the workers. In this paper, a brief study of smart bins has been done. A smart device has been proposed to segregate garbage using an IoT-based approach. The main aim of the project is segregation of the waste into biodegradable and non-biodegradable. Keywords IGS · IoT · Waste sensing · Inductive sensor · Capacitive sensor · Sensors · Garbage segregation
1 Introduction Wastes in pre-modern times were mainly ashes and biodegradable in nature, but in today’s generation, we have to deal with a large amount of garbage that is divided into various categories like biodegradable or non-biodegradable, metallic or nonmetallic and liquid waste or solid waste, etc. This has formed a major issue for waste management authorities, who are unable to manage this huge amount of proliferating garbage [1]. Our “Intelligent Garbage Segregator” aims to resolve this problem by segregating the garbage at the initial phase of the garbage disposal itself. This device will categorize the input waste into different categories and transfer it to a separate portion of the bin. As a matter of fact, the population in India is growing by the day, but the area of land remains the same. Such a mismatch between the growth of population and the resources results in a rapid increase of demand for the daily needs of the people and A. Aditya Manikanta (B) · R. Tanwar · A. K. Srivastava · K. Arora University of Petroleum & Energy Studies (UPES), Bidholi, Dehradun, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 G. Sanyal et al. (eds.), International Conference on Artificial Intelligence and Sustainable Engineering, Lecture Notes in Electrical Engineering 837, https://doi.org/10.1007/978-981-16-8546-0_6
49
50
A. Aditya Manikanta et al.
consequently increases dumping of waste from each house. We have an estimated of five thousand towns in India, and the amount of waste produced in India from 1947 to 1997 by having a yearly development rate of 4.25% is 6 million tons to 48 million tons. Roughly calculating, there will be 300 million tons of waste by 2047. The dump size of waste can be large or small from each house. Throughout the history of India, the changing civilization and having many cultures have resulted in many types of traditions and many types of group producers. Even today, India is not up to the mark with many different countries of the world with respect to economy for the waste produced by it. Therefore, even though other countries are producing more waste than our country, we are still lagging behind as we are not properly allocating the financial framework required to support the overall waste management in India [1]. India is the seventh-largest populated country in the world but has very less amount of land which is suitable for livelihood. This factor of shortage of land is playing a major role in waste management in India today. Moreover, the waste is separated only in the dumping sites, and therefore, these dumping sites occupy most of the land. We are having many large areas allocated for dumping waste, and most of them store the waste for a long period of time. By the “Swachh Bharat Abhiyan,” the government has allocated seventeen hundred billion rupees for the segregation of the waste which is collected under this scheme. These different types of waste are being segregated by uneducated people who work for daily wages and sustain a simple livelihood. But recently, an article published on June 15, 2018, “The Dump Killed My Son”, Mountains of garbage engulf India’s capital [2]. Figure 1 depicts a story of the two people who used to work as garbage separators and how they died because of a huge dumping that had carried both of them along with the waste into a nearby canal. By using more manpower, more time and more cost per day increase. Moreover, we are having the manpower in the service sector for this decade, but things will not be the same in the coming time since most of the people are completing their higher education and developing their skills. So, we cannot expect more people to work in the service sector to work for daily wages. Many industries have already
Fig. 1 In the area of Delhi, which includes our capital New Delhi, the heaps of the waste are immense testaments to India’s growing waste disaster [2]
An IoT-Based Approach in Automatic …
51
used automation techniques by deploying a greater number of machines and using less manpower. Some of the solutions that have already been proposed are for the separation, for locating the bins and knowing the level of the waste inside the bin. In this paper, study of previously proposed bins and a new design of the IoT enabled smart bin. Author says, the intelligent garbage segregator would be an upgraded version of the many proposed ideas and many of the implemented ideas.
2 Literature Survey The research work on “SWS—Smart Waste Segregator Using IoT Approach” by Payal Srivastava and her team in the year 2019 focuses on developing a device that separates the garbage into various categories using inductive sensors, capacitive sensors and IR sensors [3]. This paper played a key role in getting proper references of our research. Paul Couderc and Yann Glouche in the year 2013 proposed a framework for the waste collected in Europe. The author’s approach of identifying the type of waste and segregating the waste generated a creative idea for our device. The original idea would recognize waste when it is placed in front of the bins which then identifies the type of the waste. The major limitation of their idea was the use of RFID tags for the identification of the waste. In India, many items in shops and eatables do not consist of RFID tags [4]. Similarly, Vuayalakshmi et al. in the year 2016 proposed a system that alerts the municipal department at the nearby municipal office. Here RFID tags are used for bin implementation. But there is no separate mechanism for the separation of the waste [5]. As discussed earlier, we cannot implement segregator with RFID tags in India, but their idea of implementing cloud for the locality gave a crucial approach to our framework. Chen et al. in the year 2018 discussed a smart system for waste management system using IoT architecture, but he did not separate the type of waste. He implemented the proper IoT framework for the bins that needs to be collected and dumped [6]. Sharma et al. in their paper titled “Smart Bin Implementation for Smart Cities” had used the GSM cellular network to analyze and get insights about the status of waste in the entire city. This was a unique feature for waste management in the urban cities [7]. Das et al. also proposed a smart garbage monitoring and alert system that uses an android application and NodeMCU with inbuilt Arduino microcontroller, and it uses an IFTTT protocol. This system will notify the cleaning authority to empty the bin on time. It also has the special feature to detect the harmful gases when it reaches a certain value [8]. Shankar Balu et al. in the year 2020 implemented in real life a smart bin where the waste is being separated into dry and wet waste with the help of sensors [9]. But, this is limited to a small place such as office cabins and schools. Harshith and his team developed a smart garbage segregator basically used to collect recyclable and reusable waste which can further be categorized into wet and
52
A. Aditya Manikanta et al.
dry waste which do not other models do till then. The model used metal detector and the moisture sensor, the conveyor roller and the HTLS-geared DC motor [10]. Rafeeq et al. in the year 2016 proposed an automation of waste material segregation which uses inductive sensors, metallic items and capacitive sensors. They raised the level of this model by using automation of material segregation (AMS) method which effectively segregates metal, glass and plastic [11]. Let us compare the disadvantages of the above proposed bins in a form of table to understand the various sensors and drawbacks (Table 1).
3 Problem Statement Formulation The inadequate gathering of waste leads to several drawbacks in the field of waste management. The proposed IGS will not only solve the problem of garbage collection but will also serve to resolve many of the technical, economic, cultural, organizational, legal, environmental and health aspects that the earlier solutions were not capable of resolving. The problem statement is therefore, “The absence of a suitable garbage segregator with IoT capabilities for use as garbage bin equipment for proper waste segregation and management services.” Based on the literature reviewed and analysis of other proposed models, only capacitive, inductive and IR sensor is used for the initial implementation of the project. The facilities of waste collection are to be set up, and the IGS bins are going to be made available at such points. Besides this, a recycling plant is also to be set up. The revenue generated from sale of the recyclable items can be used for hiring local employees for the scheduled transport and delivery of the bins to and from the recycling plant. As an efficient smart garbage disposal system helps to reduce environmental hazards like landfills and spread of diseases, using such equipment would prove beneficial for the society. A network of schools, businessmen and volunteers can be set up to help facilitate the spread of awareness of the program and thus impact society at a large scale to use the smart systems and help conserve the environment. Using an economically stable system would result in a lot of economic benefits over the traditional methods. An adequate staff must be selected for the same, and training of the employees must be done accordingly to reap best results. Thus, the proper use of such a system has long-term benefits, and IGS will play a vital role in improving the position of waste management in our country. However, such a model is only going to be proposed by us for the consideration of this research paper, but on the ground there will be many more factors which have to be taken in consideration before formation of any such model. This is why the framework of the above said model is going to be project specific, as the requirements and the challenges faced by each locality are going to be different.
An IoT-Based Approach in Automatic …
53
Table 1 Remarks of the proposed models in citations (2013–2020) Citation
Year
Method of segregation
Couderc and Glouche [4]
2013
RFID tags are used RFID tags for segregation of the waste
It cannot segregate the waste that are not consisting of RFID tags
Sharma et al. [7] 2015
There is no method of Ultrasonic sensor segregation of waste, to sense the level but it can notify when of waste bin gets full
It cannot segregate the waste
Vuayalakshmi et al. [5]
2016
RFID tags are used to RFID tags separate the waste, and a notification to municipal cooperation when bin gets filled is implemented
In India, every product does not have RFID tags, and this cannot be practical in India
Rafeeq et al. [11]
2016
Automation of material segregation to segregate into metal, plastic and glass categories
This is not a smart bin, but a segregator for metal, plastic and glass material
Chen et al. [6]
2018
A complete No sensors are framework for used for sensing management of bins over a smart city was proposed. No method for segregation of waste
It is not solving the problem of segregation of waste for a particular bin
Srivastava et al. [3]
2019
There is use of three Inductive sensor, sensors to sense the capacitive sensor waste and cloud and IR sensor connectivity to send notification when the bin is full
No proper design of the bin. Only the hardware components and connections are given
Shankar Balu et al. [9]
2020
It is an Inductive, Arduino-based capacitive and IR implemented project sensor for segregation of dry and wet waste using sensors
It is implemented in a cardboard, and it is very small which cannot be used at office, schools, etc.
Only notifies if the bin reaches its full level. No method of segregation of waste
The waste cannot be segregated into biodegradable and non-biodegradable
Jayita Das et Al. 2020 [8]
Types of sensors used
Inductive, metallic and capacitive sensors
NodeMCU and Arduino for sending notification
Remarks
(continued)
54
A. Aditya Manikanta et al.
Table 1 (continued) Citation
Year
Method of segregation
Types of sensors used
Remarks
Harshith et al. [10]
2020
The waste will be separated into dry and wet waste by passing it on conveyor roller and sensing it
Metal detector and moisture sensor
The setup is large, it is not feasible over small area, and it is not designed as a bin
4 Proposed Work On the basis of the models discussed earlier and their outcomes, a serious concern toward the waste segregation and waste management is the need of the hour. The objective, the design and the process flow are discussed in the following sections, which would further provide a better picture of the smart garbage segregator.
4.1 Objective Segregation of the waste, into different bins before dumping, is to be done in an efficient manner such that it solves environmental problems without creating any additional problems. The process of scheduling the dumping is done with respect to the decomposition time of a particular type of waste in the bin. The accuracy of implementation of the idea is kept at 80% in differentiating the basic waste dumped daily and to include IoT design architecture within the proposed model.
4.2 Proposed Device The device is designed in such a way that it would be suitable for most of the household purposes and offices. This device basically consists of a compartment at the top section in which the sensing is done. The waste placed here would be categorized by the use of the sensors attached inside it which are connected to a microcontroller that helps us to sense and segregate the waste. The device consists of lower portion which is separated in two sections where one side collects biodegradable waste and the second side collects non-biodegradable waste. A turning disk, which is attached to servo motors inside the tray, is for opening the disk according to the type of waste. This device is made after considering many different types of smart waste bins and is able to make changes over the entire city. Monitoring for the device can also be done setting up the required network.
An IoT-Based Approach in Automatic …
55
Fig. 2 Conceptual design of smart garbage segregator
Ultrasonic sensor is used for the level detection of the bin so that the end user is easily notified about the status of the filled bin via the front-end services. Figure 2 represents the conceptual design of the smart garbage segregator.
4.3 Block Diagram of the Circuit Further research is done hands-on by implementing the IGS with the above sensors mentioned. The block diagram mentioned below shows the connections of the sensors to the microcontroller which were apparent with the presumed sensors. The garbage is put into the tray section of the device where the IR sensor tests the fullness or emptiness of the bin. Sensors required: Inductive, capacitive sensor, IR sensor and servo motors for opening of the tray. Sensing: The various sensors sense the waste and categorize it into various types of garbage. For example, this garbage can categorize the waste into two types: biodegradable waste as the one side of the bin and non-biodegradable in the other side as shown in Fig. 3. Furthermore, the UV sensors test the fullness or emptiness of each bin and alert the user once the bin is full. Actuation: The two perpendicular DC servo motors switch the flaps and transfer the garbage to the right bin. Note: As project has not been implemented on real-time basis, therefore, the Raspberry Pi is assumed as the working microcontroller device for this model.
56
A. Aditya Manikanta et al.
ULTRASONIC SENSOR
RASPBERRY-PI Or Micro-Controller
INDUCTIVE SENSOR
CAPACITIVE SENSOR
SERVO MOTORS
IR SENSOR
Fig. 3 Block diagram of the sensing unit
4.4 Flowchart We are using three sensors for the differentiating the waste. Inductive sensor module detects the amount of metallic substance in the waste, and once it senses large amount of the metallic waste, it directly sends it to the metallic waste bin (non-biodegradable). And if the result is negative, then from there the waste goes to the IR sensor module for sensing opaque objects, if it is an opaque object it is considered as biodegradable, and it is sent to the biodegradable bin. If the result is negative, then it goes to the capacitive sensor. If the capacitive sensor detects very less metallic and non-metallic waste and some dielectric consisting of metals, it is considered as a non-biodegradable waste. Refer Table 2 and flowchart for the implementation model as discussed in paper [3]. Table 2 is a comparision of the components and design of our proposed model with the previous citations (Fig. 4). Table 2 Components comparision of all the discussed models with IGS Citations
[10]
[9]
[8]
[3]
[6]
[11]
[5]
[7]
[4]
IGS
Inductive sensor
✓
✓
✓
✓
Capacitive sensor
✓
✓
✓
✓
IR sensor
✓
✓
✓ ✓
Ultrasonic sensor Check for biodegradable and non-biodegradable Notification to hub when bin gets full Product design
✓
✓
✓ ✓
✓
✓ ✓
✓ ✓
✓
✓ ✓
An IoT-Based Approach in Automatic …
57
Fig. 4 Flowchart of the working of sensors
5 Challenging Factors in India We may face many challenges for the implementation of IGS on a real-time basis. This section deals with the major challenges in the implementation of IGS.
58
A. Aditya Manikanta et al.
5.1 Device Implementation We need accurate sensors for sensing and for separation of the waste. To accomplish this, we need establishment of a small-scale industry to design the device and test it against the real-time conditions before delivering the device to an end user. We must have an office for software and hardware implementation also. This should also be able to ensure the quality and safety parameters of the device. While it is easy to set up such office, the price of the device must also be regulated so that it is affordable to every citizen.
5.2 Project-Based Investment The cost of components and the design of the model would be different with respect to application of IGS. This is because for every project, there will be different factors governing the overall segregation and waste disposal. These factors may be local to the area where the bin is being used or on the basis of the type of place or building where it is going to be placed. We need more money for the installations of IGS at every house individually while less money would be required if it is in a larger system of waste disposal where it has shared responsibilities. One of the best solutions is to make people aware of such devices and encourage them in implementing these devices with their savings or with collective fund of several NGOs so that the major part of investments of many such devices for the government is reduced.
5.3 Acceptance by People India is emerging as a more young country every year where many youths are educated. Many people are busy with their jobs and would not be willing to spend on installation of such devices. The rural people do not know how to use these devices and their installation procedure. This is why a proper information drive would be necessary. This would be a major challenge in the successful implementation of IGS. Firstly, the government has to take some decisions and help the people below the poverty line by providing subsidies and monthly installment schemes for people in rural and urban areas. Secondly, road shows and awareness camps must be initiated for telling the importance of waste management and educating about the cons of the absence of it. Finally, the government must be assertive and strict for the implementation and has to take some important actions for the people who are not installing it with the
An IoT-Based Approach in Automatic …
59
consideration that the problem of waste management is a larger problem for us, and only when we understand its need, then we can make a healthy and safe society.
5.4 Regular Monitoring For monitoring these devices, we need to have service stations in their vicinity so that we can change the present waste management system. Also as discussed before, we need to monitor the devices daily and procure a good administration for this system. As an added benefit, we are creating new job opportunities, and the people must have a good knowledge of the implemented model. They must be capable of providing support and maintenance at any time. The government can provide internship opportunities for students to let them know the importance and implementation of the project which will further help us in recruitment of more monitoring and support employees.
6 Conclusion and Future Scope We are going to have a safe and secure system for the waste management if we implement the device as mentioned in this paper. Our further research aims to minimize the cost of the device so that it is affordable to every person for the installation of IGS in their homes and offices. Research on this will not end here as we have different aspects to consider for further study. But it definitely makes an impact on the waste management system if device mentioned will be executed in an efficient manner. Thus, improvements can be made to segregate the mixed type of waste produced daily in our country by the use of buffer spaces, and improvements can be done as advanced processing techniques can be incorporated once the waste has been segregated. Future scope in development of IGS is to get accuracy in the identification of the waste which would be improved by more research work on the type of sensors used for sensing the wastes. The challenges and features were hypothetically made, and there is a need to have a survey of the citizens of having such waste management architecture for a sufficient area. This paper’s objective is to only motivate the students in the coming generations that engineering is used to solve the daily life problems and to keep refining their solutions with an iterative method by getting better outputs each time. Acknowledgements We would like to express our gratitude toward our mentor, Dr. Rohit Tanwar, for his sincere guidance throughout research work starting right from the title’s selection till analysis and integration of different components for this paper. His vast knowledge, motivation and exemplary guidance were the key components for us to complete the paper on this challenging domain with such ease.
60
A. Aditya Manikanta et al.
References 1. Egware HO, Ebu-nkamaodo OT, Linus GS (2016) Experimental determination of the combustion characteristics of combustible dry solid wastes. Res J Eng Environ Sci 1(1):154–161 2. https://www.todayonline.com/world/dump-killed-my-son-mountains-garbage-engulf-indiascapital 3. Srivastava P, Deep V, Garg N, Sharma P (2019) SWS—smart waste segregator using IoT approach. In: Malik H, Srivastava S, Sood Y, Ahmad A (eds) Applications of artificial intelligence techniques in engineering. Advances in intelligent systems and computing, vol 698. Springer, Singapore 4. Glouche Y, Couderc P (2013) A smart waste management with self-describing objects. In: Leister W, Jeung H, Koskelainen P (2013) The second international conference on smart systems, devices and technologies (SMART’13), June 2013, Rome, Italy 5. Kumar NS, Vuayalakshmi B, Prarthana RJ, Shankar A (2016) IOT based smart garbage alert system using Arduino UNO. In: Region 10 conference (TENCON), 2016 IEEE. IEEE, pp 1028–1034 6. Chen W-E, Wang Y-H, Huang P-C, Huang Y-Y, Tsai M-Y (2018) A smart IoT system for waste management. In: 2018 1st international cognitive cities conference 7. Sharma N, Singha N, Dutta T (2015) Smart bin implementation for smart cities. Int J Sci Eng Res 6(9). ISSN 2229-5518 8. Das J, Pramanik A, Parui SK (2020) Smart garbage monitoring and alert system using IoT. In: Kundu S, Acharya U, De C, Mukherjee S (eds) Proceedings of the 2nd international conference on communication, devices and computing. Lecture notes in electrical engineering, vol 602. Springer, Singapore 9. Balu TMBS, Raghav RS, Aravinth K, Vamshi M, Harikumar ME, Rolant Gini J (2020) Arduino based automated domestic waste segregator. In: 2020 5th international conference on communication and electronics systems (ICCES), Coimbatore, India, pp 906–909. https://doi.org/10. 1109/ICCES48766.2020.9137977 10. Harshith R, Karthik Y, Hegde P, Tejas SBN, Shivalingappa D, Kumarswamy HS (2020) Development and fabrication of smart waste segregator. In: Vijayaraghavan L, Reddy K, Jameel Basha S (eds) Emerging trends in mechanical engineering. Lecture notes in mechanical engineering. Springer, Singapore 11. Rafeeq M, Ateequrahman, Alam S, Mikdad (2016) Automation of plastic, metal and glass waste materials segregation using arduino in scrap industry. In: International conference on communication and electronics systems (ICCES), Coimbatore, pp 1–5. https://doi.org/10.1109/ CESYS.2016.7889840 12. Oma R et al (2018) An energy-efficient model of fog and device nodes in IoT. In: 2018 32nd international conference on advanced information networking and applications workshops (WAINA). IEEE
Multi-view Deep Learning for Weather Recognition Shweta Mishra, Saurabh Kumar, and Vipin Kumar
Abstract Multi-class problem for weather recognition is a challenging task in the area of computer vision and machine learning. This research work, a novel multiview deep learning method called MMDeep, has been proposed to handle the weather recognition problem in a multi-class scenario. The proposed method obtains the multiple views (multiple distinct interpretations) from the images by using pixel-wise computer vision operations and then deploys multi-view deep learning using the same views. In this method, natural views and mathematical views are utilized to learned models, and collective performance is obtained using ensemble of results obtained from each of the views. The proposed method is compared to the baseline methods as well as other models proposed for the same task. The model produces mean accuracy of 0.9746 with a standard deviation of 0.0127, which is significantly better than all the models compared with this method. The statistical analysis of the results of proposed method also shows its effectiveness as compared to state-of-the-art methods. Keywords Weather recognition · Multi-view deep learning · Ensemble learning · Computer vision · Machine learning
1 Introduction Weather happens to be an important aspect of human life, but in the era when we are advancing toward artificial intelligence and automating our day-to-day tasks, it becomes equally important for machines. The performance of any intelligent machine (operating in the open environment) can be severely affected if the weather component is not considered. A truly automated system must have the capability to tune its algorithm according to the real-time weather scenario. A self-driving car [1] or an invehicle driver assistance system [2] does integrate the weather component to operate S. Mishra (B) · S. Kumar · V. Kumar Department of Computer Science and Information Technology, Mahatma Gandhi Central University, Motihari, Bihar, India S. Kumar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 G. Sanyal et al. (eds.), International Conference on Artificial Intelligence and Sustainable Engineering, Lecture Notes in Electrical Engineering 837, https://doi.org/10.1007/978-981-16-8546-0_7
61
62
S. Mishra et al.
seamlessly in normal as well as adverse weather scenarios. It is also recommended for other vision-based systems such as video surveillance [3], robot navigation [4], and crowd counting [5]. It is a simple task for humans to recognize the weather condition by looking at the surroundings and identifies as rainy, sunshine, cloudy, or sunrise condition. But when it comes to machines, it becomes a tedious task and hence is ignored by assuming that weather is clear [6]. Many sensors are utilized to recognize the meteorological data to identify the weather condition. But these systems are usually complex and costly in the implementation at the physical level. Therefore, attempts to mimic the human-like performance for weather recognizing models often refer to image-based prediction, as simple video surveillance may identify the weather conditions with low complexity and much cost-effective. However, the models utilizing this approach do also have a major shortcoming, i.e., they only consider a single view for their study. But in a real-world scenario (for a single scene), multiple views (perspectives) are also possible [7]. It is well known that an image has abundant information. Therefore, the different sets of features are extracted from the same image as per the given tasks in the literature. For instance, a normal human and color-blind human may look at the same scene but perceives differently. Recently, researchers have started to utilize multiple possible representations of data (views) items. The effective utilization of views introduces an emerging sub-field of machine learning known as multi-view learning [8–10]. The utilization of multi-view learning under the framework of deep learning is also known as multi-view deep learning [9]. An image-based multi-class multi-view deep learning model (MMDeep) for weather classification is proposed which takes the advantage of a multi-view representation of a scene. In the work, the original view, natural view, and mathematical view are considered for multi-view learning. It also utilizes a deep convolutional neural network (pre-trained on ImageNet stats) as an automated feature extractor and classifier. The proposed MMDeep method is compared using ResNet50 for all three views. And the performance of MMDeep is also compared using K-nearest neighbors (k-NN), Naïve Bayesian (NB), support vector machine (SVM), and baseline methods SAID_1 & SAID_2. The novelties of the proposed work are as followed: • Multi-view learning has applied successfully. • Three views of multi-view learning have been identified and successfully investigated based on their collective suitability of weather recognition problem of computer vision. • Deep learning framework has been employed successfully to investigate the performance of multi-view learning. The organization of the paper is given as followed. Section 2 describes about related works. The detailed descriptions of proposed work MMDeep method has been incorporated in the Sect. 3. Next, Sect. 4 has evident representation and analysis of comparative results. In the last, Sect. 5 has conclusion of the proposed work and future research direction.
Multi-view Deep Learning for Weather Recognition
63
2 Related Works The problem of image-based weather classification is a multi-class problem, and the progress made in this field is still a little which has a huge opportunity for improvement. Song et al. [11] have a weather classification using K-nearest neighbor (KNN) such as sunny, snowy, foggy, or rainy. Chen et al. [12] have proposed to use support vector machine (SVM) along with active learning to classify images into cloudy, sunny, or overcast. Binary classification of outdoor images into sunny or cloudy has been performed using collaborative learning [13]. A weather feature extraction from the visual images of the sky has been deployed and used multiple kernel learning, dictionary learning along with active learning for feature extraction, and SVM for classification [14, 15]. A task-specific implementation of weather recognition has been analyzed and integrated weather recognition with vehicle cameras, which is heavily biased toward rainy images and dependent on vehicle information [16, 17]. Weather classification has also been done by utilizing multi-task learning [18] and for sub-tasks like cloud recognition [19, 20] and weather-based traffic control [21]. Feature extraction and selection [22] is a major challenge for such tasks, and suitable feature extraction methods lead to better results. Ensemble-based approaches [23] can also help in getting better results than baseline single models. The major shortcoming with the aforementioned models is not considering the diversity of images into account, i.e., the multiple features present in an image were ignored. Therefore, Gbeminiyi Oluwafemi et al. [24] have proposed a stacked ensemble-based approach where they included both data diversity (local binary patterns, hue, saturation and value, gradient, and contrast of an image) and parameter diversity (ensemble of KNN, NB, and SVM). It classifies the weather images into cloudy, sunrise, sunshine, and rainy classes. The idea of this model was interesting, but the accuracy achieved (86%) was not effective to be incorporated in a real-world scenario, the reason which led us to perform the study and try to build a more accurate framework for the task. Multi-view learning is a field of machine learning which deals with information and relationship between multiple distinct representations of data, sharing a common target variable. The term was first introduced by Blum et al. [25] which is also known as data fusion or data integration from multiple feature tasks [9]. Since its introduction, the field has seen widespread growth and had been utilized in diverse areas. Kang et al. [26] has proposed to mention some real-world use-cases of multi-view learning which used 3D multi-view convolution neural network (CNN) for classifying lung nodule images. Another multi-view CNN has been utilized for mammographic image classification by Sun et al. [27]. Multi-view approach for shape recognition using the 3D image is introduced by Su et al. [28]. Zheng et al. [29] have proposed a multi-view method for facial expression recognition. It has also been seen that multi-view learning improves results of deep learning tasks both independently and as ensemble of multiple models [30–32]. In the field of image-based classification, CNN has been used widely as the state-of-the-art methods to tackle with complex image classification tasks; the DCNN was introduced by Srivastava et al.
64
S. Mishra et al.
[33]. One of the initial breakthroughs was the introduction of AlexNet by Krizhevsky et al. [34], which revolutionized the area of deep learning. Many complex networks were designed to mimic the human accuracies such as ResNet [35], DenseNet [36], InceptionNet [37], and SqueezeNet [38]. The DCNNs are used widely now, with works including but not limited to COVID-19 classification [39], multiple object recognition [40], texture classification [41], image matching [42], hand-written digit classification [43], etc.
3 Proposed Model and Experimental Design For image classification tasks, it is important to consider multiple representations (views) of a scene. Therefore, a novel, multi-class multi-view deep learning (MMDeep) method has been proposed for the task of multi-class weather classification which exploits the generated views. It involves three major steps, namely view generation, training of views, and view ensemble, which are described in the following subsection-B, subsection-C, and subsection-D. The workflow of the MMDeep is shown in Fig. 1.
3.1 Data Description and Preprocessing The dataset utilized for our study is obtained from Mendeley and was contributed by Gbeminiyi [44]. This data provides a platform to perform weather analysis through still and un-obstructed images of various weather conditions. The dataset comprises of 1125 RGB images, labeled into four classes, namely cloudy, rainy, sunrise, and sunshine. As a part of preprocessing, all the images were resized to a uniform dimension of 64 × 64 × 3. The resized images were then split into train, validation, and test sets in ratio 3:1:1, with 675 images for a train set, and 225 images each for validation and test sets.
3.2 View Generation The images we got in the dataset were of a single-view, three-channel RGB images. To convert the dataset into multi-view, view generation was performed. We considered three distinct representations of the images, the original view, i.e., utilizing the images as they are, a nature-inspired view, and a mathematical representative view. Also, we decided to give prominence to the view which was giving the best result among the three independently.
Multi-view Deep Learning for Weather Recognition
65
Fig. 1 Workflow of MMDeep model
Original View For this view, it has been utilized the images from the dataset unaltered, to replicate the normal color vision of a human being. This view was named as natural-view, as the images were unchanged and in their natural composition. For more illustration, some examples are shown in Fig. 2.
Fig. 2 Images of various weather classes in normal view, a Cloudy, b Rainy, c Sunrise, d Sunshine
66
S. Mishra et al.
Fig. 3 Images of various weather classes in Seagull-view, a Cloudy, b Rainy, c Sunrise, d Sunshine
Nature-Inspired View (Seagull-view) Unlike humans, not all animals/birds are capable of visualizing things in RGB form. Some of them do have a monochromatic vision, i.e., they visualize the world in grayscale. One such bird is Seagull, often found in the coastal regions. Seagulls do have an excellent sense of the weather. Therefore, it is also decided to utilize this same advantage in our MMDeep model. The grayscale representations of the images were utilized as the second view which was named Seagull view. The following function converts a three-channel RGB image to a grayscale image, where examples are shown in Fig. 3. Mathematical View There is a plethora of mathematical functions available, each capable of giving an image a different transformation. One such function is the Fourier transform, which breaks down an image into its cosine and sine components. The specialty of the Fourier transform is to identify noise-like patterns in an image and provide more weightage. This property of Fourier transform comes quite handy for weather classification. It can easily give more weightage to tiny droplets present in a rainy scenario or small rays emerging from the sun in a sunrise image. Therefore, the proposed third-view is the Fourier transform of grayscale images and is termed as mathematical view. For any grayscale image I of dimension N × N, Fourier transform image F can be written as Eq. 1, and examples are shown in Fig. 4. F[u, v] =
N −1 N −1
I [m, n]e(−i)
2πum/N
.e(−i)
2πvm/N
(1)
m=0 n=0
The independent performances of the views suggested that the normal view is the best performing one and hence must be provided more weightage than the other two. Therefore, it is replicated the normal view, to make the resulting number of
Fig. 4 Images of various weather classes in mathematical view, a Cloudy, b Rainy, c Sunrise, d Sunshine
Multi-view Deep Learning for Weather Recognition
67
views as four (two instances of normal view, and a single instance each of Seagull view and mathematical view). The independent single-view accuracies are shown in Fig. 6 where the x-axis and y-axis are represented as distinct views and accuracies, respectively.
3.3 Model Training The multi-view learning method has been deployed on generated views independently as described in the previous section. Due to the constrained size of our dataset, a deep transfer learning approach has been utilized. The neural network used for training is ResNet50 which is pre-trained on ImageNet [45] weights. The library built on top of PyTorch [46] is utilized to obtain the pre-trained models and execute the training task using FastAI [47]. Initial layers of ResNet50 are used as automatic feature extractors where a fully connected layer is utilized at last for classification. Each generated view is utilized to train the ResNet50 architecture separately for 50 epochs. Prediction on the validation set is recorded after the training of each view. Instead of direct class prediction by the network, the probability of predicted classes of each sample is recorded. For any sample, the prediction made by a view can be written as in Eq. (2) Cloudy Rainy , Pi j , PiSunrise , PiSunshine Pi j = Pi j j j Cloudy
Rainy
(2)
where Pi j , Pi j , PiSunrise , PiSunshine are the respective probabilities of ith sample, j j belonging to Cloudy, Rainy, Sunrise, and Sunshine classes. In the proposed model, ResNet50 (residual deep convolutional neural network) has been utilized for feature extraction and classification, which has introduced by He et al. [35]. The idea of passing inputs is not only to the immediately successive neural network layer but also to subsequent layers. The architecture of a generic ResNet50 is shown in Fig. 5 where Conv-1 to Conv-5 are convolution blocks with the dimensions and number of convolution filters iterated within it.
Fig. 5 Schematic architecture of a ResNet-50 utilized for training on ImageNet data
68
S. Mishra et al.
Fig. 6 Independent performance of individual view using ResNet50
3.4 View Ensemble In this task, prediction probabilities of individual views are considered to ensure their participation weights in the final predictions of the samples. There are various ensemble methods available to ensemble multiple views such as bagging, bootstrapping, model, averaging, and stacking. MMDeep has used model averaging, where it considers the mean of prediction probabilities of all the views for final prediction probability. We also made some variations in model averaging, by providing more weight to the best performing view. Therefore, threshold value is defined in such a way that if prediction probability for any of the classes is less than that threshold, it considers the predictions made by the best performing view (normal view in our case) as the final prediction. The average prediction probability is defined (Eq. 3) for any image sample i with predictions on normal view (NV), Seagull view (SV), and mathematical view (MV), i i i , PSV , PMV , respectively. as PNV i Pavg =
i i i + PSV + PMV 2 ∗ PNV 4
(3)
and the final prediction class pred i , of the image i, for a threshold value pthresh of 0.5, is calculated as in Eq. 4: pred = i
i i arg max pavg , if pavg ≥ Pthresh i arg max pNV , otherwise
(4)
The proposed model along with all other models selected for the comparison has been implemented using the Google Colaboratory platform using Python programming language.
Multi-view Deep Learning for Weather Recognition
69
4 Results and Analysis 4.1 Results Normally, the deep learning model suffers from inconsistent performances. Therefore, the models (including MMDeep) have been executed ten times to avoid the inconsistencies. The accuracies of the model are considered as measures to compare the model’s performance. So, the accuracies of test sets are recorded with their variation in the form of standard deviation. Table 1 shows the mean accuracies with standard deviation achieved by single view models compared to the proposed multi-view model. For the comparative study of proposed model, MMDeep with state-of-the-art techniques is considered such as SVM, Naïve Bayes, and k-NN along with the two models (SAID Ensemble I and SAID Ensemble II) [24] on the same dataset. To avoid the inconsistency, all models perform tenfold cross-validation ten times. The boxplot of five models’ performances is shown in Fig. 7. Table 2 shows the minimum, maximum, and mean accuracy with standard deviation for all the models. Table 1 Mean accuracies with standard deviation achieved by single view models compared to the proposed multi-view model MMDeep Name of classifiers
Mean (± std deviation) %
Normal view ResNet50
95.98 (±4.65)
Seagull view ResNet50
92.82 (±2.8)
Mathematical view ResNet-50
76.41 (±6.82)
Proposed model (MMDeep)
97.46 (±1.27)
The bold values are the values corresponding to the model proposed by authors
Fig. 7 Box-plot of performances of the proposed model MMDeep and others k-NN, SVM, NB, SAID_1, and SAID_2
70
S. Mishra et al.
Table 2 Minimum, maximum, and mean accuracy with standard deviation for all the models Name of classifiers
Min. Acc (%)
Max. Acc (%)
Mean Acc (±std-deviation) %
SVM
67.86
94.74
82.25 (±9.08)
Naïve Bayes (NB)
55.36
78.43
66.16 (±8.76)
K-NN
60.72
87.72
72.72 (±8.34)
SAID Ensemble I [18]
81.19
87.45
85.13 (±1.78)
SAID Ensemble II [18]
80.22
90.23
86.18 (±2.62)
Proposed model (MMDeep)
95.55
99.55
97.46 (±1.27)
The bold values are the values corresponding to the model proposed by authors
4.2 Analysis To evaluate the quality of results produced by proposed model MMDeep, the metrics chosen is accuracy which is given by: Accuracy =
S 1
yˆ = ys S s=1
(5)
where S is sample size, yˆ s is predicted label, ys is ground truth label, TP is true positive, TN is true negative, FP is false positive, and FN is false negative. The results have been analyzed using a box plot of accuracies, and the performances of MMDeep are compared alongside other models, which shows MMDeep is significantly better than others in terms of accuracy. The statistical analysis has been performed using Friedman alignment test, the ranking using which establishes the fact that MMDeep outperforms all other models with a wide margin. As mentioned above, three views (normal view, Seagull view, and mathematical view) are considered for multi-view learning. So, individual view performances are investigated to compare the multi-view learning performance where accuracies are given in Table 1. It is evident from the mean accuracies that the proposed model MMDeep has performed the best (97.46 (±1.27) as compared to the single view of normal view, Seagull view, and mathematical view. In Table 2, the proposed model MMDeep has performed better than the performance of k-NN, SVM, NB, SAID_1, SAID_2 in terms of minimum accuracies, maximum accuracies, and mean accuracies. The boxplot of accuracy performances (Fig. 7) may be analyzed based on the following parameters: • Minimum Accuracy: It can be observed that the minimum accuracy of the proposed method MMDeep is highest than the performance of k-NN, SVM, NB, SAID_1, and SAID_2. Therefore, the proposed method is better than other models based on minimum accuracy.
Multi-view Deep Learning for Weather Recognition
71
• Maximum Accuracy: It is evident that the maximum performance of k-NN, SVM, NB, SAID_1, and SAID_2 is less than the maximum performance of the proposed model MMDeep. However, some outlier maximum accuracies of SAID_1 and SAID_2 are also less than the proposed method. • Median Accuracy: Based on the median of accuracies in the boxplot, it can easily observe that the median performance accuracy of the proposed MMDeep method is better than all other compared methods as k-NN, SVM, NB, SAID_1, and SAID_2. Therefore, it can be concluded that the proposed method is better based on the median accuracy parameter. • Variation within the 50% Accuracies: The variation of 50% accuracies from the median (below/above from median) of the proposed method is also better thank-NN, SVM, NB, SAID_1, and SAID_2. Therefore, the proposed MMDeep method has better performance on this parameter. It is also observed that consistency of performance accuracies of the proposed model in lower and upper 50% observation is better than other compared methods. • Total Variation of Accuracies: The total variation of accuracies (minimum to maximum accuracies) of the proposed MMDeep method is better than state-ofart methods k-NN, SVM, NB, SAID_1, and SAID_2. It indicates that the overall consistency of performances of the proposed method is better than other methods. The above analysis of five parameters concludes that the proposed MMDeep method has effective performance and also outclasses the approach suggested in state-of-the-art [24].
4.3 Statistical Analysis To evaluate the quality of result produced by our model, we subjected the accuracies achieved by our as well as other models selected for comparison in all ten iterations, to statistical analysis. Statistical analysis of the results has been done using Friedman aligned test for comparing the other state-of-the-art methods. It is a nonparametric test, which is conducted to check that how statistically significant is the performances of MMDeep model in comparison with other five models. The accuracy measures for all the models have been taken repeatedly for ten number of iterations, with the null hypotheses that during each iteration, all six algorithms have been tested on same data distribution. The output of this test is mean rank score for each model which has been demonstrated in the following Table 3. It is evident that MMDeep model is the best performing model among all five models, with a mean rank of 5.8, which is significantly lesser than the next best model by a margin of 17.0.
72 Table 3 Ranking of the algorithms using Friedman align test
S. Mishra et al. Algorithm
Ranking
SVM
30
NB
53.3
KNN
45.6
SAID_1
25.5
SAID_2
22.8
MMDeep
5.8
The bold values are the values corresponding to the model proposed by authors
5 Conclusion and Future Scope This research work proposes a novel MMDeep method of multi-view learning using deep learning for the image-based multi-class weather recognition task. It utilizes the three generated views called normal view, Seagull view, and mathematical view while learning. The method is also considered classification accuracies of individual views as a weight while ensemble of views for the final prediction of samples. The proposed MMDeep method shows promising performance compare to other approaches available for multi-class weather recognition on the use-case. It can be deployed in real-world scenarios with margin error. The proposed method is not only confined to the discussed use-case. It can also be deployed to other real-life problems. Therefore, the theoretical and practical prospects may be explored in the future.
References 1. Daily M, Medasani S, Behringer R, Trivedi M (2017) Self-driving cars. Computer 50(12):18– 23. Author F, Author S (2016) Title of a proceedings paper. In: Editor F, Editor S (eds) Conference 2016, LNCS, vol 9999, pp 1–13. Springer, Heidelberg 2. Kurihata H et al (2005) Rainy weather recognition from in-vehicle camera images for driver assistance. In: IEEE Proceedings. Las Vegas, NV, USA. Author, F (2010) Contribution title. In: 9th International proceedings on proceedings. Publisher, Location, pp 1–2 3. Woo H, Jung YM, Kim J, Seo JK (2010) Environmentally robust motion detection for video surveillance. IEEE Trans Image Process 19(11):2838–2848 4. Oishi S, Inoue Y, Miura J, Tanaka S (2019) SeqSLAM++: view-based robot localization and navigation. Robot Auton Syst 112:13–21 5. Boominathan L, Kruthiventi S, Babu R (2016) CrowdNet: a deep convolutional network for dense crowd counting. In: MM ‘16: Proceedings of the 24th ACM international conference on multimedia 6. Nashashibi F, de Charrette R, Lia A (2010) Detection of unfocused raindrops on a wind-screen using low level image processing. In: 2010 11th International conference on control automation robotics & vision. Singapore 7. Zhao B, Wu X, Cheng ZQ, Liu H, Jie Z, Feng J (2018) Multi-view image generation from a single-view. In: Proceedings of the 26th ACM international conference on multimedia 8. Xu C, Tao D, Xu C (2013) A survey on multi-view learning. arXiv preprint
Multi-view Deep Learning for Weather Recognition
73
9. Zhao J, Xie X, Xu X, Sun S (2017) Multi-view learning overview: recent progress and new challenges. Inf Fusion 38:43–54 10. Sun S (2013) A survey of multi-view machine learning. Neural Comput Appl 23(7–8):2031– 2038 11. Song H, Chen Y, Gao Y (2014) Weather condition recognition based on feature extraction and K-NN. Adv Intell Syst Comput 215:199–210 12. Chen Z, Yang F, Lindner A, Barrenetxea G, Vetterli M (2012) How is the weather: automatic inference from images. In: 2012 19th IEEE International conference on image processing 13. Lu C, Lin D, Jia J, Tang C (2017) Two-class weather classification. IEEE Trans Pattern Anal Mach Intell 39(12):2510–2524 14. Zheng C, Zhang F, Hou H, Bi C, Zhang M, Zhang B (2016) Active discriminative dictionary learning for weather recognition. Math Prob Eng 1–12 15. Zhang Z, Ma H, Fu H, Zhang C (2016) Scene-free multi-class weather classification on single images. Neurocomputing 206:365–373 16. Roser M, Moosmann F (2008) Classification of weather situations on single color images. In: 2008 IEEE intelligent vehicles symposium, pp 798–803 17. Yan X, Luo Y, Zheng X (2009) Weather recognition based on images captured by vision system in vehicle. In: Advances in Neural Networks, pp 390–398 18. Li X, Wang Z, Lu X (2017) A multi-task framework for weather recognition. In: Proceedings of the 25th ACM international conference on multimedia (MM ‘17). New York, NY, USA 19. Zhang Z, Li D, Liu S, Xiao B, Cao X (2018) Multi-view ground-based cloud recognition by transferring deep visual information. Appl Sci 8:748 20. Ye L, Cao Z, Xiao Y, Li W (2015) Ground-based cloud image categorization using deep convolutional visual features. In: 2015 IEEE International conference on image processing (ICIP) 21. Xia J, Xuan D, Tan L, Xing L (2020) ResNet15: weather recognition on traffic road with deep convolutional neural network. Adv Meteorol 11 22. Kumar V, Minz S (2014) Feature selection: a literature review. Smart Comput Rev 211–229 23. Kumar V, Minz S (2016) Multi-view ensemble learning: an optimal feature set partitioning for high-dimensional data classification. Knowl Inf Syst 1–59 24. Gbeminiyi Oluwafemi A, Zenghui W (2019) Multi-class weather classification from still image using said ensemble method. In: 2019 Southern African universities power engineering conference/robotics and mechatronics/pattern recognition association of South Africa (SAUPEC/RobMech/PRASA), pp 135–140 25. Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: COLT’ 98: Proceedings of the eleventh annual conference on computational learning theory 26. Kang G, Liu K, Hou B, Zhang N (2017) 3D multi-view convolutional neural networks for lung nodule classification. PLOS ONE, 12(11):e0188290 27. Sun L, Wang J, Hu Z, Xu Y, Cui Z (2019) Multi-view convolutional neural networks for mammographic image classification. IEEE Access 7:126273–126282 28. Su H, Maji S, Kalogerakis E, Learned-Miller E (2015) Multi-view convolutional neural networks for 3D shape recognition. In: 2015 IEEE International conference on computer vision (ICCV), pp 945–953 29. Zheng W, Zhou X, Zou C, Zhao L (2006) Facial expression recognition using kernel canonical correlation analysis (KCCA). IEEE Trans Neural Netw 17(1):233–238 30. Kumar V, Minz S (2014) Multi-view ensemble learning for poem data classification using SentiWordNet. In: Advanced computing, networking and informatics, vol 1, pp 57–66. Springer, Cham 31. Minz S, Kumar V (2018) Reinforced multi-view ensemble learning for high dimensional data classification. In: International conference on communication and computing (ICC-2014) 32. Kumar V, Minz S (2015) Multi-view ensemble learning: a supervised feature set partitioning for high dimensional data classification. In: Third international symposium on women in computing and informatics (WCI-2015)
74
S. Mishra et al.
33. Srivastava RK, Greff K, Schmidhuber J (2015) Training very deep networks. In: NIPS’15: Proceedings of the 28th international conference on neural information processing systems 34. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International conference on neural information processing systems 35. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR) 36. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) 37. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR) 38. Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size. arXiv preprint arXiv:1602.07360 39. Kumar S, Mishra S, Singh SK (2020) Deep transfer learning-based COVID-19 prediction using Chest Xrays. medRxiv:2020.05.12.20099937 40. Ba J, Mnih V, Kavukcuoglu K (2015) Multiple object recognition with visual attention. In: Proceedings of the 3rd International conference on learning representations 41. Basu S, Karki M, DiBiano R, Mukhopadhyay S, Ganguly S, Nemani R, Gayaka S (2016) A theoretical analysis of deep neural networks for texture classification. arXiv:1605.02699 42. Yu W, Yang K, Bai Y, Yao H, Rui Y (2014) DNN flow: DNN feature pyramid based image matching. In: Proceedings of the British machine vision conference 43. Ciresan DC, Meier U, Gambardella LM, Schmidhuber J (2010) Deep, big, simple neural nets for handwritten digit recognition. Neural Comput 22(12):3207–3220 44. Gbeminiyi A (2018) Multi-class weather dataset for image classification. Mendeley Data, v1. 45. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition 46. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Chintala S (2019) PyTorch: an imperative style, high-performance deep learning library. In: Advances in neural information processing systems 47. Howard J, Gugger S (2020) Fastai: a layered API for deep learning. Information 11(2):108
Optimizing Gender Detection Using Deep Learning Technique for Android Devices Ashish Chopra, Nishita Shah, and Yash Jain
Abstract In recent times, researchers have been able to successfully build algorithms that allow for a machine to imitate human behavior up to a certain extent. One such class of deep neural network theorem is the convolutional neural network (CNN). The most common use case of CNN is to analyze and process visual images and deduce important information from them. Neural networks can be trained in order to distinguish gender from human faces. Computerized classification of gender can be said as an emerging area of analysis in the domain of Image and Computer Vision and Artificial Intelligence. Progressions in Machine Learning a decade ago has created a pack of cutting edge strategies in gender detection algorithms utilized traditionally that decreases the unpredictability, all things considered, and can possibly give significantly more proficiency in the output expectation process. The main focus of this paper is detecting faces at different orientations as well as detecting gender of the face supported by Deep Learning techniques. Usage of Deep as well as Convolutional Neural Network concepts are also described. Keywords Face detection · Feature extraction · Gender detection · Deep neural network · ResNet
1 Introduction Humans have an extraordinary ability to identify people’s faces based on the subtle variations of the face irrespective of the environmental factors, different expressions and also their age. Over the years, researchers have tried and successfully built multiple variations of a system that is not merely capable of detecting human faces and their facial expressions but also robust enough to classify them into different labels. These labels can either be their names or certain distinctive features like A. Chopra (B) · N. Shah · Y. Jain Samsung Research Institute Noida, Noida, India e-mail: [email protected] Y. Jain e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 G. Sanyal et al. (eds.), International Conference on Artificial Intelligence and Sustainable Engineering, Lecture Notes in Electrical Engineering 837, https://doi.org/10.1007/978-981-16-8546-0_8
75
76
A. Chopra et al.
gender. Systems like these have resulted in a significant amount of attention owing to its various applications in myriad areas. A human’s face is the most expressive portion of their body. Research in today’s era focuses more on utilizing the facial features for gender classification. Human face gives imperative data with respect to gender orientation, age, race etc. [1]. The underlying principle behind face recognition involves verification and recognition of faces. Pre-processing, component extraction followed by segregation are the major steps behind gender classification [1]. Human beings have been observed to change their social interaction depending on the gender. Similarly, improved gender classification will play a vital role in computer applications utilizing computer vision [2]. The primary target of gender classification is to decrease hunt space. The diminished search space decreases the search time of distinguishing proof [1]. Some of the more commonplace ways to implement such algorithms are by using linear discriminant analysis or LDA, Fisher face, principal component analysis or PCA, independent component analysis or IDA, Eigen face, etc. [1]. The general algorithm for a face detection and classification system consists of four different stages, namely, face detection, face configuration, facial feature extraction, and based on those features, classification [3]. Physical biometric frameworks are iris, finger, palm, voice and face for recognizable proof. 3D faces are additionally utilized for gender characterization. The highlights, such as range and power are used for gender classification [2]. Regardless of these troubles, robot stages arranged to deal with everyday tasks that utilize facial components that are sturdy and feasible computationally. Convolutional Neural Networks (CNN) forms the basis of certain advanced image-related systems like object and image classification. Such architectures require abundance of varied parameters, thus its use in robot platforms and run-time systems can be impractical [1]. Research on Development of android application for gender age and face recognition using OpenCV [4] was presented in the paper which deals with the recognition of face using the OpenCV model and using it on the android devices but there is not much detailed analysis on the CNN model architectures. Therefore in order to achieve that we tried to analyze the gender detection model. Maybe a couple of these past strategies were intended to deal with numerous difficulties of unconstrained imaging conditions. Moreover, the AI techniques utilized by these frameworks did not completely misuse the enormous quantities of picture, precedents and information accessible through web so as to improve characterization capacities [5]. The reason it is very convoluted to build a balance between accuracy and irrelevant parameters is because these features remain invisible. Hence, to corroborate features of such a model, a backpropagation guided-gradient implementation is needed [1]. There is benchmark analysis of the model of different DNN architectures by Bianco et al. [6] which gives an idea about the performance but do not tell the analysis on the mobile devices (Fig. 1).
Optimizing Gender Detection Using Deep Learning Technique …
77
Fig. 1 Dataset
In this paper, we have proposed a ResNet architecture for face feature detection along with gender segregation. Initially comparison between already defined methods and deep learning architectures is discussed and later on description about the proposed model is done. The remaining portion of the paper is sorted out as pursues. Section 2 of this paper is topic related work, which consists of comparative analysis and proposed model, along with description about classifiers. Section 3 is about the experimental results which consists results compared with previous work. The last section is about the conclusion and references used by us.
2 Related Work 2.1 Comparative Analysis Deriving a better illustration of the data is the most vital work in most of the pattern detection statements. The gender detection problem consists of basically three phases that is face detection, facial landmark detection, and finally gender detection. For this purpose, all current topical methodologies use methods that work on appearance, which performs data pre-processing, which consists of operations like transformation, augmentation, etc. [7]. Before depicting the proposed technique we have quickly provided the surveyrelated strategies for face detection and gender detection and an overview of convolutional neural network.
78
A. Chopra et al.
Fig. 2 Flowchart
2.1.1
Face Detection
Eigen Faces This method is basically based on Principal Component Analysis. Since it is an unsupervised statistical model, so it is not used for gender detection. Whereas, Fisher faces provides a clockwise linear projection, so it is a better approach to use for gender detection tasks. The Fisher faces has gained about 95% accuracy in which images of the person which were under test are not used for learning the model [8]. [Clause 1] But the problem with this model was that it was not able to classify gender correctly if the faces were aligned in different angles and orientations. So we went through four different types of methods in OpenCv which are used for face detection, using which we can train the proposed gender detection model for different orientations [9].
Haar Cascade Model The Haar Cascade Model uses a greyscale image and gives the output as a list which consists of multiple detected faces. It detects faces at different scales but the problem with that is that it is not accurate. [Clause 2] For instance, if we draw two dots and a curve, then the model will detect it as a face which gives an indication of multiple false predictions. Hence, this method is not used for real-time face detection. The other approaches used by us was Deep Neural Network [10] (Fig. 2).
Optimizing Gender Detection Using Deep Learning Technique …
79
Deep Neural Network For optical character recognition, one of the first neural networks to be implemented was the LeNet-5 [11] One of the main features of this model was its limited computational resource requirement and less training challenges as compared to today’s deep neural networks. Not just optical character recognition, deep neural networks provide us with a variety of possible applications, for instance, speech recognition, human posture measurement, facial component detection, etc. With the significant increase in the computational power supported by the stateof-the-art graphical processing units, deeper neural networks have become more ubiquituous. This has also been promoted by the abundance of varied training data sets and better training algorithms. One area with an abundance of data is face detection. Deep neural networks can be and should be utilized to extract facial components of humans for detecting human faces. Although there are certain facial features that develop with a person’s age, there are some that remain unique throughout a person’s life. Models that track developing features are known as anthropometric models that trace these features combined with statistical models to determine a person’s age. Developing and training accurate models can be a challenging task. However, deep neural networks work perfectly with features that do not change with time. These features can be used to infer important human characteristics like gender and emotion [12]. Conventional deep neural network models that deal with facial detection can be pigeonholed into two distinctive sets—template fitting and regression-based model. As the name suggests, a template fitting model develops facial templates that fit faces from the input images. Whereas, a regression-based model uses regression to detect faces from a series of input images. Researches have been conducted to implement this technique with slight algorithmic variations. Such as in paper [13] support vector regression was implemented. In papers [14] pixel-difference features combined with the cascaded fern regression were used. Multiple findings employ random regression on local images with Haar-features. Another way to implement this is to use cascaded neural networks. In this, the faces are segregated into different chunks that are separately processed by independent deep neural networks [15] the outputs of each individual neural networks are then regularized and forwarded to another layer to deal with the combined facial components. We used Histogram of Oriented Gradients and Support Vector Machines together and a model was built which upscale the images and works even faster and yields better results for front faces and slightly non-frontal faces. The Deep Neural Network which was applied consists of Single Shot Multibox Detector and Resnet-10 architecture. The Single Shot Multibox Detector’s Convolutional Neural Network is used to diminish filter size, it increases the intensity as it goes deeper. The deep layers cover a large number of receptive fields. Over here trivial layers are used for predicting big objects.
80
A. Chopra et al.
A major drawback of applying Deep Neural Network was that it was not able to distinguish faces having a resolution lower than 70 × 70 pixels as compared to Haar Cascade Model. Also, use of Maximum Margin Object Detector is Convolutional Neural Network Model that is used for training purposes but it needs GPU support for better performance. Due to which we were not able to use it, as OpenCV doesn’t support GPU. So out of all this, the DNN model is used with additional layers added to it which are used for more feature selection. Although this model is able to detect more faces as compared to Haar Cascade Model, it was not able to cover all features while detecting faces. It used to exclude forehead and chin portions of the face. Hence, while using it for the gender detection model we used to get false predictions and performance was diminishing because of it [16] [CLAUSE 3].
2.1.2
Gender Detection
Gender detection is the next step after face detection. After a face has been detected, relevant information can be extracted out of them by analyzing its facial features such as their gender, age, emotion, etc. One aspect of this that we’re going to talk about in this paper is gender detection. Systems with more accurate gender detection results can be used to create friendlier user environments. Applications of gender detection systems include intelligent user interfaces, demographic studies, visual surveillance, marketing models, etc. Earlier the work done in gender detection has primarily given attention to distinguishable characteristics for classification [17]. Most of them have used a collection of characteristics such as local binary patterns or LBP, speeded up robust features or SURF, a histogram of oriented gradients or HOG, or scale-invariant feature transform or SIFT. Recently, methods dealing with attributes for face recognition have caught a great deal of attraction. Binary classifiers which consist of a lot many new features were used. For each attribute, different features were calculated to be used to prepare individual support vector machines or SVMs. Methods based on CNNs are developed for learning attribute-based methods in [18–20]. Gender detection models generally use principles that can be catalogued into two different kinds—methods based on appearance and methods based on geometric features. Geometric features rely on the separation gap among several facial features such as the nose, lips, eyes, ears, chin, etc. One way further to this approach is to calculate the Euclidean distance between these features and apply Delaunay triangulation [15]. In classifiers in order to extract fruitful information, it is needed to convert data by itself into much easier form, this is the principle behind representation teaching (Bengio et al. 2013). Deep Learning methods are a distinct kind of depiction learning techniques which represents several compound levels using neural network. The upper layer characteristics represent deep concepts of data, this is near to the linguistic content of data, which makes them more useful as compared to raw data. Our brain also works in a similar manner dealing with complex tasks. All these methods have
Optimizing Gender Detection Using Deep Learning Technique …
81
proved its excellence in the computer vision domain. Whereas difficulty has been observed while dealing with the normal image using standard DNN [7]. Architectures can be pre-trained in several ways and one such way is to use unsupervised models, for instance, Restricted Boltzmann Machine (Smolensky 1986). The use of these models is to extract better characteristics from data that is unlabeled which in gender recognition problem has been proved itself quite essential (Mansanet et al. 2014). DCNN is the focus of the recently published work that approximates age and gender using facial images (Levi and Hassner 2015).
Xception The gender detection model is a CNN model. In neural systems, convolutional neural system is one of a principle classification to do pictures acknowledgment, pictures grouping, articles identifications, acknowledgment faces, and so forth are a portion of regions where CNNs are broadly utilized. This model is structured with the thought of making the best precision over a multitude of parameter proportions. Minimizing the total parameter count helps in conquering two very imperative issues, firstly, slow as well as degrading performances in lower hardware or hardware restricted machines such as platforms with robotic features can be solved by using minimal CNNs. And secondly, better generalization can be achieved by reducing the provided parameters. This architecture consists of removing fully connected layers from below and including a combination of lasting modules and convolutions that are depth-wise separable. This model is propelled by Xception Architecture. Two subsequent layers can be ideally mapped with Lasting or Residual modules, with the goal that the gathered features turn into the distinctions of unique component map and anticipated traits. Thus, the anticipated traits H(x) get altered to illuminate a simpler gathering issue F(x) with the end goal that H(x) = F(x) + x (Fig. 3). A combination of Pointwise Convolution and convolutions that are depth-wise separable are used. The principle reason for the existence of such convolutions is to break up the spatial cross relationships from channel cross correlation. To accomplish this a D × D channel is applied on each M input channels followed by employing a N 1 × 1 × M convolution channels to join the M sized input channels into a yield channel of size N. Without taking into account their spatial connection inside the channel, 1 × 1 × M convolution consolidates each incentive into the element map. Convolutions that are depth-wise separable downgrade the computation as compared to normal convolution by an aspect of 1/D2 + 1/N. A fully convolutional neural network is used for this architecture that collaborates the 4 pending convolutions which are depth-wise separable which further consists of batch normalization procedure and ReLU as its activation function. The final level directs a softmax to produce output and global average pooling. The accuracy of this model is 90%.
82
A. Chopra et al.
Fig. 3 a Standard convolutions, b convolutions that are depth-wise separable
Smaller VGGNet This model consists of convolutional layers. It uses only 3 × 3 convolution but lots of other filters. It is trained on 2 GPUs for 2–3 weeks. It can extract a lot of features from images. Initially, training is done with less weight layers. The final architecture is a fully convolution neural network, that is comprising of 5 convolutional layers where every convolution is tailed by a ReLU activation function. The Batch Normalization is applied after each convolution followed by MaxPooling. The Dropout rate of 25% is applied so that it learns better and more specific detailed features. Lastly, flatten is done and dropout of 50% is applied. In the last sigmoid activation function is used. A ReLU doesn’t change the gradient as it passes back during backpropagation. ReLU is only used for hidden layers. Softmax is used for classification. In this section we have discussed gender detection steps: The minimum distance classifier is used for matching and classification purposes. This classifier uses a Euclidean distance classifier. Between a coordinate of a pair of objects, distance is used to judge by calculating the square root of the difference between the points. In geometric classification, the gender is decided based on a threshold value that is calculated from the ratio of the distance between all the essential feature points. To increase the performance as well as the efficiency, different classifiers are combined such as K Nearest Neighbor or KNN, Support Vector Machines, etc. Such a variable combination of classifiers is known as Ensemble Classifiers [16].
Optimizing Gender Detection Using Deep Learning Technique …
83
Fig. 4 Data processing
With the recent success in current CNN-based systems, there has been an explosion in the number of researchers focusing their work on developing better versions of neural networks that provide more flexibility. Certain methods are even trying to visualize the training features and other parameters used by the CNNs [15]. In Ref. [21], a backpropagation guided approach has been analyzed. In Ref. [22], modification to the above-mentioned approach was proposed that was responsible for the improvement in the outputs. However, the outputs aren’t class-discriminative with nearly identical visualization irrespective of how fine-grained these outputs are. Other methods that make an effort to augment the activation functions or change the representations [23], these methods globally visualize these models and do not provide any specific instances of predicted images. However, they do make an effort to provide class-discriminative high-resolution images. In disparity, methods that employ gradient-based activation functional mappings produce a raw localized map highlighting regions in the result by using target concept gradients streaming across the convolutional layers. Reference [22] altered the convolutional neural networks by substituting fully-connected layers with global MaxPooling and convolutional layers. In Ref. [23] logarithmic-summation-exponent pooling is scrutinized by examining global Max Pooling [20] (Fig. 4).
84
A. Chopra et al.
3 Proposed Model 3.1 Face Detection As discussed previously in clause I, II and III the drawbacks can be overcome by implementing a ResNet architecture as it is possible to train a multitude of layers and yields better performance. This method works even for different facial orientations and detects faces under different scales as well as occlusions. The proposed model has been implemented in Keras. The model consists of ResNet architecture. ResNet is used because usually in other models the deeper networks starts coinciding which causes decadence issue. As the network depth increases the accuracy starts penetrating and later on it degrades fast [1]. The remaining operation H(x) = F(x) + x is used, F(x) serves as stacked nonlinear layers, x serves as singular function. Improving residual mapping function than improvising the original unreferenced mapping H(x) is easier is an assumption. As ResNet merges faster, it is used. All ResNet uses only singularity alternatives with projections alternatively used only when dimensions change. Higher the quantity of obscure layers, better the system. Feature Extraction consists of deriving a more advanced level of knowledge from raw pixel values that can capture the difference among the divisions included. This is done in an unsupervised manner in which the patches of the image have nothing to do with message retrieved from pixels. After the feature is retrieved a classification module is trained with images and associated labels. The problem with this is that feature extraction cannot be adjusted according to classes and images. So if the selected feature faces inadequacy in presenting the discrimination among categories, the accuracy of the detection model faces problems [1]. A typical subject among the best in class following the conventional pipeline has been to pick different element extractors and club them imaginatively to show signs of improvement. In any case, this includes such a large number of heuristics just as physical work to change parameters as indicated by the space to achieve a not too bad dimension of exactness. Neural networks as we all know are just multiple layers of neurons closely interrelated to each other that process inputs and give outputs. A deep neural network has such a large array of parameters that require a considerable training dataset to train the system without the risk of overfitting it. However, with Convolutional Neural Networks (CNNs) the mammoth task of training and testing the whole architecture is easily feasible. This is possible because the parameters are shared between neurons. During training, convolutional neural networks are infamous for requiring huge memory and computational power. This is an important limitation. Its applicability is also dependent on its size since large models cannot be deployed locally on a mobile device. But it is a tradeoff between accuracy and computational power because a more computationally intensive model will in turn produce a more accurate output
Optimizing Gender Detection Using Deep Learning Technique …
85
and vice-versa. Other factors involving neural networks are ease, generality, etc. Thus, the proposed model is accurate while keeping in mind the above-discussed features [3]. Increasing the depth is one way to augment the accuracy of the model, provided overfitting is taken care of. However, as the depth of the model increases, the weights need to be altered by correlating the real value with the predicted value. But these differences become even smaller as the depth increases. Thus in other terms, the deeper layers have not learned much. This phenomenon is known as the vanishing gradient. The differences decrease gradually until they are negligible. Another problem that arises while training deeper neural networks is that adding more layers may lead to immense training error. Thus, residual networks try solving this problem by constructing the architecture through residual modules that allow for training deep networks. This is known as degradation. Over here input data is of size 300 × 300. Initially, we applied Batch Normalization. The Scaling of the image is done. Then Convolution layer is present. The Use of ReLU activation function is used as well as MaxPooling is done. This MaxPooling down samples the data and tries to learn as much features as possible. Due to MaxPooling we can learn more general features in a more specific manner. While applying convolution in between the bias is not taken into consideration and directly the weights are considered. This removal of bias causes scaling of the image. Normalization is done to setup pixel values in a particular range. Stride is defined which describes how far the filter moves from one particular position to another. If a small stride is used it would lead to large overlaps due to which the output volume we get will be high and distortion is observed in learning data. The higher stride requires less memory for output. The output processing is feasible as the volume is smaller. Overfitting is avoided as a large number of attributes are covered. Additional exclusion of mean subs-traction amid preparing and supplanted it by a batch normalization layer on the information. These outcomes in an adaptively determined mean in preparing. All images are resized into 300 × 300 size. Stochastic Depth residual Network is used to improve test accuracy and attenuate training time (Fig. 5).
3.2 Gender Detection We have proposed a model in which we have used our dataset. We have collected images of males and females online which is allowed for commercial usage. Also with the help of video processing technique we created frames from videos clicked on our own. The dataset is split into two categories males and females and batches are made. Labeling is done. This dataset consists of total 11,000 images with 6336 images of male and 4664 images of female. The training and testing dataset split is in ratio 80:20. There are 32 batches over here. The training was done in almost 5–6 days. We have trained the dataset on GPU (NVIDIA 920 mx). The use of Resnet is done as the main aim of using it is that we can train up to hundreds or more layers using it and even then gaining imperative performance. Because of this advantage,
86
A. Chopra et al.
Fig. 5 ResNet basic architecture
the performance of computer vision applications except image classification has achieved great success for example object detection and face recognition. Training Deep Neural Networks is a hectic job. Residual framework is useful in training the networks feasibly that are deeper than the previous one. The ResNet networks can easily be optimized and can gain accuracy from increased depth. Layers are clearly reformulated with reference to the layer contributions as learning leftover capacities, rather than learning unreferenced capacities [24]. Increasing the depth of network won’t work by just stacking layers together. This deep networks are difficult to train as it faces vanishing gradient problem. As deeper network starts coinciding, a deterioration problem was faced which is, as depth of network increases, the efficiency gets saturated and then deteriorates fastly. In more profound system the extra layers preferable approximates the mapping over it’s a shallower partner and lessens the mistake by a huge edge [25]. We realize neural networks are all inclusive capacity approximators and that the exactness increments with expanding number of layers. Yet, there is a breaking point to quantity of layers included that outcome in precision improvement [18]. In any case, things being what they are, because certain issues like vanishing gradients and revile of dimensionality in the event that we have adequately profound systems, it will be unable to learn simple functions like identity functions. Presently this is plainly bothersome. Through experiments, it has been observed that trivial as well as deeper network gives the same efficiency in worst case scenario. Whereas in productive case deeper network gives better performance as compared to a trivial network. In ResNet skipping of connections or short-cuts are taken to jump across some layers. If many parallel skips are there then it is known as DenseNets. The non-residual network is named as plain network.
Optimizing Gender Detection Using Deep Learning Technique …
87
This skipping of layers is done in order to escape from vanishing gradients. If hopping of layers is done adequately then it simplifies the network over where at the initial stage few layers are used. Due to this, the learning rate increases by lowering the vanishing gradient problem as few layers are there to pass by. Later on the hopped out layers are restored to learn the feature space. A neural net which does not have the remaining part is able to explore more feature space than others [16].
3.2.1
Forward Propagation
For singular hopping, the layers might be numbered as n − 2 to n or n to n + 2. During forwarding proliferation, it is easy to explain the hop as n + k from a given layer, but during backward proliferation the activation layer is reused as n − k. The hop number is k − 1. The weight matrix W k−1,k , associated loads from layer k − 1 to k, and a weight matrix W k−2,k for associated load from layer k − 2 to k. Forward Activation Function a k = g(z k + W k−2,k · a k−2 ) ak : Activations of neurons for layer. g: Activation function in layer k. W k−1,k : The weight matrix for neurons between layer k − 1 and k and z k = W k−1,k · a k−1 + bk .
3.2.2
Backward Propagation W k−2,k := −η∂ E k /∂ W k−2,k = −ηa k · δ k
η: Learning rate (η < 0). δ k : Error signal of neurons at layer k. aik : Activation of neurons at layer k. We present the ResNet model which has 50 times fewer layers and is twice quicker than others. This is a 16 layer deep network whose performance capacity is as of 1000 thin deep layer network and a lot more count of factors. The Recursion limit is set to prevent from infinite recursion from causing a stack overflow. If the requirement of recursions is more then the limit is set high. Initially random weight values are taken.
88
A. Chopra et al.
The image size is 64 × 64 and depth of layers is 16. We have used tensorflow backend with 3 channels RGB for input image. Batch Normalization is done with ReLu as activation function. In order to speed up learning the input layers is normalized by adjusting or scaling the activation. It reduces the quantity by what the covariant shift will take place. It allows every layer to learn by itself independently without considering other layers [26] (Figs. 6 and 7).
Fig. 6 Our model
Fig. 7 Block even and block odd
Optimizing Gender Detection Using Deep Learning Technique …
89
Diminishes the overfitting in view of slight regularization impacts. It adds some noise to every single hidden layer’s activations. In the event where batch normalization is used the dropout rate is brought down. It expands the neural networks stability by expelling the batch mean and partitioning the batch standard deviation. Regularizer is used which allows applying penalties on parameters of layer or layer activity during optimization. These penalties are inculcated into loss function that the network optimes. This penalties are applied on per layer basis. Convolution 16 is the filter size, kernal size is 3 × 3, stride is 1 × 1. While applying a classifier block, average pooling is done with pool size of 8 × 8. Flatten is applied and during prediction, softmax activation is used [26]. Softmax function turns the numbers into the probability that sums up to one. It outputs a vector that represents probability distributions of a list of potential outcomes. Softmax is not a black box, it has two components. It copies the onehot encoded labels better than absolute values. If we use absolute values then we would lose the information, while exponential intrinsically takes care about this. On basis of epochs, the learning rate value is set. A momentum of 0.9 in Stochastic Gradient Descent is set. Adam or Stochastic gradient descent optimizer is used. Dropout is applied on top layers which contain parameters in a considerable amount to prevent overfitting as well as feature covariation. Followed by which batch normalization substituted it, which is used like a method in neural network activations to minimize covariant shift by normalizing them in order to have specific distribution. Dropout should be infused between convolutional layers. The 16 layer deep ResNet with dropout gains 1.86% error. ResNet consists of two types of blocks [26] (Fig. 8). Bottleneck: One 3 × 3 convolution followed by reducing its dimentionality and inflating 1 × 1 convolutional layer. Basic: Two consecutive convolutional having Batch Normalization with ReLU preceding convolutional.
Fig. 8 a ReLU bottleneck and b ReLU basic
90
A. Chopra et al.
The structure of Batch Normalization, Activation and Convolution in remaining block is first Batch Normalization followed by ReLU and finally convolution. Use of filters bigger than 3 × 3 is not required. Two more factors are the deepening factor, l and the widening factor k. In this l is the quantity of convolutions in the block whereas k is the multiple quantity of characteristics in the convolutional layers. Our model’s structure consists of initial convolutional layer followed by three group of ResNet blocks conv2, conv3, conv4 tailed by the mean pooling and concluding classification layer. It is more computationally powerful to broaden the layers than have a great many little bits as GPU in substantially more effective parallel calculations on expansive tensors, so are keen on an ideal d to k proportion. We included a dropout layer into every lingering square among conv. and later ReLU to annoy batch normalization in subsequent remaining blocks thus avoiding overfitting.
4 Algorithm
1. Image is taken in form of bitmap from texture view < EQ1 > 2. Frame classification is done using synchronized lock: < EQ2 > : 3. Face detection and gender detection: Convert Bitmap to frame followed by conversion into Mat RGBA image is converted into RGB image < EQ3 > Normalizing of image is done into RGB by (104.0, 177.0, 123.0). 300 × 300 image size is used
Image is fed into model of Resent 10 architecture for detecting faces on the bases of confidence obtained through our model Loop for all faces: If (confidence > 0.5) then Consider the face as the authentic face and then gender detection is called. Converting the face bitmap into required size of 64 × 64 into scaled Face. Gender is detected using the face coordinates and ML model is called Result is shown and stored
Optimizing Gender Detection Using Deep Learning Technique …
EQ:1
textureView.getBitmap(300, 300);
EQ:2
synchronized (lock) { if (runClassifier) { classifyFrame(); } }; classifyFrame(){
91
EQ:3 cvtColor(mat, mat, COLOR_RGBA2BGR); EQ:4 blobFromImage(mat, 1.0, new Size(300, 300), new Scalar(104.0, 177.0, 123.0, 0), false, false, CV_32F); EQ:5 EQ:6
For(all the faces detected ){ If(confidence of face is >0.5){ Result 0.0069(honest). It is more profitable to cannibalize pools than mine honestly. Figure 2 shows the algorithm for distributed network in blockchain.
Fig. 1 Pool-war in blockchain
Game Theory-Based Proof of Stake Mining in Blockchain …
129
Fig. 2 Distributed network algorithm
4 Implementation and Result Analysis In implementation, the block index, proof of work, hash value etc., are used in selfcreated blockchain. The solution gives the real-time working of blockchain and how timestamp changes every time a block is being mined. As the real blockchain is distributed and applying consensus mechanism to it in real time is the hardest part, the postman software is used to make such blockchain nodes and to make them able to communicate with each other like traditional blockchain does. Other various functions that supports self-created blockchain working as follows: get_chain(): This function returns the node present in a particular node in JSON format which gives us the information about block index, transactions included, timestamp and proof. Add_transaction(): It allows a node to add transaction. The transaction contains the details about the sender, receiver and amount to be transferred. It returns the message whether the transaction is being added or not and if added, then which block will contain that. Connect_nodes(): It is POST method which contains the list containing addresses of the nodes to be connected in its body. It connects other nodes present in our distributed network to our node. It returns the status code that whether nodes have been connected successfully or not. Replace_chain(): It is the function that manages the consensus mechanism of our self-created distributed blockchain. It finds the longest chain present in our network and replaces the longest chain present and hence applies the consensus mechanism in our blockchain. Mine_block(): It is the basic function used to mine block. It receives the JSON file which contains the loyalty of the miner. Benevolent cooperation is implemented in this function. It distributes the rewards to the respective miner according their behavior and the incentive.
130
N. K. Tyagi et al.
Table 4 Result analysis Test cases Input
Functionality
1
Miner1: Honest Miner2: Honest
Do reward Miner1: 2 distribution as per Miner2: 2 benevolent cooperation outcome
Expected output Actual output Result
2
Miner1: 2 Miner2: 2
Pass
Miner1: Honest Do reward Miner1: 1 Miner2: Dishonest distribution as per Miner2: −1 benevolent cooperation outcome
Miner1: 1 Miner2: −1
Pass
3
Miner1: Dishonest Do reward Miner1: −1 Miner2: Honest distribution as per Miner2: 1 benevolent cooperation outcome
Miner1: −1 Miner2: 1
Pass
4
Miner1: Dishonest Do reward Miner1: 0 Miner2: Dishonest distribution as per Miner2: 0 benevolent cooperation outcome
Miner1: 0 Miner2: 0
Pass
As in implementation part, it is focused on more on finding how blockchain works in real, but as to host blockchain and to mine real blocks, it needs a huge amount of computational power which is not the prime focus of this paper. As this paper focuses on the different equilibrium points in reward distribution mechanism, hence the result analysis should be to verify whether implemented blockchain works as required. Thus, some of the test cases focus on the outcome that would give the miner’s honesty (Table 4). From analyzing the above test cases, it is found that reward distribution policy in blockchain is very important to maintain trust level among the miners. It is also found that there is a lot of difficulty in maintaining all these consensus mechanisms in real time among the trillions of nodes of the distributed network.
5 Conclusion It is found that benevolent cooperation is best suited for blockchain reward distribution policy and promotes miners to mine honestly and establishes Nash equilibrium in that. The creation and implementation of distributed blockchain are also done. In implementation, it is noticed that how every blockchain functionality would work in real-time distributed architecture and hence concludes that in spite there are still scalability issues in blockchain, if such reward distribution policies optimized using game theory, it can promote fair play between miners. Thus, honest play of miners
Game Theory-Based Proof of Stake Mining in Blockchain …
131
increases trust level in the blockchain which would directly encourages more nodes to be a part of blockchain. It increases distributed nature of blockchain which might improve the security issues.
References 1. Lamport L, Shostak R, Pease M (1982) The Byzantine generals problem. ACM Trans Program Lang Syst (TOPLAS) 4(3):382–401 2. Alsunaidi SJ, Alhaidari FA (2019) A survey of consensus algorithms for blockchain technology. In: International conference on computer and information sciences (ICCIS). Sakaka, Saudi Arabia, pp 1–6 3. Rifkin J (2011) The third industrial revolution: how lateral power is transforming energy, the economy, and the world. Macmillan 4. Ma Z, Zhou XX, Shang YW, Sheng WX (2015) Exploration of the concept, key technology and development model of energy internet. Proc Power Syst Technol 39:3014–3022 5. Lin I-C, Liao T-C (2017) A survey of blockchain security issues and challenges. IJ Netw Secur 19(5):653–659 6. Watanabe H, Fujimura S, Nakadaira A, Miyazaki Y, Akutsu A, Kishigami J (2016) Blockchain contract: securing a blockchain applied to smart contracts. In: 2016 IEEE international conference on consumer electronics (ICCE), pp 467–468 7. Seang S, Torre D (2018) Proof of work and proof of stake consensus protocols: a blockchain application for local complementary currencies. Universite Cote d’Azur-GREDEG-CNRS, France Str 3.4 8. Saleh F (2019) Blockchain without waste: proof-of-stake. SSRN Scholarly Paper ID 3183935. Social Science Research Network, Rochester NY 9. Auer R (2019) Beyond the doomsday economics of proof-of-work in cryptocurrencies 10. Gao Y, Nobuhara H (2017) A proof of stake sharding protocol for scalable blockchains. Proc Asia-Pacific Adv Netw 44:13–16 11. Singh R, Dwivedi AD, Srivastava G, Wiszniewska-Matyszkiel A, Cheng X (2020) A game theoretic analysis of resource mining in blockchain. Cluster Comput 1–12 12. Dey S (2018) Securing majority-attack in blockchain using machine learning and algorithmic game theory: a proof of work. In: 2018 10th computer science and electronic engineering (CEEC). IEEE, pp 7–10 13. Liu X et al (2018) Evolutionary game for mining pool selection in blockchain networks. IEEE Wirel Commun Lett 7(5):760–763 14. Cong, Lin William, and Zhiguo He: Blockchain disruption and smart contracts. The Review of Financial Studies 32, no. 5. (2019)1754–1797 15. Wang L, Liu Y (2015) Exploring miner evolution in bitcoin network. In: International conference on passive and active network measurement. Springer, Cham, pp 290–302 16. Kim S, Hahn SG (2019) Mining pool manipulation in blockchain network over evolutionary block withholding attack. IEEE Access 7:144230–144244 17. Göbel J, Keeler HP, Krzesinski AE, Taylor PG (2016) Bitcoin blockchain dynamics: the selfishmine strategy in the presence of propagation delay. Perform Eval 104:23–41 18. Liu X, Wang W, Niyato D, Zhao N, Wang P (2018) Evolutionary game for mining pool selection in blockchain networks. IEEE Wirel Commun 7(5):760–763 19. Chatterjee K, Goharshady AK, Pourdamghani A (2019) Hybrid mining: exploiting blockchain’s computational power for distributed problem solving. In: Proceedings of the 34th ACM/SIGAPP symposium on applied computing, pp 374–381 20. Wang Y, Tang C, Lin F, Zheng Z, Chen Z (2019) Pool strategies selection in PoW-based blockchain networks: game-theoretic analysis. IEEE Access 7:8427–8436
132
N. K. Tyagi et al.
21. Kopp H, Kargl F, Bösch C, Peter A (2018) uMine: a blockchain based on human miners. In: International conference on information and communications security. Springer, Cham, pp 20–38 22. Wang W, Hoang DT, Hu P, Xiong Z, Niyato D, Wang P, Wen Y, Kim DI (2019) A survey on consensus mechanisms and mining strategy management in blockchain networks. IEEE Access 7:22328–22370 23. Kumar A, Krishnamurthi R, Nayyar A, Luhach AK, Khan MS, Singh A (2020) A novel software-defined drone network (SDDN)-based collision avoidance strategies for on-road traffic monitoring and management. Veh Commun 100313 24. Kumar A, Sharma K, Singh H, Naugriya SG, Gill SS, Buyya R (2020) A drone-based networked system and methods for combating coronavirus disease (COVID-19) pandemic. Future Gener Comput Syst 115:1–19 25. Singhal R, Kumar A, Singh H, Fuller S, Gill SS (2020) Digital device-based active learning approach using virtual community classroom during the COVID-19 pandemic. Comput Appl Eng Educ 26. Kumar A, Aggarwal A (2012) Survey and taxonomy of key management protocols for wired and wireless networks. Int J Netw Secur Appl 4(3):21–40 27. Kumar A, Gopal K, Aggarwal A (2015) Novel trusted hierarchy construction for RFID sensorbased MANETs using ECCs. ETRI J 37(1):186–196 28. Kumar A, Gopal K, Aggarwal A (2014) Cost and lightweight modeling analysis of RFID authentication protocols in resource constraint internet of things. J Commun Softw Syst 10(3):179–187 29. Kumar A, Krishnamurthi R, Nayyar A, Sharma K, Grover V, Hossain E (2020) A novel smart healthcare design, simulation, and implementation using healthcare 4.0 processes. IEEE Access 8:118433–118471 30. Kumar A, Jain S, Yadav D (2020) A novel simulation-annealing enabled ranking and scaling statistical simulation constrained optimization algorithm for internet-of-things (IoTs). Smart Sustain Built Environ 31. Krishnamurthi R, Kumar A, Gopinathan D, Nayyar A, Qureshi B (2020) An overview of IoT sensor data processing, fusion, and analysis techniques. Sensors 20(21):6076 32. Singh V, Aggarwal A, Kumar A, Sanwal S (2019) The transition from centralized (Subversion) VCS to decentralized (Git) VCS: a holistic approach. IUP J Electr Electron Eng 12(1) 33. Kumar A, Aggarwal A, Yadav D (2018) A multi-layered outlier detection model for resource constraint hierarchical MANET. In: 2018 5th IEEE Uttar Pradesh section international conference on electrical, electronics and computer engineering (UPCON). IEEE, pp 1–7
Proposal to Emphasize on Power Production from Solar Rooftop System in the University Campus Subhash Chandra and Arvind Yadav
Abstract Government of India has a very ambitious mission to produce 175 GW electricity from renewable energy sources. Out of this 100, GW must be contributed by solar energy. A country with a high population density facing scarcity of land required for solar power plant installation. In this situation, institutions like colleges and universities should come forward to support the mission and take advantage of clean development mechanism. In this paper, an analysis is done to submit the proposal for installation of solar PV-based power plant on the roof of GLA University (Mathura) campus residential building to fulfil the demand of all residents. Total load is estimated and the required cost is calculated for initial investment considering the battery backup for one day. The total investment cost is around Rs. 7,500,000 is required and the system will generate 289,080 units annually hence Rs. 3,758,040/ will save yearly. The loss of load probability is 5% which is acceptable. Keywords Solar PV system · Size estimation · Cost · Loss of load
1 Introduction Government of India has an ambitious target to generate 175 GW from renewable energy sources. The contributions from various sources like 100 GW from solar energy, 60 GW from wind energy, 10GW from biomass and 5GW from small hydropower plants. Most importantly, 100 GW from solar is tough but as per the ministry of new and renewable energy reports, good achievements are shown. As on 31 March 2019, India is producing 30 GW electricity from solar energy as shown in Fig. 1 [1]. Dreams of such a large amount of electricity generation seem to come true through the installation of solar parks in various states. Top five solar parks are listed here S. Chandra (B) · A. Yadav Electrical Engineering Department, GLA University, Mathura 281406, India e-mail: [email protected] A. Yadav e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 G. Sanyal et al. (eds.), International Conference on Artificial Intelligence and Sustainable Engineering, Lecture Notes in Electrical Engineering 837, https://doi.org/10.1007/978-981-16-8546-0_12
133
134
S. Chandra and A. Yadav
Fig. 1 Present electricity generation from solar energy against the target
I. II. III. IV. V.
Bhadla Solar Park, Rajasthan (2245 MW) Pavagada Solar Park, Karnataka (2050 MW) Kurnool Solar Park, Andhra Pradesh (1000 MW) Rewa Solar Park, Madhya Pradesh (750 MW) Kamuthi Solar Park, Tamil Nadu (648 MW).
Besides the ground-mounted, solar PV system can be installed on the roof of the public and private buildings [2]. Required land is an actual concern in the states of a highly dense population. India has been able to meet the gap between demand for and domestic supply of energy while addressing the environmental externalities associated with energy use. Despite high growth rates experienced in energy-intensive sectors, energy consumption and carbon dioxide (CO2 ) emissions have not grown as rapidly as the gross domestic product (GDP). Electricity supply is growing in line with economic growth, while its carbon intensity is in decline thanks to the increase in the share of renewable and declining utilization of coal power plants [3]. As per Government of India, every stakeholder needs to contribute to the mission of phase out the electricity generation from coal and other conventional methods. In addition to this strong coordination of energy policy across central government and alignment between the central and state governments on key energy policy matters, notably on electricity market design and renewable targets. The building integrated photovoltaic systems are yet not popularized; especially in India, in contrast, roofmounted growing rapidly due to ease of installation and performance [4–7]. However the performance of these systems is site-dependent, it will be an injustice to use generalized approach for performance evaluation [8]. In the Indian scenario, the climate conditions are highly variable from north to south and east to west. The key performance parameter radiation and temperature do not show uniform characteristics and challenges for technical and economical parameters [9]. Several researchers have evaluated the performance by using methods of soft computing to forecast and attract customers [10]. Although we as Indian are lucky to have 300 sunny days and most of the region is favourable for solar PV installation but there several challenges that can be addressed in the future. In this paper, an attempt is made to suggest for the installation of solar PV system to fulfil the electricity demand of the families residing in the university campus. PVSYST software is used for the analysis.
Proposal to Emphasize on Power Production …
135
2 Challenges and Scope of Improvement Based on the above literature review, the key challenges and scope of improvement in enhancing the generation of electricity from solar PV can be listed as below. i.
ii.
iii.
iv.
v.
Scarcity of Land: In densely populated states, the scarcity of land can be compensated by installing the PV systems on the roof of public, private buildings like railway stations, roadways, candies, schools, colleges and universities. Data resource centre at the local level: Improve the collection, consistency, transparency and availability of energy data across the energy system at central and state government levels. Policies for skill development and employment: Local person must be trained for the maintenance and problem rectification kind of work by giving money. In this way, they can be attracted with the mission. Local advertisement about Government policies: Public information brochure must be prepared in regional language and be very clear to the project planners so that they can convey to the public with all if and buts. High initial cost: Despite coming down the price of per unit electricity generation from solar PV, the installation cost of the project is too high yet. The government can provide few attractive schemes and emphasize in mitigation of pollution and hence clean and green development for their children in future.
3 Methodology To emphasize a proposal is submitted for electricity generation from solar PV-based power plant, a load of all the residential flats of the university taken in to account and some assumptions are also made to avoid any ambiguity in the data collected from the householders. Since same size family flats are provided by the university management and average four members are residing in each family including the children. Therefore following assumptions are made during daily energy need. 1: Every flat contains four led lamp of 15W each and generally these are utilized from 6 p.m. to 11 p.m. and 5 a.m. to 6 a.m. 2: Every family contains only one television. The assumption is they all are of equal rating and almost used for the same duration. 3: In each family, there is a laptop and using hours are the same. 4: Similarly a fridge and washing machine are also in each family with the assumption that similar no. of running hours. 5: In each house, there are four fans of the same rating. To calculate the consumption of fans, the following assumption is made “Fan is a seasonal load which generally used 9 h daily from March to October in India. So for 12 months, 2/3 daily hours are considered.”
136
S. Chandra and A. Yadav
Table 1 Standard load profile of all the families in the university campus Type of load
Power (W) per appliance
Total No
Mean daily use (h)
Daily total energy (wh)
Led lamp
15
92 × 4 = 368
6
TV
55
92
6
30,360
Laptop
35
92
6
19,320 552,000
Fridge
300
92
20
W/machine
300
92
0.5
92 × 4 = 368
6
Fan
65
33,120
13,800 143,520
E daily = Pe ∗ n ∗ d Total E daily =
6
E daily
(1)
(2)
Appliance1
Daily consumption of each appliance is shown in Table 1. Thus total energy demand for daily use is obtained by adding all these. This data is utilized for sizing the solar PV system with the help of PVSYST.
4 Result and Discussion Figure 2 depicts the monthly available solar energy at the considered site and load demand. It is observed that PV system can fulfil the load demand in each month
Fig. 2 Monthly available average solar energy and user need
Proposal to Emphasize on Power Production …
137
Fig. 3 Monthly average state of charge of batteries and loss of load probability
for every family. Average daily energy required is 792 kWh while the system will produce 1068 kWh daily. A good matching between demand and supply is seen for entire year. The nominal array power required is 236707 Wp that can be approximated as 250 KWp. For installation of this size of the system, based on present market trends, an initial investment of Rs. 7,500,000 is required. In this cost, battery cost is also considered by taking one-day autonomy. The cost of the system proportional to the autonomy days. During summer, loss of load probability is negligible but during winter it is around 10% because of weather conditions. Here both parameters are calculated and shown in Fig. 3. Average state of charge (SOC) of batteries is found 66.1% with one-day autonomy also the loss of load probability is in the interest of the customer.
5 Conclusion and Future Scope The above analysis is carried out and a proposal is submitted for the installation of solar PV-based electricity generation. If the estimated cost is invested in such a plant then it will save the Rs. 3,758,040/annually as well as environment also. Because the diesel generator average cost Rs. 13/paid by each family resides on campus. There are very few chances of loss of load in winter only. Besides this university will also contribute to the government mission of production of 100 GW solar powers from solar PV-based power plants. The tenders from various project planners may be asked and after proper scrutiny, the green signal should be given to them with proper terms and conditions. After successful execution of this project, plant capacity may be increased and if generation is more than the demand it should be sell to power corporation.
138
S. Chandra and A. Yadav
Acknowledgements Authors are thankful to the electrical maintenance department for their kindhearted support as and when required. Authors are thankful to the persons from NISE for their support.
References 1. https://mnre.gov.in/solar/current-status/. Assessed on 10 Aug 2020 2. https://mercomindia.com/top-solar-parks-india-infographics/. Assessed on 10 Sept 2020 3. https://niti.gov.in/sites/default/files/2020-01/IEA-India%202020-In-depth-EnergyPolicy_0. pdf 4. Madessa HB (2015) Performance analysis of roof-mounted photovoltaic systems—the case of a Norwegian residential building. Energy Procedia 83:474–483 5. Padmavathi K, Arul Daniel S (2013) Performance analysis of a 3 MWp grid connected solar photovoltaic power plant in India. Energy Sustain Dev 17(6):615–625 6. Kumar BS, Sudhakar K (2015) Performance evaluation of 10 MW grid connected solar photovoltaic power plant in India. Energy Rep 1:184–192 7. Sharma V, Chandel SS (2013) Performance analysis of a 190 kW p grid interactive solar photovoltaic power plant in India. Energy 55:476–485 8. Chandra S et al (2020) Material and temperature dependent performance parameters of solar PV system in local climate conditions. Mater Today: Proc 9. https://www.mapsofindia.com/maps/india/annualtemperature.htm. Assessed on 15 May 2020 10. Chandra S, Agrawal S, Chauhan DS (2018) Soft computing based approach to evaluate the performance of solar PV module considering wind effect in laboratory condition. Energy Rep 4:252–259
Path Planning of E-puck Mobile Robots Using Braitenberg Algorithm Bhaskar Jyoti Gogoi and Prases K. Mohanty
Abstract In this paper an efficient method is proposed and tested to navigate a mobile robot to its goal point safely by passing through different types of obstacles. For the obstacle avoidance a Braitenberg vehicle method is used. This method can be implemented without using much expensive equipment or any complex structures. For the mobile robot, E-puck robot is being used for this paper. The robot should have ability to locate its goal point and move towards it. There are various ways to do it, but in this paper a new technique is being studied and implemented by considering the coordinates of the robot and the destination point. The robot’s efficiency has been tested and simulated at webots software in different environments. Some problems were identified during the simulation, but it has been effectively resolved. The paths followed by the robot during the simulations are presented. Keywords Braitenberg vehicle · Goal point · Starting point · Obstacle avoidance · Path planning · Mobile robot
1 Introduction For a mobile robot to navigate to a goal point, it is very important to be aware of the surroundings and react accordingly to avoid the obstacles. The application of mobile robotics has widely increased in the areas like, space exploration, security, distribution of goods in places like warehouse, medical and personal assistance, etc. For this the robot needs to read its surrounding constantly and process the data. Braitenberg vehicles use simple input data from proximity sensors to show complex behaviours. Therefore, simple infrared sensors on the mobile robot can be used to
B. J. Gogoi (B) Department of Electronics and Communication, The Assam Kaziranga University, Jorhat, Assam 785006, India P. K. Mohanty Department of Mechanical Engineering, National Institute of Technology, Nirjuli, Arunachal Pradesh 791112, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 G. Sanyal et al. (eds.), International Conference on Artificial Intelligence and Sustainable Engineering, Lecture Notes in Electrical Engineering 837, https://doi.org/10.1007/978-981-16-8546-0_13
139
140
B. J. Gogoi and P. K. Mohanty
detect obstacle and also perform complex behaviours to avoid the obstacles using Braitenberg vehicles to reach the goal position. Previously there has been lots of work on the path-planning of mobile robots using various methods like wall following method [1], fuzzy methods [2], virtual target method [3]. But the Braitenberg vehicle approach makes the path-planning simple and effective. Here in this paper, based on the Braitenberg methods and readings from the sensors, various complex movements are assigned to the robot. The movements on the robot is basically controlled by controlling the velocity on each wheel. Many different types of classical techniques are used for path planning, such as cell composition (CD), roadmap approach (RA), neural network (NN) and Voronoi diagram [4]. In cell composition technique the configuration space is transformed into non-overlapping cells and it uses connectivity graphs to travel across the cells to reach the goal [5]. The roadmap approach actually connects the free spaces and forms a network of many one-dimensional curves [6]. Artificial neural network is made up of many layers and elements connected with each other. These connections are given importance based on other weightage and it can make a system learn on its own [7]. The Voronoi diagram divides the plane with x number of points into polygons and the point in every polygon is always closer to its generating points [8]. For this paper a new approach has been made to reach to the final destination, by using the coordinates of the robot and estimating the time to reach the destination. To avoid the obstacles on its way Braitenberg vehicle 2a is used. Braitenberg vehicle method is simple and yet effective for a vehicle to show complex behaviours. The paper is briefly organized as: at Sect. 2 Braitenberg vehicle concepts is briefly discussed. At Sect. 3 how the robot will behave and move towards its final destination. At Sect. 4 the paths followed by the robot is presented using simulation environment.
2 Braitenberg Vehicles In 1984 a book written by Valentino Braitenberg named “Vehicles: Experiments in synthetic psychology” describes about various experiments where simple structured machines show complex behaviours [9]. In this book he had considered brains of the animals as a “piece of machinery”. This made him to think vehicles as animals with its own nature. Using the different connections and system structure of vehicles, he tried to show the internal structure and basic working of the animals. These machines with simple structure show amazing complex behaviours. That is why these are used to study for Computational Neuroscience. In Braitenberg Vehicles the connections of sensors to motors can alter how a vehicle reacts to the environment [10]. Based on these connections the Braitenberg Vehicles show different behaviours like fear, aggression, liking and love. In order to provide the vehicles with various abilities to cope with the environment different types of Braitenberg vehicles are used with wide variety of stimulus like, light, darkness, free space infant of the robot, etc. Here are some of the simplest examples of Braitenberg’s vehicles:
Path Planning of E-puck Mobile Robots Using Braitenberg Algorithm
141
Fig. 1 Braitenberg vehicle 1
2.1 Vehicle 1 The vehicle 1 of Braitenberg vehicle have one sensor and one wheel, where the wheel is being stimulated by the sensor proportionally. Here the vehicle can move only in one direction. It cannot take turns or show more complex behaviour. It can either stand still or move forward and backward depending upon the stimuli. This vehicle makes it easier to understand the concept of Braitenberg Vehicle for the beginners. The vehicles with more sensors and wheels can show more complex behaviours and are more sensitive towards the environment. Figure 1 shows the basic diagram of a Braitenberg vehicle 1.
2.2 Vehicle 2 In Vehicle 2 of Braitenberg vehicle, they have two wheels and two sensors, where one sensor is connected to one wheel and another sensor is connected to other wheel. Here each sensor independently controls the behaviour of the wheel that is connected to it. Now as we have two sensors and two wheels, there will be two different types of wiring. Based on these wirings there will be two types of Vehicle 2 i.e., Vehicle 2a and Vehicle 2b.
2.2.1
Vehicle 2a
In Vehicle 2a the sensor on the right side is wired with the right wheel and the sensor on the left side is wired with the left wheel. It can also be said that the vehicle is symmetric. This vehicle shows negative behaviour. To better understand its behaviour let us consider a situation where there is a light source on the right side of the vehicle. Now the right sensor will have higher readings than the left sensor and thus the right wheel will turn faster and as a result the vehicle will turn towards left, away from the light source. This behaviour is better suited for the escape situations. Another
142
B. J. Gogoi and P. K. Mohanty
Fig. 2 Braitenberg vehicle 2a
way the Vehicle 2a can be used to move towards the light, away from the dark. As shown in the Fig. 2 the ‘+’ sign means that, with high reading of the sensor the motor rotates faster. But the ‘−’ sign indicates, with high sensor reading the motor rotates slowly. Now if the vehicle is moving directly towards the light source then it might hit the source as both the sensor readings are increasing simultaneously. But if the light source is at the side of the vehicle then the motor of that corresponding side will move fasters and the vehicle will move away from the source [11]. Figure 2 below shows the action of the vehicle 2a of Braitenberg vehicle.
2.2.2
Vehicle 2b
For the Vehicle 2b the right sensor is connected with the left motor and the left sensor is connected with the right motor. In other words, sensors and motors are connected inversely. Now considering the light source present on the right side, the right sensor will have high reading and left sensor will have low reading. Now as the right sensor is connected with the left motor, the left motor will rotate faster than the right motor and as a result the vehicle will move towards the light. Figure 3 below represents the vehicle 2b.
Fig. 3 Braitenberg vehicle 2b
Path Planning of E-puck Mobile Robots Using Braitenberg Algorithm
143
Fig. 4 Braitenberg vehicle 3a
2.3 Vehicle 3 Vehicle 3 of Braitenberg vehicle is similar to that of the Vehicle 2. It has two sensors and two wheels. Similarly, based on its connection between sensors and wheel this vehicle can be of two types Vehicle 3a and Vehicle 3b. The only thing that separates Vehicle 3 from Vehicle 2 is that motors will move slowly when the reading is high. In other words, this vehicle slows down when faces the light and under certain conditions it might stop infant of the light source.
2.3.1
Vehicle 3a
The Vehicle 3a has its right sensor wired with right motor and left sensor wired with the left motor. To understand this vehicle, let there be light source on the right side of the vehicle and when right sensor readings increases the right wheel rotation decreases and as a result the vehicle will turn towards the light source. When the vehicle faces towards the light source then it decelerates and gradually stops in front of the source as shown in the Fig. 4.
2.3.2
Vehicle 3b
The sensors and wheels of the Vehicle 3b is connected inversely. The right sensor is wired with the left wheel and the left sensor wired with the right wheel [12]. The action of the vehicle 3a is shown in the Fig. 5.
3 Navigation and Obstacle Avoidance There are three main factors that play a critical role for the mobile robot to navigate through, while avoiding the obstacle and reach the goal point in this project. These
144
B. J. Gogoi and P. K. Mohanty
Fig. 5 Braitenberg vehicle 3b
are, rotating the robot towards goal point—then move forward—avoid obstacle using Braitenberg vehicle concept. To better understand the situation through Braitenberg vehicle’s view let us consider the goal point as source (S–G) and obstacles as another source (S–O). Here the source (S–G) to which the vehicle is attracted to and the source (S–O) is to which the vehicle tries avoiding. Considering these situations, the robot has to follow the path as shown in the Fig. 6 [13]. To avoid the obstacle Vehicle 2a is used with a slight modification. There is a total of eight sensors mounted on the robot and each of the sensors are set with certain threshold value to detect obstacles on its way. At first the robot makes a rotation to face towards the goal point at the starting point. After facing towards the goal point the robot moves forward with all the sensors activated, and the data collected from the sensors are being processed instantaneously at real time. Whenever an obstacle is detected the robot is brought to stop and then take respective moves that is required to avoid the obstacle. Once the obstacle is cleared the robot again stops and take required degree of rotation facing towards the goal point and then move towards the goal [14]. Figure 7 shows the general actions performed by the robot.
Fig. 6 Simple obstacle avoidance
Path Planning of E-puck Mobile Robots Using Braitenberg Algorithm
145
Fig. 7 Actions performed by the robot
Various types of situations are considered while programming for the robot’s obstacle avoidance pattern. The robot will act differently for different types of obstacles. Although the robot is capable of overcoming obstacles, it also has the probability of infinitely repeating the same path without making any actual progress towards the goal point. Therefore, it is very important to consider such situations by observing the obstacle patterns where the robot is more likely to fall into such loops. There are various ways to implement path planning on a robot such as, Potential field [15] and Navigation function (NF1) [16]. In potential field a virtual potential field is created around the robot. The idea behind it is the goal point will have an attractive field so that the robot will follow the field to reach the goal point. In simple terms it moves like a ball rolling downhill. But for this paper a different approach is made. Here initially a goal point will be set. After this the robot will rotate in its starting position to face towards the goal point. Then the robot will check its current location and by using the co-ordinates of the current location and final destination a time will be calculated that is required by the robot to reach the goal point. Then the robot will start moving forward for the calculated amount of time. If it encounters any obstacle then the robot will perform the required sets of instructions to overcome it and after this the robot will again rotate towards the goal point and calculate the amount of time required to reach and then move forward.
4 Simulation and Experiment For the simulation of path planning using e-puck robot a free and open-source software “webots” is used [17]. In this platform programming languages like C, C++,
146
B. J. Gogoi and P. K. Mohanty
Fig. 8 E-puck robot in the simulation environment
python and MATLAB can be used. Webots is a prototyping environment where we can simulate our ideas to see how it will behave in a real-time environment. It can be used for Bothe research and educational purposes. Here we can create a virtual 3-D environment with all physical properties like mass, velocity, joints, friction coefficient, etc. There are many inbuilt nodes like objects, humans, vehicles, robots, etc. that we can use to create any environment according to our consideration. This platform also allows to build and import any robot or structure if it is not available in its library [18]. 3-D objects can be built outside of the webots platform using any CAD software. For the simulation of this paper e-puck robot is used. This robot weighs 150 g, it has a battery, 2 stepper motors, 8 infrared sensors that can measure range up to 4 cm, a camera with a resolution of 640 × 480, 3 omni-directional microphones, an accelerometer, a gyroscope, 8 red LEDs on the long, 1 green LED on the body, speaker, switch and remote control. Figure 8 shows the look of e-puck robot in the simulation environment. Many different types of environments are used to test the path planning of the e-puck robot using Braitenberg vehicle. Observing how the robot behaves in various situations is important to update and modify the robot’s behaviour. For the simulation basically one e-puck robot, a rectangular arena and different types of obstacles are used. Initially only a goal point is set for the robot and then the robot will move on its own towards the final destination by overcoming any obstacle on its way. Simulation Environment-1 In this simulation simply a wall like obstacle is placed in front of the goal point. Figure 9 shows the path followed by the robot to reach its final destination. Simulation Environment-2 For simulation environment-2 more thick wall like obstacle is placed in front of the goal point. Figure 10 shows the path followed by the robot to reach its final destination. Simulation Environment-3 In this environment U-shaped obstacle is placed in front of the goal point. It is one of the obstacles where the robot keeps repeating
Path Planning of E-puck Mobile Robots Using Braitenberg Algorithm
147
Fig. 9 Environment-1
Fig. 10 Environment-2
an infinite loop without actually overcoming the obstacle. It gets stuck inside the U-shaped obstacle. So, to overcome these types of obstacles a different approach has to be made. At first the robot needs to detect that there is an L-shaped obstacle in front of it, as shown in the Fig. 11. Following steps can be used to overcome this hurdle: detecting the L shape >> then the robot needs to rotate left or right for 100°, depending upon the type of L >> after this move forward for an x amount of time >> and then stop and rotate left or right for 45° >> move forward. These steps are proved to be effective to overcome the U-shaped obstacle. The final path followed by the robot is shown in the Fig. 12. Simulation Environment-4 In this environment lots of small cylindrical obstacles are placed randomly in front of the goal point. It has been observed that the robot passed through all the obstacles without any problem. Figure 13 below shows the path followed by the robot.
148
Fig. 11 Detecting L shape by the robot
Fig. 12 Environment-3
Fig. 13 Path followed by the robot for environment-4
B. J. Gogoi and P. K. Mohanty
Path Planning of E-puck Mobile Robots Using Braitenberg Algorithm
149
5 Conclusion In this paper an approach is made for the path planning of mobile robotics. For a path planning it is very important that the robot has the ability to reach the goal point by avoiding all kinds of obstacles. The following points can be extracted from the experiment on the path planning method: 1.
2.
3. 4.
5.
Braitenberg vehicle approach is a very simple and yet effective for obstacle avoidance. Therefore, for this paper Braitenberg vehicle is implanted for the path planning of mobile robot. Here the webots environment is used for the simulation of mobile robot. The robot is tested under various environments to check its response and efficiency to reach the final destination and it has been found that the robot is capable to reach its goal by avoiding the obstacle on its way. A new technique was proposed for the robot to reach the goal point using the coordinates of current destination and goal point. Rather than a conventional Braitenberg vehicle, with a little adjustment a different approach was made for the obstacle avoidance. It was found that the robot was able to detect and overcome all shaped objects to reach the goal point. In future dynamic obstacles may be considered in place of static obstacles and multiple robots instead of single robots.
References 1. Bemporad A, Marco MD, Tesi A (1997) Wall-following controllers for sonar-based mobile robots. In: Proceedings of 1997 IEEE International conference on decision and control, vol 3, pp 3063–3068 2. Kim C, Kim Y, Yi H (2020) Fuzzy analytic hierarchy process-based mobile robot path planning. Electronics 9(2):290. https://doi.org/10.3390/electronics9020290 3. Sun L, Lin R, Wang W, Du Z (2011) Mobile robot real-time path planning based on virtual targets method. In: 2011 Third International conference on measuring technology and mechatronics automation. https://doi.org/10.1109/icmtma.2011.429 4. Patle BK, Babu LG, Pandey A, Parhi DRK, Jagadeesh A (2019) A review: on path planning strategies for navigation of mobile robot. Defence Technol. https://doi.org/10.1016/j.dt.2019. 04.011 5. Milos S (2007) Roadmap methods vs. cell decomposition in robot motion planning. In: Proceeding of the 6th WSEAS international conference on signal processing, robotics and automation. World Scientific and Engineering Academy and Society (WSEAS). p 127e32 6. Choset H, Burdick J (2000) Sensor-based exploration: the hierarchical generalized Voronoi graph. Int J Robot Res 19(2):96e125 7. Sung I, Choi B, Nielsen P (2020) On the training of a neural network for online path planning with offline path planning algorithms. Int J Inf Manage 102142. https://doi.org/10.1016/j.ijinfo mgt.2020.102142 8. Bhattacharya P, Gavrilova ML (2007) Voronoi diagram in optimal path planning. In: 4th International symposium on Voronoi diagrams in science and engineering (ISVD 2007). https://doi. org/10.1109/isvd.2007.43
150
B. J. Gogoi and P. K. Mohanty
9. Braitenberg V (1984) Vehicles. Experiments in synthetic psycology. The MIT Press 10. Braitenberg V (1984) Vehicles: experiments in synthetic psychology. MIT Press, Cambridge, MA. Archived copy. Archived from the original on 29 Jan 2010. Retrieved 18 Jun 2012 11. Yang X,·Patel RV,·Moallem M (2006) A fuzzy–Braitenberg navigation strategy for differential drive mobile robots. J Intell Robot Syst 47:101–124. https://doi.org/10.1007/s10846-0069055-3 12. Marzbali JM, Nikpour M (2004) Combination of reinforcement learning and braitenberg techniques for faster mobile robot navigation. In: International conference on robotics and automation engineering 13. Shayestegan M, Marhaban MH (2012) A Braitenberg approach to mobile robot navigation in unknown environments IRAM 2012, CCIS 330, pp 75–93 14. Yang X, Patel RV,·Moallem M (2006) A fuzzy–Braitenberg navigation strategy for differential drive mobile robots. Received 5 Mar 2004, Accepted 2 May 2006, Published Online 21 Sept 2006 15. Orozco-Rosas U, Montiel O, Sepúlveda R (2019) Mobile robot path planning using membrane evolutionary artificial potential field. Appl Soft Comput 77:236–251. https://doi.org/10.1016/ j.asoc.2019.01.036 16. Yuan F, Twardon L, Hanheide M (2010) Dynamic path planning adopting human navigation strategies for a domestic mobile robot. In: 2010 IEEE/RSJ International conference on intelligent robots and systems. https://doi.org/10.1109/iros.2010.5650307 17. Guyot L, Heiniger N, Michel O, Rohrer F (2011) Teaching robotics with an open curriculum based on the e-puck robot, simulations and competitions 18. Pacheco J, Benito F (2005) Development of a webots simulator for the Lauron IV robot. In: Lopez BZ (eds) Artificial intelligence research and development. IOS Press
Deep Neural Networks on Acoustic Emission in Stress Corrosion Cracking R. Monika and S. Deivalakshmi
Abstract Corrosion of the high-strength prestressed steel strands is considered one of the significant causes of the failure in civil infrastructures which leads to tremendous loss. The daunting challenge facing engineers is to identify the microcracking at an earlier stage and prevent the structures from deteriorating further. Hence, we need to monitor the condition of steel strands in real-time to avoid any fatal accidents. In this study, non-destructive techniques (NDTs), like acoustic emission, detect the crack defects in prestressed strands exerting tensile stress in the universal testing machine (UTM). Automated strand failure inspection process using deep neural networks (DNNs) in which acoustic signals are classified as crack or no-crack. The proposed neural network model showed a more precise qualitative prediction while training and testing. Keywords Acoustic emission · Prestressed strands · Deep neural networks · Python
1 Introduction The prestressing method is invented by French engineer Eugene Freyssinet in which high-strength steel wires has been employed to counterbalance the tensile stresses caused by the weight of the concrete beam in civil engineering structures like concrete bridges, nuclear reactor containment, and high-rise buildings. It is processed either in pre-tensioning of wires by anchoring and the concrete established around the tensioned wires or post-tensioning wires after the concrete member is cast [1]. Failure of steel strands caused due to pitting corrosion, hydrogen embrittlement, and stress R. Monika (B) Corrosion and Materials Protection Division, CSIR Central Electrochemical Research Institute, Karaikudi 630003, India e-mail: [email protected] S. Deivalakshmi Department of Electronics and Communication Engineering, National Institute of Technology, Tiruchirapalli 620015, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 G. Sanyal et al. (eds.), International Conference on Artificial Intelligence and Sustainable Engineering, Lecture Notes in Electrical Engineering 837, https://doi.org/10.1007/978-981-16-8546-0_14
151
152
R. Monika and S. Deivalakshmi
corrosion cracking (SCC) [2]. SCC is the major phenomena of material cracking under tensile stress or strain, anodic dissolution, or in aggressive environmental conditions which causes a reduction in load-bearing capacity that leads to severe degradation of structures [3]. To avoid such circumstances, real-time monitoring and detection of early warning of failure have to be done periodically using effective techniques to enhance the durability of structures. The existing non-destructive testing (NDT) techniques like ultrasonic, radiographic, magnetic flux leakage, and eddy current have some certain limitations of predicting the early detection of crack growth [4]. Compared with the existing methods, acoustic emission (AE) is identified as an advanced monitoring technique with less intrusive for real-time measurement in various applications such as degradation of materials, flow, and leakage of materials [5]. In ASTM E1316, AE is defined as a transient elastic wave produced by the rapid release of energy due to structural changes on the material. The AE signals have two broad classes: burst signals and continuous signals. The AE signals are typically nonstationary in which frequency and statistical characteristics are shifted with time. For field applications, AE is one of the optimistic non-destructive methods to detect SCC based on energy released in a discrete location in a material due to stress. Conventionally, AE analysis was carried out based on acoustic signal parameters in the time or frequency domain [6]. In an industrial environment, AE measurement has superimposed with external noise such as electric noise and electromagnetic interference that varies the acoustic emission pattern. The filtering of the noise in AE measurement is more complex, and generally, average filtering method was adopted. To bridge this knowledge gap in crack classification, a deep neural network process obviated the filtering complexity and automatically classifies the AE signal superimposed with noise. In recent years, the deep neural network becomes trending in AE technology because of high computation power and the accessibility of big data. A neural network which has multiple layers among input and output layer can be classified as deep architecture. Apart from the number of layers, a challenging aspect is to automate the construction of more complex features through every layer of neurons. It shows better performance than humans in various fields like computer vision, speech recognition, etc. The main aim of this work is to automate the identification of crack in prestressed strands from the AE signal using a binary classification problem. This study has conducted by converting AE signals into other dimensional feature spacing using time, spectral domain statistical features, and finally classifies the non-linear instances in the feature space by using deep neural networks.
Deep Neural Networks on Acoustic Emission in Stress …
Pre-Amplifier
153
Amplifier
AE Sensor
NI-DAQ System
Acquired AE signal
Fig. 1 Acoustic emission measurement system
2 Experimental Setup and Working 2.1 Acoustic Emission System AE technology is a real-time NDT technique, which has been widely used for detecting in-depth in material deformation and corrosion processes. AE signal generates due to energy release within the material which undergoes localized physical changes. The acoustic wave propagates in all directions and reach the surface of the material. The sensor fastened on the surface of the test specimen translate stress waves into an electric signal and it amplified using low-noise preamplifier. A high-speed data acquisition system used to acquire the sensor signal is shown in Fig. 1.
2.2 Composition of Prestressed Strands IS 6003 (indented wire for prestressed concrete) is of the cylindrical rod has a nominal diameter of 4, 5, 6, 7, or 8 mm. Seven ply prestressed strands are formed by six wires whirled around a central wire combined to form a helical arrangement as shown in Fig. 2. Table 1 gives the mechanical properties of prestressed strands. The sample is subjected to a tensile test using universal testing machine (UTM) at a constant elongation rate of 10–2 S−1 . Acoustic emission activity generated during tensile testing was recorded using an acoustic emission sensor and measurement system.
154
R. Monika and S. Deivalakshmi
Fig. 2 High tensile 7 ply prestressed wire strand
Table 1 Mechanical Properties of prestressing wire strand Material Composition (wt. %) IS 6003
S = 0.05 max
P = 0.05 max
Mechanical properties Nominal Tolerance Tensile Proof (mm) (mm) strength stress min min (0.2%) 4.00 5.00
± 0.05
Min% elongation on 200 mm GL
Min. Coil Nominal bends diameter weight No. (m) approx Radius (mm)
175
148.7
3.0
3 (12.5)
160
136.1
4.0
3 (15.0)
15
98.7 154.0
2.3 Acoustic Emission Measurement System In this work, the test specimen of prestressed wire is properly fixed to the load cell in a universal testing machine. The acoustic emission recording system consists of a Kistler 8152B111 acoustic emission sensor, Kistler 5125 preamplifier, NI PXIe5105/12-bit 60MS/s, eight-channel high-speed data acquisition digitizer card, PXI 8115 dual-core Intel core i5-2510E controller that operated by windows 7 64 bit OS. Application software was developed in LabVIEW 2012 for acquiring the AE spectra at the sampling frequency of 1 MHz in the time domain. The acquired data are stored in CSV file format in a solid-state drive (SSD) inbuilt in the measurement system for further data processing and to display in graphical format.
Deep Neural Networks on Acoustic Emission in Stress …
155
3 Proposed Methodology In machine learning, a large amount of data used in training increases the computational cost and overfitting the model that hinders the accuracy as well as classification errors. Noise in data can result from errors in measurements that hinder the machine learning algorithms. In the case of AE data, the raw data are convoluted due to many physical phenomena occur in wave propagation. Herein, AE propagated in the prestressed strand in all direction and attain complexity as the wave travels through the steel strands. When an AE wave originated from the interior of the material reaches a surface, some of the waves will be reflected and some transmitted. Since the wave path and angle of the incident on the material surface are uncontrolled nature, the wave reaching the sensor is composed of different AE sources, their multiple reflections and background noise. Hence, sensor captured AE spectrum is a complex pattern that alters the physical parameter of data such as count, energy, amplitude, and rise time. Figure 3 shows a typical AE event without background noise and real data captured from an acoustic emission sensor that combines electrical noise of measuring equipment and environmental noise. In the machine learning model, these raw data will not provide any useful information to the network and lead to overfitting and restrain the convergence of model parameters in training. To overcome this problem, it is needed either dimensionality reduction or regularization technique. The steps involved in the proposed methodology are signal acquisition, selection, and extraction of features, preprocessing, training, testing, and validation. The flow diagram of crack detection in the prestressed strands using the deep neural network is given in Fig. 4. Training a neural network on large dataset improves the proposed model into more proficient. Hence, data obtained in each experiment are divided into subsets of every 2000 data points in measurement and labeled accordingly “crack” and “no-crack.”
Fig. 3 AE signal: a AE parameters [5], b real AE spectrum
156
R. Monika and S. Deivalakshmi
Data PreProcessing
Label data
Read data files
Feature extraction
Neural network architecture
Downscale features
Model Architecture
Test unknown data
Validation
Fig. 4 The sequence of steps involved in the present work
3.1 Features Extraction in ML Feature extraction is a popular method of dimensionality reduction, thereby input raw data are transformed into lower-dimensional vectors. A raw dataset contains high volume in dimension needs higher computing resources for data analysis. Feature extraction process extracts the significant characteristics or fingerprint of the dataset and rejects some irrelevant attributes. Accordingly, dimension reduction of training data enables the speed of the learning process in machine learning. Acoustic emission data were acquired with a higher sampling rate of about 2–5 MHz. The raw AE data were subjected in a distinct feature extraction based on three domains such as statistical, temporal, and spectral domain [7]. The statistical features represent trends and inherent pattern in large data and these are more scientific about decision-making. AE data contain large background noise and statistical features are not affected too much by noise or distortions. Spectral features depend on the features of the signal in the frequency domain such as power spectral density and spectral centroid. Fourier transformation is employed to transform signal in the time domain into the spectral form. Temporal features are easy to extract such as the autocorrelation and zero crossings. These parameters are used to characterize probability densities and the shape of the signal spectrum. The list of 12 features used in our model is given in Fig. 5.
3.1.1
Statistical Features
Statistical features consider in this work are mean, variance, standard deviation, kurtosis, and skewness. The arithmetic mean is a number that implies the most potential value from a probability distribution. It is a measure of central tendency and calculated by dividing the sum of variables to the number of variables in the dataset from the average value. To make the variable a positive value, it calculates
Deep Neural Networks on Acoustic Emission in Stress …
157
Mean
Standard deviation
Statistical Domain Variance
Spectral centroid
Frequency Domain
Skewness
Kurtosis
Maximum frequency
Power Spectral density
Acoustic emission raw data
Temporal Domain
Zero crossing
TimeFrequency Domain
Energy
Maximum amplitude
Wavelet transform
Fig. 5 Features used for building a neural network
by squaring the difference among variables in a dataset and divide by the average. Skewness is an asymmetrical distorted or normal distribution curve in a statistical measure from the dataset. Kurtosis is a tailed-ness of a probability distribution of the given dataset of random variables. The higher value of kurtosis indicates more of the variance that is due to intermittent extreme deviations, as against to frequent modestly sized deviations. Standard deviation indicates the spread of the dataset with its mean and calculates the square root of the variance.
3.1.2
Spectral Features
Acoustic emission spectrum contains a spectrum of many different frequencies, each with its amplitude and phase. AE spectrum obtained is processed into the frequency domain employing a discrete Fourier transform [8] given by the Eq. (1). xk =
n−1
Xi e
− j2πik n
for k = 0, 1, 2, ..n − 1
(1)
i=0
Fast Fourier transform (FFT) impart complex spectral data of a real part of the amplitude spectrum and an imaginary part of the phase spectrum as shown in Fig. 6. Spectral features like power spectral density (PSD), spectral centroid, and maximum frequency are considered for representing AE spectrum. Power spectral density is the energy variation as a function of frequency. PSD is trying to decompose
158
R. Monika and S. Deivalakshmi (a)
DFT
(b)
Fig. 6 FFT output of AE spectrum: a real part amplitude, b imaginary part phase angle versus frequency
input data into a series of sinusoidal waves of different frequencies. Spectral centroid defines the center of gravity of the magnitude spectrum that shows the change of sinusoidal frequency for signal and phase content over. The centroid is computed by the average frequency weighted by amplitudes, divided by the sum of the amplitudes. Wavelet transforms provide the time and frequency information simultaneously. The coefficients of wavelets of the signal are in a continuous form used as a feature for the given data segment [9].
W∅ [ j0 , k] = x, ∅ j0,k
3.1.3
N −1 1 =√ x m∅ j0,k [m] , (for all k) N 0 m=0
(2)
Temporal Features
Time-domain features commonly used to designate and discover the temporal shape of transient or conventional pattern in the time series. The first feature extract is the energy envelope of the signal to compute in decibels. Zero-crossing rate is a more complex feature that is defined as the number of times a waveform transition from positive to negative or back in a definite time frame. It is used to detect AE activity from background noise that shows a very high value. The zero-crossing rate of a signal can be determined from Eq. (3). ZC R =
∞ n=−∞
|sgn(s(n)) − sgn(s(n − 1))|
(3)
Deep Neural Networks on Acoustic Emission in Stress …
159
Table 2 Formula for feature extraction S. No.
Feature
Formula
1
Mean (μ)
μ=
2
variance (δ)
3
Skewness
4
Kurtosis
Kurtosis =
5
Power spectral density (Sg ( f ))
Sg ( f ) = lim
6
Zero-crossing
ZCR =
Sum of elements in the dataset Number of elements in the dataset (μ−ith element of the dataset) δ= 2∗Number of elements Mean−Mode Skewness = Standard deviation N (ith element in dataset−Mean)
Ci =
W f L k=1
W f L
9
T
k X i (k)
k=1
8
|G T ( f ) | 2
|sgn(s(n)) − sgn(s(n − 1))|
n=−∞
Spectral centroid (Ci )
N
Standard deviation4
T →∞
∞
7
i=1
X i (k) ∞
Energy (E s )
E s = x(n), x(n) =
Wavelet transform
W∅ [ j0 , k] = x, ∅ j0,k = N −1 √1 x m∅ j0,k [m] , N0 m=0
10
Maximum amplitude
max(x(t))
11
Maximum frequency
F(n) =
N −1
f [k]e− j
|x(n)|2
n=−∞
2π N
(for all k)
n K (n=0:N −1)
k=0
12
Standard deviation (σ)
Find max F (X −X ) σ = N −1
where sgn(s(n)) =1 if s(n) ≥ 0 0 if s(n) < 0 Table 2 presents the formula for extracting feature in AE signal.
3.2 Data Preprocessing Data preprocessing is a decisive step of manipulating raw data into a machinereadable format in the scientific approach. A real data acquired from sensor contain many unindented aspect such as background noises, failing required values, and
160
R. Monika and S. Deivalakshmi
impractical format that is unsuitable for machine learning models. To model the architecture, suitable data have to be selected. After the selection of the data, it is preprocessed by the following steps: i. ii. iii.
Formatting the data to make it suitable for ML easily parse it (structured format) Cleaning the data to remove the incomplete variables Sampling the data further to reduce running times for algorithms and memory requirements.
The sampled signal from the AE measurement system is divided into a fixed length window of size 2000 ms is imported into CSV file format. Acquired data format from the equipment is given in the Fig. 7.
3.3 Scaling of Features Data are preprocessed using min max scaler. Since, the range feature values vary in large scale. Especially, skewness has a wider range than kurtosis and energy. Most machine learning algorithms do not work correctly without normalization. Hence, feature values are scaled before input to the machine learning model. Figure 8 showing kurtosis, skewness, and energy features before and after scaling.
3.4 Model Architecture Figure 9 shows the proposed deep neural network (DNN) model for classification of AE signal. It contains a sequence of fully connected layers that take 12 features of every AE dataset as an input and classify it as crack or no-crack. DNN architecture is a multilayer feed-forward neural network that embodies an input layer followed by four hidden layers and an output layer. AE features extracted from data are provided to the input layer of the network, and the output values are determined successively at each layer of the network. At every hidden layer parameterized by a weight matrix, a bias vector, and an activation function. Hidden layer output value is calculated by multiplying the weight matrix by the input vector from the previous layer to each node as given in Eq. 4. y=∅
n
wi xi + bi
(4)
i=1
Here, w designates the weight matrix, x is input vector, b is bias, y is the output of a given node, and Ø is rectified linear unit (ReLu) activation function [10]. At the output layer is a classification layer which predicts the result from the probability value
Deep Neural Networks on Acoustic Emission in Stress …
Fig. 7 a Acoustic emission signal of the crack, b sampled data of acoustic emission signal
161
162
R. Monika and S. Deivalakshmi
Fig. 8 Plot showing features: a before and b after scaling (x1-kurtosis, x2-skewness, x3-energy)
Fig. 9 Deep neural network architecture
estimated from the previous layers. In binary classifier neural network, a sigmoid activation function is considered as an output layer. Hyperparameters are configuration variables [11] that are fine-tuned during the training of the model. On every iteration, these values are updated toward the global minimum in the loss function. The model hyperparameters used for training the model include the dataset size, size of each data point, etc., is given in Table 3.
4 Results and Discussion The deep neural network model was used with 725 datasets contain two classes, out of which 80% is given for training of 580 samples, 20% is given for cross-validation
Deep Neural Networks on Acoustic Emission in Stress … Table 3 Model hyperparameters
Table 4 Training statistics
163
Dataset size
725
Size of each data point
2000
Number of features
12
Test and train data split
80% and 20%
Number of classifications
2 (crack or no-crack)
Number of layers
6
Number of hidden layers
4
Activation functions
Rectified linear unit (ReLu) and sigmoid
Loss
Binary cross-entropy
Optimizer
Adam
Batch_size
32
Epochs
500
Size of the testing data
74
Model
Number of samples
Training
580
Cross-validation
145
Testing
74
of 145 samples, and 74 samples are given to find the test accuracy of the model. The total number of samples used is 799 shown in Table 4.
4.1 Confusion Matrix A confusion matrix is a summary of predictive results on a machine learning classification model for the case of two or more classes [12]. The number of correct and incorrect predictions are summarized with count values. Table 5 is a 2 × 2 table contains four combinations of predicted and true values such as true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) on a set of test data in which the true values are known. It gives errors being made by a classifier, and significantly, the types of errors that are being made. Table 5 Confusion matrix positive
Actual value
Predicted values Positive class
Negative class
Positive class
True positive (TP)
False negative (FN)
Negative class
False positive (FP)
True negative (TN)
164
R. Monika and S. Deivalakshmi
Table 6 Confusion matrix for cross-validation
Table 7 Model training summary
Actual value
Positive (Predicted)
Negative (Predicted)
Training
Positive
57
1
Negative
1
86
Testing
Positive
49
0
Negative
0
25
Performance measure
Relation
(%) for training
Accuracy
TP+TN TP+TN+FP+FN TP TP+FN TP TP+FP
98.62
Recall Precision F-measure
98.27 98.27 98.27
4.2 Model Summary for Training and Testing As the features were extracted from AE in temporal, spectral, and time–frequency domain with a time window of 2 ms and with a sampling frequency of 1 MHz, there were 2000 samples for each window. After feature extraction and selection, signal processing and classification model performance are determined using additional metrics. Cross-validation shows 98.62% of accuracy from the trained model. The confusion matrix for cross-validation in training and testing is given in Tables 6 and 7. The performance of the model is evaluated by standard parameters as classification accuracy, precision, recall, and F-measure. Accuracy of the model attains 98.62%, and precision and recall show 98.27% in training. Table 7 shows the evaluation result of the model.
4.3 Accuracy and Loss When an entire dataset is passed forward and backwards through the neural network exactly one time, it is called an epoch. If the dataset is large enough, the algorithm cannot be received at once, it must be subdivided into mini batches. Batch size is the total number of training data present in a single mini batch. The number of iterations is equivalent to the number of batches needed to complete one epoch. The optimal number of epochs is determined by checking the loss and accuracy of the model in training and validation data. Here, there are 500 epochs with batch-size of 32 data. Figure 10 shows the model accuracy and loss value as a function of epoch number for training, validation, and testing data. It indicates that the number of epochs around
Deep Neural Networks on Acoustic Emission in Stress …
165
Fig. 10 Accuracy, loss versus epochs for the training, cross-validation, and test data
15 shows 97% of accuracy for both training and testing data. However, the training data achieve its maximum accuracy at 48 of the epoch number and after that remain constant. The test accuracy is better than training accuracy. It indicates that the proposed model has a broader perspective. In cross-validation, the model achieves maximum accuracy of 98% at 75 of the epoch number. As the graph says that, both training, validation, and testing errors reduce as the number of epochs increases.
4.4 Comparison of Algorithms A comparative analysis was carried out for different binary classification models with the proposed DNN model. All datasets and their 12 extracted features in the training set are constant in each model. Here, the accuracy metric is considered to compare the proposed model with other standard classifiers. Comparing the five algorithms shows that in particular support vector machines (SVMs) [13] are forecasting 73.793% accuracy, logistic regression [14] shows 75.86%, naïve Bayes [15] is predicting 93.79%, KNN [16] is predicting 96.55%, decision tree [17] shows a marginal improvement of 96.5517% over KNN, and the proposed model shows 100% for example set of naïve Bayes is predicting 90% while random forest predicts
166
R. Monika and S. Deivalakshmi
SVM Logistic regression Naive bayes Decision tree KNN Model 0
20
40
60
80
100
Fig. 11 Comparison of machine learning algorithms
88% in 725 data in a dataset. The above comparison result of the proposed model is illustrated in Fig. 11.
5 Conclusion A supervised deep neural network with binary classifier was successfully employed to detect crack and no-crack from AE signal. Twelve distinct features were extracted from the raw AE data generated in the tensile test of the prestressed strand. The architecture was modeled with four hidden layers in between an input and output layer. Rectified linear unit (ReLu) and sigmoid activation functions are employed in hidden layers and output layer, respectively. The model algorithm was coded in Python language and TensorFlow as backend. In-built libraries such as SciPy, Seaborn, Keras, and Sklearn are used to construct a neural network model. The model was trained with 725 data samples, assigning 20% to cross-validation. An accuracy of 98.62% was obtained on cross-validation data which is of 145 sizes. By the above accuracy, we can deduce that the efficiency of the model is more. The proposed DNN model was compared with five known classifiers and it was found that it outperforms in the two-class problem. Acknowledgements The authors thank Director of CSIR-CECRI and Director of NIT-T for providing us the facilities.
Deep Neural Networks on Acoustic Emission in Stress …
167
References 1. Eugene F (1943) Method for tensioning reinforcements. US Patent No. US2579183A 2. Poursaee A (2016) Corrosion of steel in concrete structure. Woodhead Publishing, Elsevier Ltd. ISBN: 978-1-78242-381-2 3. Barnartt S (1962) General concepts of stress-corrosion cracking. Corrosion 18(9):322–331 4. NCHRP Project 10-53 (1999) Non-destructive methods for condition evaluation of prestressing steel strands in concrete bridges final report phase I: technology review 5. Gholizadeh S et al (2015) A review of the application of acoustic emission. Struct Eng Mech 54(6):1075–1095 6. Nesvijski E, Marasteanu M (2006) Spectral analysis of acoustic emission of cold cracking asphalt. NDT.net 11(1) 7. Segreto T et al (2015) Feature extraction and pattern recognition in acoustic emission monitoring of robot assisted polishing. Procedia CIRP 22–27 8. Cooley J et al (1969) The finite Fourier transform. IEEE Trans Audio Electroacoust 17(2):77–85 9. Sheng W et al. (2018) Wavelet packet transform-based feature extraction for acoustic emission pattern recognition. In: 9th European workshop on structural health monitoring 2018 10. Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. ICML 11. Probst P et al (2019) Tunability: importance of hyper parameters of machine learning algorithms. J Mach Learn Res 20:1–32 12. Stehman SV (1997) Selecting and interpreting measures of thematic classification accuracy. Remote Sens Environ 62(1):77–89 13. Cortes C et al (1995) Support-vector networks. Mach Learn 20(3):273–297 14. Tolles et al (2016) Logistic regression relating patient characteristics to outcomes. JAMA 316(5):533 15. Mullachery V et al. Bayesian neural networks. https://arxiv.org/ftp/arxiv/papers/1801/1801. 07710.pdf 16. Altman NS (1992) An introduction to kernel and nearest-neighbour nonparametric regression. Am Stat 46(3):175–185 17. Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106
Device for Position Tracking System with GPS for Elderly Person’s Health Aspect to Make Them Equally Accessible in the Developmental and Monitoring Process in the Society Shyamal Mandal, Samar Jyoti Hazarika, and Ranjit Sil Abstract India is the one of the world-largest population in the world. Out of 15– 25% people at the age of 65 years loss their memory due to old age. People detected as Alzheimer’s patient sometimes are not able to reach their destination. Device for the position tracking with GPS system may help the parent/caretaker of the patient to identify their location. The system includes a GPS modem that continuously tracks the patient’s location in the form of latitude and longitude along with the heartbeat and body temperature as physical parameter. With the help of the sensors, the data will be sent to a microcontroller interfaced to a GSM modem. In this paper, an attempt has been made to develop a device with tracking system including GPS as well as to collect the health status of the old person. This device may help the country adopting a policy for the elderly person’s proper health care and management. Keywords Position · GPS · Energy · Tracker · Human rights · Society · Alzheimer
1 Introduction Position tracking system is a hardware-based application device that will identify the individual location of a person. With the help of GPS system, it will generate latitude and longitude to the microcontroller. A space-based network of orbiting satellites that provide exact details of its location and time information back to earth in all weather conditions by means of GPS system is used to collect the information. It can be accessed from anywhere on the Earth other than unobstructed line of its
S. Mandal (B) Department of Biomedical Engineering, North Eastern Hill University, Shillong, Meghalaya 732202, India S. J. Hazarika Department of Energy Engineering, North Eastern Hill University, Shillong, Meghalaya 732202, India R. Sil Department of Law, North Eastern Hill University, Shillong, Meghalaya 732202, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 G. Sanyal et al. (eds.), International Conference on Artificial Intelligence and Sustainable Engineering, Lecture Notes in Electrical Engineering 837, https://doi.org/10.1007/978-981-16-8546-0_15
169
170
S. Mandal et al.
sight. GPS was initially developed for the purpose of monitoring military navigation but now anyone with the GPS device can able to receive the radio signals that are broadcasted by the satellites. The GPS technology has also been brought to the mobile phone/laptop/computers to that help in tracking one’s location. The most important application for the GPS is satellite navigation in aircrafts, vehicles, and ships. This device will also update its location as soon as the user changes his/her position. This type of receiver can able to pinpoint its current location using a process called trilateration [1]. Bag et al. line project that can monitor heart rate, blood sugar levels, human’s body temperature, and by using wireless communication technologies [2]. National collaborating center for mental health, A NICE-SCIE guideline on supporting people with dementia and their careers in health and social care [3]. Brodaty et al. designed device for dementia patients [4]. Aziz et al. have design real-time healthcare monitoring system for monitoring blood pressure with GPS system [5]. Saranya et al. have designed tracking device for Alzheimer’s patient and monitor them via MiWi [6]. Protopappas et al. have designed an information system for screening, management, and tracking of demented patients [7]. The authors in this work have demonstrated the designing of device for position tracking system with GPS for elderly person’s positions and health. The fabricated device is tested with different volunteer, and they were standing in different places. A tracking system has been furnished in detail to understand the position and health aspect of elderly person.
2 Methods and Materials All the materials were procured from local market and online from Amazon and Flipkart. All the components were used after testing their functionality.
2.1 Hardware Requirements The following are the hardware requirement for the system: a. b. c. d. e. f. g. h. i. j.
Battery Transformer (230-12 V AC) Voltage Regulator (LM7805/7809) Filter Rectifier Microcontroller LCD (16 × 2) Heartbeat Sensor Temperature Sensor GPS Module (Arduino Uno (ATmega328))
Device for Position Tracking System with GPS for Elderly …
k. l. m. n.
171
GSM Modem IN4007 LED Resistors.
2.2 Fabrication of Device for Position Tracking The proposed system explains the interconnection relationships among the main system block and main blocks. Mainly, the proposed system aims to cover an endto-end smart health application that can be build-up from two functional building blocks. However, the main function of the first building block is to gather all sensory data that are related to the monitored persons, whereas the second block functions are to store, process, and present the resulted information of this stage to the monitoring person. Figure 1 shows that architecture of the system. The microcontroller connected with GPS system and GSM modem. Temperature sensor and heart rate sensor are also connected with microcontroller. All the information come into microcontroller and microcontroller distribute information to other devices. Temperature sensor and heart rate sensor attached with elderly person and monitoring system installed in a control room and all are connected through GPS system. Information regarding health status and position of any elderly person we can collect from controlled room. Figure 2 shows all the connectivity of different devices, and Fig. 3 is circuit diagram of position tracking system.
Fig. 1 Basic architecture of the system
172
S. Mandal et al.
Fig. 2 Block diagram of the system
Fig. 3 Circuit diagram of position tracking system
3 Results and Discussion The device was tested in North Eastern Hill University premises, and volunteers were placed at different distance. Volunteer-1 placed in 100 m distance, volunteer-2 placed in 200 m distance, volunteer-3 placed in 300 m distance, volunteer-4 placed in 400 m distance, volunteer-5 placed in 500 distance. Temperature sensor and heart beat sensor were fixed in their body for continuously monitoring their body temperatures and heart bests. Table 1 shows that in control room all the volunteer’s data was available with different parameters in tabular forms. We have calculated error for original distance and measured distance of different volunteers, and it was found around 2 m. It is mean the volunteers location which we have tracked its maybe
Device for Position Tracking System with GPS for Elderly …
173
Table 1 Different parameters of different volunteers placed at different distance S. No. Name
Heart rate (beats Body Original distance Monitor display per minute) temperature (°C) (m) distance (m)
1
Volunteer-1 76
36
100
96 ± 2
2
Volunteer-2 70
37
200
195 ± 2
3
Volunteer-3 80
38
300
297 ± 2
4
Volunteer-4 74
35
400
394 ± 2
5
Volunteer-5 75
36
500
495 ± 2
somehow ±2 m away from exact location. All data was collected via GSM system, and it was connected with GPS systems for tracking the positions of volunteers. The tracking device will help the society/parent and doctors to manage the old age/ Alzheimer’s patients for primary care settings, i.e., anyone may monitor the health parameter like body temperature, location of the person/patients, and heartbeat. If the information pertaining to the device is effectively collected by the expert/ neurologist/scientist in such a way that all patients can receive common high-standard care services and analysis the occurrence of the problems. The care of elderly person is very vital in the society, their right to health and taking care of in a family and society is an equally important like others. It is an integral part of right to life. Most importantly the elder person suffering from Alzheimer are getting deprived of proper maintenance. Such kind of device may help the country in policy-making process to incorporate proper management of elder person’s health aspect to make them equally accessible in the developmental and monitoring process in the society.
4 Conclusion The device is easy to use and also very cost-effective which may turn out to be a lifesaver. A test run trail has been done in laboratory scale to check the parameter with mobile compatibility, though mobile GPS the location was identified and also the physical parameters like body temperature and heartbeat. The device is providing reliable effective application for real-time health monitoring and tracking. The merit of this project relies on two factors: tracking of Alzheimer, mental, and motion patients could be benefited from this system with health monitoring; secondly, GPS technologies used to avoid complex connections like wires which somehow may limit the patient mobility.
174
S. Mandal et al.
References 1. Balaji S, Raju R, Sandosh KSP, Ramachandiran R (2018) Smart way tracking to identify individuals location using android system with GPS. Int J Pure Appl Math 119:9–14 2. Bag S, Bhowmick A (2017) Smart health care monitoring and tracking system. Int Res J Eng Tech 04:3085–3088 3. National Collaborating Centre for Mental Health UK (2007) Dementia: a NICE-SCIE guideline on supporting people with dementia and their carers in health and social care. British Psychological Society 4. Brodaty H, Pond D, Kemp NM, Luscombe G, Harding L, Berman K, Huppert FA (2002) The GPCOG: a new screening test for dementia designed for general practice. J Am Geri Soc 50:530– 534 5. Aziz K, Tarapiah S, Smile SH (2002) Smart real time healthcare monitoring and tracking system using GSM/GPS technologies. In: MEC international conference on big data and smart city 6. Saranya S, Jayarin PJ (2017) An efficient tracking device for alzheimer patient using MiWi. Int Res J Eng Tech 04 7. Protopappas V, Tsiouris K, Chondrogiorgi M, Tsironis C, Konisiotis S, Fotiadis DI (2016) ALZCARE: an information system for screening, management and tracking of demented patients, vol 16. IEEE, pp 5364–5367
Garbage Monitoring System Using LoRa Technology Amarjeet Singh Chauhan, Abhishek Singhal, and R. S. Pavithr
Abstract In present, garbage is the main problem and how to stop spreading garbage from the outside of dustbins. How can we easily monitor and collect the garbage from the dustbins time to time? In this project, we have developed an automated garbage monitoring system using LoRa technology. LoRa technology provides long-range connectivity between dustbin (node) and gateway. A LoRa garbage node, which is fix on the dustbin. Multiple nodes located at different locations are connected with a common gateway. The ultrasonic sensor connect with LoRa node measures the level of garbage inside the dustbin and pushes sensors data on the server through LoRa gateway and also sends the notification (e.g., email, simple text message) to authorized personal when dustbin is full. LoRa nodes also provide an indication about garbage level in the dustbins with the help of glowing led, e.g., red led shows garbage bin is fully filled; yellow led shows garbage bin is half filled, and green led shows garbage bin is empty. Keywords LoRa · LoRa garbage node · Ultrasonic sensor · Garbage bin
1 Introduction Nowadays, a huge amount of unwanted garbage waste has spread outside the dustbins and flew garbage here and there, become dirty environment. For resolving the above problem, introduce a garbage monitoring system using LoRa technology, which is totally based on the popular technology, Internet of Things and the latest LoRa technology which is basically known for long-range connectivity. For good lifestyle, cleaning is must, and it begins with garbage collection. As all we know that the population of our country is increasing day by day. Garbage waste overflow from the dustbins results, dirty and an unhealthy environment. From the dirtiness, different types of harmful and dangerous diseases spread which is harming our body and downgrade living standard. So, by the help of this paper, we A. S. Chauhan (B) · A. Singhal · R. S. Pavithr IoT Lab, Department of Physics and Computer Science, Dayalbagh Educational Institute, Agra, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 G. Sanyal et al. (eds.), International Conference on Artificial Intelligence and Sustainable Engineering, Lecture Notes in Electrical Engineering 837, https://doi.org/10.1007/978-981-16-8546-0_16
175
176
A. S. Chauhan et al.
introduce a garbage monitoring system using LoRa technology for both rural and urban areas which is acting as one of the innovatory projects to keep the environment clean. In this project, using LoRa technology which is helpful to provide longrange connectivity between dustbin (node) and gateway. LoRa garbage nodes (in which ultrasonic sensors connect with Arduino LoRa shield) are placed at different locations. This system monitors the garbage level from the dustbins which are placed in different areas and posted their status time to time on the server. Ultrasonic sensor, which is used in this system, fixed at the top of dustbin to measure the garbage waste level. LoRa node sends sensor data on a server through LoRa gateway and send notification (e.g., email, simple text message) to authorized person when dustbin will be full. LoRa garbage node provides manual indication to know about garbage level in the dustbins with the help of the glowing led for local people e.g., red led shows garbage bin is fully filled; yellow led shows garbage bin is half filled, and green led shows garbage bin is empty.
1.1 LoRa Technology LoRa is a radio frequency (RF) modulation technology, which cover long-range area and low power consumption. In urban areas, LoRa provides long-range communications up to five kilometers. In rural areas, it provides long-range communication up to 15 km. The main advantage of LoRa technology to capable for cover long-range communication up to 15 km. Multiple LoRa nodes can connect with a single gateway. The range of connectivity of LoRa depends upon the surrounding of that area. LoRa technology is an advance technology, which allowing data communication for long range with using low power [1]. LoRa is more flexible and could penetrate the deep indoors and cover long distance using low power consumption. All other technologies like Bluetooth and Wi-Fi have low-range connectivity and required high bandwidth for data communication, use high power. So, LoRa technology is more beneficial than those technologies. We can use LoRa technology in smart agricultural areas, in smart city projects and will be beneficial for long-range applications. In this project, we use LoRa technology, so its main specifications are as follows: • It has long range of coverage, greater than 10 km. • It provides secured network. • In this, only one gateway has good capacity to cover an area of 8 km to 12 km. The main thing in LoRa is frequency because it works on frequency. In India, the frequency band for LoRa is 865–868 MHz.
Garbage Monitoring System Using LoRa Technology
177
2 Literature Survey Inspection of garbage bins of different areas takes more time and money. Municipal corporation workers do not go daily for inspection of garbage bins results, garbage from garbage bin keeps spreading out. If we set up smart dustbins concept, got a problem of connectivity in the streets and open areas because there is no Internet connectivity to send sensor data to the server. For the smart dustbin concept, we need Internet connection. Hence, to reduce this problem, we use the latest IOT-based technology, LoRa technology, we can easily set up this system anywhere we want. For this project, we saw many research papers for gaining more knowledge about LoRa technology, so one of them [2] is the paper of Umber Noreen, Achene Bounceur, Laurent Clavier. They proposed emerging transmission technology which is related to IoT network. They have written in their paper about three basic parameters for the characteristics of LoRa technology are as follows: Code fee, spreading thing, and bandwidth. They presented in their paper about the impact of those three parameters. In LoRa technology, a gateway which is set up in a selected location where an Internet connection is good, and we can connect multiple nodes from a single gateway in range of 10 km and can easily monitor these nodes. In this project, node generally consists of sensors like the Arduino controller board for controlling the device functioning and using LoRa shield for connectivity which is placed on the Arduino. LoRa works on frequency so no need of Wi-Fi in dustbin area where we place our garbage LoRa nodes. We can easily monitor the dustbins garbage level and update their status on the server where ultrasonic sensor used for level detection. For further changes, in this project, we will add more sensors like gas sensor, which is used to detect harmful gases from our surrounding, Fire sensor used to detect the fire around or nearby dustbin area, information about fire is also send by this system automatically to authority of Municipal corporation on time, so they take immediate action on that. As per latest trend, we can check dustbin status on android application instead of Web pages.
3 System Architecture Garbage monitoring system using LoRa technology project is basically designed for resolving the problem of garbage which is spread outside the bins that result unhealthy environment and increasing pollution. So, this system helps us to keep the rural and urban areas clean. So basically, in this project, we use good sensors those work perfectly and gives perfect results. This dustbin node made by using Arduino LoRa shield, ultrasonic sensor, three smalls led. Arduino LoRa shield is powered by a 5 V DC adapter. LoRa gateway which is basically used for receiving data from the dustbin node. Ultrasonic sensor, which is used in this system, fixed at the top of dustbin to measure the garbage waste level. A dashboard which is made for showing the dustbin status and get notification
178
A. S. Chauhan et al.
(e.g., email, simple text message) when dustbin will be full. Figure 1 shows the architecture of this project which is given below. By using this figure, we can easily understand about this system. So, the above diagram shows the whole process of this project. In this project, we use a LoRa garbage node. In Fig. 2, ultrasonic sensor and three led (green, yellow, red) connect with Arduino LoRa shield and use power supply to give power to this Arduino shield. Ultrasonic sensor and three led are connected with Arduino LoRa shield and built a complete dustbin node. Figure 2 shows the block diagram of a single node of dustbin which is given below.
Fig. 1 Project architecture
Fig. 2 Block diagram of single dustbin node
Garbage Monitoring System Using LoRa Technology
179
Fig. 3 Arduino with LoRa shield. www.dragino.com
4 Hardware Used and Their Roles 4.1 Arduino LoRa Shield Basically, LoRa shield is known as a long-range transceiver which is placed on the Arduino (We can also use other controller boards like Arduino Mega, and Leonardo.). This LoRa shield allows to send data at low data rates and cover long ranges. In this project, the role of this Arduino shield is to collect data from sensors and send this data on the gateway. By using ultrasonic sensor, this LoRa shield collects the garbage level data and send it to the server through the gateway (Fig. 3).
4.2 Ultrasonic Sensor The role of this sensor is telling about the level of garbage bins. It has four pins. Ultrasonic sensor GND and VCC pins are connected to GND pin and +5 V pin, respectively, of Arduino. Trig pin and echo pin are connected with any digital pin of the Arduino LoRa shield (Fig. 4). To measure the distance, we use this given formula: D = (T × S)/2 Here, D = Distance the sound has traveled, T = Time, S = Speed of Sound.
180
A. S. Chauhan et al.
Fig. 4 Ultrasonic sensor. www.robu.in
Fig. 5 Dragino LoRa gateway. www.dragino.com
4.3 Dragino LoRa Gateway The role of this gateway is only getting the data from all nodes and pass the data to the server which is shown in data dashboard. This gateway is working like a receiver it works on frequency (e.g., In India, the frequency band for LoRa is 865 MHz to 868 MHz) (Fig. 5).
4.4 Power Supply In this project, we use +5 V DC power supply to give power to this node. And we can also use rechargeable battery which is good to give power all the time (Fig. 6). Fig. 6 Lithium-ion rechargeable battery. www. dragino.com
Garbage Monitoring System Using LoRa Technology
181
Fig. 7 Small Led. www. robu.in
4.5 Led In this project, led is used for the manual indication about garbage present in the dustbins. By the help of three led, local people get to know how much garbage present in the dustbins. In this, red led shows garbage bin is fully filled; yellow led shows garbage bin is half filled, and green led shows garbage bin is empty (Fig. 7).
5 Working and Flowchart In this project, Smart garbge collection system using LoRa technology. It contains ultrasonic sensor (which is fix on the top of the garbage bin for measure the level of garbage) and LoRa client node. LoRa node sends sensor data on server through LoRa gateway. So, data on the dashboard updated time to time and automatically send a notification (e.g., email, message) to authorized person when the garbage bin will be full (Fig. 8).
6 Results At last, some of the result pictures of this project are given below. • Figure 9 shows the complete node for dustbin; LoRa gateway is also present in this picture. • Figures 10 and 11 show the online data on ThingSpeak which come from sensors time to time • Fig. 12 shows the notification alert when dustbin is full, an automatic notification sent to the authorized person phone.
7 Future Scope Initially, we set up garbage node in my college dustbins, and we successfully did this. In future, we will use this project in following government projects:
182
A. S. Chauhan et al.
Fig. 8 Working flowchart
Fig. 9 Working dustbin node with LoRa gateway
• We will use this project in Government SMART CITY project. • This project will be very helpful for the government project SWACHH BHARAT ABHIYAN.
Garbage Monitoring System Using LoRa Technology
Fig. 10 Online data
Fig. 11 Online data
183
184
A. S. Chauhan et al.
Fig. 12 Notification alert
8 Conclusion At last, we come to this conclusion that the main aim of this project is to keep the sanitation level in the villages and cities, form a hygienic area which is good for us. With the help of this, we can easily monitor the level of the garbage. The authorized person also got notification (e.g., email, simple text message) when dustbin will be full. Data collected from dustbin node are presented over a dashboard that is hosted on a Web server and used the LoRa technology to reduce the connectivity issue between garbage node and gateway server. This system is very useful for us if we used it properly. It saves time and money both. It is truly helpful for improving the quality of our surrounding. This is smart way of keeping our city clean and hygienic.
References 1. Ziouzios D, Dasygenis M (2019) A smart bin implementation using LoRa. In: 2019 IEEE 4th South-East Europe design automation, computer engineering, computer networks and social media conference (SEEDA-CECNSM) 2. Devalal S, Karthikeyan A (2018) LoRa technology-an overview. In: 2018 second international conference on electronics, communication and aerospace technology (ICECA) 3. SathishKumar N, Vuayalakshmi B, Jenifer Prarthana R, Shankar A (2016) In: IOT based smart garbage alert system using Arduino UNO, 2016 IEEE Region 10 Conference (TENCON) 1028– 1034 https://doi.org/10.1109/TENCON.2016.7848162 4. Parkash PV (2016) IoT based waste management for smart city. Int J Innov Res Comput Commun Eng 4(2):1267–1274
Garbage Monitoring System Using LoRa Technology
185
5. Chaware SM, Dighe S, Joshi A, Bajare N, Korke R (2017) Smart garbage monitoring system using internet of things (IOT). Int J Innov Res Electr Electron Instrum Control Eng 5(1):74–77 6. Folianto F, Low YS, Yeow WL (2015) Smartbin: smart waste management system. In: 2015 IEEE tenth international conference on intelligent sensors sensor networks and information processing (ISSNIP) 7. Noreen U, Bounceur A, Clavier L (2017) A study of LoRa low power and wide area network technology. In: 2017 international conference on advanced technologies for signal and image processing (ATSIP)
Optimal Controller Design for Buck Converter Fed PMBLDC Motor Using Emperor Penguin Optimization (EPO) Algorithm Deepak Paliwal and Dhanesh Kumar Sambariya
Abstract For this manuscript, authors wished for a EPO optimized PI controller design of buck-derived PMBLDC drive employing a current. multiplier control technique. Through classical method, the PMBLDC motor was controlled via the twicestructure control of the converter control and then for the speed control. In this work first, we designed DC controlling off drive and eliminated twice loops and reduce the total harmonic distortion. Next to improve dynamic response and speed regulation for motor. We proposed the PI control optimized by EPO algorithm for the PMBLDC motor powered by buck drive as two-object minimization for obtain best outcomes. Problem is considered as multi-objective optimization which is resolved using the weighted sum technique. Keywords Emperor penguin optimization (EPO) · PI optimization · Speed control · Power quality
1 Introduction Over recent years, having growing involvedness off actual-world trouble have set to the necessitate for superior evolutionary algorithms. They were used to achieve the best possible strategies for real-life engineering issues [1–4]. PMBLDC motors have a wide range of application potential such as electric car and appliances [5–10], spacecraft [11, 12], home devices [13, 14], healthcare services [15], expert system [16], and nonconventional systems [17, 18]. Usually, the PMBLDC engine drive is built into a singular AC source via the converter for medium and high power implementations, which lowers its running
D. Paliwal (B) · D. K. Sambariya University Department, EE, Rajasthan Technical University, Kota, Rajasthan 324010, India e-mail: [email protected] D. K. Sambariya e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 G. Sanyal et al. (eds.), International Conference on Artificial Intelligence and Sustainable Engineering, Lecture Notes in Electrical Engineering 837, https://doi.org/10.1007/978-981-16-8546-0_17
187
188
D. Paliwal and D. K. Sambariya
costs [19]. System pulls the harmonized current to its AC side that contributes to stronger THD and weak PF. Pf. Circuits have been used to resolve this issue [20]. So power factor correction converters are used for solving this problem [20–26]. Double PFC power converter types exist, notably current converter control and voltage source control. Due to the downside of higher tension on the switches of both the circuits, the constant voltage control system is missing [19, 27–29]. The voltage follower control mode lacks due to the disadvantage of higher current stress on the switches of converters. Recently, intelligent algorithm provides a tool for solving engineering optimization issues [30–35]. Seeing as the preceding decades, diverse soft computing techniques used to optimize controllers followed to the research objective. In this research, authors used EPO algorithm [36] optimum PI controller for obtain THD reduction with speed regulation of the PMBLDC driver as optimization method based on weighted sum methodology toward addressing multi-objective computation [37, 38]. The traditional control utilizing two loop controls can be seen in Fig. 1, and the anticipated job is as shown in Fig. 2. Whole manuscript consists of 5 segment. The difficulty wording is provided in segment 2, segment 3 reflects the configuration of its controller as well as the EPO algorithm. The entire outcomes are revealed in segment 4. At last, the summary of final manuscript was laid out with segment 5 accompanied with citations.
Fig. 1 Classical control strategy for the drive
Optimal Controller Design for Buck Converter Fed PMBLDC Motor …
189
Fig. 2 Proposed control strategy for the drive
2 Problem Formulation 2.1 Modeling for Converter System While key S get enabled, diode D acts for overturn polarization and the current stream during an inductor illustrated in Fig. 3a. For on condition and while key S is “closed,” the diode can operate in forward bias mode as spot in Fig. 3b. For this job, the parameter values are as L 0 = 2 µH, C0 = 1500 µF, and filter Cf = 7.5 µF. V0 = D.Vin
Fig. 3 Converter working a while switch S get activated, b while switch S get disabled
(1)
190
D. Paliwal and D. K. Sambariya
L 0 = V0 (1 − D)/(I L 0 ). f s
(2)
C0 = (1 − D)/(8L 0 f s2 )(VC0 /V0 )
(3)
2.2 Modeling for PMBLDC Motor The structure of its PMBLDC driver can be interpreted as a series of nonlinear equations which shown in (vx − i x .R − ex ) (L s + M) (P 2)(Te − Tl − B.ωr ) pωr = (J ) pi x =
(4)
(5)
pθ = ωr
(6)
ex = K b . f x (θ ).ωr
(7)
⎧ f a (θ ) = 1 ⎪ ⎪ ⎨ f a (θ ) = 1{(6/π )(π − θ )} − 1 ⎪ f (θ ) = −1 ⎪ ⎩ a f a (θ ) = {(6/π )(π − θ )} + 1
for 0 < θ < 2π/3 for 2π/3 < θ < π for π < θ < 5π/3 for 5π/3 < θ < 2π
(8)
3 Controller Optimization 3.1 Objective Function The optimal solution is calculated as a weighted sum strategy used to tackle multiobjective computation as a particular optimization problem. For this approach, each objective function is weighted as a priority basis and ultimately, each target is paired of one optimization problem. JMinimize =
i=2 i=1
wi .|Ji |
Optimal Controller Design for Buck Converter Fed PMBLDC Motor …
191
= w1 .J1 + w2 .J2 0.5J1 + 0.5J2 = 100
⎛ ⎞ 2 − I2 − I2 Irms 0 1 rms ⎠% + 0.5 ISEDC link Error = 0.5⎝ I1 rms
sim ∗ V (t) − Vdc (t)2 .dt = 0.5(THD)% + 0.5 dc T
(9)
0
Subjected To: w1 + w2 = 1 0
0 ≤ THD ≤ 3% ≤ Vdc∗ − Vdc ≤ 50
3.2 Emperor Penguin Optimization (EPO) Emperor penguin optimization (EPO) algorithm is a recent nature inspired algorithm which is based on huddling nature of emperor penguins to keep their body warm in Antarctica Region. This algorithm was proposed by G. Dhiman and V. Kumar in 2018. The EPO algorithm is illustrated in the following steps (Fig. 4): Generation and Establish Huddle Boundary Let us consider the 2-d, L-shaped polygon boundary for huddling progression. Wind Profile Around the Huddle Consider the wind profile as
Fig. 4 Huddling nature of emperor penguin
192
D. Paliwal and D. K. Sambariya
α = ∇β
(10)
G = β + i
(11)
where G = polygon plane function, β = velocity of the wind, α = gradient of β, x = current iteration, Z = time to find the best solution. Temperature Profile Roughly the Huddle The huddle temperature Z = 0 R > 1 if and R < 1; where defines the polygon radius. The temperature profile Z is calculated as follows:
Maxiteration Z = Z− x − Maxiteration 0, if R > 1 Z= 1, if R < 1
(12)
(13)
Space Calculation Between Penguins The emperor penguin update their position mathematically as follows − − → → −−→ − → −−−→ Q ep = Abs V ( A ). P(x) − D . Pep (x)
(14)
− → A = (M × (Z + Pgrid (Accuracy)) × Rand( )) − Z
(15)
− → − → Pgrid (Accuracy) = Abs ( P − Pep )
(16)
− → D = Rand( )
(17)
− → V ( A ) = ( K 1 .e−x / K 2 − e−x )2
(18)
− → − → − → − → x = Current iteration, A , D = Avoid collision vector, P = Best solution, P ep = − → Position vector of the penguin, V ( A ) = Social forces M = 2 (movement parameter) K 1 and K 2 = Control parameters, their value lie in the range of [2, 3] and [1.5, 2], − → respectively, Q ep = Distance. Updating of the Position of Mover To place the next position of an emperor penguin, the following equations are used (Fig. 5): −−→ − − → →− → P ep (x + 1) = P(x) − A . Q ep
(19)
Optimal Controller Design for Buck Converter Fed PMBLDC Motor …
193
Fig. 5 Flowchart of EPO algorithm
4 Simulink Outcomes The findings of the Simulink can be seen in Figs. 6, 7, 8, 9 and 10. To try out the output of the EPO integrated control, the results are evaluated and the results traditional PI controller used model. Figure 6 displays the dynamic performance of the drive for the DC linkage voltage. This indicates that the drive based on EPO-PI has fewer instance to set the required production voltage off the drive.
194
Fig. 6 Dynamic characteristic off drive for DC volt
Fig. 7 Unity step outcomes of the speed for the drive
Fig. 8 THD harmonic analysis of classical methodology
D. Paliwal and D. K. Sambariya
Optimal Controller Design for Buck Converter Fed PMBLDC Motor …
195
Fig. 9 Harmonic analysis of derived methodology
Fig. 10 Fitness cost of EPO-based topology
Figure 7 specifies system of unity step input’s response to speed. EPO-based controller project seems to have a poorer fixed instance and a smaller overshoot contrast with the traditional-PI includes system unit in Figs. 8 and 9 EPO shows the THD values of proposed schemes. The THD results in compared in table. At last, it shows the TTHD analysis of anticipated strategy. The TTHD outcome in compare via Table 1 specifies the controller put on standards for the control of the interior circuit Table 1 Optimized gains of controller ILC parameters Controller
OLC parameters
K p, ILC
K i, ILC
Controller
K p, ILC
EPO-PI controller
2.736
1.023
0.734
1.621
Classical-PI controller
–
–
0.185
1.850
196
D. Paliwal and D. K. Sambariya
Table 2 TTHD study of different method Scheme
Input current THD (%)
EPO-PI
1.850
Classical-PI (DC-link)
4.270
Classical-PI (no DC-link) [19]
10.550
Table 3 Assessment for relative outcomes of suggested schemes Controller
Settlement time DC-volt (s)
Speed control indices Stay time
Overshoot (%)
EPO-PI (DC-link)
0.0231
0.0451
02.181
Classical-PI (DC-link)
0.0691
0.0951
08.727
Table 4 Parameter calculation with EPO algorithm Algorithm
Parameters
Valuation
Emperor penguin optimization (EPO) algorithm
Search agents and generations
100
Temperature profile Z − → A Constant
[1, 1000] [−1.5, 1.5]
Function V ()
[0, 1.5]
Parameter M
2
Parameter K 1, K 2
[2, 3] and [1.5, 2]
and the manage to the outer loop of the proposed schematic. The THD attributes of the suggested schemes are contrasted to the system previous researches in Table 2. In Table 3, dynamic result of the algorithm schemes is measured. Table 4 displays toward simulation configuration for employed algorithm used for proposed function.
5 Conclusions This research proposed DC-link action of the PMBLDC driver and its optimization techniques by the EPO algorithm. In this research, we decrease the THD and acquired speed regulation of the engine by enhancing its dynamic output relative to the system published. The findings demonstrate the dominance of the proposed machine control strategy.
Optimal Controller Design for Buck Converter Fed PMBLDC Motor …
197
References 1. Gieras JF (2009) Permanent magnet motor technology: design and applications. CRC Press 2. Singh B, Singh S (2009) State of the art on permanent magnet brushless DC motor drives. J Power Electron 9:1–17 3. Xia C-L (2012) Permanent magnet brushless DC motor drives and controls. Wiley 4. Krishnan R (2017) Permanent magnet synchronous and brushless. CRC Motor Drives Press, DC 5. Barcaro M, Bianchi N, Magnussen F (2008) PM motors for hybrid electric vehicles. In: 2008 43rd International universities power engineering conference, pp 1–5 6. Chan CC (2007) The state of the art of electric, hybrid, and fuel cell vehicles. Proc IEEE 95:704–718 7. Chau K, Chan CC, Liu C (2008) Overview of permanent-magnet brushless drives for electric and hybrid electric vehicles. IEEE Trans Ind Electron 55:2246–2257 8. Chau K, Zhang D, Jiang J, Liu C, Zhang Y (2007) Design of a magnetic-geared outer-rotor permanent-magnet brushless motor for electric vehicles. IEEE Trans Magn 43:2504–2506 9. Lu J, Mallik A, Khaligh A (2017) Dynamic strategy for efficiency estimation in a CCMoperated front-end PFC converter for electric vehicle onboard charger. IEEE Trans Transport Electr 3:545–553 10. Nian X, Peng F, Zhang H (2014) Regenerative braking system of electric vehicle driven by brushless DC motor. IEEE Trans Ind Electron 61:5798–5808 11. Cao W, Mecrow BC, Atkinson GJ, Bennett JW, Atkinson DJ (2011) Overview of electric motor technologies used for more electric aircraft (MEA). IEEE Trans Ind Electron 59:3523–3531 12. Jiang X, Huang W, Cao R, Hao Z, Jiang W (2015) Electric drive system of dual-winding fault-tolerant permanent-magnet motor for aerospace applications. IEEE Trans Ind Electron 62:7322–7330 13. Hsiao H-C, Hsiao C-Y, Huang, Y-H, Chien Y-K, Zheng Y-W, Design and economical evaluation of small-capacity motor used in household appliances by Taguchi method. In: 2018 IEEE student conference on electric machines and systems, pp 1–6 14. Singh B (2014) Power quality improvements in permanent magnet brushless DC motor drives for home appliances. In: 2014 9th International conference on industrial and information systems (ICIIS), pp 1–1 15. Santhosh P, Vijayakumar P (2017) Performance study of BLDC motor used in wireless medical applications. Wireless Pers Commun 94:2451–2458 16. Patel SS, Botre B, Krishan K, Kaushal K, Samarth S, Akbar S et al (2016) Modeling and implementation of intelligent commutation system for BLDC motor in underwater robotic applications. In: 2016 IEEE 1st international conference on power electronics, intelligent control and energy systems (ICPEICES), pp1–4 17. Dursun M, Ozden S (2012) Application of solar powered automatic water pumping in Turkey. Int J Comput Electr Eng 4:161 18. Madichetty S, Pullaguram D, Mishra S (2019) A standalone BLDC based solar air cooler with MPP tracking for improved efficiency. CSEE J Power Energy Syst 5:111–119 19. Singh B, Singh S (2010) Single-phase power factor controller topologies for permanent magnet brushless DC motor drives. IET Power Electronics 3:147–175 20. Bhim Singh VB (2015) Power factor correction (PFC) converters feeding brushless DC motor drive. Int J Eng Sci Technol 7(3):65–75 21. Singh B, Bist V (2013) Power-quality improvement in PFC bridgeless SEPIC-fed BLDC motor drive. Int J Emerging Electric Power Syst 14:285 22. Narula S, Bhuvaneswari G, Singh B (2014) A PFC based power quality improved bridgeless converter for welding applications. In: 2014 6th IEEE power india international conference (PIICON), pp 1–6 23. Bist V, Singh B, Chandra A, Al-Haddad K (2015) An adjustable speed PFC bridgeless-SEPIC fed brushless DC motor drive. In: 2015 IEEE energy conversion congress and exposition (ECCE), pp 4886–4893
198
D. Paliwal and D. K. Sambariya
24. Singh S, Singh B, Bhuvaneswari G, Bist V (2016) A power quality improved bridgeless converter-based computer power supply. IEEE Trans Ind Appl 52:4385–4394 25. Anand A, Singh B (2018) Power factor correction in Cuk–SEPIC-based dual-output-converterfed SRM drive. IEEE Trans Industr Electron 65:1117–1127 26. Singh B, Anand A (2018) Power factor correction in modified SEPIC fed switched reluctance motor drives. IEEE Trans Ind Appl 54:4494–4505 27. Singh S, Singh B (2010) Voltage controlled PFC SEPIC converter fed PMBLDCM drive for an air-conditioner. In: 2010 Joint international conference on power electronics, drives and energy systems & 2010 power India, pp 1–6 28. Singh S, Singh B (2012) A voltage-controlled PFC Cuk converter-based PMBLDCM drive for air-conditioners. IEEE Trans Ind Appl 48:832–838 29. Singh B, Bist V (2013) DICM and DCVM of a PFC-based SEPIC-fed PMBLDCM drive. J Res 59:141–149. 30. Mirjalili S, Gandomi AH, Mirjalili SZ, Saremi S, Faris H, Mirjalili SM (2017) Salp Swarm algorithm: a bio-inspired optimizer for engineering design problems. Adv Eng Softw 114:163– 191 31. Yang X-S (2010) Nature-inspired metaheuristic algorithms. Luniver Press 32. Yang X-S (2014) Nature-inspired optimization algorithms. Elsevier 33. Mirjalili S, Dong JS, Lewis A (2020) Nature-inspired optimizers: theories, literature reviews and applications, vol 811. Springer 34. Kallannan J, Baskaran A, Dey N, Ashour AS (2018) Bio-inspired algorithms in PID controller optimization. CRC Press 35. Mirjalili S (2019) Evolutionary algorithms and neural networks 36. Dhiman G, Kumar V (2018) Emperor penguin optimizer: a bio-inspired algorithm for engineering problems. Knowl-Based Syst 159:20–50 37. Rangaiah GP (2009) Multi-objective optimization: techniques and applications in chemical engineering, vol 1. World Scientific 38. Paliwal D, Sambariya DK (2020) A novel control approach for buck converter based PMBLDC motor using BAT algorithm. In: 2020 4th International conference on electronics, communication and aerospace technology (ICECA), pp 159–164
Reducing Start-Up Delay During Churn in P2P Tree-Based Video Streaming System Using Probabilistic Model Checking Debjani Ghosh, Shashwati Banerjea, Mayank Pandey, Akash Anand, and Satya Sankalp Gautam Abstract Major multimedia streaming companies including Adobe, Netflix and Google utilize adaptive video streaming over HTTP for disseminating video contents. Adaptive video streaming requires video chunks to be encoded with different bit rates and stored at different content delivery network (CDN) servers. With the increasing number of consumers and requirement to store multiple versions of same video chunks, CDN service providers have to continuously increase their storage and network capacity. Also, in order to provide desired Quality of User eXperience (QoUX), the CDN servers have to be placed at many different geographic locations. Peer-to-peer video streaming is a good alternative where different peers collaborate to provide CDN-like services. However, dynamic leaving and joining of peers and requirement of VCR functionalities (rewind, forward etc.), make these systems difficult and non-trivial to reason about. In this paper, we have taken P2Cast, a most adopted tree-based video streaming overlay and attempted to capture the effect of early departure of parent peer during an ongoing video streaming session. We have modeled P2Cast as Continuous Time Markov Chain (CTMC) process and use Probabilistic Symbolic Model Checker (PRISM) to evaluate the start-up delay of child peer after its parent peer is departed. Based on the evaluation, we have proposed our D. Ghosh (B) · A. Anand Department of Computer Science and Engineering, Amity University Uttar Pradesh, Noida, India e-mail: [email protected] A. Anand e-mail: [email protected] S. Banerjea · M. Pandey Department of Computer Science and Engineering, Motilal Nehru National Institute of Technology Allahabad, Prayagraj, India e-mail: [email protected] M. Pandey e-mail: [email protected] S. S. Gautam Uneva Automations India, Uttar Pradesh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 G. Sanyal et al. (eds.), International Conference on Artificial Intelligence and Sustainable Engineering, Lecture Notes in Electrical Engineering 837, https://doi.org/10.1007/978-981-16-8546-0_18
199
200
D. Ghosh et al.
modifications in P2Cast system which is able to provide fast departure recovery and minimize start-up delay. Keywords Video · Streaming · P2P · Tree · Modeling · CTMC · P2Cast · Churn · Departure · Peer
1 Introduction With the rapid proliferation of handheld devices and easy availability of Internet connectivity, video-on-demand (VoD) content providers like YouTube, Netflix, Hulu, Amazon video etc., are attracting large number of Internet users. According to a survey by Cisco systems, the video traffic will be 82% of all consumer Internet traffic by 2021. VoD streaming is already being used by educational institutions and commercial industries, to expand the reach and impact of knowledge, entertainment, advertisement etc. Video streaming involves three components: media source, communication medium and user. Source of media may be centralized or distributed. In a centralized architecture, multiple clients are connected to a single server. This architecture is successful till the server has enough capacity to handle the incoming connections. A sudden increase in demand of a particular stream leads to the creation of hotspot and the entire streaming system stops functioning. Thus, the quality of service (QoS) parameters like scalability, robustness and fault tolerance suffer in a centralized architecture. Distributed architecture like content delivery network (CDN) and proxy streaming network address the limitation of centralized architecture. In CDN, the content delivery nodes are arranged globally and the server pushes the entire content to each of these nodes, which in turn serve it to their nearby clients and thus improves the user experience. Some of the well-known CDN providers are: Akamai, AT&T, Amazon Cloudfront, Cachefly, CDNetworks, ChinaCache, EdgeCast and MaxCDN. Media companies, for example, YouTube and Netflix hire CDN services to deliver the content to audience. Proxy streaming network works in a similar fashion with the difference in the buffering capacity. The buffering capacity of proxy servers is less, limiting the scalability of the model. Hiring the services of CDN providers presents numerous advantages such as low start-up delay, scalability, high throughput, low latency and jitter. However, the ever increasing demand of media companies persuades the CDN service providers to continuously increase the number of servers, making the sevices expensive. Also, in order to provide desired Quality of User eXperience (QoUX) like throughput, latency, jitter, error rate and frame loss rate [1], the CDN service providers are forced to deploy large number of servers in different geographical locations. This has to be done to reduce delay, jitter, error rate etc. which is only possible when appropriate CDN server can be found with desired content in close geographical proximity. To address the issues associated with CDN-based VoD streaming, the focus [2] has been shifted to utilize large number of VoD client machines that are engaged
Reducing Start-up Delay During Churn in P2P Tree …
201
in displaying streaming content. These machines can serve as elegant alternatives for CDN-based video placeholders. These machines can form P2P overlays among them for timely content discovery and utilize their available upload bandwidth and storage capacity to stream the contents. Further, the P2P VoD streaming architecture requires no dedicated infrastructure, and has a lower deployment cost with enough fault tolerance capability [3]. Due to these benefits, streaming using this architecture has attracted significant research in this area. Many organizations such as Netflix, Peer5 and Peer Tube are utilizing P2P architecture for streaming. P2P tree-based architecture is extensively used to build VoD system. Tree-based approach is based on simple application overlay. The peers in P2P-tree-based systems are organized in a tree structure for delivering the stream. Parents forward the same stream that they receive to their children. However, deploying P2P for VoD streaming inculcates a number of challenges. The first challenge is abrupt arrival of viewers during a streaming session. The newly arrived viewer may like to watch the video from the beginning, which requires the streaming server to continuously move the ongoing streaming session back and forth. This in turn affects the performance of the system. Secondly, the participating peers are free to leave and join the system anytime at their will. This dynamic nature of peers hamper the streaming capability, thus degrading the QoUX. To address the above two discussed challenges in P2P tree-based VoD system, the authors [4] have presented P2Cast. In this paper, we intended to capture the behavior of P2Cast at an abstraction level which does not make the model complex. The authors of [1, 5–8] have adopted formal methods and tools for specification and verification of real-time systems. Formal specification is a collection of mathematical techniques which are used to succinctly describe the system at a comfortable abstraction level. On the other hand, formal verification is used to check the correctness of the properties which should be exhibited by formal specification of the model. However, modeling the behavior of P2P system involves many dimensions of uncertainities and unpredictabilities. There is a prevalent uncertainity on the availability or non-availability of a peer acting as source of P2P streaming. The behavior of Internet (acting as medium) between media source and destination is uncertain in terms of losing a packet or delivering a packet. At closer inspection, it is observed that presentation of stream packet on client side depends on the current state and not on the past history. Clearly, the P2P VoD system exhibits stochastic and probabilistic behavior where the further sequence of execution steps can be derived by different probabilities of occurrence (or non-occurrence) of associated events. We have used Continuous Time Markov Chain (CTMC) [9] to model the behavior of P2Cast as this is the best suited alternative where ‘which step to be executed next?’ depends solely upon the current state and not on the past state. We have utilized PRISM [10], a probabilistic model checker which is suitable for formal modeling and analysis of systems that exhibit probabilistic and stochastic behavior. We have used continuous stochastic logic (CSL) [10] to specify and verify the desired properties of the system. From the analysis, we have identified the shortcoming of P2Cast during departure recovery which leads to performance issue of start-up delay.
202
D. Ghosh et al.
In this paper, we present our proposed model which can rectify the shortcoming revealed in analysis and provide fast departure recovery. Section 2 describe the behavior of P2Cast during departure recovery. Section 3 presents our proposed modification to address the limitation of P2Cast for departure recovery. Section 4 gives the comparative analysis of departure recovery using our proposed modification and P2Cast. Finally, Sect. 5 presents the conclusion.
2 Model of P2Cast The P2Cast tree-based VoD system consists of three components, media server (server), communication medium(network) and participating peers. The participating peers which are directly connected with the media server are called candidate parent. The entire video is streamed over each session of an application-level multicast tree (called base tree) so that it can be shared among the participating peers. For peers who arrive later than the former peers in the session can retrieve patch from the server or other peers that have already cached the initial part. The chunks in between the missed stream and ongoing current stream is called patch stream and the current ongoing stream is called base stream. A peer may need patch, base or both patch and base streams. Figure 1 shows example P2Cast session-based streaming to the requesting client Peer1. Selection of patch and base server are performed through best-fit process of P2Cast. For a requesting client, finding the base, patch or both base and patch server starts with the media server. Figure a, b and c of Fig. 1 shows Peer1 retrieving both the patch and base from Parent1, Child1 and Child1Child, respectively. Figure d, e and f of Fig. 1 show that the requesting client retrieve patch from one parent and base from another parent. For example, in figure e of Fig. 1, Peer1 retrieves patch from Parent1 and base from Child1. It may also be the case that Peer1 can retrieve the patch from server (shown in figure d). For example, if Parent1 is the only child and it has limited bandwidth to stream base but not patch or none of the peer has sufficient bandwidth to stream patch, then Peer1 can retrieve the patch from the Server (as long as it has enough bandwidth) and base from the Parent1.
2.1 CTMC Specification of Best-Fit Process Figure 2 shows the CTMC model of best-fit process executing at each candidate parent peer for patch, base or both patch and base server selection. The candidate parent process receives the handover request at any state State X during its retrieving or transmission of stream. After receiving handover request from Server, the candidate parent follow certain steps which are described as follows: 1. At State1 there may be two possibilities:
Reducing Start-up Delay During Churn in P2P Tree …
203
Fig. 1 Session wise base and patch streaming
(a) it has no children (b) it has children. We describe each scenario in following steps: (a) Candidate parent has no children and at State1 , it estimates its own bandwidth with the requesting client via action named calculatingItsOwnBandwidth. If bandwidth is maximum to stream patch, base or both base and patch, then the Candidate Parent stream patch or base via action named transmissionOfPatchStream and transmissionOfBaseStream and reaches to StateT p and StateT b , respectively. From both these states, it goes back to State3 , i.e., the state during its lifetime to listen any further request like handover, patch, base or both patch and base request. But if Candidate Parent has no bandwidth as well as no children then it can reject the request via action rejectRequest and goes back to State X . (b) If it has children, the Candidate Parent estimates its own bandwidth to the requesting client via action calculatingItsOwnBandwidth. Meanwhile, it sends messages to all of its children shown in Fig. 2 asking them to measure their respective bandwidth to the requesting client via action sendCalculateBandwidthToChildren and thereby reaches to State2 . The bandwidth
204
D. Ghosh et al.
Fig. 2 CTMC of best-fit process executing at Candidate Parent
calculation message is received by the children at State A via action receivedBandwidthCalculationFromParent and the state(of the children) transit to State1 shown in Fig. 3. 2. At State2 , the Candidate Parent received bandwidth estimation results via action receivedBandwidthEstimationFromChildren, which are sent by their children via action sendComputedBandwidthToParent shown in Fig. 3. It compares the received bandwidth results of children with its own bandwidth via action comparingBandwidth and reaches to State4 . After comparison, there may be two possibilities: (a) Candidate Parent’s bandwidth is not maximum from its children and reaches to State5 via action bandwidthNotMaximum. From this state, the Candidate Parent handover the request to one of its child who has maximum bandwidth via action handoverToChildHavingMaxBandwidth(v) and reaches to State7 . From this state, it goes back to State X . (b) Candidate Parent has bandwidth maximum but depending on the request like for new joining or departure recovery it follows the different flow of transitions. 3. For new joining, the patch has priority over the base so first it accepts the patch request via action maxBandwidthAcceptPatchRequest and reaches to State8 . From this state, there may be two possibilities:
Reducing Start-up Delay During Churn in P2P Tree …
205
Fig. 3 CTMC of bandwidth estimation process executing at Child Peer
(a) Bandwidth is available for base stream, or (b) Not enough bandwidth to stream base. (a.1) If bandwidth is available to stream base also, the state transit from State8 → State9 via action bandwidthLeftForBaseStream. If the best-fit process reaches to State9 , this means the new requesting client has found both patch and base server from the same Candidate Parent. So, the transmission of patch and base starts from State8 and State9 , respectively, and continue until reaches StateT p and StateT b . From these states, the process transits to State3 via action transmissionCompleted which shows completion of the transmission. (b.1) Apart from patch, if bandwidth is not enough to stream base, the state transit from State8 → State10 via action bandwidthNotLeftForBaseStream. At this state, it will handover the request to one of its child who has the bandwidth maximum via action handoverToChildHavingMaxBandwidth and reaches to State11 . After the handover, it goes back to State X . In this condition, new peer is not in a joined state until getting both patch and base. Though, this parent is capable of streaming the patch, but the streaming starts (at State8 ) when the new peer also get the base server. Once the new peer is in joined state, the patch stream starts from this Candidate Parent via action transmissionOfPatchStream from State8 . 4. However, if the request is other than new joining process such as departure recovery request and Candidate Parent has bandwidth maximum then it can stream: (a) patch via State4 → State8 → StateT p , (b) base via State4 → State6 → StateT b or (c) both patch and base via State4 → State8 → State9 → StateT b Next section, we will describe the start-up delay after departure recovery in P2Cast which we are interested in analyzing.
206
D. Ghosh et al.
2.2 Departure Recovery in P2Cast In this section, we present the start-up delay property of the peer after recovering from the departure. The authors [4] have applied the same best-fit process for departure recovery. However, in departure recovery, it may be the case that the requesting client needs patch, base or both patch and base. So during recovery, if the orphan client requests for patch, then the selection process follow the same flow of best-fit process starting from the server and terminate once find the patch server. Similarly, the bestfit process is also used for selection of base server and once find the entire subtree is recovered. The parent peer may abruptly leave the system anytime at their will. Sudden leaving of parent peer causes interruption in the retrieval of chunks. The peer must join any other parent peer as soon as possible to restart retrieval. We have captured the average start-up delay of playback, when playback has already started and the next frames for playback is not available due to interruption in retrieval. The total average start-up delay, in this case, is given by Eq. 1. star tup Delaydepar t = joining Delay + restar tingStreaming Delay
(1)
Next, we describe joining delay and restarting streaming delay in detail. 1. Joining Delay The joining delay of a peer is the sum of average latency induced at each component (server, parent and network) in the path. To determine the delay of an individual component, we must consider the true rate of arrival and exit of peer at every component. The delay imposed at each component is calculated as follows. (a) Peer Peer recovery request is poisson process, so the delay is given by the mean time between recovery activities, which is the inverse of the true recovery rate, i.e., rateOfRecovery. In addition, the total average joining delay of peer component is achieved when peer recovery request get fulfilled. The orphan peer request can be for patch, base or for both patch and base server. For example, the delay of getting both patch and base server is obtained by the mean time between accepting activities, i.e., rateOfPatchAccepting and rateOfBaseAccepting. The accepting activity is the inverse of the true acceptance rate. The average peer joining delay is given in Eq. 2: jd1 = 1/S[ peer 1 = 3] × rateO f Recover y + 1/S[( peer 1 = 1)&(accept Patch Str eam = 1)] × rateO f Patch Accepting + 1/S[( peer 1 = 1)&(accept BaseStr eam = 1)] × rateO f Base Accepting
(2)
Reducing Start-up Delay During Churn in P2P Tree …
207
(b) Server Server component handover the departure recovery, i.e., joining request to the candidate parent. However, when peer is joined directly under server then joining delay is very small which is identical to the centralized system. To obtain the variance of joining delay in P2P-tree-based hierarchical system, here we are interested in determining the average delay taken by the server component when it handover the joining request to the candidate parent. The delay is given by the mean time between server handover activities i.e rateOfHandover which is the inverse of the true handover rate. The mean server handover delay is given in Eq. 3: jd2 = 1/S[(ser verCandidate Par ent1 = 2)& (ser ver Session = 0)&( par ent1H andover = 0)] × rateO f H andover
(3)
(c) Parent The Parent sends message to all its children by parent handover activity, i.e., rateOfParentHandover asking them to measure their respective bandwidth to the peer. The parent estimates its own the bandwidth to the peer by bandwidth estimation activity, i.e., rateOfEstimatingBandwidth. The parent collects the measured bandwidth from all its children (n) by replying activity, i.e., rateOfReplying. It identifies the child node that has the maximum bandwidth to peer by bandwidth comparasion activity, i.e., rateOfBandwidthComparison. The average delay of all these activities is given by the mean delay between each of these activities which is the inverse of the true rate of the activities as shown in Eq. 4. After bandwidth comparasion, if the parent has maximum bandwidth to stream both patch and base then it accept the request by patch accepting and base accepting activities, i.e., rateOfPatchAccepting and rateOfBaseAccepting. The average delay of acceptance activities is given by the mean delay between each of these activities which are the inverse of the true rate of the activities shown in Eq. 5. So, the total average delay from handover activity to acceptance activity is given by Eq. 6. jd13 = 1/S[( par ent1H andover = 1)&((child1 = 1) &(child1H andover = 0))&((child2 = 1) &(child2H andover = 0))] × rateO f H andover + 1/S[ par ent1Est Bandwidth = 0] × rateO f Estimating Bandwidth + n × (1/S[ par ent1Getting Reply = 0] × rateO f Replying) + 1/S[ par ent1Bandwidth Maximum = 0] × rateO f BandwidthComparison
(4)
208
D. Ghosh et al.
jd23 = 1/S[( par ent1Accepting = 0) &( par ent1Bandwidth Maximum = 1)] × rateO f Patch Accepting + 1/S[( par ent1Accepting = 1) &( par ent1Bandwidth Maximum = 1)] × rateO f Base Accepting
jd3 = jd13 + jd23
(5)
(6)
So the joining delay for tree when peer gets patch and base from same parent is calculated by adding Eqs. 2, 3 and 6 along with the average network delay at each hop shown in Eq. 7: jdtr ee = jd1 + jd2 + jd3+ avg N etwor k Delay At Each H op
(7)
However, if parent bandwidth is not maximum, i.e., (parent1BandwidthMaximum is not equal to 1), then the handover algorithm will be repeated until get the patch server. For each handover repetition given in Eq. 4, jd13 is calculated where state variables are changing relative to the parent at different height. For example, the formula is changed for joining delay given by Eq. 8, when peer get patch from one parent at some height and get base from another parent at another height. jd3 = jd13 + ... + jd231 + ...+ jd13... + jd232
(8)
where jd231 is: jd231 = 1/S[(child1Accepting = 0) &(child1Bandwidth Maximum = 1)] × rateO f Patch Accepting
(9)
where jd232 is: jd232 = 1/S[(child1Child Accepting = 0) &(child1Child Bandwidth Maximum = 1)] × rateO f Base Accepting
(10)
Reducing Start-up Delay During Churn in P2P Tree …
209
2. Restarting Streaming Delay This delay is mean end-to-end delay of true rates of entry and exit of frames. To obtain the mean end-to-end delay, we consider the sum of average delays imposed by each individual component in the communication path. If the peer retrieves the stream directly from the server component, then the frames do not have an explicit entry activity, since we consider that one frame is generated as soon as the previous is transmitted. So the delay is given by the mean time between transmit activities, i.e., rateOfServerUnicasting, which is the inverse of the true transmit rate as shown in Eq. 11. sour ceDelay = 1/S[ser verCurr ent BaseStr eam = 0] × rateO f Ser verU nicasting
(11)
The peer component has only one input and one output activity and so the delay is given by Little’s Law [1] as shown in Eq. 12. The avgFramesOnBufferAtState1 is the average number of frames in the peer buffer which will eventually be received as shown in Eq. 13. peer 1Delay = avg Frames On Bu f f er At State1/ S[ peer 1Rear Base = 1] × rateO f Playback
(12)
avg Frames On Bu f f er At State1 = F[ peer 1Rear Base = 1]
(13)
where,
We have considered the network component to be an unreliable medium and all the frames are not delivered to the client or peer. Some are lost via the activity loss, i.e., rateOfLoss. Using Little’s Law in the context of successful transmission, the average delay of the successfully passed on frames is given by the Eq. 14. networ k Delay =
averageN umber Frames Received tr ue RateO f Receive
(14)
The start-up delay of the playback when joining delay is excluded is the sum of the component delay which is shown in Eq. 15 s Dtr eeH t1 = sour ceDelay + networ k Delay + peer 1Delay
(15)
However, if the peer connect with other parent peer rather than server component, then the equation for calculating start-up delay of playback at different height excluding joining delay is given by Eqs. 16–19, when content of initial start-up playback is patch. But, if the arrival delay of parent and peer is very small, then the initial content of start-up playback is base stream which it retrieves from parent peer then average delay is changed to Eq. 20.
210
D. Ghosh et al.
s Dtr eeH t2 = par ent1Delay + networ k Delay + peer 1Delay
(16)
where par ent1Delay = 1/S[( peer 1Retrieving Patch Str eam Fr om = P A R E N T I D)&( par ent1Giving Patch Peer 1 = 0)] × r ateO f Par ent1Sending Patch (17) peer 1Delay = avg Frames On Bu f f er At State1 /S[( peer 1Retrieving Patch Str eam Fr om = P A R E N T I D) &( peer 1Rear Patch = 1)] × rateO f Playback (18) where avg Frames On Bu f f er At State1 = F[ ( peer 1Retrieving Patch Str eam Fr om = P A R E N T I D) &( peer 1Rear Patch = 1)]
(19)
s Dtr eeH t3 = sour ceDelay + networ k Delay + par ent1Delay + networ k Delay + peer 1Delay
(20)
where, par ent1Delay = 1/S[( peer 1Retrieving BaseStr eam From = P A R E N T I D)&( par ent1Giving Base Peer 1 = 0)] × r ateO f Par ent1Br oadcasting
(21)
peer 1Delay = avg Frames On Bu f f er At State1 /S[( peer 1Retrieving BaseStr eam From = P A R E N T I D) &( peer 1Rear Base > 0)] × rateO f Playback
(22)
where avg Frames On Bu f f er At State1 = F[( peer 1Retrieving− BaseStr eam Fr om = P A R E N T I D) &( peer 1Rear Base > 0)] (23)
Reducing Start-up Delay During Churn in P2P Tree …
211
Fig. 4 View of family message
Fig. 5 Parent send family message to children
3 Proposed Architecture for Departure Recovery In P2Cast, the media server gets involved in case of failure. This causes disruption due to server bottleneck at the source. Orphaned clients reconnect by using the join algorithm starting from the media server as the source, resulting in long waiting time before the service could resume. On the contrary, in our architecture, parent departures are handled locally and most of the times without involving the media server. To maintain the connectivity, proposed architecture allows peers to exchange messages with its neighbors. In our proposed architecture, neighbors of a peer include its parent, children and its siblings. Here, we use FAMILY as message to maintain the connectivity. This message is delivered by the parent to its child when it joins its parent. The message contains family information which includes the IP addresses of grandparent and siblings as illustrated in 4. The message exchange between parent and children is presented in Fig. 5. For example, if client ‘C’ in Fig. 5 is currently joined under ‘b,’ then it can deliver the family message which includes a’s IP address, e’s IP address and d’s IP address(shown in Fig. 4). The FAMILY message is delivered periodically. However, to reduce the network overhead, the message is forwarded by the peers only when the information get updated.
212
D. Ghosh et al.
3.1 Failure Recovery In the above Sect. 3, we have discussed on FAMILY message that is used to provide family information as shown in Figs. 5 and 4. This message is periodically provided by the parent to the child in order to get updated about its current grand parent and siblings. The idea behind proposing the FAMILY message is to connect directly to its grandparent at the time of departure recovery due to parent departure. It is obvious that before departing, the parent was retrieving the stream from its parent who is the grandparent of parent’s child. When the parent is departed, it may possible that the orphan child can get missed and ongoing stream from its grandparent. If it is so, then the recovery time would be better as compared to the P2Cast process for departure recovery and the entire subtree of the child is also recovered. Failure recovery request can be in two forms: patch stream recovery and base stream recovery. 1. Base Recovery In our proposed design, recovery process does not start with contacting the media server first. Instead, the parent node which is disrupted by failure contact its grand parent. The grand parent information can get from family message (shown in Fig. 4) provided by departed parent. For example, suppose client C is disrupted by failure. It contact its grand parent A. A in turn executes the joining process through bandwidth principle similar to P2Cast. Based on bandwidth comparison, if found suitable parent, then C can join that node and the entire subtree under C is recovered. Similar to P2Cast, we also consider that recovery procedure allows only the parent node rooted at departed node. However, if recovery fails, then the parent node (disrupted by failure) contacts the media server and the same new client admission process is executed. 2. Patch Recovery Similar to base stream recovery procedure, here also the disrupted node contact its grand parent. If found suitable parent in terms of bandwidth, the disrupted node is recovered. Otherwise, it contacts to the media server and new client admission process is executed but terminates once found patch server
3.2 CTMC Specification In this section, we discuss on recovery analysis of child peer under parent peer through CTMC. Figure 6 shows the CTMC of departure recovery of child node. The function of child node after joining under parent node can retrieve, forward and display the stream. However, it is not defined that at which time the parent will be departed and based on the current situation, the child has to take decision for the next step. The flow of CTMC transitions of child node are described as follows: 1. Child peer join to the parent via action rateOfJoining(r16 ) and the state transit from State0 → State1 .
Reducing Start-up Delay During Churn in P2P Tree …
213
2. After joining to the parent, the child is in State1 and starts retrieving the stream via action rateOfReceiving(r2 ) and the state transit from State1 → State3 . From State3 , it can perform playback, listen handover request, patch request etc. However, child’s parent can be departed at any state. Due to this, the child have to execute departure recovery process described in Sect. 3.1. 3. When the parent departed from the session, the child execute the action rateOfDepartureRecovery(r1 ). After joining, the departure recovery may be in any state (e.g State1 , State3 ) until the child peer is at State6 which indicates the completion of playback. For example, after joining to the parent, the child process is in State1 and if the parent is departed in this state, the child state transit from State1 → State2 via action rateOfDepartureRecovery(r1 ). 4. When the child is in State2 , the departure recovery request goes to grandparent and the state transit from State2 → State7 via action rateOfRequesttoGrandParent(r3 ). 5. At State7 , the best-fit process of P2Cast executes. However, the rejoining process does not start with the server instead the process starts with its grandparent. 6. After execution of the rejoining process, there may be three possibilities of parent: (a) grandparent itself, (b) any child node of grandparent (c) any node from the session. • If found that, grandparent is the most suitable node the state transit from State7 → State7 . After the orphan child join its grandparent as parent, the child retrieve the stream and the state transit from State7 → State3 and the receiving process continues. • In comparison to grandparent, if the grandparent’s child or any node from the session is most suitable parent then the orphan child join that node and the state transit from State7 → State8 . At State8 the child retrieve the stream from the parent and the state transit from State8 → State3 . • However, when any of the possibilities fail, the departure recovery request goes to the server via action rateOfRequestToServer(r6 ) and the state transit from State7 → State9 . At State9 , if the departure request is for patch stream and the server has enough bandwidth to stream then the state transit from State9 → State3 and receiving of stream process continues. But if the server denies for patch stream due to bandwidth unavailability, it forwards the stream request to its super parent and the state transit from State9 → State10 via action rateOfRejoiningTONode(r7 ). The same flow is applicable when the request is for base stream. 7. When the peer is at State10 , which indicates the child got the parent, the state transit from State10 → State3 via action rateOfReceiving(r2 ) and the process continues. But, if the child has not found the parent the child request is rejected via action rateOfRejection(r8 ) and the state transit from State9 → State0 . 8. During the receiving state State3 or playback state State4 , it may happen that parent will depart and for departure recovery the state transit from State3 → State2 and State4 → State2 respectively and the recovery process continues.
214
D. Ghosh et al.
Fig. 6 CTMC of departure recovery process
4 Performance of Departure Recovery In this section, we have analyzed the performance of proposed departure recovery method which claims that instead of going directly to the server for departure recovery (which is the recovery principle of P2Cast architecture), it is better to first ask its grandparent for recovery. For analysis, we have compared the performance of departure recovery one through grandparent(proposed architecture) and another through server(P2Cast architecture), using PRISM property. The rate of transitions used to calculate the performance of both P2Cast and our proposed model is shown in Table 1. The PRISM property specification of departure recovery one through grandparent and another through server is shown in Eqs. 24 and 25 respectively. R{“Grand Par ent } =? [ S ]
(24)
R{“Ser ver } =? [ S ]
(25)
The reward structures are defined in Eqs. 26 and 27.
Reducing Start-up Delay During Churn in P2P Tree …
215
Fig. 7 Departure recovery cost varying N and M
Fig. 8 Departure recovery cost varying M
r ewar ds “Grand Par ent [ joinT oGrand Par ent] tr ue : 1; endr ewar ds
(26)
r ewar ds “Ser ver [ f or war dT oSuper Par ent] tr ue : 1; endr ewar ds
(27)
216
D. Ghosh et al.
Table 1 Activity and transition rates of P2P tree-based VoD with VCR Activity
Rate of activity
Specific rate
[handoverToCandidateParent], [handoverToChildren], [childHandoverToChildren]
rateOfHandover
N /L, N: no. of peer arrivals
[parentEstimatingBandwidth]
rateOfEstimatingBandwidth
N /L
[childBandwidthMaximum]
rateOfBandwidthComparison
N /L
[childrenReplyingBandwidth]
rateOfReplying
N /L;
[departure], [child1ChildDeparture]
rateOfDeparture
M/L, M: no. of peer departures
[peer1PatchBaseRecovery]
rateOfRecovery
M/L
[forwardToSuperParent], [requestToServer]
rateOfForward
2 ∗ (m/L), m: no. of node departures, L: length of video in seconds
[joiningToServer], [joiningToSuperParent], [joiningToGrandGrandParent], [joiningToGrandParent], [joiningToGrandParent], [joiningToGrandParent], [joiningToGrandParentChild3]
rateOfJoining
N /L, N: no.of node arrivals
[executingSelection]
rateOfExecutingSelection
m/L
[reJoining], [joinToGrandParent], [joinToGrandParentChild1], [joinToGrandParentChild2]
rateOfRejoining
m/L
[gettingEstimationFromChildren]
rateOfSendingTheEstimation
m/L
[request]
rateOfDepartureRecovery
m/L
The steady-state reward formula of Eqs. 24 and 25 is applied on transition named joinToGrandParent and forwardToSuperParent respectively. The expected steady state reward values shown in Fig. 7 for our proposed architecture and P2Cast are obtained using Eqs. 24 and 25, respectively. Figure 7 shows departure recovery one through grandparent and another via server by varying both number of peer arrivals(N) and number of peer departures(M). Figure 8 shows departure recovery one through grandparent and another via server by varying number of peers departure(M) but peers arrival remain constant, i.e., 50. It is observed from Fig. 7, that the steady-state reward value of departure recovery through server is twice of steady state reward value of departure recovery through grandparent. This means that, if we choose a path of departure recovery through grandparent, then recovery time is half of the recovery time through server. This indicates that the performance of departure recovery through our proposed architecture is better as compared to the P2Cast architecture. From the Figs. 7 and 8, we have shown that for both the departure recovery procedure(via grandparent or server), when varying number of nodes arrival and departure represented by N and M, respectively, the difference in the steady-state reward value for the varying departure relative to constant arrival is very less. But the difference is more when node arrival increases relative to constant departure.
Reducing Start-up Delay During Churn in P2P Tree …
217
5 Conclusion In this paper, we have proposed a modified architecture of P2P tree-based VoD streaming system. This modified architecture is able to resolve the issue of startup delay at the event of departure recovery. Our proposed architecture is specified formally using CTMC and its performance is compared with P2Cast. Our architecture is evaluated against failures, and it is revealed through performance analysis that departure recovery of an affected client in our proposed system take minimum time as compared to P2Cast.
References 1. Ghosh D, Pandey, M, Tyagi N (2014) Stochastic modeling and analysis of video on demand system with VCR functionalities. In: IEEE international conference of computer science & engineering, Thailand 2. Ghosh D, Rajan P, Pandey M (2014) P2p-VoD streaming: design issues & user experience challenges. In: Advanced computing, networking and informatics-vol 2: wireless networks and security. Springer Berlin Heidelberg. ISBN 978-3-319-07350-7 3. Rodrigues R, Druschel P (2010) Peer-to-peer systems. Commun. ACM 4. Guo Y, Suh K, Kurose J, Towsley D (2007) P2cast: peer-to-peer patching for video on demand service. In: Multimedia tools and applications 5. Aygun R, Zhang A (2002) Modeling and verification of interactive flexible multimedia presentations using promela/spin. Model Checking Software, vol 2318. Springer, Berlin Heidelberg, pp 205–212 6. Martin FI, Alins-Delgado JJ, Aguilar-Igartua M, Mata-Diaz J (2004) Modelling an adaptiverate video-streaming service using Markov-rewards models. IEEE Computer Society 7. Norman G, Parker D (2014) Quantitative verification: formal guarantees for timeliness, reliability and performance. A knowledge transfer report from the London Mathematical Society and Smith Institute for Industrial Mathematics and System Engineering 8. Hanczewski S, Stasiak M (2011) Performance modelling of video-on-demand systems. In: The 17th Asia Pacific conference on communications, Sabah, pp 784–788 9. Keshav S (2012) Mathematical foundations of computer networking 10. Kwiatkowska M, Norman G, Parker D (2011) Prism4.0: Verification of probabilistic real-time systems 6806:585–591
Operational Flexibility with Statistical and Deep Learning Model for Electricity Load Forecasting Ayush Sinha, Raghav Tayal, Ranjana Vyas, and O. P. Vyas
Abstract Electricity has very different characteristics compared to other material products as electricity energy cannot be stored, so it should be generated when it is demanded. It should be consumed at that particular time it is produced, so it is necessary to accurately forecast its demand. Since electricity data is stored in form of time series data, it is assumed to have linear as well as nonlinear patterns. Initially, statistical approach was a popular choice among researchers for simple time series data forecasting, but with the advent of deep learning techniques, forecasting becomes more accurate. The advantage of deep learning is capability of handling large-scale data as well as handling the heterogeneity of data simultaneously. So, in this work ,a hybrid approach has been proposed in for power load forecasting which is based on vector auto regression (VAR) and CNN-LSTM. VAR model is used to separate out linear pattern in time series data and CNN-LSTM for modeling nonlinear patterns in data. CNN-LSTM works as CNN can extract complex features from electricity data and LSTM can model temporal information in data. This approach can derive temporal and spatial features of electricity data. Through our extensive experiments, it is established that VAR-CNN-LSTM hybrid approach performed better as compared to more recent methods like CNN-LSTM on Household Power Consumption dataset as well as Ontario Electricity Demand dataset. Moreover, this approach performed fairly better as compared to more traditional approaches like multilayer perceptron and long short-term memory. Performance metrics such as mean squared error, mean absolute error and root mean square error have been used to evaluate the performance of the discussed approaches. Keywords Vector auto regression · Convolutional neural network · Long short term memory · Electrical load forecasting A. Sinha (B) · R. Tayal · R. Vyas · O. P. Vyas Department of IT, Indian Institute of Information Technology, Allahabad, U.P., India e-mail: [email protected] R. Vyas e-mail: [email protected] O. P. Vyas e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 G. Sanyal et al. (eds.), International Conference on Artificial Intelligence and Sustainable Engineering, Lecture Notes in Electrical Engineering 837, https://doi.org/10.1007/978-981-16-8546-0_19
219
220
A. Sinha et al.
1 Introduction Electricity demand forecasting plays an important role as it enables electric industry to make informed decisions in planning power system demand and supply. Moreover, accurate power demand forecasting is necessary as energy must be utilized the same time as it is produced owing to its physical characteristics [1]. Electricity demand forecasting can be of multiple types: short term (day), medium term (week to month) and long term (year). These forecasts are necessary for proper operation of electric utilities. Precise power load forecasting can be helpful in financing planning, in order to make a strategy of power supply, management of electricity and market search [2]. It is a time series problem which is multivariate as electrical energy is dependent of many characteristics which makes use of temporal data for the prediction. Temporal data depends on time and represented using time stamps. Prediction using classical methods of load forecasting is very difficult as power consumption can have a uniform seasonal pattern but irregular trend component.
1.1 Time Series It is a series of discrete data points which are taken at fixed intervals of time [3]. An explicit order dependence is added between observations by time series via time dimension. Order of observations in time series gives a source of extra information which can be used in forecasting. There may be one or more variables in time series. A time series that is having one variable changing over time is univariate time series. If greater than one variable is varying with time, then that time series is multivariate. It can have applications in many domains such as weather forecasting, power load forecasting, stock market prediction, signal processing and econometric. Time Series Analysis and Forecasting It constitutes of methods for analyzing and draw out meaningful information and patterns from data which can help in deciding the methods and getting better forecasting results [4]. It helps to apprehend the nature of series that is needed to be predicted. Time series forecasting involves creating a model and fitting it on training set(historical data) and then using that model to make future predictions. In classical statistical handling, taking forecasts in the future is called extrapolation. A time series model can be evaluated by taking forecast in the future and analyze the performance by certain evaluation metrics like MSE, MAE and RMSE. Time Series Evaluation Metrics The most commonly used error metrics for forecasting are: • Mean Squared Error: It is the average of cumulative sum of square of all prediction error. It is formulated as
Operational Flexibility with Statistical and Deep Learning Model …
MSE =
n (yi − yˆi )2 /n
221
(1)
i=1
• Mean Absolute Error: It is the average of cumulative sum of absolute value of all prediction error. It is formulated as MAE =
n
yi − yˆi /n
(2)
i=1
• Root Mean Squared Error: It is square root of the mean of cumulative sum of square of all prediction error. It is formulated as n RMSE = (yi − yˆi )2 /n
(3)
i=1
2 Literature Survey H. K. Choi et al. discussed ARIMA-LSTM hybrid model for time series forecasting [5]. LSTM was used for temporal dependencies and its long-term predictive properties. To circumscribe linear properties, ARIMA is used and for residuals which contain non-linear and temporal properties, LSTM is used. This hybrid model is compared with other methods, and it gave better results for evaluation metrics such as MSE, RMSE and MAE. Tae-Young Kim, Sung-Bae Cho proposed a hybrid CNNLSTM model that is evaluated on power consumption data [6]. It is proposed that CNN can extract temporal and spatial features between several variables of data, whereas LSTM takes data returned by CNN as input and model temporal data and irregular trends. The proposed model is compared with other models like GRU, BiLSTM etc., and it performed a lot better on evaluation metrics such as MSE, RMSE, MAE and MAPE. G. Mahalakshmi et al. surveyed various methods for forecasting time series data and also discussed various types of time series data that are being forecasted [7]. Research has been done on various types of data such as electricity data and stock market data and the performance evaluation parameters such as MAE and MSE prove that hybrid forecasting model yields good results compared to other models. A. Gasperin et al. discussed the problem of accurately predicting power load forecast owing to its nonlinear nature [8]. Authors worked on two power load forecast datasets and apply state-of-the-art deep learning techniques on the data for short-term prediction. Most relevant deep learning models applied to the shortterm load forecasting problem are surveyed and experimentally evaluated. The focus has been given to these three main models, namely sequence to sequence architectures, recurrent neural networks and recently developed temporal convolution neural networks. LSTM performed better as compared to other traditional models. S. Siami-
222
A. Sinha et al.
Namini et al. compared deep learning methods such as LSTM with the traditional statistical methods like ARIMA for financial time series dataset. According to them, forecasting algorithm based on LSTM improve the prediction by reducing error rate by 85% on when compared to ARIMA [9]. Wang et al. [10] worked on CNN-LSTM consisting of two parts: regional CNN and for predicting the VA ratings method used is LSTM. According to their evaluation, regional CNN-LSTM outperformed regression and traditional neural network-based methods. Alex Sherstinsky explains the essential fundamentals of RNN and CNN [11]. Author discussed about ‘Vanilla LSTM’ and discussed about the problems faced when training the standard RNN and solved that by RNN to ‘Vanilla LSTM’ transformation through a series of logical arguments. C. Hartmann, M. Hahmann, D. Habich, and W. Lehner worked on cross sectional forecasting approach on auto regression which consumes available data from many same domain time series in a single model, thus covering wide domain of data which also compensates missing values and quickly calculate accurate forecast results [12]. J. Y. Choi1 and B. Lee presented a novel LSTM ensemble forecasting algorithm which can combine many forecast results from a set of individual LSTM networks [13]. The novel method is able to capture nonlinear statistical properties, and it is easy to implement and computationally efficient. G. Chniti et al. presented robust forecasting methods for phone price prediction using support vector regression (SVR) and long short-term memory neural network (LSTM) [14]. Models have been compared for both univariate and multivariate data. In multivariate model, LSTM performed better as compared to others. K. Yan et al. worked on short-term forecasting for electricity power consumption dataset. Due to varying nature of data for electricity, traditional algorithms performed poorly as compared to LSTM [15]. To increase further accuracy, authors discussed a hybrid approach consisting of CNN on top of LSTM which is experimented on five different datasets and it performed fairly better as compared to ARIMA, SVR and LSTM alone. C. N. Babu et al. proposed a linear and nonlinear models combination that is a combination of ARIMA and ANN models where ARIMA is used for linear component and ANN for nonlinear component. For furthur improvement, authors proposed that nature of time series should be taken into account so volatile nature is taken into account by moving average filter and then hybrid model applied, Proposed hybrid model is compared with these individual models and some other models and it performed fairly well as compared to other models [16].
3 Problem Statement Build a model which can accurately forecast Power load data formulated as a problem given below:
Operational Flexibility with Statistical and Deep Learning Model …
223
1. Given fully observed time series data Y = {y1 , y2 , …, yT } where yt belongs to R n and n is the variable dimension, aim is to predict a series of future time series data. 2. That is, assuming {y1 , y2 , …, yT } is available, then predicting yT +h where h is the desirable time horizon ahead of the current timestamp [17]. 3. Following constraints need to be satisfied by the model: (a) Model should be able to handle numerous series data. (b) Model should be able to handle incomplete data. (c) Model should be able to handle noisy data.
4 Proposed Methodology—VAR-CNN-LSTM Hybrid Model This model combines the ability of statistical model to learn with combination with deep learning models. Time series data is known to be made of linear and nonlinear segment which can be expressed as dt = Nt + L t +
(4)
where L t is linear component at time t, Nt is component which is non-linear component at time t and is the error component. Vector auto regression is a traditional statistical model for time series forecasting which is known to perform well on linear problems. On the other hand, neural network models like CNN-LSTM seem to work well on problems which have non-linearity in data. So, combination of both models is capable of identifying both linear as well as non-linear patterns in data. In this model, VAR can identify linear interdependence in data and residuals left from VAR can be used by CNN-LSTM to capture nonlinear patterns in data. Now, we will discuss each of these sector used in algorithm.
4.1 Vector Auto Regression (VAR) Model When two or more time series have an influence over each other, then vector auto regression can be used. This model is auto regressive as in this model, each variable is formulated as function of past values of variables [18]. As compared to other models like ARIMA, in this model, variable output is build as a linear combination of its own past values as well as values of other variables, whereas in ARIMA, output depends on value of that particular variables on which we want to make predictions. A typical auto regression with order ‘p’ can be formulated as Yt = α + β1 Yt−1 + β2 Yt−2 + . . . + β p Yt− p + ε
(5)
224
A. Sinha et al.
where α is a constant denoting the intercept, β1 , β2 , . . . , β p are lag coefficients. To understand equation for VAR, let us assume there are two time series Y1 and Y2 and these needed to be forecast at time t and we know that to calculate predicted values, VAR needed to consider past data of all related variables. So, equations of value predicted at time t and order p becomes: Y1,t = α1 + β11,1 Y1,t−1 + β12,1 Y2,t−1 + · · · + β11, p Y1,t− p + β12, p Y2,t− p
(6)
Y2,t = α2 + β21,1 Y1,t−1 + β22,1 Y2,t−1 + · · · + β21, p Y1,t− p + β22, p Y2,t− p
(7)
In order to apply VAR model, time series need to be stationary. If it is stationary, we can directly predict using VAR model, else we need to make differences of data to make it stationary. To check for stationarity, Augmented Dickey-Fuller Test (ADF Test) can be used. It is a unit root stationarity test. Property of time series which make it non-stationary is unit root and number of unit roots determine how many differencing operations are needed in order to make series stationary. Consider the following equation: Yt = α + βt + γ Yt−1 + δ1 ΔYt−1 + δ2 Yt−2 + · · · + δ p Yt− p + ε
(8)
For Augmented Dicky Fuller Test, if the null hypothesis that δ = 1 in the model equation proves to be true, then the series is non stationary, else series is stationary. Since, the null hypothesis assumes presence of unit root (δ = 1), so value of p should be less than significant level of 0.05 for rejecting the null hypothesis, hence proving that series is stationary. After series become stationary by differencing the series and verifying using Augmented Dicky Fuller Test, now we need to find out the right order for vector auto regression. For that purpose, we will iterate over different orders values and fit the model. Then, find out the order which gives us least AIC. AIC stands for Akaike information criterion which is a method for selecting a model based on score. Suppose m be the number of parameters estimated for the model and L be the maximum likelihood. Then, AIC value is following: AIC = 2 ∗ m − 2ln(L)
(9)
We will select that model which has least value of AIC. Goodness of fit is rewarded by AIC, but penalty function is implemented as increasing with increase in number of estimated parameters. After done with all the testing and getting all requisite parameters, forecasting can be performed on the data. The residual that we get after subtracting forecasted data from original test data is used as input to convolutional neural network and that data contains nonlinear patterns. It is formulated as: dt − L t = Nt +
(10)
Operational Flexibility with Statistical and Deep Learning Model …
225
4.2 CNN-LSTM Model As we know, neural network has good performance on nonlinear data mostly due to large number of versatile parameters. Moreover, due to use of non-linear activation function in layers, they can easily adapt to nonlinear trends. They can model residuals received from VAR very effectively. This model extracts temporal and spatial features for effectively forecasting time series data. It consists of convolutional Layer with max pooling layer on top of LSTM. CNN is consisting of an input layer which accepts various correlated variables as input and an output layer that will send devised features to LSTM . Convolution layer, ReLU layer, activation function, pooling layer are types of hidden layers. Convolutional layer reads the multivariate input time series data and applies the convolution operation with filters and sends results to next layer reducing the number of parameters and making the network deeper. If xi0 = {x1 , x2 , ..., xn } is input vector, yi1j output from first convolutional layer is: yi1j
=σ
b1j
+
M
1 0 wm, j x i+m−1, j
(11)
m=1
yi1j is calculated by input xi0j from previous layer and bias bij represents bias for jth feature map, weights of kernel is represented as w and σ denotes the rectified linear unit (ReLU) like activation function. Similarly, resultant vector from kth convolutional layer is formulated as: yil j
=σ
blj
+
M
1 0 wm, j x i+m−1, j
(12)
m=1
Convolution pooling layer is followed by a pooling layer which reduces the space size of the devised results from conv layer, thereby reducing number of parameters and hence computing costs. Max pooling operation is formulated as: Pilj = max yil−1 xk+z, j z∈Z
(13)
After convolution operation, LSTM is used which is lower layer in CNN-LSTM neural network which stores temporal information from features extracted from convolution layer. It is well-suited for forecasting as it reduces the problem of vanishing and exploding gradient which is generally faced by recurrent neural networks. Remembering early data trend is made possible in LSTM by use of some gates which control flow of information through the memory line. LSTM consists of cells which captures and stores the data streams. Adding some gates in each cell of LSTM enables us to filter, add or dispose the data. Gates are based on sigmoid layer which enables LSTM cells to optimally pass data or disposing it. There are three types of gates mainly:
226
A. Sinha et al.
• Forget Gate: This gate filters out the information that cell state should discard. It is formulated as (14) f t = σ (W f .[h t−1 , xt ] + b f ) • Input Gate: It decides what should be new information that we need to put in cell. It is consisting of a sigmoid-based layer which decides which values are needed to be updated. Moreover, it contains a tanh layer which creates new candidate values vector, C˜ that is needed to be added to the state. We need to combine these two to define the update. (15) i t = σ (Wi .[h t−1 , xt ] + bi ) C˜t = tanh(Wc .[h t−1 , xt ] + bc )
(16)
Cell state is updated by first forgetting the things from previous state that was decided to be forgotten earlier and then adding i t ∗ C˜t . It is formulated as: Ct = f t ∗ Ct−1 + i t ∗ C˜t
(17)
• Output Gate: This gate decides about what is the output out of each cell. To get output, we run a sigmoid layer on input data and hidden layer which decides what we are going to output. Then, cell state (Ct ) is passed through tanh layer and get multiplied by output gate such that we get the parts that we decided to output. ot = σ (Wo .[h t−1 , xt ] + bo ), h t = ot ∗ tanh(Ct )
(18)
Last unit of CNN-LSTM consists of dense layer (also known as fully connected layer) which can be used to generate the final output result. Here, as we are forecasting for 1 hour so, number of neuron units in dense layer is 1.
5 Experimentation and Results The experimentation has been done on Household Electricity Consumption Dataset with detail explanation is as:
5.1 Household Power Consumption Dataset It is a multivariate time series dataset consisting of household energy consumption in a span of four years (2006–2010) at per minute sampling provided by UCI machine learning repository [19]. It consists of seven time series, namely: 1. global active power: Total active power consumption by household (measured in kilowatt).
Operational Flexibility with Statistical and Deep Learning Model …
227
2. global reactive power: Total reactive power consumption by household (in kilowatt). 3. voltage: Average voltage of household (in volts). 4. global intensity: Average intensity of current (measured in amperes). 5. sub metering 1: Active energy utilized for kitchen (watt-hours). 6. sub metering 2: Active energy utilized for laundry (watt-hours). 7. sub metering 3: Active energy utilized for climate control systems (watt-hours).
5.2 Preliminary Analysis Preliminary analysis of data is being done and patterns are evaluated, enabling us to make correct predictions. From Fig. 1, we can observe that given time series follow seasonal pattern but irregular trend components. From Fig. 2, it can be seen that there is a positive correlation between the two variables. Global intensity has a significant impact in forecasting GAP value as we can easily observe from graph. From Fig. 3, you observe that global active power and Voltage does not have a strong correlation.
5.3 Performance Comparison of Models Best fitted model to be used depends on historical data availability, relationship between variables to be forecast. To verify the effectiveness of the proposed models, experiments have been conducted for other neural network models consisting of MLP, LSTM, CNN-LSTM etc., and results are evaluated with MSE and RMSE. Next, we will go through architecture of each of these models and compare the results (Figs. 4 and 5):
Fig. 1 Time series patterns in household power consumption data
228 Fig. 2 Correlation between GAP and GI
Fig. 3 Correlation between GAP and voltage
A. Sinha et al.
Operational Flexibility with Statistical and Deep Learning Model …
229
Fig. 4 Correlation between variables without resampling over hour
Fig. 5 Correlation between variables with resampling over hour
5.4 MultiLayer Perceptron Model The architecture of multilayer perceptron is dependent on parameters adjustment and number of hidden layers in the network. Multilayer perceptron consists of input layer consisting of input neurons, hidden layers and output layer. Hidden layers consists of dense layers. Parameters such as number of neurons in hidden layers, learning algorithm and loss function can be optimized based on input data. Here, input data is resampled to convert it into hour-based sampling. Input data consists of sliding window of 24 data points for which we are going to predict the next hour of result. Input is basically 24 × 7 size data where 24 is number of time steps and 7 are the number of variables in each time step. Architecture of MLP Model and Forecasting Result As we can see from the architecture Fig. 6, two hidden layers each with 100 neurons have been used for
230
A. Sinha et al.
Fig. 6 Model architecture of multiLayer perceptron (household electricity data) Table 1 Prediction performance with multilayer perceptron model Mean Absolute Error (MAE) Mean Square Error (MSE) Root Mean Square Error (RMSE) 0.395
0.303
0.551
Fig. 7 Prediction versus actual result of multilayer perceptron (household electricity data)
extracting patterns from the data. Model is trained with upto 50 epochs and early stopping is used on data with patience value as eight which ensures if there is similar validation loss in each of eight consecutive epochs then model will stop running and most optimal weights will be stored as output. ReLU activation is being used in the hidden layers and for optimizing the weights adam optimizer is used. Table 1shows model’s with respect to various error metrics like MAE, MSE and RMSE: The graph of predicted versus actual values with MLP model is shown in Fig. 7:
Operational Flexibility with Statistical and Deep Learning Model …
231
5.5 Long Short-Term Memory (LSTM) Architecture of LSTM is dependent on types of layers and parameters adjustment of layers in the network. It consists of LSTM layer, dropout layer (to prevent overfitting) and dense layer to predict the output. After preliminary analysis of data parameters such as number of layers, neurons in each layer, loss functions and optimization algorithms are adjusted so as to give the best possible outcome. Input data consists of a sliding window consisting of 24 data points (resampled to hour). So, the input to the LSTM is 24 × 7 size data. There are a total of seven variables used to make the prediction. Architecture of Proposed LSTM Model and Forecasting Result As we can see from the architecture Fig. 8, LSTM layer each with 100 neurons has been used for extracting patterns from the data. Model is trained with upto 100 epochs and early stopping is used on data with patience value as eight which ensures if there is similar validation loss in each of eight consecutive epochs, then model will stop running and most optimal weights will be stored as output. For optimizing the weights, adam optimizer with learning rate 0.0001 is used with batch size of 256. Table 2 shows model’s performance with respect to various error metrics like MAE, MSE and RMSE: The graph of predicted versus actual values with LSTM model is shown in Fig. 9:
Fig. 8 Model architecture for LSTM (household electricity data) Table 2 Prediction performance with LSTM model Mean absolute error Mean square error 0.382
0.262
Root mean square error 0.512
232
A. Sinha et al.
Fig. 9 Prediction versus actual result of LSTM model (household electricity data)
5.6 CNN-LSTM Model The architecture of CNN-LSTM can be varied according to the number of layers, type of layers and parameters adjustment in each layer. It consists of convolution layers, pooling layers, flatten layer, LSTM layers and dense layer to predict the corresponding output. For convolution, number of filters, size of filter and strides need to be adjusted. By adjustment of these parameters to optimal level, accuracy can be significantly improved. To properly adjust the parameters of model, data should be properly analyzed. As we already know that in CNN-LSTM, CNN layers use multiple variables and extract features between them, hence improving time series forecasting significantly. As we can observe from the correlation matrix Fig. 10, there is high correlation between different variables of time series with the variable that we want to predict, i.e., Global Active Power (GAP). Input data consists of a sliding window consisting of 24 data points (resampled to hour). So, the input to the CNN-LSTM is 24 × 7 size data. There are a total of seven variables used to make the prediction (Fig. 11). Architecture of CNN-LSTM Model and Forecasting Result The graph of predicted versus actual values with CNN-LSTM model is shown below in Fig. 12: The table of predicted versus actual values Table 3 with CNN-LSTM model is shown as:
Operational Flexibility with Statistical and Deep Learning Model …
233
Fig. 10 Correlation matrix (household electricity data)
Fig. 11 CNN-LSTM model summary (household electricity data)
5.7 Proposed VAR-CNN-LSTM In this model architecture, first, we estimate VAR properly on training data, and then we extract what VAR has learned and using it to refine the training of CNN-LSTM process giving better results. Firstly, to properly create VAR model, data should be stationary. As already discussed, using Augmented Dicky Fuller, it can be verified whether a time series is stationary or not. Results obtained by applying ADF test on given time series are as follow: Results from Figs. 13, 14, 15 and 16 ADF test shows that all time series are stationary, so differentiation is not needed for the series. After doing these preliminary checks, we need to find out lag order which can be calculated using AIC (Akaike information criterion). All we need to do is to iterate
234
A. Sinha et al.
Fig. 12 Prediction versus actual result of CNN-LSTM model (household electricity data) Table 3 Prediction performance with CNN-LSTM model Mean absolute error Mean square error 0.320
0.221
Root mean square error 0.470
Fig. 13 Results of ADF on global active power
through lag orders and find out the lag order which has minimum AIC score as compared to its predecessors. In this case, 31 comes out to be the best lag order as evident in this table. After getting the best order for vector auto regression, we fit VAR model on differentiated data. VAR is able to learn linear inter dependencies in time series. This information is subtracted from raw data and get the residuals which contain nonlinear data. Figure 17 shows the forecasting result done by VAR alone (Table 4).
Operational Flexibility with Statistical and Deep Learning Model … Fig. 14 Results of ADF on global reactive power
Fig. 15 Results of ADF on voltage
Fig. 16 Results of ADF on global intensity
235
236
A. Sinha et al.
Fig. 17 Residuals left after applying VAR (household electricity data) Table 4 Akaike information criterion Lag order AIC 29 30 31 32
−5.3868 −5.3882 –5.3893 −5.3892
BIC –5.0376 −5.0271 −5.0161 −5.0040
Best value is in bold
Architecture of VAR-CNN-LSTM Proposed Model The architecture of the VAR model is shown below. As we can see from the architecture, all the important parameters from the models such as number of equations, AIC and BIC are visible here. After getting forecasting results from VAR-CNN-LSTM is trained on those forecasted results along with original data so that it learn all the intricacies from the data. Architecture of the VAR-CNN-LSTM model is shown below: The architecture of above model from Fig. 18 can be varied according to the number of layers, type of layers and parameters adjustment in each layer. It consists of convolution layers, pooling layers, flatten layer, LSTM layers and dense layer to predict the corresponding output. For convolution, number of filters, size of filter and strides need to be adjusted. By adjustment of these parameters to optimal level, accuracy can be significantly improved.To properly adjust the parameters of model, data should be properly analyzed. Input provided to the model consists of a sliding window of 24 data points (resampled to hour). So, the input to CNN-LSTM is 24 × 7 size data. Results The graph of predicted versus actual values with VAR CNN-LSTM model is shown in Fig. 19: Table 5 of predicted versus actual values with VAR-CNN-LSTM model is shown in below:
Operational Flexibility with Statistical and Deep Learning Model …
237
Fig. 18 VAR CNN-LSTM model summary (household electricity data)
Fig. 19 Prediction versus actual result of VAR CNN-LSTM model (household electricity data) Table 5 Prediction performance with VAR CNN-LSTM Model Mean absolute error Mean square error Root mean square error 0.317
0.210
0.458
238
A. Sinha et al.
Table 6 Comparative analysis of proposed model with existing methods Mean absolute error Mean squared error Root mean squared error MLP LSTM CNN-LSTM VAR CNN-LSTM
0.395 0.382 0.320 0.317
0.303 0.262 0.221 0.210
0.551 0.512 0.470 0.458
5.8 Combined results Combined results of all the algorithms is displayed in Table 6: We can observe that from the above table that both CNN-LSTM and the proposed approach perform well for given data, but proposed model performed slightly better in terms of error metrics.
6 Conclusion A VAR-CNN-LSTM model is proposed in this paper for efficient power load forecasting. A time series consists of linear and nonlinear components where linear components are handled by vector auto regression (VAR) and residuals containing nonlinear components by CNN-LSTM. The output efficiency is furthur enhanced by data preprocessing and analysis. With data preprocessing, problem of missing values is solved and data is normalized to bring values of dataset to a common scale [20]. From data analysis, correlation between variables has been discovered. For example, in household power consumption data, it was found out that global active power is correlated with all the variables in time series, so all variables are used for forecasting, but in Ontario Demand Dataset, only two variables are correlated so all others are filtered out. The proposed method is modeled for Household Power Consumption Dataset for short-term forecasting. Through the evaluation metrics, we concluded that hybrid VAR-CNN-LSTM model performed better as compared to other state-of-the-art deep learning techniques like CNN-LSTM, LSTM and multilayer perceptron in all evaluation metrics. There are some limitations with this model also. One of the limitations is that all the hyperparameters of models like number of neurons, learning rate, number of epochs, batch size etc., need to be determined by hit and trial which require a large effort and cost a lot of time. However, the model has only been tested for short-term forecasting, and as a future scope, it can be further modeled for mediumand long-term forecasting.
Operational Flexibility with Statistical and Deep Learning Model …
239
References 1. Perron J, Ibrahim H, Ilinca A (2008) Energy storage systems-characteristics and comparisons. Renew Sustain Energy Rev 12(5):1221e50 2. Stoll HG (1989) Least-cost electric utility planning. Wiley, New York 3. Timeseries (2013) 4. Cohen I. Time series-introduction. Accessed on 13 April 2020 5. Choi H-K (2018) Stock price correlation coefficient prediction with ARIMA-LSTM hybrid model. CoRR abs/1808.01560. Accessed on 13 Apr 2020 6. Cho S-B, Kim T-Y (2018) Predicting residential energy consumption using CNN-LSTM neural networks, vol 182. Elsevier, pp 72–81 7. Sridevi S, Mahalakshmi G, Rajaram S (2016) A survey on forecasting of time series data. In: 2016 International conference on computing technologies and intelligent data engineering, Kovilpatti 8. Gasparin A, Lukovic S, Alippi C (2019) Deep learning for time series forecasting: the electric load case. CoRR, abs/1907.09207. Accessed on 13 Apr 2020 9. Siami-Namini S, Tavakoli N, Siami Namin A (2018) A comparison of ARIMA and LSTM in forecasting time series. In: 2018 17th IEEE international conference on machine learning and applications (ICMLA), pp 1394–1401 10. Wang J, Yu L-C, Robert Lai K, Zhang X (2016). Dimensional sentiment analysis using a regional CNN-LSTM model. In: Proceedings of the 54th annual meeting of the association for computational Linguistics, vol 2: short papers), Berlin, Germany, August 2016. Association for Computational Linguistics, pp 225–230 11. Sherstinsky Alex (2018) Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Wiley, New York 12. Hartmann C, Hahmann M, Habich D, Lehner W (2017) CSAR: The cross-sectional autoregression model. In: 2017 IEEE international conference on data science and advanced analytics (DSAA), pp 232–241 13. Choi JY, Lee B (2018) Combining LSTM network ensemble via adaptive weighting for improved time series forecasting. Math Probl Eng 1–8 14. Chniti G, Bakir H, Zaher H (2017) E-commerce time series forecasting using LSTM neural network and support vector regression. In: Proceedings of the international conference on big data and internet of thing—BDIOT2017 15. Yan K, Wang X, Du Y, Jin N, Huang H, Zhou H (2018) Multi-step short-term power consumption forecasting with a hybrid deep learning strategy. In: MDPI J Energies 16. Narendra Babu C, Eswara Reddy B (2014) A moving-average filter based hybrid ARIMA–ANN model for forecasting time series data. Appl Soft Comput 23:27–38 17. Chatfield C (1996) The analysis of time series—An introduction. Chapman and Hall 18. Prabhakaran S (2020) Vector autoregression (VAR)—Comprehensive guide with examples in python 19. Individual household electric power consumption data set (2012) 20. Jaitley U (2019) Why data normalization is necessary for machine learning models
Compressive Spectrum Sensing for Wideband Signals Using Improved Matching Pursuit Algorithms R. Anupama, S. Y. Kulkarni, and S. N. Prasad
Abstract The upcoming wireless communication technologies increase the demand for spectrum, the dynamic spectrum allocation technique is a promising solution for the spectrum allocation. Spectrum sensing plays a key role in dynamic spectrum allocation, scanning through a wideband for the detection of spectrum holes which poses a problem of very high sampling rate and compressive sensing (CS) uses subNyquist samples to addresses this problem. This paper proposes how orthogonal matching pursuit (OMP), compressive sampling matching pursuit (CoSaMP) and stage wise orthogonal matching pursuit (StOMP) signal reconstruction algorithms can be used for detection of spectrum holes. Simulation results reveal that the probability of detection of these algorithms is very high for the detection problem than for the estimation problem. Keywords Cognitive radio · Orthogonal matching pursuit · Spectrum sensing · Compressive sensing · Wide band signal
1 Introduction Dynamic spectrum allocation where the unlicensed user who is called as secondary user (SU) uses the spectrum to send the signal in the absence of the primary user (PU) who is licensed to use the spectrum has emerged as a promising solution for the increasing demand of the spectrum. Like every other natural resource, spectrum is also a limited resource and on the other hand as per the reports of Federal Communications Commission (FCC) the licensed spectrum is underutilized [1]. The allocation R. Anupama (B) · S. N. Prasad School of Electronics and Communication Engineering, REVA University, Bengaluru, Karnataka, India e-mail: [email protected] S. N. Prasad e-mail: [email protected] S. Y. Kulkarni BNM Institute of Technology, Bengaluru, Karnataka, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 G. Sanyal et al. (eds.), International Conference on Artificial Intelligence and Sustainable Engineering, Lecture Notes in Electrical Engineering 837, https://doi.org/10.1007/978-981-16-8546-0_20
241
242
R. Anupama et al.
of the licensed spectrum to the unlicensed user (SU) in the absence of the licensed user (PU) is the idea of the cognitive radio (CR). Spectrum sensing plays a significant role in CR. Spectrum sensing is a method of scanning the spectrum to identify the spectrum holes which are the frequency bands not in use by the PU. Therefore, identifying whether the spectrum is in use by the PU or not is the primary objective of spectrum sensing. Thus, spectrum sensing is envisaged as a binary hypothesis detection problem which does not entail reconstructing the original bandpass signal. There are various algorithms to perform spectrum sensing in CR. Spectrum sensing for wideband signals is more challenging than narrowband spectrum sensing. The traditional algorithms designed for narrowband spectrum sensing cannot be implemented for wideband spectrum sensing. Narrowband spectrum sensing can be studied as a binary hypothesis test problem in which the spectrum sensing decision is taken for the entire spectrum as a single channel [16]. Therefore, the sensing methods designed for scanning narrowband spectrum will fail to identify the various spectral holes in individual bands of wideband spectrum. A straightforward solution for sensing wideband spectrum is to divide the wideband spectrum into many narrow bands and then apply narrowband spectrum sensing methods. But this solution suffers from delayed processing time due to high computational complexity. Scanning through a wideband spectrum for the identification of spectrum holes has many challenges like unacceptable sampling frequency. Compressive sensing (CS) is a potential tool for wideband spectrum sensing. Instead of considering the samples according to the Nyquist sampling theory, CS is a sub-Nyquist rate sampling technique that can be applied to any signal which is sparse in some domain [4]. The wideband signal is block sparse as shown in Fig. 1 in frequency domain therefore CS can be employed for the spectrum sensing of wideband signal which leads to reduction in the computational complexity. Another premises for the application of CS is the sensing matrix should satisfy restricted isometry property (RIP). For various applications the spectrum is allocated, and the spectrum is not in use always which gives rise to sparsity of the signal in Fourier domain as the signals appear discontinuously at some frequency. This makes a possibility to employ CS for wideband spectrum sensing. The CS was used for wideband spectrum sensing for the first time in [1]. Where the CS was used to reconstruct the signal and then wavelet-based edge detection method was used for sensing the spectrum holes. In this paper the spectrum sensing problem is looked at as an estimation problem, where the exact reconstruction of the signal is not required. The algorithms are simulated to check the capability of detecting the spectrum holes. We have applied
Fig. 1 Wideband spectrum model [14]
Compressive Spectrum Sensing for Wideband Signals …
243
three reconstruction algorithms, orthogonal matching pursuit (OMP), compressive sampling matching pursuit (CoSaMP) and stage wise orthogonal matching pursuit (StOMP). The detection level of these algorithms has been compared with respect to different parameters of the signal model. The efficiency of algorithms for reconstruction problem and detection problem is also presented.
2 Compressive Sensing Compressive sensing is applicable provided; the signal is sparse in some domain. The wideband signal is sparse in Fourier domain as seen in the Fig. 1. In CS, the measurements samples can be considered as a linear combination of the sensing matrix arrays. Consider a wideband N-dimensional signal g(t) which can be mathematically epitomized as a weighted sum of the basis vectors of a matrix [12]. g(t) =
N
si i (t)
(1)
i=1
In the matrix form, g = s
(2)
The meaning of sparsity is a very few samples of si are non-zero elements. The signal is K sparse if si has K non-zero elements, where K is less than N. The signal g(t) can be faithfully reconstructed using CS by using fewer samples than N. Let us consider the compressed measurement signal as y which is a M × 1 vector. The knowledge of y and the degree of sparsity K will suffice for the reconstruction of g(t) [6]. The measurement vector y is attained by projecting g on to a M × N basis matrix φ. The model can be represented as [y] M×1 = [] M×N [g] N ×1
(3)
y = g
(4)
Therefore,
With the signal model stated above, where K < M < N , Eq. (4) is an under determined system. The goal of the sensing algorithms is to estimate the signal g from y [7]. Obviously, the intention is to reduce the computations and achieve reconstruction with minimum errors. The columns of is referred as an ‘atom’. Since the signal is a K sparse it can be observed that y is a linear combination of K atoms.
244
R. Anupama et al.
Consider signal g = [g1 , g2 , . . . , g N ]T which is K sparse. Let the support set J represent the indices of the non-zero elements of g that is, J = {i : [gi = 0]}. From y if the indices of the atoms contributing to the non-zero value in g can be extracted, then g can be reconstructed [4]. The sensing matrix can be constructed using independent and identically distributed random numbers. For the simulation, the elements of are independently drawn from normally distributed pseudorandom numbers. With this idea of CS, let us discuss the algorithms designed for the reconstruction of [g] N ×1 from [y] M×1 where M < N .
3 Sensing Algorithms The wideband spectrum sensing involves 2 steps. Obtaining the compressed measurements and estimating the support set for detecting the spectrum holes [9]. In this work the compressive measurements are obtained using analog information conversion (AIC) and the support set is acquired using three algorithms namely, OMP, StOMP and CoSaMP [5].
3.1 Orthogonal Matching Pursuit (OMP) The OMP is a step wise forward selection algorithm and is easy to implement. The algorithm adds single element in each iteration to the support estimate [3, 9, 13]. The element to be inserted to the support estimate is calculated by projecting the residue on to the sensing matrix. If the sparsity parameter K is known, the iteration stops after K cycles, otherwise the iteration continues until the residual signal is less than the preset threshold. Algorithm Steps Input: Sensing measurement matrix , measurement vector y and order of sparsity K. Output: Estimated signal of g, denoted as g. ˆ 1. 2. 3. 4. 5. 6.
Initialize: gˆ 0 = 0, residue r0 = y, the count value i = 0 and the index V = Set i = i + 1 /V Generate u i = T ri−1 and select J = supp(u), such that J ∈ Update Vi = Vi−1 ∪ J Reconstruct gˆ = i+ y Update the residue ri = y − gˆ
Compressive Spectrum Sensing for Wideband Signals …
7. 8.
245
Termination condition: if ri 2 > ri−1 2 or i > K go to step 8 with i = i − 1. Otherwise go to step 2. Output gˆ
In the algorithm + is called the pseudo inverse of the matrix , whereas supp() is the support of the vector supp(u) = {m : u m = 0}.
3.2 Stage Wise Orthogonal Matching Pursuit (StOMP) The stage wise orthogonal matching pursuit (StOMP) is an algorithm developed by doing slight modification to the original OMP algorithm. The OMP algorithm addresses one vector after another, whereas StOMP uses several vectors at a time [8, 11]. The inner product of y is compared with the basis vectors of measurement matrix . Every vector crossing the preset threshold value are calculated according to the algorithm. The approximation is done using least square method. Nevertheless, the enhancement in the results is based on the selection of the threshold. Algorithm Steps Input: Sensing measurement matrix , measurement vector y and threshold z. Output: Estimated signal of g, denoted as g. ˆ 1. 2. 3. 4. 5. 6. 7. 8.
Set the initial value of residue as r0 = y, the count value i = 0 and support set V = {} Declare i = i + 1 Generate u i = φ T ri save the index of the values of u which are higher than the preset threshold z to J, J = {i : |u i | > z} Create Vi = Vi−1 ∪ J , Select the unique values of Vi Reconstruct gˆ = i+ y Calculate the residue, ri = y − gˆ Termination condition: If ri 2 < z or Vi = Vi−1 , go to step 8 with i = i −1. Otherwise go to step 2. Output gˆ
3.3 Compressive Sampling Matching Pursuit (CoSaMP) Relating the CoSaMP algorithm to other greedy algorithms, this algorithm has the capability of detecting multiple atoms in one iteration [2, 10]. Therefore, CoSaMp can converge quickly compared to other OMP-based algorithms. Another advantage of CoSaMP is the idea of back check. This improves the reconstruction. One more point to be observed is the problem of threshold selection is avoided in CoSaMP.
246
R. Anupama et al.
The CoSaMP algorithm converges quickly and poses good immunity to the noise. The disadvantage of the algorithm is the requirement of the knowledge about the sparsity degree of the signal. Due to this reason the COSaMP has limited practical applications. Algorithm Steps Input: Sensing measurement matrix , measurement vector y and order of sparsity K. Output: Estimated signal of g, denoted as g. ˆ 1. 2. 3. 4. 5. 6. 7. 8. 9.
Initialize: gˆ 0 = 0, residue r0 = y, the count value i = 0 and the index V is empty Set i = i + 1 T ri−1 and select the J = supp(u) Generate u i = φi−1 Update Vi = Vi−1 ∪ J and i = [i−1 J ] Reconstruct g˜ = i+ y Retain the K maximum values of g˜ to obtain gˆ Residue updating ri = y − gˆ Termination condition: if i < K , return to step 2. Otherwise go to step 9. Output gˆ
4 Simulation and Results The algorithms discussed are simulated to demonstrate and evaluate the performance. The signal g is a random signal. The value of M and N are chosen as 50 and 90, respectively. AWGN is added to the signal to simulate the practical model. The number of active bands which are in use by PU is K = 10. Random frequency slots are allotted for the K number of primary users. The performance results are assessed based on the probability of detection Pd . Pd =
Number(Tests satisfying condition) Number(iterations)
(5)
Figure 2 depicts the difference between the original samples and the reconstructed samples for different SNR. This result is simulated using CoSaMP algorithm for 10 active sub bands. It is seen that the detection is better as the SNR improves. Figure 3 shows the performance of the OMP and CoSaMP algorithms for reconstruction and sensing problem for different SNR. It is seen that CoSaMP and OMP algorithms performance for the detection problem is good compared to reconstruction of the signal. The reconstruction requires estimation of exact signal whereas the sensing requires the decision between presence and absence of the signal. The performance of OMP, StOMP and CoSaMP algorithm for the different values of SNR with K = 10 and different number of active bands with SNR = 0 dB in the
Compressive Spectrum Sensing for Wideband Signals …
247
Fig. 2 The comparison between the actual signal samples and the reconstructed signal samples
Fig. 3 Detection probability of OMP and CoSaMP for estimation and detection
wideband signal are illustrated in the Figs. 4 and 5, respectively. The trend of the curve demonstrates that the probability of detection increases with increase in the SNR and CoSaMP gives better detection compared to other two algorithms. In Fig. 5, as the number of sub bands having PU signal increases the probability of the detection
248
R. Anupama et al.
Fig. 4 Detection probability of OMP, StOMP and CoSaMP for detection problem
Fig. 5 Detection probability of OMP, StOMP and CoSaMP for detection and estimation problem
decreases. If the threshold value of StOMP algorithm is chosen appropriately, then it outperforms the other two algorithms. The threshold chosen for this simulation is 0.5.
Compressive Spectrum Sensing for Wideband Signals …
249
5 Conclusion and Future Scope The inference from the discussed results is spectrum sensing of wideband signals does not require exact reconstruction of the signal. Therefore, CS can be implemented to sense the spectrum with less computational complexity. Simulation results of OMP, CoSaMP and StOMP shows that CoSaMP algorithm performance is better with the knowledge of sparsity level of the signal, otherwise the StOMP algorithm with suitable threshold gives the best results. The estimation of sparsity level is challenging, hence algorithms with suitable feature extraction capability must be designed to increase the detection probability. The computational complexity can be further reduced to ensure that the SU can quickly access the spectrum in the absence of the PU and vacate the spectrum when PU emerges. Thus computational complexity reduction plays an important role in SU not causing noisy interference to the licensed user.
References 1. Tian Z, Giannakis GB (2007) Compressed sensing for wideband cognitive radios. Proc IEEE Int Conf Acoust Speech Signal Process 4:IV-1357–IV-1360 2. Needell D, Tropp JA (2009) CoSaMP: iterative signal recovery from incomplete and inaccurate samples. Appl Comput Harmon Anal 26(3):301–321 3. Tropp JA, Gilbert AC (2007) Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans Inf Theory 53(12):4655–4666 4. Jin Y, Rao BD (2013) Support recovery of sparse signals in the presence of multiple measurement vectors. IEEE Trans Inf Theory 59(5):3139–3157 5. Blanchard JD, Cermak M, Hanle D, Jing Y (2014) Greedy algorithms for joint sparse recovery. IEEE Trans Signal Process 62(7):1694–1704 6. Mishali M, Eldar YC (2009) Blind multiband signal reconstruction: compressed sensing for analog signals. IEEE Trans Signal Process 57(3):993–1009 7. Tropp JA, Laska JN, Duarte MF, Romberg JK, Baraniuk RG (2010) Beyond Nyquist: efficient sampling of sparse bandlimited signals. IEEE Trans Inf Theory 56:520–544 8. Donoho DL, Starck JL (2010) Sparse solution of underdetermined systems of linear equations by stagewise orthogonal matching pursuit. IEEE Trans Inf Theory 58(2):520–544 9. Mourad N, Sharkas M, Elsherbeny MM (2016) Orthogonal matching pursuit with correction. IEEE Int Colloquium Signal Process Appl 247–252 10. Wang X, Jia M, Gu X, Guo Q (2018) Sub-Nyquist spectrum sensing based on modulated wideband converter in cognitive radio sensor networks. IEEE Access Special Section Mission Critical Sens Sensor Networks 6:40411–40419 11. Dongxue L, Sun G, Li Z, Wang S (2019) Improved CoSaMP reconstruction algorithm based on residual update. J Comput Commun 7:6 12. Wang H, Xia K, Niu W (2017) Improved research on Stagewise orthogonal matching pursuit (StOMP) algorithm. Comput Eng Appl 53(16):55–61 13. Wei Y, Lu Z, Yuan G, Fang Z, Huang Y (2017) Sparsity adaptive matching pursuit detection algorithm based on compressed sensing for radar signals. Sensors 17:1120 14. Zhenzhen Y, Zheng Y, Sun L (2013) A survey on orthogonal matching pursuit type algorithms for signal compression and reconstruction. J Signal Process 29(4):486–496 15. Aswathy GP, GopaKumar K (2018) Wideband spectrum sensing using modulated wideband converter by revised orthogonal matching pursuit. IEEE international conference on control, power, communication and computing technologies, 179–184
250
R. Anupama et al.
16. Bhagate SV, Patil S (2017) Maximizing spectrum utilization in cognitive radio network. IEEE international conference on big data, IoT and data science, 82–90 17. Anupama R, Kulkarni SY, Prasd SN (2019) Comparative study of narrowband and wideband opportunistic spectrum access techniques, IEEE international conference on distributed computing, vlsi, electrical circuits and robotics, Manipal, India, 1–5
Controller Design for Steering and Diving Model of an AUV Ravishankar P. Desai and Narayan S. Manjarekar
Abstract A linearised decoupled steering/yaw and diving/depth subsystem of an autonomous underwater vehicle (AUV) are taken in this paper. Two linear control schemes are proposed to achieve the steady-state tracking and disturbance rejection for steering subsystem and diving subsystem. The proposed control schemes are compared in the presence of time-variant (TV) and time-invariant (TIV) type disturbance for set-point tracking and sinusoidal reference tracking, respectively. The effectiveness and efficacy of proposed control performance are presented through a simulation result. Keywords AUV · PD-controller · R-controller · Reference tracking · Disturbance rejection
1 Introduction One of the essential and inevitable parts of human life is a vehicle. Amongst ubiquitous ground and air vehicle, marine underwater vehicle has challenging environment because the underwater ocean environment is aggressive, communication is challenging and ocean external disturbances (tide, wave and ocean current) can adversely bear upon the vehicle dynamics involved in the mission. The underwater vehicle modelling and control synthesis are challenging problems due to highly coupled dynamics, payload variation, uncertainties in environment and external disturbance. Hence, the control system design having learning and adaptive capability to deal the challenging problem of the vehicle as well as design structure must be simple, robust, fast convergent, adaptive and intelligent towards the underwater environment. Also, accomplish the potential application in military, navy, civil, scientific and industry areas. The proposed work is suitable for multirole applications depending on R. P. Desai (B) · N. S. Manjarekar Department of Electrical and Electronics Engineering, BITS Pilani, K K Birla Goa Campus; NH 17B, Zuarinagar, Goa 403726, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 G. Sanyal et al. (eds.), International Conference on Artificial Intelligence and Sustainable Engineering, Lecture Notes in Electrical Engineering 837, https://doi.org/10.1007/978-981-16-8546-0_21
251
252
R. P. Desai and N. S. Manjarekar
mission capability/requirement such as military (search and rescue, investigation, payload delivery and mapping of an ocean floor, an inspection of wreckage, antisubmarine warfare etc.), navy (deploy and retrieve devices, transmit or gather all types of information), research (archaeological and geological survey, environmental monitoring and marine biological studies etc.) and industry (underwater cable tracking and structure inspection etc.). In this work, a linear control technique is proposed to control the linearized decoupled model of an AUV for steering and diving subsystem. In order to achieve the designed control scheme such that the desired performance of the vehicle steering/yaw and diving/depth motion of the vehicle by rejecting TI and TIV disturbances in the presence of setpoint and sinusoidal reference tracking. The paper is summarised as compasses. Section 2 generalises related work and Sect. 3 introduces the mathematical modelling of steering and diving subsystem of AUV. The design and implementation of controller presented in Sect. 4. The Sects. 5 and 6 shows the controller structure and simulation result, respectively. Finally, a conclusion is drawn in Sect. 7.
2 Related Work The complete modelling details, analysis and control scheme of the ocean vehicle are found in [1] and its decoupled subsystem model in [2]. The versatile control methodologies are presented in the literature to control the AUV in the presence and absence of external disturbance and uncertainties. Some of these control techniques for steering and diving control are considered here. For non-linear decoupled model, a backstepping controller is designed in [3] to stabilize the yaw and pitch orientation by ensuring robustness and internal stability of AUV. In [4], sliding mode controller (SMC) based on state feedback is designed for a decoupled subsystem of slow-speed AUV under the influence of vehicle speed, uncertainty in modelling and disturbance. Better sensitivity and improved stability of the closed-loop system is achieved in the presence of environmental disturbance using robust PD-like fuzzy control presented in [5]. An LQG and H∞ controller design is applied to the diving and steering system and proposed design performance and robustness verified using a frequencydomain technique with wave disturbance [6]. For a linear decoupled model of an AUV, the UDE-based SMC scheme is proposed in [7] and its effectiveness presented through simulation result in the presence of external disturbances and parametric uncertainties. In [8], a multimode control strategy on soft switching is proposed for different operating condition of steering control of the vehicle. An adaptive control strategy is developed using NARMAX structure in the presence of a change in payload and parameter variation presented through experimental result in [9]. A nonlinear decoupled model with non-linear PD and PD+ controller is designed under robustness towards changes in damping and buoyancy shown through experimental results in [10]. An AUV control problem of input saturation and uncertain dynamic is considered in [11] and proposed an adaptive anti-windup strategy to control the
Controller Design for Steering and Diving Model of an AUV
253
pitch and yaw channel of an AUV. An underwater vehicle parameter is estimated in [12] for prototype AUV. Linear controller design basics are adopted from [13]. A large dive depth control problem of underactuated AUV is considered in [14] and to deal with it, a backstepping controller is designed based on bio-inspired model and virtual velocity variables are used instead of velocity error. To solve the depth control problem, a model-free reinforcement learning framework is proposed in discrete time [15] and to reduce the roll at low speed near surface a variable structure controller is presented in [16]. A two-loop control structure PD+ neuro adaptive generalised dynamic inversion (NAGDI) demonstrate in [17] for better attitude and positional tracking performance and unaffected for parametric variations and external disturbances. A heading control system based on SMC is designed in order to reduce the tracking error signal by introducing a smoothing filter to the reference signal in [18]. In [19], a feedback control with observer is proposed for heading subsystem of AUV to enhance heading control action.
3 Modelling of an AUV The fundamental building block of marine vehicle consists of kinematics, provides geometrical aspects of motion and dynamics and provides an analysis of the forces that contribute to the motion. A math model of an AUV is used to control the vehicle motion in 6-DOF (surge, sway, heave, roll, yaw, pitch) in cartesian co-ordinate called as earth-fixed frame (X E − Y E − Z E ) and body-fixed frame (X 0 − Y0 − Z 0 ) as shown in Fig. 1. To describe the vehicle position and orientation, earth-fixed frames are used while for velocity and acceleration of the vehicle, body-fixed frames are used. It is inevitable to make co-ordination between frames. The linear and angular velocity transformation is used to describe the transformation of two co-ordinate frames. Usually, the origin of the vehicle is set as the centre of gravity/buoyancy. The motion of the vehicle represented in ordinary differential equations describes the vehicle motion in a 6-DOF non-linear equation having three translational and rotation along three-dimensional (3D) axis, i.e. x, y and z axis. The kinematic equation of an AUV is formulated using Euler angle which consists of transformation between linear and angular velocities with position and attitude. It expressed as: η˙ = J (η)ν (1) For the dynamic equation of an AUV is formulated using a Newtonian approach which consists of hydrodynamic forces and moments (restoring forces and moments, added mass and inertia, hydrodynamic damping) and environmental disturbances (wave, wind and ocean currents). It expressed as: M v˙ + C(v)v + D(v)v + g(η) = τ + τd
(2)
254
R. P. Desai and N. S. Manjarekar
Fig. 1 A schematic of an AUV
A highly coupled non-linear 6-DOF equation is decoupled into lightly/non-interacting subsystem. The control system consists of the following three subsystems of an AUV with its state variables and control variables are: 1. Speed control system- u(t) and η(t) 2. Steering/Yaw control system- v(t), r (t), ψ(t) and δr (t) 3. Diving/Depth control system- w(t), q(t), θ (t), z(t) and δs (t) In this paper, steering subsystem and diving subsystem is considered as control target with/and without and TI/and TIV disturbance. To linearize the AUV dynamics for the decoupled system, it is assumed that the x-z and x-y axis of the vehicle is geometrically symmetric. A submerged AUV motion is achieved by using its control surface and propeller.
3.1 Steering Subsystem Model When vehicle steer in the lateral direction, the corresponding change in rudder with deflection causes moment in yaw on the vehicle and results in change in steering direction. The following assumptions are taking into account while linearizing a steering/yaw subsystem:
Controller Design for Steering and Diving Model of an AUV
1. 2. 3. 4.
255
vehicle forward velocity u 0 is invariant to hold the steering, the linear velocity w = 0 in the motion of Z-axis direction the angular velocity p = 0 and the roll angle φ = 0 in the rotation about X-axis the angular velocity q = 0 and the yaw angle θ = 0 in the rotation about Y -axis
Then the linearized equation of steering subsystem is ψ˙ =
cos φ sin φ q+ r cos θ cos θ
(3)
m[ur + v˙ − wp − yg (r 2 + p 2 ) + x g (qp + r˙ ) + z g (qr − p)] ˙ = Y H S + Yv˙ v˙ + Yr˙ r˙ + Yv|v| v|v| + Yr |r | r |r | + Yuv uv + Yur ur + Ywp wp + Y pq pq +Yuuδr u 2 δr (4)
m[x g (v˙ − wp + ur ) − yg (u˙ − vr + wq)] + Izz r˙ + (I yy − Ix x ) pq = N H S + Nv˙ v˙ + Nr˙ r˙ + Nr |r |r |r | + Nv|v| v|v| + Nur ur + +Nuv uv + N pq pq +Nwp wp + Nuuδr u 2 δr (5) The desired sway velocity is very small, i.e. v = 0 during steering, then the simplified linearized equation represented in state-space form is
Iz − Nr˙ 0 0 1
r˙ ψ˙
=
Nr 0 1 0
r ψ
+
Nδr 0
δr ; Y = 0 1
r ψ
(6)
and the open-loop transfer function (OLTF) is written as G ψ (s) =
Nδr Iz −Nr˙ Nr s 2 − Iz −N r˙
=
ψ(s) δs (s)
(7)
3.2 Diving Subsystem Model When a vehicle is diving in the longitudinal direction, the corresponding change in stern planes (two horizontal fins) with deflection causes fin lift force and corresponding moment in pitch will be changed for the vehicle, and resulting pitch angle will change. The change in pitch angle during constant forward vehicle speed will result in rising/submerging of the vehicle in diving direction. The following assumptions are taking into account while linearising a diving/depth subsystem:
256
R. P. Desai and N. S. Manjarekar
1. 2. 3. 4.
vehicle forward velocity u 0 is invariant to hold the depth, the linear velocity v = 0 in the motion of Y -direction the angular velocity p = 0 and the roll angle φ = 0 in the rotation about X-axis the angular velocity r = 0 and yaw angle ψ = 0 in the rotation about the Y -axis.
Then the linearized equation of diving subsystem is Z˙ = −(sin θ )u + (cos θ sin φ)v + (cos θ cos φ)w
(8)
θ˙ = (cos φ)q − (sin φ)r
(9)
m[−uq + vp + w˙ + x g (r p − q) ˙ + yg (rq + p) ˙ − z g ( p 2 + q 2 )] = Z H S + +Z w˙ w˙ + Z q˙ q˙ + Z uq uq + Z vp vp + Z w|w| w|w| + Z q|q| q|q| +Z uw uw + Z r p r p + Z uuδs u 2 δs (10) m[−x g (w˙ − uq + vp) + z g (u˙ − vr + wq)] + I yy p˙ + (Ix x − Izz )r p = M H S + Mw˙ w˙ + Mq˙ q˙ + Mw|w| w|w| + Mq|q| q|q| + Muq uq + Mvp vp (11) +Muw uw + Mr p r p + Muuδs u 2 δs The desired heave velocity is very small, i.e. w = 0 during diving, then the simplified linearized equation represented in state-space form is ⎤⎡ ⎤ ⎡ ⎤ ⎤⎡ ⎤ ⎡ ⎡ ⎤ q q˙ Mq 0 M θ I y − Mq˙ 0 0 q Mδs 010 ⎣ ⎦ ⎣ 0 1 0 ⎦ ⎣ z˙ ⎦ = ⎣ 0 0 −u 0 ⎦ ⎣ z ⎦ + ⎣ 0 ⎦ δs ; Y = z 001 1 0 0 θ 0 0 01 θ θ˙ (12) ⎡
and the OLTF is written as G θ (s) =
s2
−
Mδs I y −Mq˙ Mq Mθ s − I y −M I y −Mq˙ q˙
G z (s) = −
=
Z (s) u0 = s θ (s)
θ (s) δs (s)
(13)
(14)
4 Controller Design and Implementation To achieve the desired control objective in the presence of modelling error, uncertainties, disturbance etc., a controller design and its structure play a vital role to ensure that the designed control structure has steady-state tracking and desirable dynamics.
Controller Design for Steering and Diving Model of an AUV
257
Performance criteria of a closed-loop system must satisfy the stability, good set-point tracking and disturbance rejection, and robust in the presence of modelling error and uncertainties. The design and implementation of linear controller is adopted from [13].
4.1 Proportional (P)-Controller The P-controller is one of the simplest controller mainly used when a small load change and the existence of a small dead time. To find the proportional gain (K P ), we used root locus method. The behaviour of the feedback system is dependant on K P because any variation in K P corresponding change/effect will occur in the location of the pole-zero. The control signal U for a P-controller is written as U (s) = K P (y(s) − r (s)) = K P e(s)
(15)
where y(s) is desired output, r (s) is reference input/set point, and e(s) is error between desired output and reference input/set point.
4.2 Proportional+Derivative (PD)-Controller The PD-controller is used to handle the fast varying load in the presence of tolerable offset error. One of the important function of the PD-controller is to improve damping, reduce overshoot and provide very fast stability. From an application perspective, in the presence of oscillatory behaviour/and unstable nature of model PD-controller is a suitable choice. Consider, the second-order linear time-invariant (LTI) system describes the AUV dynamics with the form, Z0 (16) G(s) = 2 s + P1 s + P0 and the PD-Controller form is C(s) = K p +
Kd s τfs + 1
(17)
where K p is proportional gain, K d is derivative gain, and τ f is filter time constant. Now, for design simplicity the lead-lag form of PD-controller is C(s) =
C1 s + C0 s + L0
(18)
258
R. P. Desai and N. S. Manjarekar
where the controller and lead-lag parameter relate the following form Kp =
C0 C1 C0 1 ; Kd = − 2;τf = L0 L0 L0 L0
(19)
From Eqs. (16) and (18), the OLTF is written as G(s)C(s) =
s2
C1 s + C0 Z0 + P1 s + P0 s + L 0
(20)
From Eq. (20), the actual closed-loop characteristic is Acl = s 3 + s 2 (P1 + L 0 ) + s(P0 + L 0 P1 + Z 0 C1 ) + (L 0 P0 + Z 0 C0 )
(21)
and the desired closed-loop poles of lead-lag compensator should be stable, i.e. left half side(LHS) of complex plane. Then, the desired closed-loop characteristic equation (22) Adcl = s 3 + s 2 Ad2 + s Ad1 + Ad0 we will get by letting Acl = Adcl following linear equation forms: P1 + L 0 = Ad2 : s 2 P0 + L 0 P1 + Z 0 C1 = Ad1 : s 1 L 0 P0 + Z 0 C0 = Ad0 : s 0
(23)
4.3 Resonant (R)-Controller The sinusoidal reference signal tracking and sinusoidal disturbance signal rejecting is often required in the electromechanical (ex. robotic) system. The R-controller gives excellent performance over PI-controller for tracking and rejecting a sinusoidal signal. It guarantees unconditional closed-loop stability for the collocated system. It has the ability to provide absolute gain at a bound frequency and for other frequency, no gain and phase shift will occur. From Eq. (16), the resonant control structure for the two-order system is chosen as, C(s) =
C3 s 3 + C2 s 2 + C1 s 1 + C0 s 3 + s 2 L 0 + sw02 + w02 L 0
(24)
From Eqs. (16) and (24), the OLTF is written as, G(s)C(s) =
Z 0 (C3 s 3 + C2 s 2 + C1 s 1 + C0 ) (s 2 + P1 s + P0 )(s 3 + s 2 L 0 + sw02 + w02 L 0 )
(25)
Controller Design for Steering and Diving Model of an AUV
259
The pole assignment technique is adopted to design the R-controller for the sinusoidal disturbance rejection with its w0 frequency. The reference tracking, disturbance rejection and speed of closed-loop response is achieved by proper selection of the desired closed-loop poles. From Eq. (25), the closed-loop characteristic equation is Acl = s 5 + s 4 (P1 + L 0 ) + s 3 (w02 + P1 L 0 + P0 + Z 0 C3 ) + s 2 (w02 L 0 + P1 w02 +P0 L 0 + Z 0 C2 ) + s(w02 P1 L 0 + w02 P0 + Z 0 C1 ) + (w02 P0 L 0 + Z 0 C0 ) (26) and the desired closed-loop poles of lead-lag compensator should be stable, i.e. left half side (LHS) of the complex plane. Then, the desired closed-loop characteristic equation (27) Adcl = s 5 + Ad4 s 4 + s 3 Ad3 + s 2 Ad2 + s 1 Ad1 + Ad0 we will get by letting Acl = Adcl following linear equation forms:
w02
P1 + L 0 = Ad4 : s 4 + P1 L 0 + P0 + Z 0 C3 = Ad3 : s 3
w02 L 0 + P1 w02 + P0 L 0 + Z 0 C2 = Ad2 : s 2 w02 P1 L 0 + w02 P0 + Z 0 C1 = Ad1 : s 1 w02 P0 L 0 + Z 0 C0 = Ad0 : s 0
(28)
5 Control Structure of an AUV 5.1 Steering Subsystem Control Structure The feedback control structure is used to control the steering motion of the vehicle. To steer the vehicle, steering controller is designed and assured the closed-loop stability. The steering subsystem control structure as shown in Fig. 2, consists of two state variables; yaw angular velocity (r ), and heading angle (ψ) with rudder deflection (δr ) as control input. The control structure loop is controlled using linear control technique a PD-controller and R-controller, respectively. The vehicle parameter presented in [12] is considered and from Eq. (7) and the OLTF of the control loop written as: G ψ (s) =
s2
−4.4 + 1.5044s
(29)
260
R. P. Desai and N. S. Manjarekar
Fig. 2 A control structure of steering subsystem
5.2 Diving Subsystem Control Structure The cascade form control structure is used to control the diving motion of the vehicle and each loop is controlled for a unique purpose. The inner-loop controller is designed firstly for pitch control which provides a sufficient pitch angle (θ ) for desired depth in the appearance of disturbance. Then, the pitch loop (inner-loop) controller uses measurement of pitch angle (θ ) which is contributed by stern plane deflection (δs ), and it guarantees the closed-loop stability. The depth loop (outer-loop) controller is designed for exact diving position of the vehicle. Figure 3 shows the diving subsystem control structure and it consists of three state variables such as pitch angular velocity (q), pitch angle (θ ) and depth position (z) with stern plane deflection (δs ) as control input. The control structure has two loops called pitch loop and depth loop control system which is controlled using linear control techniques, a PD+P controller and R+P controller, respectively. The vehicle parameter presented in [12] is considered and from Eqs. (13) and (14), the OLTF of the pitch and depth control loop is written as: −4.4006 (30) G θ (s) = 2 s + 1.5009s + 0.9671 G z (s) = −
Fig. 3 A control structure of diving subsystem
1 s
(31)
Controller Design for Steering and Diving Model of an AUV
261
6 Simulation Results In this section, the efficacy and effectiveness of the proposed control scheme for set-point tracking and sinusoidal trajectory tracking with TV and TIV under the robustness category are discussed. A qualitative and quantitative analysis was carried out for decoupled steering subsystem and diving subsystem of an AUV using simulation tool MATLAB R2019a.
6.1 Steering Subsystem Control Open-loop system analysis Using steering OLTF from Eq. (29), the resultant vector sum of the linearized subsystem are obtained as ( p = 0, −1.5044) with one real pole at origin and another one is a stable pole. The Eq. (29) is fully state controllable and state observable. The pole-zero (PZ), root locus (RL), open-loop step and sinusoidal response plot of yaw control loop for set point u(t) and sinusoidal reference input sin(0.1)t are shown in Fig. 4. Close-loop system analysis A steering subsystem control structure consists of a lateral controller. The proposed control scheme verifies through the performance of the initial conditions of an AUV and those are considered as [v; r ; ψ]T = [0; 0; 0]T with linear velocities are [u; v; w]T = [1; 0; 0]T and Adcl = (s + 6)5 is chosen as desired close-loop poles. Table 1 shows the parameters of controller setting. The simulation response of set point and sinusoidal reference tracking with and without external disturbance of the steering system with control effort are shown in Figs. 5 and 6. A smooth steering trajectory tracking is achieved within attributes- > add. Binarize the mean value on attribute ‘Mean’ after creating attribute ‘High/Low affected COVID States’ as we would not be able to do on last field. To do so, we select: choose- > filters- > unsupervised- > attributes- > NumericToBinary. Now select High against mean_binarized = 1 and low against mean_binarized =0 We now perform SVM classification on preprocessed data by click on classify button. Then select: Choose- > classifiers- > functions- > SMO as: We can choose the test options as follows: (Figs. 1, 2, 3)
3.4.2
Doing Analysis Using Weka on District-Wise COVID-19 Dataset
We only keep the necessary fields in the file and save the new CSV file by deleting unnecessary fields, and the original file contains the following areas: (Table 2) 1. 2. 3.
4. 5.
Open Weka tool and upload the above file in weka explorer. Once uploaded we remove the fields like SINO, state_code, state, district_key by ticking on checkbox and pressing on remove button and we get: Now create a new field as ‘Value’ such that it contains the evaluation as: floor((0.4*a2) + (0.3*a3) + (0.2*a4) + (0.1*a5)) where a2 is confirmed column, a3 is active column,a4 is recovered column, and a5 is deceased column fields of the dataset. Select in preprocess window: choose- > filters- > unsupervised- > attributes> addExpression. Press OK and apply the changes in Weka tool. Create copy of Value by clicking on choose and select: Weka > filters > unsupervised > attribute > copy.
360
D. Jain et al.
Table 1 State wise data analysis State
Confirmed
Recovered
Deaths
Active
Maharashtra
74,860
32,329
2587
39,944
Tamil Nadu
25,872
14,316
208
11,348
Delhi
23,645
9542
615
13,488
Gujarat
18,117
12,212
1122
4783
Rajasthan
9652
6744
209
2699
Uttar Pradesh
8729
5176
229
3324
Madhya Pradesh
8588
5445
371
2772
State Unassigned
7123
0
0
7123
West Bengal
6508
2580
345
3583
Bihar
4326
2025
25
2276
Karnataka
4063
1514
53
2494
Andhra Pradesh
3971
2464
68
1439
Telangana
3020
1556
99
1365
Jammu and Kashmir
2857
1007
34
1816
Haryana
2953
1082
23
1848
Odisha
2388
1416
9
963
Punjab
2376
2029
47
300
Assam
1757
414
4
1336
Kerala
1495
651
12
832
Uttarakhand
1066
259
8
795
Jharkhand
726
296
5
425
Chhattisgarh
564
130
1
433
Tripura
520
173
0
347
Himachal Pradesh
357
144
6
204
Chandigarh
301
214
5
82
Ladakh
90
48
1
41
Manipur
108
38
0
70
Puducherry
88
30
0
58 69
Goa
126
57
0
Nagaland
58
0
0
58
Andaman and Nicobar
33
33
0
0
Meghalaya
33
13
1
19
Arunachal Pradesh
29
1
0
28
Mizoram
14
1
0
13
Dadra and Nagar Haveli and Daman and Diu
11
1
0
10
Sikkim
2
0
0
2
Lakshadweep
0
0
0
0
A Robust Approach of COVID-19 Indian Data Analysis …
361
Fig. 1 Preprocessed data
Fig. 2 Classification setup
6.
Make a new attribute as zones and label it as ‘Red’, ‘Orange’, and ‘Green’. by selecting options Weka > filters > unsupervised > attribute > Add, and then click on textbox next to choose button to add following information as follows:
362
D. Jain et al.
Fig. 3 Test options
• AttributeIndex = last • attributeName = ZONES • nominalLabels = RED-ZONE,ORANGE-ZONE,GREEN-ZONE 7. 8. 9. 10.
Click ok and then apply the changes to see the new attribute. Now goto Weka > filters > unsupervised > attribute > mathExpression, and add the expression = ifelse(A7 > 1000,3,ifelse(A7 > 100,2,1)). We apply numeric to nominal on Copy of value field by selecing: Weka > filters > unsupervised > attribute > NumericToNominal Final preprocessed data we get is: Now apply SVM on district-wise dataset (Fig. 4).
4 Experimental Results 4.1 Results Obtained for State-Wise Dataset We choose percentage split option; we get following result in Fig. 5.
State
State Unassigned
Andaman and Nicobar Islands
Andaman and Nicobar
Andaman and Nicobar
Andhra Pradesh
Andhra Pradesh
Andhra Pradesh
Andhra Pradesh
Andhra Pradesh
Andhra Pradesh
Andhra Pradesh
Andhra Pradesh
Andhra Pradesh
State_Code
UN
AN
AN
AN
AP
AP
AP
AP
AP
AP
AP
AP
AP
Table 2 District-wise COVID-19 data analysis
AP_Prakasam
AP_Other State
AP_Kurnool
AP_Krishna
AP_Guntur
AP_East Godavari
AP_Chittoor
AP_Anantapur
AP_Foreign Evacuees
AN_South Andaman
AN_North and Middle Andaman
AN_Nicobars
UN_Unassigned
District_Key
Prakasam
Other State
Kurnool
Krishna
Guntur
East Godavari
Chittoor
Anantapur
Foreign Evacuees
South Andaman
North and Middle Andaman
Nicobars
Unassigned
District
80
446
724
477
496
224
262
220
112
32
1
0
5630
Confirmed
14
249
134
132
98
169
69
66
112
0
0
0
5630
Active
66
197
565
327
390
53
191
150
0
32
1
0
0
Recovered
0
0
25
18
8
2
2
4
0
0
0
0
0
Deceased
A Robust Approach of COVID-19 Indian Data Analysis … 363
364
D. Jain et al.
Fig. 4 Model development
Fig. 5 State-wise dataset results
4.2 Results Obtained for District-Wise Dataset If we select test option as training set and run the model by clicking on start button: We get following result in Fig. 6.
A Robust Approach of COVID-19 Indian Data Analysis …
365
Fig. 6 District-wise dataset results
5 Conclusion and Future Scope There is a lot of data on the disease’s spread to help in the Indian dataset analysis. Looking into the population density in countries like India, the range of COVID19 has significantly been contained and managed in India. Likewise, the mortality rate is remarkably low in large countries like India between January 2020 and June 2020 as against the western world. In this article, we conducted an analysis based on the amount of data available for several regions of India are affected. Some areas under the red zone are more affected by COVID-19, while some areas under the orange zone are less affected than those in the red zone, while the green zone is a COVID-free case area. Further work may include research on symptomatic disease detection and the development of global disease detection models to eradicate this deadly disease. In the future, we will work with chest X-ray or CT scans images for real time findings for COVID disease prediction.
References 1. Yadav RS (2020) Data analysis of COVID-2019 epidemic using machine learning methods: a case study of India. Int J Inf Tecnol. https://doi.org/10.1007/s41870-020-00484-y 2. Kumar S (2020) Monitoring novel corona virus (COVID-19) infections in India by cluster analysis. Ann Data Sci 7:417–425. https://doi.org/10.1007/s40745-020-00289-7 3. World Health Organization Clinical Management of Severe Acute Respiratory Infection When Novel Coronavirus (nCoV) Infection is Suspected. Available online: https://www.who.int/
366
4.
5. 6. 7.
8.
9.
10.
11. 12.
13.
14. 15.
16.
17.
18.
D. Jain et al. publications-detail/clinical-management-of-severe-acute-respiratory-infection-when-novelcoronavirus-(ncov)-infection-issuspected. (Accessed on 28 March 2020) Sujath R, Chatterjee JM, Hassanien AE (2020) A machine learning forecasting model for COVID-19 pandemic in India. Stoch Environ Res Risk Assess 34:959–972. https://doi.org/10. 1007/s00477-020-01827-8 Gates B (2020) Responding to Covid-19—a once-in-a-century pandemic? N Engl J Med 382(18):1677–1679. https://doi.org/10.1056/NEJMp2003762 Kumari S et al (May 2019) DeepBT: brain tumor detection using deep mri image analysis. Int J Res Eng. IT and Social Sciences. ISSN 2250–0588, Impact Factor: 6.565, 09(5):139–143 Di Gennaro F et al (2020) “Coronavirus diseases (COVID-19) current status and future perspectives: a narrative review,” Int J Environ Res Public Health 17(8). https://doi.org/10.3390/ijerph 17082690 Yuen KS, Ye ZW, Fung SY, Chan CP, Jin DY (2020) SARS-CoV-2 and COVID-19: the most important research questions. Cell Biosci 10(1):1–5. https://doi.org/10.1186/s13578-020-004 04-4 Wang H, Wang S, Yu K (2020) COVID-19 infection epidemic: the medical management strategies in Heilongjiang Province, China. Crit Care 24(1):10–13. https://doi.org/10.1186/s13054020-2832-8 Tartari E, Hopman J, Allegranzi B, Gao B, Widmer A, Chi-Chung Cheng V, Wong SC, Marimuthu K, Ogunsola F, Voss A, Perceived Challenges of COVID-19 infection prevention and control preparedness: a multinational survey, medRxiv 2020.06.17.20133348. https:// doi.org/10.1101/2020.06.17.20133348 Covid-19 India, Dataset, https://www.covid19india.org/, online accessed May 31, 2020 Devi B, Kumar S, Anuradha, Shankar VG (2019) AnaData: a novel approach for data analytics using random forest tree and SVM. In: Iyer B, Nalbalwar S, Pathak N (eds) Computing, communication and signal processing. Advances in intelligent systems and computing, vol 810. Springer, Singapore. https://doi.org/10.1007/978-981-13-1513-8_53 Goel V, Jangir V, Shankar VG (2020) DataCan: robust approach for genome cancer data analysis. In: Sharma N, Chakrabarti A, Balas V (eds) Data management, analytics and innovation. Advances in intelligent systems and computing, vol 1016. Springer, Singapore. https://doi.org/ 10.1007/978-981-13-9364-8_12 Weka, https://waikato.github.io/weka-wiki/citing_weka/, online accessed June 10, 2020 Shankar VG, Devi B, Srivastava S (2019) DataSpeak: data extraction, aggregation, and classification using big data novel algorithm. In: Iyer B, Nalbalwar S, Pathak N (eds) Computing, communication and signal processing. Advances in intelligent systems and computing, vol 810. Springer, Singapore. https://doi.org/10.1007/978-981-13-1513-8_16 Shankar VG, Sisodia DS, Chandrakar P (2021) A novel discriminant feature selection–based mutual information extraction from MR brain images for Alzheimer’s stages detection and prediction. Int J Imaging Syst Technol 1–20.https://doi.org/10.1002/ima.22685 Shankar VG, Sisodia DS, Chandrakar P (2020) DataAutism: An early detection framework of autism in infants using data science. In: Sharma N, Chakrabarti A, Balas V (eds) Data Management, analytics and innovation. Advances in intelligent systems and computing, vol 1016. Springer, Singapore. https://doi.org/10.1007/978-981-13-9364-8_13 Shankar VG, Devi B, Bhatnagar A, Sharma AK, Srivastava DK (2021) Indian air quality health index analysis using exploratory data analysis. In: Sharma DK, Son LH, Sharma R, Cengiz K (eds) Micro-electronics and telecommunication engineering. Lecture notes in networks and systems, vol 179. Springer, Singapore. https://doi.org/10.1007/978-981-33-4687-1_51
Viability and Applicability of Deep Learning Approach for COVID-19 Preventive Measures Implementation Alok Negi and Krishan Kumar
Abstract The deadliest COVID-19 (SARS-CoV-2) is expanding steadily and internationally due to which the nation economy almost come to a complete halt; citizens are locked up; activity is stagnant and this turn toward fear of government for the health predicament. Public healthcare organizations are mostly in despair need of decision-making emerging technologies to confront this virus and enable individuals to get quick and efficient feedback in real-time to prevent it from spreading. Therefore, it becomes necessary to establish auto-mechanisms as a preventative measure to protect humanity from SARS-CoV-2. Intelligence automation tools as well as techniques could indeed encourage educators and the medical community to understand dangerous COVID-19 and speed up treatment investigations by assessing huge amounts of research data quickly. The outcome of preventing approach has been used to help evaluate, measure, predict, and track current infected patients and potentially upcoming patients. In this work, we proposed two deep learning models to integrate and introduce the preventive sensible measures like face mask detection and image-based X-rays scanning for COVID-19 detection. Initially, face mask detection classifier is implemented using VGG19 which identifies those who did not wear a face mask in the whole crowd and obtained 99.26% accuracy with log loss score 0.04. Furthermore, COVID-19 detection technique is applied onto the X-ray images that used a Xception deep learning model which classifies whether such an individual is an ordinary patient or infected from COVID-19 and accomplished overall 91.83% accuracy with 0.00 log loss score. Keywords Data augmentation · Xception · Face mask detection · Transfer learning (TL) · VGG 19 · X-rays
A. Negi (B) · K. Kumar National Institute of Technology, Uttarakhand, Srinagar (Garhwal) 246174, India K. Kumar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 G. Sanyal et al. (eds.), International Conference on Artificial Intelligence and Sustainable Engineering, Lecture Notes in Electrical Engineering 837, https://doi.org/10.1007/978-981-16-8546-0_30
367
368
A. Negi and K. Kumar
1 Introduction The novelist COVID-19 that first reported in December 2019 in Wuhan province of china quickly spread across the world and then became a global epidemic. It has a profound impact on every daily life, global health, as well as the world economy. The COVID-19 is a new disease associated with the relatively similar virus family that has severe acute respiratory syndrome (SARS) and perhaps some kinds of flu or cold. COVID-19 relates to the genera beta-coronavirus depends on all its phylogenetic similarities and genomic properties. Human beta-coronaviruses (MERS-CoV, SARSCoV, SARS-CoV-2) include several similar characteristics but also have variations in certain genomic and phenotypic composition which may affect their pathogenesis. While an infected person coughs, speaks, or sneezes, the whole virus is transferred by respiratory through the droplets generated among people that are into the nearby contact with each other. Such droplets may be inhaled while approaching your own face, falling in the mouth, eyes, or nose, or being taken to each of these places. Frequently, recorded symptoms involve fever, dry cough, fatigue, and presently, there is no prescribed treatment to remedy COVID-19 and so no treatment available. Over time, none antibiotics are successful against respiratory [1] infections like COVID19. 3,24,29,965 confirmed cases with 985,823 deaths are reported to WHO till 26 September 2020. So, it is clear that protective steps should be taken to prevent these viruses from spreading human body. Most countries governments also take other protective steps to protect their people. These necessary steps are the preservation of social distance, isolation, sensitization of the eye, quarantine, and quite notably, mask usage when going outside. Leung et al. [2] showed that how and why to use the face mask as a prevention measure against COVID-19. As even the infection is now becoming a major catastrophe, deep learning techniques and tools could indeed be used to assist efforts by decision makers, the healthcare community and social order at massive to maintain every phase of the downturn and its aftereffects: detection, protection, response, recovering, and acceleration of research. Deep learning has enabled many complex application [3, 4]. Using advanced deep learning model coupled with radiological imaging can also be useful for accurate diagnosis of this illness and it can be beneficial in overcoming the issue of availability of trained physicians in isolated regions. X-ray findings [1] in COVID-19 cases include ground glass opacities, consolidation, bilateral, peripheral. The work described in this paper translates the following contributions to the field of public healthcare and medical community for prevention from global pandemic COVID-19. • To prove the technological viability and general applicability of a deep learning approach to enable implementation of preventive measures for global pandemic COVID-19. • Implementation of face mask detection classifier and image-based radiography (X-Ray) as a preventive step for COVID-19 detection using VGG19 and Xception advanced deep learning models.
Viability and Applicability of Deep Learning Approach …
369
• To present qualitative and quantitative analysis-based comparison study for best accuracy and log loss score. The rest of the paper is structured as follows: Section 2 provides a brief overview of the previous surveys. Section 3 describes the proposed work for COVID-19 prevention. Experimental outcome and discussion are given in Section 4. Finally, the conclusion of the proposed approach is given in Section 5 followed by references.
2 Related Work Ozturk et al. [5] introduced a new model for automatic COVID-19 detection using raw chest X-ray images to provide accurate diagnostics for binary classification (COVID, no findings) and multi-class classification (COVID, no findings, pneumonia) and achieved classification accuracy of 98.08% for binary classes and 87.02% for multiclass cases. Robson [6] presented a bioinformatics studies carried out on the COVID19 virus. Apostolopoulos et al. [7] evaluated the performance of state-of-the-art convolutional architecture. Exclusively, the transfer learning technique was implemented using two datasets. The findings indicated that deep learning through X-ray imaging could extract significant biomarkers relevant to the COVID-19 virus, whereas the accuracy, sensitivity, and specificity obtained was 96.78%, 98.66%, and 96.46%, respectively. The hybrid design has been developed by Loey et al. [8], which uses deep learning to detect face masks that have two sections. ResNet50 is used as the first part for feature extraction, while the second part used the principle of support vector machine ensembles algorithm and decision trees for classification and reported accuracy of testing 99.64% using SVM on RMFD dataset, 100% on LFW dataset, and 99.49% on SMFD dataset. Li et al. [9] developed an HGL technique to solve the major problem of mask head pose specification, mostly during the COVID-19 endemic problem. This approach incorporates both image color distortion analysis and line representation. The principal component analysis (PCA) was developed by Sabbir et al. [10] to describe the person in a masked and unmasked face. They found that wearing masks had an effect on the accuracy of face resonance by using the extremity of the PCA.
3 Proposed Work In this proposed work, we incorporated and implemented two deep learning-based preventive measures as shown in Fig. 1. • Firstly, we implemented a face mask detection classifier using VGG19 for identifying those among the crowd, who has not worn a mask
370
A. Negi and K. Kumar
Fig. 1 Proposed work block diagram
• Secondly, a COVID-19 detection mechanism is implemented from the X-ray images using Xception deep learning model that classify whether a person is normal or COVID-19 patient. The proposed work uses the concept of data augmentation, dropout, normalization, and transfer learning. Data augmentation enriches the training data by creating new examples by transforming existing ones by random. This way we increase the size of the training set artificially, reducing overfitting. In augmentation, Keras ImageDataGenerator produces training data in batch from the directories and processes it. While transfer learning is a machine learning method where a model developed for a task is reused as the starting point for a model on a second task.
3.1 Face Mask Detection We used VGG19 as a deep learning architecture with TensorFlow in the back-end using Python script to create an effective network for face mask detection. Firstly, input images for this dataset often come in different sizes and resolutions so they were resized to 224 × 224 × 3 to reduce scale. Data augmentation is performed to get a clear view of an image sample in different angle that allows practitioners to increase dramatically the data available for training models without actually collecting new data. Rescale (1./255), shear (0.2), zoom (0.2), and horizontal flip (True) are used as the augmentation parameter for this work. Secondly, we used a pertained ImageNet to fine-tune the VGG19 architecture. Then, fully connected layer of the original VGG19 is replaced with two new dense layers. First dense layer is used with 128 hidden nodes using ReLU activation function, while second dense layer is used for the final output with two hidden node using softmax activation function. Finally, data can be loaded for the classification.
Viability and Applicability of Deep Learning Approach …
371
For face detection which was developed by Paul Viola and Michael Jones [11], the OpenCV-based Haar cascade classifier is used. First of all, we loaded pre-trained Haar cascade frontal face default XML classifiers followed by an input image grayscale mode. Then, we have the faces throughout the image. If features are identified, the coordinates of the face region are retrieved as Rect(x, y, w, h). If we have those positions, we will be able to build a face ROI. Finally, trained face mask classifier is applied on the detected face ROI to determine whether a person is wearing a mask or not.
3.2 X-Rays Image-Based COVID Detection For COVID-19 detection from X-rays, we implemented Xception deep learning model. Firstly, chest X-ray dataset images are resized to 150 × 150 × 3. Then, fully connected layers are omitted from the original Xception architectures which are currently used as a building block of many architectures for segmentation. Three dense layers are added additionally to the top of the model by replacing the original layer. First dense layer used 128 hidden nodes while second dense layer used 64 hidden nodes using the ReLU activation. Third dense layer is used for final output with two hidden nodes using softmax activation function. We used pre-trained weights and transfer learning with dropout, augmentation, and normalization to avoid overfitting. For this experiment, rescale (1./255), zoom range (0.2), horizontal flip (true), and rotation range(0.2) are used as a augmentation parameter. Block diagram for the fine-tuned VGG19 and Xception model is plotted in Fig. 2.
4 Result and Analysis In this work, face mask images-based VGG19 classifier and X-ray images-based Xception model for COVID-19 are trained on Google Colab using Python script with batch size 32 and 16, respectively, using Adam optimizer. ReLU and softmax are used as a activation functions. ReLU activation function is a non-linear operation of the rectified linear unit with a max (0, x) output. Softmax has the ability to handle multiple classes, so it is used for classifying inputs into multiple classes on output layer. The accuracy, loss, precision, recall, F1 score, and specificity are calculated by Eqs. (1)–(6), respectively. Accuracy = (T P + T N)/(T P + T N + F P + F N) −1 yi j ∗ log( p)i j N i=1 j=1 N
log loss =
(1)
M
(2)
372
A. Negi and K. Kumar
Fig. 2 Improved VGG19/Xception block diagram
Precision = T P/(T P + F P)
(3)
Recall = T P/(F N + T P)
(4)
F1score = 2 × (Precision Recall)/(Precision + Recall)
(5)
Specificity = T N/(T N + F P)
(6)
4.1 Face Mask Detection Analysis The dataset has 1376 images of both masked and non-masked real-time faces for face mask detection. There is no validation set in this dataset, so it was split into a training set and a validation set for evaluation. The training collection contains 1104 images (552 masks and 552 without masks), while 272 images (138 masks and 134 without masks) are included in the validation set. The model is trained for 25 epochs and during training, the total parameter is 20,090,306, of which 65,922 are
Viability and Applicability of Deep Learning Approach …
373
trainable parameter and 20,024,384 are non-trainable. Distributions of dataset and kernel density plot of image sizes are plotted in Figs. 3, and 4a, b respectively. In this experiment, true negatives (TN), false positives (FP), false negatives (FN), and true positives ((TP) are recorded 136, 2, 0, 134, respectively, for validation set. Based on these values, the overall accuracy, loss, precision, recall, F1 score,
Fig. 3 Dataset classes distribution
(a) Mask Images Fig. 4 Dataset distribution and kernel density plot
(b) Without Mask Images
374
A. Negi and K. Kumar
(a) Without Normalization
(b) With Normalization
Fig. 5 Confusion matrix for VGG19
Table 1 Precision, recall, F1 score (in percent), and support Class and avg. type
Precision
Recall
F1 score
Support
With mask
100
99
99
138
Without mask
99
100
99
134
Macro avg.
99
99
99
272
Weighted avg.
99
99
99
272
and specificity are calculated 99.26%, 0.04, 98.53%, 100.00%, 99.26%, and 99%, respectively. Confusion matrix is drawn in Fig. 5a, b. Table 1 shows the classification report for individual classes. The model achieved the training accuracy 99.46% with 0.02 loss while validation accuracy 99.26% with loss 0.04 just in 25 epochs. The proposed work recorded the accuracy of the model and its loss curve per epoch as shown in Fig. 6a, b.
(a) Accuracy Curve Fig. 6 Accuracy and loss curve for VGG19
(b) Loss Curve
Viability and Applicability of Deep Learning Approach …
375
Fig. 7 Sample images, face detection, and model prediction
As Fig. 7 displays some random input images, it also depicts faces marked with their respective pixel-level image inside a bounding rectangle, accompanied by the proposed model prediction.
4.2 X-Ray Image-Based COVID-19 Detection For this experiment, X-ray images are resized to 150 × 150 × 3 in preprocessing step. This dataset consists of 5216 images into training set (3875 pneumonia and 1341 normal) while 624 into validation set (390 pneumonia and 234 normal) as shown in Figs. 8 and 9. The kernal density plot are shown on Fig. 10. In this experiment, true negatives (TN), false positives (FP), false negatives (FN), and true positives (TP) are recorded 189, 45, 6, 384, respectively, for validation set. Based on these values, the overall accuracy, loss, precision, recall, F1 score, and specificity are calculated 91.83%, 0.00, 89.51%, 98.46%, 93.77%, and 81.0%, respectively. Confusion matrix is drawn in Fig. 11a, b. Table 2 shows the classification report for individual classes.
376
A. Negi and K. Kumar
Fig. 8 Sample images
The model achieved the training accuracy 99.02% with 0.00 loss while validation accuracy 91.83% with loss 0.00 in 60 epochs. The proposed work recorded the accuracy of the model and its loss curve per epoch as shown in Fig. 12a, b. Although, proposed work recorded validation accuracy 99.26% with loss 0.04 for face mask detection just in 25 epochs and validation accuracy 91.83% with loss 0.00 in 60 epochs for X-ray-based detection which is good and comparable to other model but number of experiments with more epochs and advanced convolution neural network can be increased on larger dataset for better analysis.
5 Conclusion As COVID-19 has been declared a pandemic and there are no particular therapeutics currently approved for the treatment of COVID-19, preventive measures play a crucial role in the fight against this pandemic at healthcare and community level. The only measure to prevent infection is prevention. Even a slight negligence in taking protective measures will be very costly for humanity. In this work, we implemented two deep learning-based mechanism for the prevention and detection of COVID-19. Firstly, face mask detection classifier is implemented using VGG19 which achieved 99.26% validation accuracy with loss 0.04 just in 25 epochs. Secondly, X-ray imagesbased detection is implemented using Xception model which achieved 91.83% validation accuracy with loss 0.00 in 60 epochs. Hence, both the approaches tried to prove the general applicability of deep learning to enable implementation of preventive measures for global pandemic COVID-19. In future, face mask detection can be
Viability and Applicability of Deep Learning Approach …
(a) Distribution of Classes in Training Set
(b) Distribution of Classes in Testing Set Fig. 9 Distribution of dataset
377
378
A. Negi and K. Kumar
Fig. 10 Kernel density plot of image sizes
Fig. 11 Confusion matrix for Xception
Table 2 Precision, recall, F1 score (in percent), support for covid and phenumonia class Class and avg. type
Precision
Recall
F1 score
Support
Normal
97
81
88
234
Pneumonia
90
98
94
390
Macro avg.
93
90
91
624
Weighted avg.
92
92
92
624
performed for real-time camera or video streams while COVID-19 detection can be applied on other radiography-based images with more experiments and larger dataset to check the technological viability of deep learning.
Viability and Applicability of Deep Learning Approach …
379
Fig. 12 Accuracy and loss curve for Xception
References 1. Xu Z et al (2020) Lancet respir med 2. Leung NH, Chu DK, Shiu EY, Chan KH, McDevitt JJ, Hau BJ, Yen HL, Li Y, Ip DK, Peiris JM et al (2020) Respiratory virus shedding in exhaled breath and e cacy of face masks. Nat Med 26(5):676–680 3. Kumar K, Shrimankar DD, Singh N (2018) Eratosthenes sieve based key-frame ex-traction technique for event summarization in videos. Multim Tools Appl 77(6):7383–7404 4. Kumar K, Shrimankar DD (2017) F-des: fast and deep event summarization. IEEE Trans Multim 20(2):323–334 5. Ozturk T, Talo M, Yildirim EA, Baloglu UB, Yildirim O, Acharya UR (2020) Automated detection of covid-19 cases using deep neural networks with x-ray im-ages. Comput Biol Med, p 103792 6. Robson B (2020) Computers and viral diseases. Preliminary bioinformatics studies on the design of a synthetic vaccine and a preventative peptidomimetic antagonist against the sarscov-2 (2019-ncov, covid-19) coronavirus. Comput Biol Med, p 103670 7. Apostolopoulos ID, Mpesiana TA (2020) Covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks. Phys Eng Sci Med, p 1 8. Loey M, Manogaran G, Taha MHN, Khalifa NEM (2020) A hybrid deep transfer learning model with machine learning methods for face mask detection in the era of the covid-19 pandemic. Measurement, p 108288 9. Li S, Ning X, Yu L, Zhang L, Dong X, Shi Y, He W (2020) Multi-angle head pose classi cation when wearing the mask for face recognition under the covid-19 coronavirus epidemic. In: 2020 International conference on high performance Big Data and intelligent systems (HPBD&IS). IEEE, pp 1–5 10. Ejaz MS, Islam MR, Sifatullah M, Sarker A (2019) Implementation of principal component analysis on masked and non-masked face recognition. In: 2019 1st International conference on advances in science, engineering and robotics technology (ICASERT). IEEE, pp 1–5 11. Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001, vol 1. IEEE, p I
A Review on Security and Privacy Issues in Smart Metering Infrastructure and Their Solutions in Perspective of Distribution Utilities Rakhi Yadav and Yogendra Kumar
Abstract Rapid advancements in electric grid to make it smart, smart metering infrastructure (SMI) is playing a vital role for monitoring, controlling, and managing the different parameters. In spite of various salient features of SMI, there is a major issue of security because many devices such as smart meters are connected to each other and communicating through public network. This paper discusses salient features of smart meter along with its components and possible security vulnerabilities. We have also focused on the possible attacks and its impact on electric distribution networks. In addition, this paper has also reviewed privacy concern issues along with its existing solutions. The list of different simulation tools for security assessment, which will be helpful for researchers who are willing to do work in this field, is also given. Furthermore, we have provided research issues, gaps, and directions after rigorous review of the above research area. Keywords Smart meter · Smart grid · SMI · Privacy · Electric distribution utilities · Threats · Assessment tools
1 Introduction These days, technologies in the field of smart grids are growing rapidly. In the smart grid, smart components play various important roles; one of them is the smart meter. These smart meters measure the energy consumption in real time at regular interval. This real-time information is very helpful for consumers and service providers in electric distribution network to control and balance their load. Smart meters are beneficial for both consumers and their service providers’ perspective. It measures accurate reading and sends to service providers for precise bill of their actual consumption. Furthermore, it provides detailed information about their energy usage. Hence, energy consumption can be distributed or managed between peak and off-peak hours uniformly as per need. By smart meters, both R. Yadav (B) · Y. Kumar Department of Electrical Engineering, Maulana Azad National Institute of Technology Bhopal, Bhopal, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 G. Sanyal et al. (eds.), International Conference on Artificial Intelligence and Sustainable Engineering, Lecture Notes in Electrical Engineering 837, https://doi.org/10.1007/978-981-16-8546-0_31
381
382
R. Yadav and Y. Kumar
home and business types of users are benefitted. Using smart meter data, the service providers can also predict the demand of consumers. Service provider can also solve the problem of load balancing of electricity consumption with the help of usage pattern of consumers. There are many advantages of using smart meter in the electricity distribution networks. All the smart meters are connected with some other nodes, i.e., other smart devices, control center, service providers via an open network. So, some security concerns arise such as privacy, confidentiality, integrity. There are different types of attacks are possible such as modification, software, and hardware interceptions. We have surveyed literatures related to security issues in smart metering infrastructure especially have focused on privacy issue in smart meter. This study provides a comparison-based survey. We have also considered some recent survey papers related to security concerns in electrical distribution network. We have been gone through various existed works based on privacy preservation of metering system. Our proposed work has been categorized primarily on the basis of attribution and function. The used abbreviations in this study are shown in Table 1. The contributions of this paper are: • A comprehensive study of smart meter components and possible attacks in smart metering. • We have reviewed security and privacy concern issues along with its existing solutions. • Security and privacy assessment tools: We have listed some important available simulation tools for security and privacy assessment. Table 1 List of abbreviations Abbreviations
Descriptions
SM
Smart meter
AMI
Advanced metering infrastructure
ABE
Attribute-based encryption
CyberSAGE
Cyber security argument graph evaluation
ECC
Elliptic curve cryptography
HAN
Home area network
AVISPA
Automated verification of Internet security protocols and applications
SG
Smart grid
IoT
Internet of Things
MDMS
Meter data management system
WSN
Wireless sensor network
TTP
Trusted third party
TPM
Trusted platform module
DoS
Denial of service
A Review on Security and Privacy Issues in Smart Metering …
383
• Future scope: Based on our rigorous survey, we have found emerging research problem with their earlier solution and future research scope. Our survey has been organized as follows. Section 2 has details about security concerns in Smart Metering. Section 3 has details about components of the smart meter and possible attacks. The detailed description of literatures based on the privacy-preserving with comparison list is discussed in Sect. 4. Section 5 has introduced some important security and privacy assessment tools. In Sect. 6, we have concluded our work along with future research scope.
2 Security Concerns in Smart Metering 2.1 Threat Modeling in Smart Metering The threat model shows in Fig. 1 which is conceptually adopted from Ref. [1]. In this model, three levels of abstractions consist of types of attackers, vulnerabilities, and targets are mentioned. The Attacker A threat agent or a group of mediator can be an attacker. These attackers have the main purpose to harm the target systems forcefully. Agents can damage the system by various ways, i.e., hackers, legitimate consumers, cyber-criminals. The attackers Fig. 1 Abstract threat model in smart metering
384
R. Yadav and Y. Kumar
might be passive and/or active type as per their threat pattern. They only need either physical or logical access along with strong intention or motivation. They also require some resources either advanced or cheap devices with technical skills. Vulnerabilities in Smart metering In smart metering, there are mainly three levels of vulnerabilities such as application, device, and network levels. It uses devices like smart meter, router, gateway, and sensors. Smart metering is vulnerable due to the lack of temper protection and hardware design flaws. All the above-mentioned devices are connected to each other via a public network and communication protocol. Authors of reference [2] show that the attack is possible in the communication protocol. Different communication methods, i.e., mesh network, Wi-Fi, IEEE 804.15.4× (ZigBee, Zwave), have been observed to have security vulnerabilities [3, 4]. The automated billing and other value-added services for customers are dealt at application level. In reference [5], the inappropriate data delivery due to cyber threat created by attacker who found the flaws in software is emphasized. Targets The targets of attacker are consumers, utility provider, marketing, operation, etc.; actually, the choosing targets by attacker depend upon type of motivation or intention.
2.2 Security Attacks Eavesdropping Eavesdropping comes under the category of passive attack, attacker listen the communication between consumers and service provider. The identification of this type attack is very difficult. It is a privacy concern related to consumers which is actively seen in a wide area network (WAN). Denial of Service (DoS) Attacks This type of attack is also performed in WAN. In DoS attack, attacker flood the data packets for creating the congestion in communication line which intern the legitimate user request which is not reachable at the destination point. Packet Injection Attacks This type of attack is done by sending the false packet in communication channel. Such type of attacks may create the false bills which affect the monetary loss of either consumers or utility provider. Malware Injection Attacks In this, attacker can place/implant the malware in the communication network. This type attack is done in the wireless area network. This attack may have misbalanced the demand response in the electric distribution network.
A Review on Security and Privacy Issues in Smart Metering …
385
Remote Connect/Disconnect In such type of attack, user’s connectivity or accessibility may connect or disconnect remotely. Firmware Manipulation Firmware smart meter or metering gateway are manipulated. This type of attack may affect the accounting process or false power consumption report. Above-discussed attacks may harm users or service provider or both. Eavesdropping and DoS attacks are done more frequently in real life. Due to the highly concern of the privacy issue, many researchers have done a lot of research and still working on it. In the next section, we have discussed recent real-world cyber-attack incidents related to smart metering. Moreover, how it affects the consumers and service providers has also been discussed.
3 Smart Meter Components and Possible Attacks 3.1 Attacks on Metering Networks Many types of attack are possible in the smart metering network. In Ref. [2], authors have proposed a puppet attack and may possible in smart meters. A DoS is the resultant cause of this threat, e.g., resource exhaustion and packet flooding. Any node is selected as a puppet by attackers. As a result, this node becomes a slave of attackers to behave as per their instructions. This puppet node receives message from other nodes and sends the bogus messages to other nodes which can unnecessarily weak the bandwidth and the node energy. This is more harmful attack than the flooding attack. Authors claim, this attack can collapse the network through puppet attack. Alberto and Javier argued by reverse engineering smart mete can be broken down and/or smart meter data theft, and threat can be enabled over the power line [6]. Further, by receiving the encryption keys, attacker may misuse of these keys to disrupt the hardware and then attacker has full control on network over big area. T. Li et al. discussed cyber-attacks on smart meters in a household nano-grid [7] in which many cyber-attacks were enabled into the smart meter so that the impact of various attacks on in-house nano-grid can be analyzed. Real-time information of the smart meter is under threat which causes the different attacks such as confidentiality, integrity, and availability. Table 2 shows the summary of these above attacks.
3.2 Attack Surface Intruders or attackers can damage the smart meter physically or hack the smart meter by malicious code. The cyber-attack can be performed by attackers in such ways
386
R. Yadav and Y. Kumar
Table 2 Recent real-world cyber-attack incidents related to smart metering Target
Cyber vulnerabilities
Ukrainian power distribution
VPN credential stolen and Authentication violated [8]
Renewable energy
Spoof: Meter parameters and Network configurations [9]
Smart metering
Potential DoS and other routing attacks are the resultant of a puppet attack in wireless mesh nodes [2] Reverse engineering helps to reveal user privacy concern or electricity usage fraud [6] Compromising smart meter communication can lead to monetary effects [7]
• Direct cyber-attack on smart meter • By using technician equipment • By compromised supply chain. Similarly, other nodes like data collector or control centers are also under threat. The impact of these attacks is (i) theft of smart meter readings; (ii) theft of electricity; (iii) localized denial of power; (iv) widespread denial of power; and (v) interruption of grid. Theft of smart meter readings The unauthorized monitoring of smart meter data in transition or at rest in a meter or data collector comes under the category of theft of smart meter reading. By analyzing the smart meter readings, anyone can easily know about the hobbies and habits of consumers and unfortunately, and this will help to disclose the privacy. Theft of this smart meter real-time information leads of illegal access of intellectual property, i.e., information about infrastructure configuration and system firmware. By reverse engineering, attacks can be spread. Theft of electricity When customer tries to receive power at no cost or at a reduced rate, the reverse of this attack may cause the inflated electricity bills for targeted consumers. It can also be done by reconnecting electricity connection after disconnecting the power connection due to unpaid electricity bills. Localized denial of power This type of attack is done by remotely disconnect the user electricity connection. It may be more serious when the permanent disabling of the smart meters. This type of attack is enabled by radio interfaces or optical interfaces and corrupting their firmware. Widespread denial of power This type of situation occurs when intruders or attackers implanted any warm or virus in the data collector or control center. In that case, warm may damage the
A Review on Security and Privacy Issues in Smart Metering …
387
hundred or millions of smart meter which are connected with the control center. Overcoming from this situation can take over months or a year depending on the number of technicians and availability of the smart meters.
3.3 Smart Meter Components A smart meter has basically five important parts: (i) control unit; (ii) metrology system; (iii) SM radio system; (iv) HAN radio system; and (v) optical interface. Infrastructure configuration interception (effect on confidentiality), modification (effect on integrity), and blocking (effect on availability) are the various possible attacks on these parts. The communication among different smart meters is done by open communication channel which is also under threat. The severe communication channel attacks have been categorized as follows: (i) interception, (ii) injection, and (iii) blocking. Control unit All the operations of smart meter are managed by control unit. Control unit is a combinations of memory chip, microcontroller, and firmware. The possible attacks are software and hardware interceptions and modification attack. The impacts of these attacks are: theft of power, widespread denial of power, theft of data, and localized denial of power. Metrology system This is used for calculating the electricity consumption by electricity load in real time. This system is a combinations of memory chips, microcontrollers, and firmware. The possible security concerns on this system are integrity, confidentiality, and availability. The impact of security concern can be data theft and power theft. SM radio system The SM radio system communicates with data collectors and other smart meters. This system sends and received the messages from data collectors or other smart meters where the message are consumption data, meter status, and alarm conditions. It also receives the request message or command for the disconnect or connect the connection, configuration, and firmware updates. The possible attack is localized denial of power, theft of data, widespread denial of power, and theft of power. HAN radio system The HAN radio system transmits the energy consumption to a smart device mounted in a consumer’s premises [10]. This system gives the information of energy consumption in real time, tariff rate, and billing. The interception attacks are possible. The modification attack is at minimum risk due to the limited functionality of the HAN radio system.
388
R. Yadav and Y. Kumar
Optical interface For on-site technician interactions, optical interface is developed, primarily smart meter initialization, configuration, diagnostics, and firmware updating. There are different types of attack which are possible such as theft of SM data, localized denial of power, theft of electricity, and widespread denial of power.
4 Literatures Related to Privacy-Preserving Techniques After reviewing many papers related to privacy concerns, we found that the main cause for this type of security concern is attribution in smart metering. The service provider measures the power based on two types: first is attribute-based and another is non-attribute-based. The purpose of power measurement is to know about power consumption billing and to manage usage where the billing operations require attributable measurement having two types: (1) coarse-grained analysis and (2) fine-grained measurements. The fine-grained measurements are highly concerned with privacy; however, the coarse-grained-based measurement has minimum privacy concern. The maintaining operations require fine-grained measurements to maintain service provider operations. There are certain research issues for protecting the user’s privacy during non-attributable-based fine-grained measurements. So, privacy concern in smart metering is a crucial issue. Many research groups have done work on this issue based on different techniques which are being discussed below. The categorization of privacy-preserving study is shown in Fig. 2. Jiang et al. [11] have introduced a privacy-preserving technique based on negative survey which work on time series data. This technique has many advantages over other privacy preservation technique like there is no need of encryption process, complex communication protocol, and trusted third party. It could also resist from some failures like communication or equipment failure. Sun et al. [12] introduced a privacy preservation technique in IoT-enabled smart grid. This is a certificate-less data aggregation type approach. It also ensures the data integrity and confidentiality. It can resist from impersonation attacks, replay attacks,
Fig. 2 Different types of available methods for privacy-preserving in smart metering
A Review on Security and Privacy Issues in Smart Metering …
389
and modification attacks. This technique does not require expensive bilinear pairing for data collection, encryption, and signature processes. This scheme required only three timescale multiplication operation on ECC, two times point addition operation on ECC and one-time one-way hash operations. It uses the batch processing for verification. Zhang et al. [13] introduced attribute identity-based privacy-preserving technique. This technique does not require the bilinear pairing operation for signcryption and verification process which interns the computationally efficient technique. This technique is useful to verify the trustworthiness of the SM and support the control center to analyze the consumer’s fine-grained data of energy consumption without revealing user’s privacy. Drawback of this approach is that how to identify whether the smart meter is trusted when the system or program updates. Shanker et al. [14] presented another privacy-preserving technique based on privatization mechanism. This method assesses the privacy by the mutual information (MI) among the sensitive variables to be hidden and the electricity consumption measurements distorted. The Markov model is used for the distribution of the measurements which is controlled by the state of the appliances. This technique is not suitable for real-time privacy problem because in which data release mechanism works on block of samples. Gaussian model is also used in this approach which may be quite restrictive to model SMs signals. The limitations of the above-discussed approach [14] were considered in the privacy-preserving technique based on neural network [10]. This paper shows extensive statistical study with real data to characterize the utility privacy trade-offs and techniques works on real-time privacy problem. Thorve et al. [15] protect the privacy of real data of smart meter by using differential privacy. Authors analyze their approach used UK Power Network’s smart metering data of 5567 consumers. In this method, K-means clustering for time series is used. Rial et al. [16] proposed privacy protection technique. This method allows to minimize the trusted computing base. In which no need of updating the tamperproofresistant meter when the tariff policy changes. The improved version of the study [16] is presented in the Ref. [17] which improved the communication cost. After rigorous review of the many papers related to privacy concern, we give a comparative analysis based on important parameters which are shown in Table 3.
5 Security and Privacy Assessment Tools There are many tools are available for security and privacy assessment in the electric distribution network. Some of them are listed below which are helpful for the researchers who wish to work on security and privacy issues in the smart metering infrastructure 1. 2.
AVISPA [26] ProVerif [27]
390
R. Yadav and Y. Kumar
Table 3 Different privacy/confidentiality threats in smart metering References
Types of threat
Description
Security goals compromised
Werner and Lunden [18], Radio eavesdropping SM data can be breached Confidentiality, Gong et al. [19] over the wireless radio integrity, and through eavesdropping authenticity Yi et al. [2], Saputro and Akkaya [20]
Unauthorized forwarding, spoofing, and message modification in the SMI
Confidentiality, authenticity, integrity
Paverd et al. [21], Paverd Misuse of data [22]
SM data breach
Privacy violation
Mahmoud et al. [23], Zhang et al. [24], Kim et al. [25]
Attacker may control SMI data and devices
Confidentiality, authenticity, integrity
3. 4.
5.
Forwarding point compromise
Backhaul network interception
TrustFound [28] Fawaz et al. [29] had designed a cost model which can be used to assist in the selection of responses to malicious attacks on an AMI. They also demonstrated a realistic implementation of this model using ArcGIS for topology generation and GridLAB-D for grid simulation. CyberSAGE [30]: It is a tool for cyber-physical systems. The automatic generation of security argument graphs is very supportive due to this tool, and the graph can be drawn by different inputs such as workflow information for processes executed in the system; physical network topology and attacker models are used to argue about the level of security for the target system. On the basis of generated graphs, CyberSAGE can calculate quantitative security assessment results with the help of numerical information.
6 Conclusions and Future Research Directions As a fast-growing technology in the field of smart grid (SG), smart metering infrastructure (SMI) is a key component in SG network. SMI is a combination of many smart devices such as smart meter, sensors which are connected to each other and communicated through public network. Hence, there is an urgent need of security for the successful operation of smart grid or electric distribution network. There is a major challenging issue of privacy of users for service providers, i.e., how to collect consumer’s data so that their personal and sensitive information should not be revealed. Therefore, a strategic communication framework integrated with robust security and privacy features must be designed on priority basis during SMI installation. Hence, both security and privacy in SMI are considered as an emergent research field.
A Review on Security and Privacy Issues in Smart Metering …
391
This paper discusses salient features of smart meter along with its components and possible security threats. We have also focused on the possible attacks and its impact. This paper has also reviewed privacy issues along with its existing solutions. This paper provides the list of different simulation tools for security assessment which are very helpful for researchers who are willing to do work in this field. The future research directions can be summarized in below-mentioned points: 1.
2. 3. 4.
5. 6.
There is a need of more efficient privacy-preserving techniques based on trusted computing where the consumers can control their information and maintain their privacy. To increase existing security assessment tools such as AVISPA, ProVeri. and recent implementation techniques. Need of a standard common security model and security assessment tools for comparing the research works. Need to study and solution of the problem: How it can be detected whether the smart meter is trusted or not when the program or system updates because it causes a change in the measurement values. Need to develop quantum cryptography-based key distribution and management methods in SG. There is a need to arrange a big database of privacy mechanism to design a recommender system according to the user’s requirement.
References 1. Stellios I, Kotzanikolaou P, Psarakis M, Alcaraz C, Lopez J (2018) A survey of iot-enabled cyberattacks: assessing attack paths to critical infrastructures and services. IEEE Comms Surv Tutorials 2. Yi P, Zhu T, Zhang Q, Wu Y, Pan L (2016) Puppet attack: a denial of service attack in advanced metering infrastructure network. J Netw Comput Appl 59:325–332 3. Morgner P, Mattejat S, Benenson Z, Muller C, Armknecht F (2017) Insecure to the touch: attacking zigbee 3.0 via touchlink commissioning. In: Proceedings of the ACM conference on security and privacy in wireless and mobile networks, pp 230–240 4. Yao J, Venkitasubramaniam P, Kishore S, Snyder LV, Blum RS, Network topology risk assessment of stealthy cyber-attacks on advanced metering infrastructure networks. In: Conference on information sciences and systems (CISS), pp 1–6 5. Tweneboah-Koduah S, Tsetse AK, Azasoo J, EndicottPopovsky B (2018) Evaluation of cybersecurity threats on smart metering system. In: Information technology—new generations. Springer International Publishing, pp 199–207 6. Alberto G, Javier V Lights off! The darkness of the smart meters 7. Tellbach D, Li Y-F (2018) Cyber-attacks on smart meters in household nanogrid: modeling, simulation and analysis. Energies 11(2) 8. Analysis of the Cyber Attack on the Ukrainian Power Grid. https://ics.sans.org/media/E-ISA CSANSUkraineDUC 9. Locus energy lgate command injection vulnerability, 2016. [Online]. Available: https://ics-cert. us-cert.gov/advisories/ICSA-16-231-01-0 10. Shateri M, Messina F, Piantanida P, Labeau F (2020) Real-time privacy-preserving data release for smart meters. IEEE Trans Smart Grid
392
R. Yadav and Y. Kumar
11. Jiang H, Luo W, Zhang Z, A privacy-preserving aggregation scheme based on immunological negative surveys for smart meters. Appl Soft Comput J (2019) 12. Sun A, Wu A, Zheng X, Ren F (2019) Efficient and privacy-preserving certificateless data aggregation in Internet of things–enabled smart grid. Int J Distrib Sens Netw 15(4) 13. Zhang S, Zheng T, Wang B (2019) A privacy protection scheme for smart meter that can verify terminal’s trustworthiness. Electr Power Energy Syst 108:117–124 14. Sankar L, Rajagopalan SR, Mohajer S, Poor HV (2013) Smart meter privacy: a theoretical framework. IEEE Trans Smart Grid 4:837–846 15. Thorve S, Kotut L, Semaan M (2018) Privacy preserving smart meter data. In: Proceedings of international workshop on urban computing (UrbComp’18), ACM, USA 16. Rial A, Danezis G (2011) Privacy-preserving smart metering. In: Chen Y, Vaidya J (eds) WPES. ACM, New York, pp 49–60 17. Rial A, Danezis G, Kohlweiss M (2018) Privacy-preserving smart metering revisited. Int J Inf Secur 17:1–31 18. Werner S, Lunden J (2016) Smart load tracking and reporting for real- time metering in electric power grids. IEEE Trans Smart Grid 7(3):1723–1731 19. Gong Y, Cai Y, Guo Y, Fang Y (2016) A privacy-preserving scheme for incentive-based demand response in the smart grid. IEEE Trans Smart Grid 7:1304–1313 20. Saputro N, Akkaya K (2015) PARP-S: A secure piggybacking-based ARP for IEEE 802.11 s-based smart grid AMI networks. Comput Commun 58:16–28 21. Paverd A, Martin A, Brown I (2014) Privacy-enhanced bi-directional communication in the smart grid using trusted computing. In: IEEE international conference smart grid communications, pp 872–877 22. Paverd AJ (2015) Enhancing communication privacy using trustworthy remote entities. Ph.D. dissertation, University of Oxford 23. Mahmoud M, Saputro N, Akula P et al (2017) Privacy-preserving power injection over a hybrid ami/lte smart grid network. IEEE Internet of Things J 24. Zhang Y, Zhao J, Zheng D (2017) Efficient and privacy-aware power injection over AMI and smart grid slice in future 5g networks. Mob Inf Syst 25. Kim Y, Kolesnikov V, Thottan M (2016) Resilient end-to-end message protection for cyberphysical system communications. IEEE Trans Smart Grid 26. Armando A, Basin D, Boichut Y, Chevalier Y, Compagna L, Cuellar J, Drielsma PH, Heam P-C, Kouchnarenko O, Mantovani J (2005) The avispa tool for the automated validation of internet security protocols and applications. In: International conference on computer aided verification. Springer, pp 281–285 27. Blanchet B, Smyth B, Cheval V (2015) Proverif 1.90: automatic cryptographic protocol verifier, user manual and tutorial. http://prosecco.gforge.inria.fr/personal/bblanche/proverif/man ual.pdf 28. Bai G, Hao J, Wu J, Liu Y, Liang Z, Martin A (2014) Trustfound: towards a formal foundation for model checking trusted computing platforms. In: International symposium on formal methods. Springer, pp 110–126 29. Fawaz A, Berthier R, Sanders WH (2016) A response cost model for advanced metering infrastructures. IEEE Trans Smart Grid 7(2):543–553 30. Smart grid: Intergrative security assessment. [Online]. Available: http://publish.illinois.edu/int egrative-security-assessment/
Detecting Image Forgery Over Social Media Using Residual Neural Network Bhuvanesh Singh and Dilip Kumar Sharma
Abstract Social media have earned fame for consuming the news due to their quick proliferation and availability. However, they are also the main contributor in spreading fake news. Fake images spread over microblogging platforms like Twitter generates misrepresentation and arouses destructive emotions in consumers. This makes the detection of fake images over social platforms an extremely critical challenge. Deep learning methods learn the latent features of images and can be utilized in detecting fake images. In this paper, we have used the ResNet-50 a residual neural network to detect fake images. The images are passed through the Error Level Analysis process before being input to the deep learning model to subside the image’s main features and bloat the latent features of manipulation. The model is verified against the Twitter image dataset. The experiment proves that residual neural networks perform well in detecting fake images over social media platforms. Keywords Error level analysis · Fake images · Residual network · Social media
1 Introduction Fake news has been spreading since long in society. It was initially spread for fun or as satire. Later, it was used to proliferate a narrative or propaganda under the political arm. Social media platforms, due to its availability, are the biggest contributor to spreading fake news. Recently a survey conducted by Norwegian Media Authority, Norway 2020 [1], also had similar inference. The survey results showed that Social Media platforms had a significant role in spreading false news regarding Coronavirus. Fake images in fake news play a vital role. Images manipulated digitally are propagated over social platforms. Global tech giants like Adobe, Facebook, and Google are investing in developing Artificial Intelligence (AI) applications to counter the fake image and video threat flooding the internet. Figure 1a shows fake images example B. Singh (B) · D. K. Sharma GLA University, Mathura, India D. K. Sharma e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 G. Sanyal et al. (eds.), International Conference on Artificial Intelligence and Sustainable Engineering, Lecture Notes in Electrical Engineering 837, https://doi.org/10.1007/978-981-16-8546-0_32
393
394
B. Singh and D. K. Sharma
Fig. 1 a Five headed snake [2], b Sandy hurricane in USA [2]
like five headed snake and Fig. 1b Statue of Liberty during Sandy hurricane in USA. These fake photos were viral over the social media platforms.
1.1 Motivation Usage of fake images in fake news has been increasing day by day. Fake images are widely used to arouse rage and polarize the sentiments of the people. But it has more damaging impacts when it leads to grave consequences like mob lynching, religious feud, and wrong treatment advice to patients. Deep fakes has raised a security concern since its inception. There is a dire need to develop solutions to detect fake images over social media platforms. The need of time is to check the proliferation of tampered photos and mitigate its impact on the consumers. Sharma and Sharma [3] describes various methods in its detection. Hand-crafted forensic feature-based methods fails to spot fake images over social platform. The main reason being that these feature-based approach does not work well on images which has undergone multiple tampering. Deep learning models can play a significant role in spotting fake images over social platforms. They can learn the intrinsic features of images and thus classify them as fake or real. Thus, this paper proposes usage of residual network, a deep-learning based approach as solution to spot fake images over social platforms.
1.2 Related Work Images can be digitally altered in various ways. Copy and Move, Image splicing, resampling, and compression are the majorly used techniques. There are numerous software tools available, like GIMP, Paint.Net, Photoshop, and Pixlr. Detection of manipulated images can be done using conservative feature-oriented forensic techniques or using deep learning methods that learn the feature set by itself. The concern with forensic techniques is that each technique is suitable for
Detecting Image Forgery Over Social Media Using …
395
individual manipulation type. For detecting Copy and Move tampering, forensic approach primarily uses Discrete Cosine Transformation (DCT) and Discrete Wavelet Transform (DWT) coefficients [4–7]. Other novel methods like Multiscale WLD histograms [8] and fractional Zernike moments (FrZms) [9] are also used. Similarly, for spotting image splicing CFA [10, 11], discrete octonion cosine transforms (DOCT) [12], and various histograms gradients [13, 14] are applied. Various experiments using handcrafted forensic techniques resulted in high accuracy where the single manipulation technique was used. As soon as multiple tampering techniques like rotating, resampling, mirroring, and compression was used along with Copy and Move or image splicing, the accuracy was impacted. Fake images shared over social media platforms typically undergo multiple tampering and altering, before being shared over social media platforms. The quality of fake images is deteriorated by adding blur and noise so that it becomes hard to detect its manipulation. It is found that deep learning models perform better on multi manipulated images. The advantage of using deep learning models is that model learns the latent features of fake images by itself over the training data. Earlier, a Convolutional Neural Network (CNN) based model was suggested from Rao and Ni [15] for the detection of fake images. Bappy et al. [16] proposed a hybrid CNN-LSTM model for spotting fake images. Rehman et al. [17] proposed LiveNet, an optimized deep learning framework that continuously picked random mini-batches from the full training set at each training iteration. Li and Pun [18] used multiple layers of denseNet. Xiao et al. [19] proposed C2RNet as 2 level multi-branch CNN framework. Another approach was to use multiple content types to detect fake images. Jin et al. [20] integrated multiple content types to detect fake news. The suggested solution used RNN with an attention mechanism for fusing features of the visual, textual, and social context. Text and social context were initially combined with an LSTM for a joint representation. The resultant representation was then bonded with image features which were extracted from deep CNN. Mangal and Sharma [21] used Cosine similarity Index between text over images and headline text to detect fake news. The model used CNN-LSTM framework. In this paper, we have used ResNet-50 as a residual neural network model and have used ELA in pre-processing to bring out the latent features of images.
2 System Design In this study, we have taken a residual neural network approach to detect fake images over social platforms. ResNet-50 is used as a deep learning model. Error Level Analysis was proposed by Krawetz [22], which uses the fact that the JPEG resaving error is not linear. ELA is used as pre-processing. All images are passed through ELA, and ELA images are passed to ResNet-50 as input. The architecture diagram is illustrated in Fig. 2. Initially, images from the dataset are first normalized, and various image augmentation techniques are used to increase the image counts. After
396
B. Singh and D. K. Sharma
Fig. 2 Architecture of the proposed system
that, all the images were passed through ELA, and ELA generated images are used as input for the ResNet-50 model.
2.1 Error Level Analysis (ELA) Error level analysis (ELA) is a method to recognize image that has been tampered by gathering images at a specified quality level and then calculating the variation from the compression level. ELA allows distinguishing regions within a picture that are at dissimilar levels of compression. With JPEG images, the level of compression should be at almost same level in the entire picture. If a region of a picture is at a substantially distinct error level, then it implies a digital manipulation. Figure 3a shows a fake image posted during hurricane sandy over Facebook. Figure 3b shows the ELA generated of the same image. We can clearly see the edges where manipulation has been applied. The ELA image is taken during the proposed experiment.
2.2 ResNet-50 ResNet-50 is a deep convolutional residual neural network. It has 50 layers. ResNet was the first to introduce the theory of skip connection. The residual network or skip connection is used to mitigate the common problem of vanishing gradients in Deep Neural Networks. They also allow the model to learn and identify the function, which confirms that higher layers will perform at best as lower levels. Figure 4 illustrates
Detecting Image Forgery Over Social Media Using …
397
Fig. 3 a Fake image of shared over Facebook during hurricane sandy [23], b ELA generated image of the same fake image
Fig. 4 Skip connection depiction as used in ResNet-50 [24]
the skip connection concept where the shortcut is applied and put forward to the next convolution layer but before its activation function. ResNet-50 uses ReLU as activation function. In the experiment, Adam optimizer is used for the minimization of the loss function. Loss function used is Cross-entropy. The cross-entropy is calculated as below [25] C(P, Q) = −sum x in X P(x) ∗ log(Q(x)) Here, C() is the cross-entropy function P(x) = Probability of event x in P Q(x) = Probability of event x in Q.
3 Experiment Results The model was tested against the MediaEval [2] Twitter dataset. The dataset has 413 images. Two social media cases of 193 cases of real images, 218 cases of fake images, and two cases of altered videos. But during the experiment using an image
398
B. Singh and D. K. Sharma
augmentation technique, the total images were 632. Overall, the accuracy of 85.17% was achieved. Table 1 displays the evaluation results of the experiment. As fake image detection is a binary classification problem, hence evaluation parameters are calculated using a confusion matrix. F1-score is the harmonic mean of Precision and Recall. F2-Score is the Fbeta score, where beta is 2. F2 score provides more emphasis over the Recall than Precision. Figure 5 illustrates the Train/Val loss graph captured during the experiment. Table 2 illustrates the comparison of results with other methods. It is observed the residual network works better than multi-modal approach using only CNN based framework. Precision = True Positive/(True Positive + False Positive) Recall = True Positive/(True Positive + False Negative) Fbeta = ((1 + betaˆ2) * Precision * Recall)/(betaˆ2 * Precision + Recall) beta = 1, F1score, Harmonic Mean beta = 2, F2 Score. Table 1 Experiment results Proposed model
Dataset
Accuracy
F1-score
F2-score
ELA + ResNet-50
MediaEval
85.17%
0.85254
0.8489
Fig. 5 Train valid loss graph
Table 2 Comparison of accuracy with other method
Model
Dataset
Att-RNN (Jin et al. [20])
MediaEval
Accuracy (%) 74.17
ELA + ResNet-50 (proposed)
MediaEval
85.17
Detecting Image Forgery Over Social Media Using …
399
4 Conclusion In this study, we have proposed a deep learning-based approach to detect fake images shared over social media platforms. At the pre-processing stage, besides regular normalization of the image, each image was processed through ELA. The ELA images were then forwarded to the ResNet-50 DNN model to learn and detect the fake images. The experiment used Twitter fake and real images and verified the results. A good accuracy of 85.17 is achieved. It proves that a deep learning model can perform better in detecting fake images over social media platforms. It is to be noted that a deep learning model requires a massive volume of data to learn. We can fairly assume that the accuracy will improve further if more social media data is fed into the model. However, there were few limitations observed too. When the image’s size is very small and has undergone multiple compression, the model finds it challenging to predict. Also, the proposed solution is not verified against fake images generated through generative adversarial networks. This will be taken up as a further part of the study.
References 1. Stoll J (2020) Reading fake news about the Coronavirus in Norway 2020, by source. https:// www.statista.com/statistics/1108710/reading-fake-news-about-the-coronavirus-in-norwayby-source/. Accessed May 2 2020 2. Boididou C, Papadopoulos S, Dang-Nguyen D-T, Boato G, Riegler M, Middleton SE, Petlund A, Kompatsiaris Y (2016) Verifying Multimedia Use at MediaEval 2016 3. Sharma S, Sharma DK (2019) Fake news detection: a long way to go. In: 4th international conference on information systems and computer networks (ISCON), pp 816–821 4. Li G, Wu Q, Tu D, Sun S (2007) A sorted neighborhood approach for detecting duplicated regions in image forgeries based on DWT and SVD. In: Proceedings of IEEE international conference on multimedia and expo (ICME‘07), IEEE, Beijing, China, pp 1750–1753 5. Mahmood T, Nawaz T, Irtaza A, Ashraf R, Shah M, Mahmood MT (2016) Copy-move forgery detection technique for forensic analysis in digital images. Hindawi Publishing Corporation Mathematical Problems in Engineering 8713202, 13 6. Jwaid MF, Baraskar TN (2017) Study and analysis of copy-move & splicing image forgery detection techniques. In: International conference on I-SMAC (IoT in social, mobile, analytics, and cloud) (I-SMAC), pp 697–702 7. Alamro L, Nooraini Y (2017) Copy-move forgery detection using integrated DWT and SURF. J Telecommun, Electron Comput Eng (JTEC), pp 67–71 8. Hussain M, Qasem S, Bebis G, Muhammad G, Aboalsamh H, Mathkour H (2015) Evaluation of image forgery detection using multiscale weber local descriptors. Int J Artif Intell Tools 24(4):1540016 9. Chen B, Yu M, Su Q, Shim HJ, Shi Y (2018) Fractional Quaternion Zernike moments for robust color image copy-move forgery detection. IEEE Access, 56637–56646 10. Popescu AC, Farid H (2005) Exposing digital forgeries in color filter array interpolated images. IEEE Trans Signal Process, 3948–3959 11. Ferrara P, Bianchi T, Rosa AD, Piva A (2012) Image forgery localization via fine-grained analysis of CFA artifacts. IEEE Trans Inf Forensics Secur 7(5):1566–1577
400
B. Singh and D. K. Sharma
12. Sheng H, Shen X, Lyu Y, Shi Z, Ma S (2018) Image splicing detection based on Markov features in discrete octonion cosine transform domain. IET Image Proc 12(10):1815–1823 13. Mazumdar A, Bora PK (2016) Exposing splicing forgeries in digital images through dichromatic plane histogram discrepancies. In: Proceedings of the Tenth Indian conference on computer vision, graphics and image processing, 62, pp 1–8 14. Jaiswal AK, Srivastava R (2020) A technique for image splicing detection using hybrid feature set. Multimed Tools (2020), 11837–11860 15. Rao Y, Ni J (2016) A deep learning approach to detection of splicing and copy-move forgeries in images. In: IEEE international workshop on information forensics and security 16. Bayar B, Stamminger MC (2016) A deep learning approach to universal image manipulation detection using a new convolutional layer. In: IH&MMSec, Proceedings of the 4th ACM workshop on information hiding and multimedia security, pp 5–10 17. Rehman YAU, Po LM, Liu M (2018) LiveNet: Improving features generalization for face liveness detection using convolution neural networks. Expert Syst Appl 108:159–169 18. Liu B, Pun C-M (2020) Exposing splicing forgery in realistic scenes using deep fusion network. Inf Sci 2020(526):133–150 19. Xiao B, Wei Y, Bi X, Li W, Ma J (2020) Image splicing forgery detection combining coarse to refined convolutional neural network and adaptive clustering. Inf Sci 511:172–219 20. Jin Z, Cao J, Guo H, Zhang Y, Luo J (2017) Multimodal fusion with recurrent neural networks for rumor detection on microblogs. In: ACM on multimedia conference, p 795 21. Mangal D, Sharma DK (2020) Fake news detection with integration of embedded text cues and image features. In: 8th international conference on reliability, Infocom technologies and optimization (trends and future directions) (ICRITO), pp 68–72 22. Krawetz N (2007) A picture’s worth… Hacker Factor Solutions. https://www.hackerfactor. com/papers. Accessed May 10 2020 23. Robertson A (2012) During hurricane sandy, misinformation and fact-checking clash on Twitter. https://www.theverge.com/2012/10/30/3577778/hurricane-sandy-twitter-instagrammisinformation 24. Dwivedi P Understanding and coding a ResNet in Keras. https://towardsdatascience.com/und erstanding-and-coding-a-resnet-in-keras-446d7ff84d33 25. Durfee C (2019) A gentle introduction to cross-entropy for machine learning. https://www. aiproblog.com/index.php/2019/10/20/a-gentle-introduction-to-cross-entropy-for-machine-lea rning/. Accessed July 42020
Content-Based Video Retrieval Based on Security Using Enhanced Video Retrieval System with Region-Based Neural Network (EVRS-RNN) and K-Means Classification B. Satheesh Kumar, K. Seetharaman, and B. Sathiyaprasad Abstract The recent development in video data requires more prominent and capable techniques in retrieval of video which might improve the searching quality of video as well as can understand the content of video in higher standard, whereas to search relevant secured video from a document is a major setback for the digital research community. However, the major problem is the external information brought from digital resources. Therefore, this research proposed a secure CBVR framework directly select features from the ciphertext domain on the cloud server side. Therefore, high secure data has been stored in cloud using enhanced video retrieval system with region-based neural network (EVRS-RNN). From the resulted retrieval data, accuracy had been improved using ensemble classifier linear discriminant analysis (LDA) and relevant video frames retrieval using K-means classification technique. The simulation performance shows efficiency in security with obtaining parameters and improves the accuracy. Keywords CBVR · Enhanced video retrieval system with region-based neural network (EVRS-RNN) · Linear discriminant analysis (LDA) · K-means classification technique · Accuracy · Video frames
1 Introduction Recently, the techniques have been presented in retrieving the video on basis of their characteristics appearances. The frequent characteristics are color, texture, shape, motion, and spatial–temporal composition has been utilized in comparison with similar features of the visuals. Comprehending the storage inability, broadband web services of ubiquitous, cost-effective digital cameras along with video editing tools B. Satheesh Kumar (B) · B. Sathiyaprasad Department of Computer Science and Engineering, Annamalai University, Annamalai Nagar, Tamil Nadu 608002, India K. Seetharaman Department of Computer and Information Science, Annamalai University, Annamalai Nagar, Tamil Nadu 608002, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 G. Sanyal et al. (eds.), International Conference on Artificial Intelligence and Sustainable Engineering, Lecture Notes in Electrical Engineering 837, https://doi.org/10.1007/978-981-16-8546-0_33
401
402
B. Satheesh Kumar et al.
can give the output of well-organized video frames; the development of technologies for searching the video by researchers is for a number of years. The video production along with the technology for video delivery has carried the requirement for such devices to the cutting edge. Video retrieval keeps on being one of the most energizing and quickest developing exploration zones in the field of media innovation [1]. Notwithstanding the supported endeavors in the most recent years, the central test remains connecting the semantic hole. By this, we imply that low-level highlights are effectively estimated and figured, yet the beginning stage of the retrieval cycle is ordinarily the elevated level inquiry from a human [2]. Deciphering or changing over the inquiry presented by a human to the low-level highlights seen by the PC represents the issue in connecting the semantic hole. Be that as it may, the semantic hole isn’t only making an interpretation of elevated level highlights to low-level highlights. The embodiment of a semantic question is to comprehend the importance behind the inquiry. This can include understanding both the scholarly and passionate sides of the human, not just the refined sensible bit of the question yet additionally the individual inclinations and enthusiastic sub-tones of the inquiry and the particular type of the outcomes. Content-based image retrieval (CBIR) otherwise called query by image content (QBIC) and content-based visual information retrieval (CBVIR) is the utilization of PC vision to the video retrieval issue, that is, the issue of looking for video in huge information bases [3]. CBVR is used to identify the authenticated user by applying face recognition techniques. In this case, the authorized user is searched through by subtracting the biometric data stocked in the video database and then it is stored in cloud by a third party. If the authenticated user is identified by confirming the similarity, then user can login to the account to retrieve accountable work furnished by the unknown users. On the other side, cloud concentrates mainly the data security setbacks by exponential increase due to secrecy and confidentiality. Whereas, the other security issues are cloud that gets most of the resources from the external data, so the user’s unable to control the own data. Recently, it is clearly depicted that outsource data undergoes several security threats, the threats can be external (e.g., hackers) or internal [4]. Therefore, various techniques have been explored to secure CBVR methods, and the techniques are distinguished on the basis of secrecy measures estimated locally or global video frames signature; the suitable techniques are applied to determine the encryption with video frames signature comparison, and the techniques apply the encryption module along with video frames signature process and examination. It has to be determined by the procedure the signatures are handled, the best method is the outsourced video frames integrated with the signature are calculated in prior. In the above scenario, the server authenticates the signatures similarity. Whereas the techniques AES or 3-DES are approached to encrypt the video frames and the unintelligible video data are not manipulated till the video frames get decrypted, then secured CBVR techniques are mutually shared between the client and the server. The above scenario can be alternated by extracting the features from unintelligible video frames. Therefore, the above process is done by homomorphic encryption, and the platform
Content-Based Video Retrieval Based on Security Using Enhanced …
403
it performs the techniques with valuable operations (e.g. “SUM”, “DIVIDE”, “SUBTRACT”) onto unintelligible data with the assurance that the intelligible outcomes equal the one carried out onto primary data [5]. Recently, convolutional neural network (CNN) is a compelling deep learning techniques, and the retrieved video frames are more effective and have a better accuracy than the conventional human annotation. The other compatible technique approximate nearest neighbor (ANN) search. Therefore, the knowledge-based secured video frames retrieval is from the extracted feature of pre-trained CNN, secured techniques render both secrecy addition and multiplication operations. The above challenges can be overcome by fully homomorphic encryption techniques such as lattice-based schemes; therefore, the model is incompatible to manage large video frames database with high process complexity. To assist secure video frames, an implemented approach is secure multi-party computation (SMC) to estimate the video frames similarity calculation. It renders with a framework assist between two entities, i.e., the cloud and the client to move perpetually, and the above technique is not practicable for the consistent usage to the mobile user [6].
2 Review of Literature This section discusses existing techniques for CBVR with their classification techniques. The author in [7] has designed novel cross-modal CBVR method which can be used even for music involving techniques of deep neural networks. Network was trained with inter-modal ranking loss where the music as well as videos having identical semantics ended up together in the embedding space, thereby losing modality characteristics. This was solved by the use of soft intra-modal loss structure which controlled the distance of intra-modal samples prior to embedding. For the tasks related to video and music, the lack of standard protocols was also an issue which was solved by using quantitative as well as qualitative protocols. Another technique named a sketch-based video retrieval engine with IMOTION system has supported multiple queries. This system utilized a huge range of low-level image as well as video features for retrieval in vector space. Moreover, high-level spatial as well as temporal features were jointly used. Further, dedicated motion features are supported to specify motion in the video sequence. For specifying the queries, this system supported query-by-sketch interactions, query-by-example, motion queries, and any combination of these [8]. After this, the works and researchers have focused on video retrieval by image querying. As the approaches, extracting handcraft features from videos were inexpensive related to memory, deep feature, and a derivative of deep neural network, which was used to overcome this limitation. In particular, deep feature detected shots with identical key-frames which were then represented using various aggregation strategies. Thus, saving redundant video key-frames were avoided. Further, two-way localization approach was involved to discover the bestmatched area between the query and video key-frames. This similarity was used to re-rank initial retrieval output [9]. Since authors have discussed the query-based
404
B. Satheesh Kumar et al.
video retrieval technique, the work in [10] designed quantization-based hashing (QBH) technique which integrated the benefits of quantization-based approach with similarity-preserving hashing approach. QBH preserved the similarity property with less quantization error. The framework was useful for both the supervised and unsupervised hashing. Experiments were carried out with real-life image as well as video datasets. QBH when compared with conventional methods achieved higher accuracy by maintaining the same computational time [10]. The indexing approach is also presented in [11] where it has been introduced content-based lecture video retrieval along with indexing approach. Audio and video lecture videos were given to automatically extract content-based metadata. Many indexing features were introduced for huge lecture video portal by involving these metadata, and a user study was made. Human action retrieval is also discussed in [12] where they investigate the techniques of representing action and retrieving information recently developed for a human action retrieval system. Those techniques involved a series of several operations. From the study carried out, several techniques were combined to analyze the performance of the developed hybrid technique with the realistic action datasets, namely UCF YouTube, UCF Sports, and HOHA2.
2.1 Problem Statement • While retrieving the video frames from cloud content its level becomes a low highdimensional feature which varies in colors, textures, edges, or any combination. • Hacking CBVR is quite different from hacking a cryptosystem where the disclosure of the secret key grants full decryption of ciphertexts. Here, the success or failing of an attack cannot be simply measured by a binary. • Attacking video frames results in content manipulation that induces distortion. • To either improve the quality of the results and/or to reduce query response times, most CBVR systems use various optimizations.
2.2 Motivation of the Work • To resort (protecting the data in image), the data from cloud infrastructure for large image storage and retrieval using EVRS-RNN (enhanced video retrieval system with region-based neural network). • To improve the retrieval accuracy is using linear discriminant analysis (LDA) and along with the relevant images retrieval using K-means classification technique with appropriate similarity among the query image.
Content-Based Video Retrieval Based on Security Using Enhanced …
405
3 Research Methodology The overall architecture for research methodology is explained with flow diagram, Our proposed system architecture is illustrated in Fig. 1 where the above block diagram Fig. 1 is the proposed general architecture of the research methodology. Initially, it takes the data from the stored database and then it starts with the improvement of feature dimensionality on basis of color, texture, and edges (descriptors of video frames). The huge secured data has to be stored in the cloud with model on the basis of knowledge-based neural network. For this stored data, they have to improve the retrieved data accuracy and relevant video frames retrieval classification technique. From this, the classified output is obtained and it is retrieved with high security. Initially, the input video frame images have been taken, and it is preprocessed for noise removal. Then the features of image have been extracted using MVGG_16. Then these extracted features have been classified. Then in query set of the image, the test image has been taken and again its features have been extracted using MVGG_16. Then its chi-squared distance has been calculated which then obtains the ranking test data of the input image. Finally, the similarity measures of image from training set and query set have been compared and there it retrieves the enhanced video frames. MVGG_16 has been implemented and accuracy is measured; as the extension for MVGG_16, we enhance the proposed research with security improvement for video retrieval. The proposed architecture for this secured content-based video retrieval is given in Fig. 2. Figure 2 represents working architecture of the projected methodology. Firstly, input dataset has been trained; in that process, the video frames have been preprocessed. In preprocessing, the video frames have been resized and the noise is removed;
Fig. 1 General architecture diagram of the research methodology
406
B. Satheesh Kumar et al.
Fig. 2 Architecture proposed methodology
initially, then the video frames have been cleaned. This noise-removed video frames are in extracting the features using MVGG_16. This output is the trained video frames, and the trained video frames are then stored and search for the match of database present in the cloud architecture. Here the security is enhanced using EVRS-RNN (enhanced video retrieval system with region-based neural network). After this secured search, the data accuracy has been calculated for the trained data and test video is also combined here in data accuracy calculation using LDA. After this, the relevant video frames have been retrieved using K-means classification. Then ranking matrix is done to get the appropriate classified output, and finally, the output is retrieved video frames with high security and accuracy.
3.1 Cloud Architecture It exists of few blocks, i.e., the cloud servers (C1, C2), and (multiple) customers. The authenticated customer encrypts the trained video frames and stock unreadable video data onto the cloud servers; the server will take content-based query video frames from the users. The encrypted data is stocked in cloud servers to execute contentbased image retrieval by using huge cloud database with less computation power. Two cloud servers, C1 and C2, to stock the unreadable encrypted sub-video frames, respectively, as shown in Fig. 3. C1 keeps its private key sk, and EVES operations is performed in C2 to determine the computations with high security on the nonlinear layers. Where C1, C2 cloud servers share the feature extracted from the model and encrypt the video frames, respectively. The users intact the similar nearest neighbor
Content-Based Video Retrieval Based on Security Using Enhanced …
407
Fig. 3 System architecture with the MVGG_16 model
(SNN) query-based along with the unintelligible video frames together. The encrypt video frames uploaded by the users of the trained model are converted to encrypted data format by a user’s and detect a test video. However, the trained image I user gets the encrypted video frames Ia by summating a stochastic matrix Ib along with it. However, Ia is transferred to the cloud server C1, and Ib is transmitted to C2. Where exploratory SNN video frames, a customer makes over the test video trapdoor in similar pattern. Subsequently, accepted Ia , C1 carries out enhanced video retrieval system (EVRS) protocols with C2 to convert the unintelligible image Ib to the primary image with key sk.
3.2 Enhanced Video Retrieval System (EVRS) with Region-Based Neural Network (EVRS-RNN) Autoencoders are a class of knowledge-based neural networks, these neural networks match the similarity between the input vector to that of the output vector, and it is an unsupervised neural networks; the network gets the video frames from the cloud server C1 as input, then undergoes preprocessing process of reducing the dimension; also by input video frames resizing, it provides maximum dimensionality reduction of the data. The compressed data has been obtained by proposed neural networks through encoding the data using unsupervised way. Moreover, the layer of network train only one layer at time where this could minimize the computational complexity, and this requires reconstruction to an efficient model. The input images in hidden layer have been minimized, and the constructed output video frames are shown in Fig. 4; the data encoding is done using the network (i.e., feature compression). For noise removal, the construction of an autoencoder is used since it is highly robust in training an autoencoder for the input; this is called a denoising autoencoder. More than the typical autoencoders, this type of technique has higher generalization and robustness.
408
B. Satheesh Kumar et al.
Fig. 4 Procreative intruder network
To learn the secured data of the security system-based on their features in random similarity and parameter sensitivity, the neural network uses procreative intruder network (PIN). The technique used in conversion of video frames data into an unintelligible form is known as data encryption that uses process of the key. At present, the security system has been deployed in two encryption techniques, i.e., stream cipher and block cipher. For generating a pseudo-random sequence in retrieving the video frames, stream cipher integrates neural network; also, it uses the parameter as the key. The generator takes input video frames from the video database and renders unintelligible data with analogous characteristics as original data as shown in Fig. 4. The discriminator output the real data by decrypting the encrypted data from the generator ROM the generator endeavor to differentiate whether the input is real or fake. By the training of network autoencoder, for generating the new data element of generator has been which is not distinct from real data.
3.2.1
Encryption Module
For every pixel of input image, the encryption has been done and transformed by these pixel using permutation, substitution, and impurity addition. The maximum level of encryption in image has been done using two levels in encryption. Algorithm shows the necessary transmutation: Algorithm Enhanced video retrieval system (EVRS) First level of encryption (steps 1 to 8): Step 1: Step 2: Step 3: Step 4: Step 5: Step 6:
Initially, the pixel value of the video frames is represented in binary with each pixel as 8 bits [01000011] Separate the 8-bit pixel values into two 4-bits (nibbles) [0100] & [0011] In substitution the 4-bit (nibbles) concatenate to form a byte [00110100] The primary data lower significant nibble bit [0111] is EX-OR ed with the most significant nibble bit [0100] which is the condition computed. The resulted EX-ORed data is shifted to the right by 5 bits. Now, the resulted Exoring value to get 9 bit number [011100000] The binary bit value in step 3 and step 5 is EX-Ored and the resulted outcomes are [011010100] = [212].
Content-Based Video Retrieval Based on Security Using Enhanced …
Step 7:
409
In adding the compute value to the resulted outcomes in step 6. Then randomly the value is selected as 117, as [212 + 117 = 329]. The above steps from step1 to step7 are computed to the entire resulted in pixels video frames file.
Step 8:
Addition of two columns : Step 9:
In this step, substituting two columns and the value of 117 is substituted to the first new column and 627 is substituted to the second new column. The above process is done to normalize the matrix.
Second level encryption Step 10:
3.2.2
In this step, substitute another level of computation to the output matrix acquired in step 9 such that the computation changes with regard to the position of the pixel.
Decryption Module
The receiver end trains the decrypted value in this model for the standard mapping value, and the weights and biases are evaluated along with the encrypted pixel values before transformation the decrypted values. The architecture of three-layer backpropagation in RNN is shown in Fig. 5. W i : represents weight matrix of the ith layer f : denotes activation function bij : bias of the jth neuron in the ith layer.
Fig. 5 Architecture of three-layer backpropagation in RNN
410
B. Satheesh Kumar et al.
Architecture is designed as three basic layers: input, output, and the hidden layers. According to the structure model of encrypted value, only one neuron is present in input and output layers, where the hidden layer has 627 neurons. Neurons calculation has been enhanced along with the encrypted data. The process of decryption has accomplished in three steps. The first stage is preprocessing, and the position of pixel has been removed since the value of computation has been changed. In the second stage, to the resulted output of the first stage matrix, two columns have been substituted, and the above process is added during the encryption are removed. In third stage, the video frames data which is accepted and their weights have been stocked after training which is used in network implementation.
4 Performance Analysis The executed analysis determines the performance of the secured-based memory usage and overall performance of the content-based video retrieval, respectively. During secured-based video frame retrieval estimated memory cost (it determines the usage of memory) and the retrieve time (it determines the utilization of the processing time). During the indexing-based content video retrieval estimates performance analysis such as the precision, recall, and F1-score. For secured image retrieval, comparison with cost of time and memory cost with the works in [8–10]. Table 1 shows the some of the observation of six commonly used datasets, and the outcome of the video frame retrieval has been estimated from the videos with the same similarity, then ranking the video frames in the database by evaluating the pixels values of the deep feature (correlation distance) of the random two videos in the datasets. Table 2 shows the comparison of performance in memory size for the proposed architecture and existing algorithm. It gives the image size in terms of KB which is compared with memory size in terms of KB. Figure 6 is graphical representation for Table 2. It shows the comparison for various methods in terms of memory utilization by the secured content image in the cloud server. It is memory utilization comparison between existing and proposed Table 1 Performance of proposed EVRS system for secure video frame search S. No.
Dataset
Dataset size (bytes) (k)
Memory size (MB)
Retrieve time (ms)
1
Bus
1.700
1805
234
2
Car
2.423
2378
674
3
Flight
1.673
1678
164
4
Crowd
3.300
1437
183
5
Apple
1.845
2378
317
6
School
2.671
1673
287
Content-Based Video Retrieval Based on Security Using Enhanced …
411
Table 2 Comparison of performance in memory size of proposed system and existing algorithm S. No.
Image size (KB)
Memory size (KB) AdaBoost
Decision tree
QBVH
PRO_EVRS
1
256
48
58
66
32
2
312
50
52
54
36
3
456
52
56
58
38
4
256
64
60
66
54
5
308
70
68
70
66
6
376
74
69
76
67
Fig. 6 Comparison of memory size for proposed system and existing algorithm
techniques. As shown in the above fig, the proposed EVRS utilizes minimum memory than existing techniques. Figure 7 is graphical representation for Table 3. It shows the comparison for various methods in terms of retrieving time for the retrieval of secured content image in the cloud server. It is retrieving time comparison between existing and proposed techniques. As shown in the above fig, the proposed EVRS utilizes minimum time than existing techniques. Table 3 shows comparison of existing and proposed techniques for parameters precision, F1-score, and recall for query image bus, car, flight, crowd, apple, school. On comparing with the existing technique, proposed techniques give optimal results in retrieving the image on basis of relevant image. Figure 8 shows the comparison for precision, recall, F1-score among existing and proposed techniques for query images bus, car, flight, crowd, apple, school as per Table 3.
412
B. Satheesh Kumar et al.
Fig. 7 Comparison of time for proposed system and existing algorithm
5 Conclusion Content-based video retrieval (CBVR) has been more extensively used in most of the multimedia applications. Storing multimedia content requires a huge storage of video database and render computational constraints. To overcome the above constraints, the outsourcing of CBVR services is very attractive. However, the security setbacks are mainly due to the external source and are a serious big dispute in stocking multimedia content. So this paper proposes the enhancement of feature based on color, texture, and edges with the security. Huge data has been stored in cloud using enhanced video retrieval system (EVRS) with region-based neural network (EVRS-RNN). From these data, accuracy for retrieval of data has been improved using linear discriminant analysis (LDA) and relevant images retrieval using K-means classification technique. The simulation results show enhancement in CBVR with security using deep learning techniques. The parameters obtained are accuracy, F1-score, recall, memory utility, and retrieve time.
Content-Based Video Retrieval Based on Security Using Enhanced …
413
Table 3 Overall comparison of existing and proposed techniques for precision, F1-score, and recall Method
Query image
Dataset (bytes) (k)
Precision (%)
Recall (%)
F1-score (%)
AdaBoost
Bus
1.700
88
93
88
Decision tree
90
96
93
QBVH
93
95
94
Pro_EVRS
95
98
96
AdaBoost
85
90
87
Decision tree
Car
2.423
89
93
90
QBVH
93
87
92
Pro_EVRS AdaBoost
Flight
1.673
Decision tree
96
96
95
76
81
84
81
85
87
QBVH
85
89
93
Pro_EVRS
89
93
96
81
87
79
Decision tree
84
90
82
QBVH
87
93
86
Pro_EVRS
92
95
89
AdaBoost
AdaBoost
Crowd
Apple
3.300
77
84
78
Decision tree
1.845
81
88
82
QBVH
84
91
84
Pro_EVRS AdaBoost Decision tree
School
2.671
87
93
87
76
81
79
79
84
83
QBVH
83
87
86
Pro_EVRS
86
90
89
414
B. Satheesh Kumar et al.
Fig. 8 Overall comparison of existing and proposed techniques for precision, F1-score, and recall
References 1. Patel BV, Meshram BB (2012) Content based retrieval systems. Int J Ubi Comp 3:13–30 2. Lin L, Chen C, Shyu M-L, Chen S-C (2011) Weighted subspace filtering and ranking algorithms for video concept retrieval. IEEE Multim 18 3. Lai C-C, Chen Y-C (2011) A user-oriented image retrieval system based on interactive genetic algorithm. IEEE Trans Instrum Measur 60 4. Xia Z et al (2017) EPCBIR: an efficient and privacy-preserving content-based image retrieval scheme in cloud computing. Inf Sci 387:195–204 5. Prathiba T, Shantha Selva Kumari R (2020) Content based video retrieval system based on multimodal feature grouping by KFCM clustering algorithm to promote human–computer interaction. J Ambient Intell Humanized Comput, 1–15 6. Tahboub K et al (2014) An HEVC compressed domain content-based video signature for copy detection and video retrieval. Imaging Multim Analytics Web Mob World, 9027 7. Hong S, Im W, Yang HS (2018) CBVMR: content-based video-music retrieval using soft intramodal structure constraint. In: Proceedings of the 2018 ACM on international conference on multimedia retrieval, pp 353–361 8. Rossetto L, Giangreco I, Schuldt H, Dupont S, Seddati O, Sezgin M, Sahillio˘glu Y (2015) IMOTION—content-based video retrieval engine. In: International conference on multimedia modeling, pp 255–260 9. Wang M, Ming Y, Liu Q, Yin J (2017) Image-based video retrieval using deep feature. In: IEEE international conference on smart computing, pp 1–6 10. Song J, Gao L, Liu L, Zhu X, Sebe N (2018) Quantization-based hashing: a general framework for scalable image and video retrieval. Pattern Recogn, 175–187 11. Yang H, Meinel C (2014) Content based lecture video retrieval using speech and video text information. IEEE Trans Learn Technol 7:142–154 12. Jones S, Shao L (2013) Content-based retrieval of human actions from realistic video databases. Inf Sci 236:56–65
Approximate Bipartite Graph Matching by Modifying Cost Matrix Shri Prakash Dwivedi
Abstract Graph matching is the process of evaluating the structural similarity between the two graphs. Bipartite graph matching is one of the important technique for error-tolerant graph matching. In this paper, we present an approach to bipartite graph matching by considering the edge assignment instead of node assignment during the creation of cost matrix. We demonstrate this technique can achieve better accuracy on some graph dataset. Keywords Graph matching · Bipartite graph matching · Graph edit distance
1 Introduction The graph is a versatile data structure in computer science and mathematics. Graphs consist of nodes and edges, where each edge is a pair of nodes. Because of its inherent ability to identify relationships between different objects represented as nodes, it is widely used in graph-based representation and structural pattern recognition. In many structural pattern recognition applications, objects can be represented as nodes and the connections or relationships between these objects can be denoted by edges. Graph matching (GM) is the process of computing similarities between two graphs. When two graphs are similar, there is a one-to-one correspondence between the node and the edges of the two graphs; we say them to be isomorphic. The mapping which achieves the above correspondence is known as graph isomorphism. Although the graph and subgraph isomorphism is important theoretically, they have limited applications in real-world circumstances, where the graph often gets modified due to the presence of noise during the processing of graphs. Due to the above reasons, the GM is often classified into exact and inexact GM. Inexact GM’s aim is to find a measure of similarity between two graphs, which can be used to specify the S. P. Dwivedi (B) Department of Information Technology, G.B. Pant University of Agriculture and Technology, Pantnagar, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 G. Sanyal et al. (eds.), International Conference on Artificial Intelligence and Sustainable Engineering, Lecture Notes in Electrical Engineering 837, https://doi.org/10.1007/978-981-16-8546-0_34
415
416
S. P. Dwivedi
extent by which two graphs are similar. Inexact GM is also known as error-tolerant GM since it can be used to adapt errors or noise. An extensive survey of important GM techniques is provided in [5]. A review of advances in pattern recognition techniques based on GM and related concepts is given in [12]. An interesting class of GM methods is based on the extension of kernel methods to graphs. A description of different representational languages for structured data such as graphs is provided in [13]. In [18], the authors describe the graph edit distance and kernel machines and provide various error-tolerant graph kernels that can be used for a variety of graph kernels. Convolution kernels on discrete structures like trees and graphs are described in [14]; it is a generalization of the family of radial basis kernels. Diffusion kernels, which are a family of kernels for statistical learning using the geometric structures of the statistical models, are described in [16]. A novel method of GM is based on geometric graphs, which utilizes the geometric properties of the graphs to find the similarity between two graphs [8] and uses its similarity score to perform error-tolerant GM [10]. An efficient algorithm for computing frequent patterns in geometric subgraphs out of a large collection of geometric graphs is given in [15]. This algorithm can find geometric subgraphs, which are translation, rotation and scaling invariant. In [4], the authors present a similarity measure for geometric graphs and demonstrate it to be a metric. They also describe experiments for heuristic distance functions. GM problem is formulated in a maximum likelihood estimation framework; subsequently, the expectation-maximization method is used for finding GM between two geometric graphs in [1]. An efficient GM method for generalized geometric graphs using Monte Carlo Tree Search is described in [19]. A recent category of GM is based on the notion of Graph Edit Distance (GED) [2, 23, 25]. Since efficient polynomial time algorithms for GM and GED are not available, various approximation and heuristic methods utilizing local search, greedy techniques, neighborhood search, centrality information, bipartite GED, etc., are proposed [6, 7, 9, 11, 17, 21, 22, 24]. In this paper, we present an approach to inexact GM using bipartite GED. We use an alternative cost matrix consisting of edge edit operations as compared to node edit operations, and it is used to compute an optimal assignment between edges. This paper is structured as follows. Section 2, provides basic definitions and motivation. Section 3, presents modified bipartite matching, cost matrix and algorithm. Section 4, describes the discussion and lastly, Sect. 5 contains the conclusions.
2 Basic Concepts and Motivation This section describes the basic definitions and concepts used in GM. The graph is defined as g = (V, E, μ, ν) where V, E, μ, ν are set of vertices, set of edges connecting between the vertices, node labeling function and edge labeling function, respectively. Node labeling function assigns a node label to every node (from the node label set L V ), whereas the edge labeling function assigns an edge label to every edge of the graph (from the edge label set L E ).
Approximate Bipartite Graph Matching by Modifying Cost Matrix
417
A graph g1 is said to be a subgraph of g2 , if vertex set V1 of g1 is a subset of vertex set V2 of g2 , edge set E 1 of g1 is a subset of edge set E 2 of g2 , node labeling function and edge labeling function of both graphs are equal for every nodes and edge of both graph, respectively. A graph is bipartite, when all the vertices of the graph can be can be divided in to two sets, such that every edge of this graph connects vertices from one set to another one. In other words for every edge, first vertex belongs to one set, whereas second vertex belongs to another set. There should be no edge that connects vertices from the same vertex set. In the bipartite GED during the creation of cost matrix, node assignment from one graph g1 to another g2 is performed first and then implied edge assignment cost is added to the individual elements of cost matrix. When the graph is dense, the number of edges in the graph can be far more than the number of nodes in the graph. For example, in a graph with n nodes, there can be up to C(n, 2) edges in a simple graph. For such cases, we can define the cost matrix by first assigning the edges from g1 to g2 using Linear Sum Assignment Problem (LSAP). The implied node edit operations get added to the individual elements of the cost matrix.
3 Bipartite Graph Matching Bipartite GM uses the concept of bipartite matching to compute the GED between two graphs. The GED computation can be formulated as an instance of the quadratic assignment problem (QAP). The assignment problem is the task of assigning n elements of one set to n elements of another set. It can be seen as a bijective (oneto-one and onto) mapping between the two sets’ elements. In the case of QAP, the cost function is a quadratic function, whereas for the LSAP, the objective function is linear. There are differences between the QAP and the linear assignment problems. For the LSAP, efficient polynomial-time solutions are available, but for the QAP, no efficient polynomial time solution is available and is demonstrated to be an NP-Hard problem. GED can be formulated as an instance of a QAP. Bipartite GED reduces the QAP of GED to an instance of LSAP. The assignment problem formulated in terms of bipartite graph is known as bipartite GM problem. The formulation of assignment problem using bipartite graph consists of assignment of nodes from one vertex set of bipartite graph to another vertex set. Node labeling function in the bipartite GM assigns a unique label to every vertices of both vertex set of the graph. Edge label assigns the cost of matching vertices adjoining this edge. Bipartite GM considers the structures of nodes only by performing linear sum assignment of nodes from graph g1 to g2 , after that it evaluates only the implied edge edit operations, due to which it does not leads to globally optimal solutions. For most of the connected graphs, the number of edges is usually more than that of nodes. Complete connected graphs have C(n, 2) number of edges. In such cases, by first performing the linear sum assignment from edge set E 1 of g1 to edge set E 2 of g2 , we can get better overall optimization.
418
S. P. Dwivedi
For this modified bipartite GM, we can use the edge-based error-tolerant GM instead of standard node-based error-tolerant GM. Let g1 = (V1 , E 1 , μ1 , ν1 ) and g2 = (V2 , E 2 , μ2 , ν2 ) be two graphs. An edgebased error-tolerant GM is a mapping f : E 1 ∪ {} → E 2 ∪ {}, where refers to empty node. Here mapping f follows: f (e1 ) = ⇒ f (e1 ) = f (e2 )∀e1 , e2 ∈ E 1 and f −1 (h 1 ) = ⇒ f −1 (h 1 ) = f −1 (h 2 )∀h 1 , h 2 ∈ E 2 . To do LSAP between edges of g1 and g2 , similar to bipartite GED, we make the edge set E 1 and E 2 equal by padding some additional empty edges to both edge set.
3.1 Modified Cost Matrix Suppose |E 1 | = n and |E 2 | = m. Then E 1 = E 1 ∪ {1 , 2 , ..., m } and E 2 = E 2 ∪ {1 , 2 , ..., n } such that |E 1 | = |E 2 | = n + m. Using the extended edge sets E 1 = {e1 , e2 , ..., en , 1 , 2 , ..., m } of g1 and E 2 = {h 1 , h 2 , ..., h m , 1 , 2 , ..., n } of g2 we can create a cost matrix C. M11 M12 C= M21 M22 ⎡
where M11
c11 ⎢c21 ⎢ =⎣ ... cn1 ⎡
M21
c12 c22 ... cn2
c1 ⎢∞ =⎢ ⎣ ... ∞
... ... ... ...
∞ c2 ... ∞
⎤ ⎡ c1m c1 ⎢ c2m ⎥ ⎥ M12 = ⎢ ∞ ⎣ ... ... ⎦ cnm ∞
... ... ... ...
∞ c2 ... ∞
... ... ... ...
⎤ ∞ ∞⎥ ⎥ ... ⎦ cn
⎤ ⎡ ∞ 0 0 ⎢0 0 ∞⎥ ⎥M =⎢ ... ⎦ 22 ⎣... ... cm 0 0
... ... ... ...
⎤ 0 0⎥ ⎥ ...⎦ 0
Here, entry ci j denotes the cost c(ei → h j ) of the edge substitution (ei → h j ), ci represents the cost c(ei → ) of the edge deletion (ei → ) and c j indicates the cost c( → h j ) of the edge insertion ( → h j ). Here, we observe that the upper left portion of the cost matrix C, i.e., M11 denotes the costs of all edge substitution, upper right part M12 indicates the costs of all edge deletion. Similarly, bottom left part M21 represents the costs of all edge insertion, and the bottom right part M22 of C denotes edit operations ( → ). It has 0 costs since it does not represent any edit operations. The permutation (ϕ1 , ϕ2 , ..., ϕn+m ) represents a bijective assignment γ = {(e1 → h ϕ1 ), (e2 → h ϕ2 ), ..., (em+n → h ϕm+n )} from the edge set E 1 of g1 to E 2 of g2 . This assignment includes edge edit operations of the form (ei → h j ), (ei → ), ( → h j ) and ( → ). Here, every permutation is associated with a complete edit path. Another strategy for designing a cost matrix is to find an optimal assignment of edges from one graph to another and then compute the optimal node assignment
Approximate Bipartite Graph Matching by Modifying Cost Matrix
419
between two graphs. Check all the edge assignments which are consistent with node assignment and retain them. For the remaining nodes whose cost is already not added to the assignment matrix, add the cost of implied node edit operations to the corresponding edge assignment costs.
3.2 Algorithm The computation of approximate GED between two undirected graphs g1 and g2 is depicted in Algorithm 1. A brief explanation of this algorithm is as follows. The input of the algorithm is two undirected graphs g1 and g2 . The graph g1 has n number of edges and the graph g2 has m number of edges, i.e., |E 1 | = n and |E 2 | = m. Output of the algorithm is approximate GED between g1 and g2 . Lines 1–2 of the algorithm computes the updated edge set E 1 and E 2 , respectively. Since the number of edges of the two graphs g1 and g2 , which are n and m, respectively, may not be equal, we have to modify edge set E 1 and E 2 to make them equal. For this purpose, edge set E 1 is appended with extra m edges ({1 , 2 , ..., m }) in line 1, to make the new edge set E 1 . Similarly, edge set E 2 is appended with extra n edges ({1 , 2 , ..., n }) in line 2, to make the new edge set E 2 . Cost matrix C is created in line 3 as described above. The cost matrix C consists of four component matrix M11 , M12 , M21 and M22 . M11 represents cost of all edge substitution, M12 denotes the costs of edge deletion, M21 indicates the cost of edge insertion and M22 is a null matrix. The cost matrix in line 3 consists of only edge edit operations, which should be modified to keep the account of node edit operations also. In lines 5–7, each element of the cost matrix is updated by adding the implied node edit operations. For the upper left matrix M11 , ci j = ci j + c(u i → vi ) + c(u j → v j ). The upper right matrix M12 becomes ci j = ci j + c(u i → ) + c(u j → ), and for lower left matrix M21 , ci j = ci j + c( → vi ) + c( → v j ). The optimal edge assignment for the updated cost matrix is performed in line 9. Finally, approximate GED of the complete edit path is returned in line 10. Algorithm 1 : Approx-Bipartite-Graph-Matching (g1 , g2 ) Input: Two undirected graphs g1 , g2 , where gi = (Vi , E i , μi , νi ) for i = 1, 2 Output: Approximate GED between g1 and g2 1: Compute E 1 = E 1 ∪ {1 , 2 , ..., m } 2: Compute E 2 = E 2 ∪ {1 , 2 , ..., n } 3: C = (ci j ) 4: // Update C by adding implied node edit operations to each element (ci j ) 5: C = (ci j ) = (ci j ) + c(u i → vi ) + c(u j → v j ) // for M11 6: C = (ci j ) = (ci j ) + c(u i → ) + c(u j → ) // for M12 7: C = (ci j ) = (ci j ) + c( → vi ) + c( → v j ) // for M21 8: // Compute optimal edge assignment on C 9: γ = {(e1 → h ϕ1 ), (e2 → h ϕ2 ), ..., (em+n → h ϕm+n )} 10: return GED dϕ (g1 , g2 ) of complete edit path
420
S. P. Dwivedi
We observe that the modification of edges in lines 1–2 takes linear time. The modifications of cost matrices in lines 5–7 also takes around linear time. The most expensive step of the algorithm is the optimal edge assignment, performed in line 9. The naive brute-force approach for solving assignment problem will have to check all the permutations of the two sets, along with their associated cost, making it to be exponential time in the worst case. Fortunately, efficient algorithms are available to perform the optimal assignment in linear time. One of them is Munkres algorithm, which can perform the assignment in O(n 3 ) time. Therefore, the time taken by Approx-Bipartite-Graph-Matching algorithm is O((n + m)3 ) due to assignment in line 9, whereas the space complexity is O((n + m)2 ).
4 Discussion The proposed modification of cost matrix in Algorithm 1 may be used as a trade-off between execution time and accuracy for GM problem. For the graphs having edges more than that of nodes, the assignment of edges between two graphs instead of nodes can lead to better accuracy. Linear sum assignment using modified cost matrix takes O(n) time more during execution time as compared to standard cost matrix. Suppose n is the number of nodes, and m is the number of edges. For dense graphs usually m > n, therefore to do the assignment between two graphs, the modified cost matrix based on edge edit operations can outperform the assignment using cost matrix based on node edit operations, since it uses the edge structure of graphs in to considerations followed by the implied node edit operations as opposed to node assignment followed by implied edge edit operations. Also, the optimal assignment with m edges followed by node operations will lead to a more accurate solution as compared to the optimal assignment with n nodes followed by implied edge operations. Table 1 gives a comparison of the average and the maximum number of nodes and edges for the various dataset of IAM graph database repository [20]. We observe that letter dataset including low, medium and high have a less average number of edges than nodes. Whereas digit, AIDS and protein dataset have a more average number of edges than nodes. As expected with an increase in nodes in different graph datasets, the corresponding increase in edges is higher. Table 1 Nodes versus edges for various graph dataset Dataset Mean nodes Mean edges Letter low Letter medium Letter high Digit AIDS Protein
4.7 4.7 4.7 11.8 15.7 32.6
3.1 3.2 4.5 13.1 16.2 62.1
Max nodes
Max edges
8 9 9 32 95 126
6 7 9 30 103 149
Approximate Bipartite Graph Matching by Modifying Cost Matrix
421
We observe that when accuracy is of prime concern, we can use the modified bipartite GED, whereas when execution time is the main criteria, we can use the standard bipartite GED. Also before applying Algorithm 1, we make sure that |E 1 | + |E 2 | > |V1 | + |V2 |, otherwise we can use bipartite GED using the unmodified cost matrix.
5 Conclusion In this paper, we described an approach to bipartite GM using modification to cost matrix in which assignments of edges between the two graphs are performed based on the edge edit operations, while the implied node edit operations are simply added to the corresponding elements of the cost matrix. We demonstrated that this approach could lead to better accuracy at the cost of more time during execution.
References 1. Armiti A, Gertz M (2014) Geometric graph matching and similarity: a probabilistic approach. SSDBM 2. Bunke H (1998) Error-tolerant graph matching: a formal framework and algorithms. Advances in pattern recognition, statistical techniques in pattern recognition (SPR) and Structural and syntactic pattern recognition (SSPR), LNCS, 1451 3. Bunke H, Allerman G (1983) Inexact graph matching for structural pattern recognition. Pattern Recogn Lett 1:245–253 4. Cheong O, Gudmundsson J, Kim H, Schymura D, Stehn F (2009) Measuring the similarity of geometric graphs. Exp Algorithms LNCS 5526:101–112 5. Conte D, Foggia P, Sansone C, Vento M (2004) Thirty years of graph matching in pattern recognition. Int J Pattern Recogn Artif Intell 18(3):265–298 6. Dwivedi SP (2019) Some algorithms on exact, approximate and error-tolerant graph matching. PhD Thesis, Indian Institute of Technology (BHU), Varanasi 7. Dwivedi SP, Singh RS (2017) Error-tolerant graph matching using homeomorphism. In: International conference on advances in computing, communication and informatics (ICACCI), pp 1762–1766 8. Dwivedi SP, Singh RS (2018) Error-tolerant geometric graph similarity. SPR and SSPR, lecture notes in computer science. Springer 11004, pp 337–344 9. Dwivedi SP, Singh RS (2018) Error-tolerant graph matching using node contraction. Pattern Recogn Lett 116:58–64 10. Dwivedi SP, Singh RS (2019) Error-tolerant geometric graph similarity and matching. Pattern Recogn Lett 125:625–631 11. Dwivedi SP, Singh RS (2020) Error-tolerant approximate graph matching utilizing node centrality information. Pattern Recogn Lett 133:313–319 12. Foggia P, Percannella G, Vento M (2014) Graph matching and learning in pattern recognition in the last 10 years. Int J Pattern Recogn Artif Intell 88:1450001.1–1450001.40 13. Gartner T (2008) Kernels for structured data. World Scientific 14. Haussler D (1999) Convolution kernels on discrete structures. Technical report UCSC-CRL99-10, University of California, Sant Cruz
422
S. P. Dwivedi
15. Kuramochi M, Karypis G (2007) Discovering frequent geometric subgraphs. Inf Syst 32:1101– 1120 16. Lafferty J, Lebanon G (2005) Diffusion kernels on statistical manifolds. J Mach Learn Res 6:129–163 17. Neuhaus M, Riesen K, Bunke H (2006) Fast suboptimal algorithms for the computation of graph edit distance. Statistical techniques in pattern recognition (SPR) and Structural and syntactic pattern recognition (SSPR), LNCS, 4109. Springer, pp 163–172 18. Neuhaus M, Bunke H (2007) Bridging the gap between graph edit distance and kernel machines. World Scientific 19. Pinheiro MA, Kybic J, Fua P (2017) Geometric graph matching using Monte Carlo tree search. IEEE Trans Pattern Anal Mach Intell 39(11):2171–2185 20. Riesen K, Bunke H (2008) IAM Graph database repository for graph based pattern recognition and machine learning. Statistical techniques in pattern recognition (SPR) and Structural and syntactic pattern recognition (SSPR), LNCS, 5342. Springer, pp 287–297 21. Riesen K, Bunke H (2009) Approximate graph edit distance computation by means of bipartite graph matching. Image Vis Comput 27(4):950–959 22. Riesen K, Bunke H (2015) Improving bipartite graph edit distance approximation using various search strategies. Pattern Recogn 48(4):1349–1363 23. Sanfeliu A, Fu KS (1983) A distance measure between attributed relational graphs for pattern recognition. IEEE Trans Syst Man Cybern 13(3):353–363 24. Sorlin S, Solnon C (2005) Reactive Tabu search for measuring graph similarity. GbRPR LNCS 3434:172–182 25. Tsai WH, Fu KS (1979) Error-correcting isomorphisms of attributed relational graphs for pattern analysis. IEEE Trans Syst Man Cybern 9:757–768
Comparative Analysis of Texture-Based Algorithms LBP, LPQ, SIFT, and SURF Using Touchless Footprints Anshu Gupta and Deepa Raj
Abstract Person identification and verification based on their footprints have not gained wider popularity in the field of biometrics. This paper mainly explores various texture-based algorithms, for personal recognition using human footprints, which drill down their major features. Features of footprints are hauled out using various feature extraction methods like LBP, LPQ, SIFT, and SURF followed by matching and recognition. Afterward, a comparative study and an analysis are carried out to check the performance of these methods for footprint recognition system. Probably, this paper presents the first endeavor to compare the results of implementation of above mentioned feature extraction methods for personal recognition using touchless human footprints. Experimental results reveal that the best performance is shown by SIFT feature extraction method which shows more match points as well as takes lesser amount of time to recognize with 97% recognition accuracy. Keywords Biometrics · Footprint recognition · Feature extraction · LBP · LPQ · SIFT and SURF
1 Introduction In today’s fast paced technology driven era, identity theft is the global budding problem. The major cause behind this fraud is the excessive and insecure usage of Internet and online computer applications. To remedy this problem of identity theft, biometrics, a special stream of computer science came into existence which led to the development of an automatic recognition system by applying statistical analysis on biological data of human characteristics such as palmprint, fingerprint, ear, voice, face, handwritten signatures, gait, or iris. It is God gifted that biometric attributes of every person, which represent their physiological or behavioral characteristics, are unique and assessable. Since, biometric identities are inherent to an individual; so, any attempt to steal, share, distribute, imitate, forge, or hack can become futile [1]. Also, the physical presence of the user is mandatory during identification process. A. Gupta (B) · D. Raj Babasaheb Bhimrao Ambedkar University, Lucknow, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 G. Sanyal et al. (eds.), International Conference on Artificial Intelligence and Sustainable Engineering, Lecture Notes in Electrical Engineering 837, https://doi.org/10.1007/978-981-16-8546-0_35
423
424
A. Gupta and D. Raj
Therefore, individual automatic authentication and identification using biometrics are more, reliable, advantageous, and capable than traditional knowledge and tokenbased systems. Among many modalities used worldwide in biometrics for personal recognition, the foot biometry has got a little attention. Though, human footprints also bear similar distinctive features like fingerprints, palmprints, retina, ear, etc. The objective of this paper is to investigate the various techniques to extract those unique features of a person’s footprints and use them for verification, and/or identification of an individual. Among many pre-existent feature extraction methods, this paper performs a comparative study and analysis of four texture-based algorithms, namely local binary pattern (LBP), local phase quantization (LPQ), speeded up robust features (SURF), and scale invariant feature transform (SIFT), using touchless footprints for personal recognition. LBP is a potent tool to summarize texture description using statistical analysis by thresholding the neighboring pixels. While, LPQ quantizes the Fourier transform phase in local neighborhoods. SURF acts as a feature extractor and matcher method for the points of interest in any image that can be used in object recognition. SIFT is a method to drill distinguishing scale and rotation invariant features from images that are helpful for reliable matching between different views of an object/scene. Experimental outcomes reveal that implementation of SIFT feature extraction method shows the best performance results with respect to detected feature points, matching feature points, feature matching time, and recognition accuracy in comparison with LBP, LPQ, and SURF. Probably, this paper presents the first endeavor to compare the results of implementation of LBP, LPQ, SURF, and SIFT feature extraction methods using touchless human footprints for an individual recognition. The rest of the paper is structured as follows: The next section highlights the related research work in footprint biometry. In Sect. 3, a short introduction of background concepts of feature extraction methods, namely LBP, LPQ, SURF, and SIFT, is presented. The proposed approach is illustrated in Section 4. Section 5 discusses experimental results. It also analyzes and compares the results to find the best performing method. The concluding remarks and future scope are shared in the last section.
2 Related Work Kennedy [2] was the person who introduced the foot biometrics for the first time in 1980. He used inked footprints and extracted 38 local geometrical features, like toe lengths, inter toe angles, length, ball width, heel width, and foot polygon area by proposing 6 different methods like silhouette extraction, foot segmentation, minutiae extraction, etc. He advised that a person’s foot can also be researched due to its uniqueness and similar properties like human hand. His approach confined the recognition rates in the range of FMR of 28.91–1.35% and FNMR of 29.38–2.18% by implementing 6 proposed methods.
Comparative Analysis of Texture-Based Algorithms LBP, …
425
Nakajima et al. [3] introduced first novel method for footprint-based authentication for personal recognition. For robust image matching, other than Euclidean distance, normalization of the static footprints, both in direction and in position, was carried out. They achieved the recognition rate of 85%. In 2008, Uhl and Wild [4] performed personal identification using eigenfeet, ballprint, and foot features and presented the tecshniques for rotation invariant footprint verification system with the recognition rate of 97%. In 2011, Zheng et al. [5] proposed the gait recognition system for personal identification by using cumulative foot pressure image that contains 2D information of spatial and temporal changes of ground reaction force during one gait cycle. They presented an evaluation framework, benchmark algorithm for representing translation-invariant cumulative foot pressure image using PCA, LDA, LSC (localityconstrained sparse coding), inspired by convolutional deep neural network. Their experiments resulted that LSC+LDA+NN methods performed better in comparison with other methods with the improved recognition rates. Jia et al. [6] presented a newborn footprint recognition system using four orientation features like robust line orientation code, ordinal code, competitive code, and BOCV, with the rates of recognition up to 98%. Ambeth et al. [7] and Nagvanshi and Dubey [8] explored the various well-known approaches for footprint recognition prevalent at that time like hidden Markov model (HMM), self-organizing map (SOM) algorithm, ART2 algorithm, trace transform technique, statistical method, fuzzy logic, neural network, smoothing algorithm, PCA, best basis methods or wavelet packets, linear complexity algorithms, singular value decomposition (SVD), and modified Haar energy (MHE) method, etc. In 2012, Zheng et al. [9] proposed a two-modality cascade fusion-based human identification system using gait images and corresponding cumulative foot pressure images, which is invariant to slight environmental changes and the users. Ambeth et al. [10] presented an effective footprint recognition system (FPRS) for temples to guard against theft and children frauds in government hospitals like abduction, baby switching, etc. A dynamic database, containing personal details and the infant footprint images, was prepared. They attached a unique bar code with each record to ensure security and faster searching. They also presented contrast enhancement techniques for enhancing the quality of foot images to give better results during feature extraction and recognition phases. Zhou et al. [11] put forward a novel method for face recognition (GLL) using Gabor filter, local binary pattern (LBP), and local phase quantization (LPQ) by utilizing texture information and the blur invariant property. Experiments performed on CMU-PIE and Yale B databases reveal the effectiveness of GLL compared to other related methods with recognition accuracy of 71.9% and 90.7%, respectively. In 2013, Khamael Abbas Al-Dulaimi [12] computed the determinant value of footprint image using 3X3 blocks, followed by thresholding. Finally, calculated Euclidean distance and used MSE to compare the results. He attained up to 65% recognition rate. Balameenakshi et al. [13] used multimodal biometrics footprints of newborns and mother fingerprints for identification and verification, respectively, using circular 2D Gabor filter, morphology, and ROI extraction. They extracted the
426
A. Gupta and D. Raj
texture feature for footprint recognition and for fingerprint of the mother effective minutiae matching algorithm was used. Heesung et al. [14] proposed a multimodal biometric system by amalgamating footprint and gait features for human identification which automatically divides the gait cycle using footprints and employed decision level fusion to improve accuracy. Moorthy et al. [15] revealed stature estimation from footprint measurements like phalange marks, features of the toes, flatfoot condition, humps in the toe line, corns, cracks, etc., using regression analysis in Indian Tamils. In 2014, Ambeth et al. [16] proposed a touch less footprint recognition system using PCA for feature extraction and recognition and SVM for classification. The recognition accuracy achieved was 93.02%. Moorthy et al. [17] explored individualizing characteristics of footprints of Malaysian Malays viz. crease marks, phalange marks, toe features, humps in the toe line, pits, cracks, corns, etc., for personal identification. Snehlata et al. [18] proposed foot and ear modalities to develop a multimodal biometric system. Implementation of eigen image classifier for ear and modified sequential Haar transforms classifier for foot with the match score level fusion was done that resulted 0.13 recognition match score. In 2015, Ambeth et al. [19, 20] proposed a more robust automated footprint recognition system by computing various footprint parameters such as Toe-Index, BallFoot Index, Heel-Index, and Instep-Foot Index in conjunction with neural network, achieving more improved recognition accuracy of 97.43%. Khokher et al. [21] proposed PCA and ICA feature extraction methods for personal identification using footprints. In 2016, Khokher et al. [22] used scanning techniques to calculate various parameters like weight, height, foot length, and body mass index for footprint recognition. Experiments showed an effective correlation between height and weight, height and foot length, and height and toes. Bhushan and Saravanan [23] implemented SIFT and SURF as scale and rotation invariant feature extraction methods in palmprint verification system. Results claimed that SURF is a better method for biometric applications due to more accuracy and lesser amount of time; though it detected lesser number of features compared to SURF. Hashem et al. [24] presented a novel method for personal identification using 16 features and recognition using ANN with the recognition rates up to 95.2%. In 2017, Khokher and Singh [25] proposed Dactyloscopy technique for biometric authentication with an accuracy of 78.8%. A restraint of their research is that acquisition of all the footprint images needed to have only one standing position. The Nagvanshi and Dubey [26] explored 27 distinguishing features of human footprints using deep analytics by implementing BigML and IBM Watson analytics processes with the integration of 100 fuzzy rules for the predictive analysis and attained 97% recognition rate. Bhushan and Saravanan [27] proposed a touchless biometric technique for palmprint verification system using Gabor filter, LBP, and its variants LBP-U, LBP-RI, and LBP-RIU. They used IITD and CASIA palmprint databases for experimentation. The Gabor filter based on rotation invariant uniform local binary pattern (GFLBP-RIU) produced the best total success rate (TSR) as 99.25 and 99% in IITD and CASIA database, respectively, and took 0.88 s for palmprint matching.
Comparative Analysis of Texture-Based Algorithms LBP, …
427
Liu et al. [28] proposed deep convolutional neural network based a new minutiae descriptor for infant footprint recognition with improved performance. In 2018, Nagvanshi [29] presented a detailed review on personal identification using two different ways of foot, namely human footprints and gait behavior. In this paper, he explored morphological and statistical footprint features in detail and claimed that footprint-based method is better than gait-based system for personal recognition. Further, in [30], he also introduced a generic matcher framework for footprint biometric with the error rates confined to level ±1%. Keatsamarn [31] presented deep learning-based footprint recognition for personal identification. Foot pressure is used with the amalgamation of convolutional neural networks, and 92.69% recognition rate was observed. In 2019, Ibrahim et al. [32] developed a heuristic strategy-based feature selection algorithm by assimilation of feature extraction methods and ant colony optimization (ACO) technique. The experimental outcomes claimed 98.8–100% recognition rate. Abuqadumah et al. [33] investigated the implementation of 5 deep transfer learning models for footprint recognition, namely AlexNet, VGG16, VGG19, GoogleNet, and Inception v3. Accuracy and the computational time were the performance measures to compare test results. Experimental results demonstrated that the highest accuracy of 98.52% was achieved by Inception v3 model but it took the largest time to execute in comparison with the other four models.
3 Background 3.1 Local Binary Pattern (LBP) Timo Ojala et al. introduced LBP as an effective technique of describing texture information based on statistical analysis of local neighborhoods and for representing the discriminative information. It works effectively as a texture analyzer and is used in many realms like object identification, classification, and recognition. LBP possesses many important properties like gray level invariance, computational simplicity, rotation invariance, arbitrary circular neighborhoods, uniform patterns, robustness to illumination variations, and to localization errors. In this scheme, a decimal number called as LBP code or LBP Label is assigned to every image pixel which is calculated by binary thresholding the gray levels of its surrounding neighboring pixels with the pixel itself in the center. For example, considering 3×3 neighborhood of a pixel, its 8 bit LBP binary code gets generated by thresholding gray level values of 8 surrounding pixels. Likewise, applying this strategy on every image pixel generates LBP image which contains local micro texture information as LBP features. Furthermore, 256-bins histogram of the corresponding LBP labels generated is also used to describe the distinctive texture information. Mathematically, the value of LBP code of a pixel positioned at (x c , yc ) is defined as below:
428
A. Gupta and D. Raj
LBPP,R (xc , yc ) =
7
s(gp − gc )2n
n=0
Also, sign function is given as s(x) =
1, if x ≥ 0; 0, otherwise
(1)
where gc is the gray level value of central pixel of interest positioned at (x c , yc ), gp is the gray level value of one of its P surrounding neighboring pixels located on a circle with the radius R. s(x) represents the sign function that governs the binary pattern for LBP code to be generated. Figures 1 and 2 show an example of calculation and functioning of basic LBP operator: Uniform and non-uniform are the two main variations of LBP patterns. In “uniform” LBP (LBP-U) binary form, there are at most 2 bit transitions from 0 to 1 and from 1 to 0 while considering it as a circular bit string. They are used to trim down the size of LBP feature vectors. For example, 11100001, 001110000, and 00000000 are uniform LBP patterns. In the histogram of LBP-U, uniform patterns have separate distinct bins while only a single bin is assigned to all other non-uniform patterns.
Fig. 1 LBP code calculation
Fig. 2 LBP image and histogram
Comparative Analysis of Texture-Based Algorithms LBP, …
429
For the P bits uniform LBP pattern, the total output bins produced = P (P − 1) + 3. Hence, it can be observed that for 8 bit LBP-U, 28 = 256-bins histogram sequence can be confined to 59- bins (8*(8 − 1) +3) histogram sequence with uniform patterns as a feature vector. In texture images, it is observed that by considering (8, 1) neighborhood, uniform patterns describe about 90% of the overall patterns and approximately 70% by considering (16, 2) neighborhood [27].
3.2 Local Phase Quantization (LPQ) In 2008, LPQ was introduced by Ojansivu et al. as an effective illumination and blur tolerant feature descriptor method for texture classification. It is based on assessing phase information of discrete Fourier transform (DFT) calculated in local image windows at each pixel positions. By considering the four low-frequency points, analysis of their phases in 8-dimensional space is performed to compute 8 bit binary LPQ code and the corresponding histogram gets generated to be used as a feature vector for classifying the texture [35]. In discrete form, the blurring effects are given as a 2D convolution ⊗ between the given image f (x) and the point spread function (PSF) p(x): g(x) = f (x) ⊗ p(x)
(2)
where x is a vector of coordinates [x, y]T and g(x) is the resulting blurred image. The same can be expressed in the Fourier domain as G(u) = F(u) ⊗ P(u)
(3)
where the discrete Fourier transforms (DFTs) of the original image f (x), the blurred image g(x) and the blur PSF p(x) are F(u), G(u), and P(u), respectively, and u represents the 2D frequency [u, v]T . The amplitude spectrum and phase spectrum of (2) can be extracted as |G(u)| = |F(u)| · |P(u)| and ∠G(u) = ∠F(u) + ∠P(u)
(4)
By assumption, the blur PSF p(x) is an even and non-negative function, i.e., h(x) = h (−x), consequently, its phase is a binary valued function, given as below: ∠P(u) =
0, if P(u) ≥ 0 1, if P(u) < 0
(5)
It implies that, ∠G(u) = ∠F(u) for all P(u) ≥ 0. Hence, the phase angles at low frequencies with positive P(u) can be considered as blur invariant. This property is used by LPQ to extract the local phase information using short-term Fourier transform
430
A. Gupta and D. Raj
(STFT). It is calculated for every pixel position x of the image f (x) over MXM neighborhood N x and is defined as F(u, x) =
f (x − y)e−2πu
T
yj
(6)
y∈N X
In LPQ, calculation of local Fourier coefficients is done by considering only four 2D low frequencies u1 = [a, 0]T , u2 = [0, a]T , u3 = [a, a]T , u4 = [a, −a]T , where a is scalar to satisfy P(ui ) > 0 (see Fig. 3). As a result, at every pixel position, a vector F(x) gets generated as F(x) = [F(u 1 , x), F(u 2 , x), F(u 3 , x), F(u 4 , x)] and Fx = [Re{ Fu1 , Fu2 , Fu3 , Fu4 }, Im{ Fu1 , Fu2 , Fu3 , Fu4 }]
(7)
where F x results in 8 bit vector containing real and imaginary components of each of four frequency constituents of F(x) [35]. Next, the resulting vector is quantized using simple binary quantizer as below: qj =
1, if f j ≥0 0, if f j 1, = 0, 1, = Pπ(2) + I D min = ⎪ Pπ(1) − b, i f I Dmin = 0, ⎪ ⎩ Pπ(1) − 1, i f I Dmin < 0
(3)
In this embedding process, value of Pπ(1) can be decreased or remain as it is. Thus, the original arranged order of the pixels in the increasing order does not change even after carrying-out modifications in the pixels, it ensures the restoring of host image in lossless manner after extracting the secret data bits.
3 Proposed Technique In the proposed technique, an input color image is taken in YCbCr color space to separate its luminance (Y) component and chrominance (Cb, Cr) components. In general, the luminance component and chrominance component are uncorrelated to each other, so that it can be processed separately. Each of the three components is separately processed for data hiding as per the I-PVO technique [14]. Initially, first color component is taken and then, a first image block of 4 × 4 size as manifested in Fig. 1 is processed for secret data embedding. To determine the block is smooth or complex, standard deviation of surrounding pixels (manifested in light gray shade in Fig. 1) of 2 × 2 sub-block (manifested in dark gray shade in Fig. 1) is calculated. If the calculated standard deviation of the surrounding pixels is less then within a predefined threshold, then the 2 × 2 sub-block is accounted for embedding purpose. Modification rules for the sub-block BLCi will be as below, if each BLCi
PVO Based Reversible Secret Data Hiding Technique …
457
Fig. 1 Input block of 4 × 4 size
contains 4 pixels i.e., ((P1 , . . . P4 ), which are sorted to have (Pπ(1) , . . . Pπ(4) ). IDmin = Ps − Pt
(4)
IDmax = Pu − Pv
(5)
s = minium(π (1) − π (2)) and t = maxium(π (1) − π (2)), and u = minium(π (3) − π (4)) and v = maxium(π (3) − π (4)). Thus, the IDmin and IDmax values can be positive (+ve) or negative (−ve). In accordance to the values of IDmin and IDmax , a secret data bit with value either ‘0’ or ‘1’ is embedded by updating the IDmin and IDmax as per below equations:
IDmin
⎧ ⎪ ⎪ IDmin − b, ⎨ IDmin − 1, = ⎪ ID + b, ⎪ ⎩ min IDmin + 1,
if IDmin if IDmin if IDmin if IDmin
= 1, > 1, = 0, 1, IDmax = ⎪ ID − b, if IDmax = 0, ⎪ ⎩ max IDmax − 1, if IDmax < 0
= 1, > 1, = 0, 1, = 0, T will be detected. Value of T here is 50% as we get good results from such threshold value. “Precision is a measure of result relevancy, denoted by P, where Pm [0, 1]. It is defined as the number of true positives (tp) over the number of true positives plus the number of false positives (fp)” shown in Eq. 1. Precision(P) =
tp tp + fp
(1)
“Recall is a measure of how many truly relevant results are returned, denoted by R, where Rm [0, 1].” “Recall (R) is defined as the number of true positives (tp) over the number of true positives plus the number of false negatives (fn)” shown in Eq. 2. Recall(R) =
tp tp + fn
(2)
By merging precision and recall estimate measures, we can estimate the overall performance of each tool. So for this, we use F1 score.” “Harmonic mean of P and R is F1 score” shown in Eq. 2. Fig. 7 Confusion matrix
QOSCR: Quantification of Source Code Resemblance
F1 =
2∗ P ∗ R P+R
497
(3)
“System’s accuracy can be identified with the help of below formula” shown in Eq. 4. Accuracy =
tp + tn tp + tn + fp + fn
(4)
Figure 8 shows measure of performance value of QOSCR and DeSoCoRe [5] on a dataset of 100 Java files which are the results of experiment performed, and Fig. 9 displays analysis graph performance of both DeSoCoRe [5] and QOSCR and represents that precision, recall, F1-score, and accuracy of QOSCR is larger than DeSoCoRe [5] because DeSoCoRe [5] is based on N-gram methods on function of a source code, so in it, if we are changing name of variables of a function of a source code, adding or removing or changing comments of a function in source code and consider stylistic pattern in a function of a source code, then DeSoCoRe [5] will
Fig. 8 Performance measure value
Fig. 9 Performance measure value
498
M. Agrawal
say that both source codes are dissimilar, while both source are similar, so it is the limitation of DeSoCoRe [5], and these limitation are removed by QOSCR so the performance of QOSCR is better than DeSoCoRe [5].
5 Conclusion and Future Work The purpose of this paper is to represent a useful frame work for source code evaluators that make them easier to decide whether source code is unsatisfied and can be used in educational enterprises such as universities, colleges, and institutions, and it can be used in intellectual property rights (IPR). Here, we have proposed the QOSCR tool that detects similarities between source code written in the Java language and DeSoCoRe [5] (reuse source code) comparative experiments with our QOSCR. Results from contrast experiments suggest that new QOSCR tools for source code similarity can successfully improve source code parity. This allows comparing source code written in Java. In future, the performance of our work will be compared with existing works other than DeSoCoRe. In future, the work can be done to find similarity between codes in different programming environments other than Java. We thought in the near future to develop its functionality in more common programming languages. Our purpose as future work is to allow it at the piece level where the piece is treated as a method or group of methods. Acknowledgements We wish to thank all the teachers and software industries around the countries who actively contributed in this (QOSCR) tool.
References 1. Alsmadi I, AlHami I, Kazakzeh S (2014) Issues related to the detection of source code plagiarism in students assignments. Int J Softw Eng Its Appl 8(4):23–34 2. Hage J, Rademaker P, van Vugt N (2010) A comparison of plagiarism detection tools. Utrecht University, Utrecht, The Netherlands, p 28 3. Gondaliya TP, Joshi HD, Joshi H (2014) Source code plagiarism detection ‘SCPDet’: a review. Int J Comput Appl 105(17) 4. Bandara U, Wijayarathna G (2011) A machine learning based tool for source code plagiarism detection. Int J Mach Learn Comput 1(4):337 5. Flores E, Barrón-Cedeno A, Rosso P, Moreno L (2012) DeSoCoRe: detecting source code re-use across programming languages. In: Proceedings of the 2012 conference of the North American chapter of the association for computational linguistics: human language technologies: demonstration session. Association for Computational Linguistics, pp 1–4 6. Flores E, Barrón-Cedeno A, Rosso P, Moreno L (2011) Towards the detection of cross-language source code reuse. In: Natural language processing and information systems. Springer Berlin Heidelberg, pp 250–253 7. Jadalla A, Elnagar A (2008) PDE4Java: plagiarism detection engine for Java source code: a clustering approach. Int J Bus Intell Data Min 3(2):121–135
QOSCR: Quantification of Source Code Resemblance
499
8. Haider KZ, Nawaz T, ud Din S, Javed A (2010) Efficient source code plagiarism identification based on greedy string tilling. IJCSNS 10(12):204 9. Jadon S (2016) Code clones detection using machine learning technique: support vector machine. In: 2016 International conference on computing, communication and automation (ICCCA). IEEE 10. Karnalim O (2017) An abstract method linearization for detecting source code plagiarism in object-oriented environment. In: 2017 8th IEEE international conference on software engineering and service science (ICSESS). IEEE 11. Ragkhitwetsagul C, Krinke J, Marnette B (2018) A picture is worth a thousand words: code clone detection based on image similarity. In: 2018 IEEE 12th international workshop on software clones (IWSC). IEEE 12. Paji´c E, Ljubovi´c V (2019) Improving plagiarism detection using genetic algorithm. In: 2019 42nd international convention on information and communication technology, electronics and microelectronics (MIPRO). IEEE 13. Tukaram D (2019) Design and development of software tool for code clone search, detection, and analysis. In: 2019 3rd international conference on electronics, communication and aerospace technology (ICECA). IEEE 14. Cheers H, Lin Y, Smith SP (2020)Detecting pervasive source code plagiarism through dynamic program behaviours. In: Proceedings of the twenty-second Australasian computing education conference 15. Cheers H, Lin Y (2020) A novel graph-based program representation for Java code plagiarism detection. In: Proceedings of the 3rd international conference on software engineering and information management 16. Bowyer KW, Hall LO (1999) Experience using “MOSS” to detect cheating on programming assignments. In: 29th annual frontiers in education conference, 1999. FIE’99, vol 3. IEEE, pp 13B3-18 17. Jplag tool site. http://jplag.ipd.kit.edu. Last access on 7-08-2015 18. Bugarin A, Carreira M, Lama M, Pardo XM (2008) Plagiarism detection using software tools: a study in a computer science degree. In: 2008 European University information systems conference, Aarhus, Denmark, pp 72–1 19. Chen X, Francia B, Li M, Mckinnon B, Seker A (2004) Shared information and program plagiarism detection. IEEE Trans Inf Theory 50(7):1545–1551 20. Ðuri´c Z, Gaševi´c D (2012) A source code similarity system for plagiarism detection. The Comput J, bxs018 21. Gupta A, Singh S (2013) Lexical analysis for the measurement of conceptual duplicity between C Program. Ijraset 1(I), ISSN: 2321-9653 22. Ali AMET, Abdulla HMD, Snasel V (2011). Overview and comparison of plagiarism detection tools. In: DATESO, pp 161–172 23. https://en.wikipedia.org/wiki/Machine_learning. Last access on 7-08-2015 24. https://en.wikipedia.org/wiki/N-gram. Last access on 7-08-2015 25. http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining. htm. Last access on 7-08-2015 26. https://en.wikipedia.org/wiki/Latent_semantic_analysis. Last access on 7-08-2015 27. Maurer HA, Kappe F, Zaka B (2006) Plagiarism-a survey. J UCS 12(8):1050–1084 28. Bin-Habtoor AS, Zaher MA (2012) A survey on plagiarism detection systems. Int J Comput Theory Eng 4(2):185 29. Hattingh F, Buitendag AA, Van Der Walt JS (2013) Presenting an alternative source code plagiarism detection framework for improving the teaching and learning of programming. J Inf Technol Educ 12:45–58 30. Huang L, Shi S, Huang H (2010) A new method for code similarity detection. In: 2010 IEEE international conference on progress in informatics and computing (PIC), vol 2. IEEE, pp 1015–1018 31. Ji JH, Woo G, Cho HG (2008) A plagiarism detection technique for java program using bytecode analysis. In: ICCIT’08. Third international conference on convergence and hybrid information technology, 2008, vol 1. IEEE, pp. 1092–1098
500
M. Agrawal
32. Jian H, Fei L (2009) Quick similarity measurement of source code based on suffix array. In: CIS’09. International conference on computational intelligence and security, 2009, vol 2. IEEE, pp 308–311 33. Maskeri G, Karnam D, Viswanathan SA, Padmanabhuni S (2012) Version history based source code plagiarism detection in proprietary systems. In: 2012 28th IEEE international conference on software maintenance (ICSM). IEEE, pp 609–612 34. Clough P (2003) Old and new challenges in automatic plagiarism detection. National Plagiarism Advisory Service. http://ir.shef.ac.uk/cloughie/index.html