389 13 26MB
English Pages 865 [866] Year 2023
Lecture Notes in Networks and Systems 665
Jennifer S. Raj Isidoros Perikos Valentina Emilia Balas Editors
Intelligent Sustainable Systems Proceedings of ICISS 2023
Lecture Notes in Networks and Systems Volume 665
Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas—UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Türkiye Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).
Jennifer S. Raj · Isidoros Perikos · Valentina Emilia Balas Editors
Intelligent Sustainable Systems Proceedings of ICISS 2023
Editors Jennifer S. Raj Gnanamani College of Engineering and Technology Namakkal, Tamil Nadu, India
Isidoros Perikos Department of Computer Engineering and Informatics University of Patras Kato Kastritsi, Greece
Valentina Emilia Balas Aurel Vlaicu, University of Arad Arad, Romania
ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-981-99-1725-9 ISBN 978-981-99-1726-6 (eBook) https://doi.org/10.1007/978-981-99-1726-6 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
We are honored to dedicate the proceedings of 6th ICISS 2023 to all the participants, organizers and editors of 6th ICISS 2023.
Preface
With deep gratification, we are delighted to welcome you to the proceedings of the 6th International Conference on Intelligent Sustainable Systems (ICISS 2023) organized at SCAD College of Engineering and Technology, Tirunelveli, India, on 3– 4, February 2023. The major goal of this international conference is to gather the academicians, industrialists, researchers and scholars together in a common platform to share their innovative research ideas and practical solutions toward the development of intelligent sustainable systems for a more sustainable future. The conference delegates had a wide range of technical sessions based on different technical domains involved in the theme of conference. The conference program has included invited keynote sessions on developing a sustainable future, state-of-the-art research work presentations, and informative discussions with distinguished keynote speakers by covering a wide range of topics in information systems and sustainability research. This year, ICISS has received 327 papers in different conference tracks, and based on the 3–4 expert reviews from the technical program committee, internal and external reviewers, and 66 papers were finally selected for the conference. The entire conference proceedings include papers from different tracks like Intelligent Systems, Sustainable Systems and Applications. Each paper, regardless of the track, has received at least 3 reviews, who have professional expertise in the particular research domain of the paper. We are pleased to thank the conference organization committee, conference program committee and technical reviewers for working generously toward the success of the conference event. A special mention to the internal and external reviewers for working very hard in reviewing each and every paper received at the conference and for giving valuable suggestions to the authors for maintaining the quality of the conference. We are truly obliged to the authors, who have contributed their innovative research results to the conference. Special thanks go to Springer Publications for their impeccable support and guidance throughout the publication process.
vii
viii
Preface
We wish the proceedings of ICISS 2023 will give an enjoyable and technical– rewarding experience for both attendees and readers. Namakkal, India Kato Kastritsi, Greece Arad, Romania
Dr. Jennifer S. Raj Dr. Isidoros Perikos Dr. Valentina Emilia Balas
Contents
Image Processing-Based Solution to Repel Crop-Damaging Wild Animals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . W. P. S. Fernando, I. K. Madhubhashana, D. N. B. A. Gunasekara, Y. D. Gogerly, Anuradha Karunasena, and Ravi Supunya
1
SDN Framework for Efficient Latency-Aware Topology Discovery in ISTN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V. Deepa and B. Sivakumar
17
Recommendation System Based on Clustering Techniques Using Collaborative Filtering Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G. L. Swathi Mirthika and B. Sivakumar
29
Compartmented Proactive Secret Sharing Scheme . . . . . . . . . . . . . . . . . . . . Rolla Subrahmanyam, N. Rukma Rekha, and Y. V. Subba Rao
37
DevOps Challenges and Practices in Software Engineering . . . . . . . . . . . . T. Pandiyavathi and B. Sivakumar
49
The Effects of Climate Change on Crop Yield . . . . . . . . . . . . . . . . . . . . . . . . Daksh Patel, Breenda Das, and R. I. Minu
59
Finetuned-VGG16 CNN Model for Tissue Classification of Colorectal Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . T. E. Anju and S. Vimala
73
Effective Heart Disease Prediction and Classification Using Intelligent System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . P. Mohana Priya and Kannan Balasubramian
85
A Machine Learning Approach for Aeroponic Lettuce Crop Growth Monitoring System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . R. Gowtham and R. Jebakumar
99
ix
x
Contents
A Novel Approach for Privacy Preserving Technique in IoT Fog and Cloud Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Ravula Arun Kumar, Gillala Rekha, and Kambalapally Vinuthna IOT-Based Fertilizer Recommendation System Using a Hybrid Boosting Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Sri Silpa Padmanabhuni, J. Lakshmi Narayana, Konjeti Hema Lakshmi Bhavani, Vudathu Venkata Krishna sai Poojitha, Boggarapu Rupa, and Chirala Jaya Development of a System for Controlling IoT Devices Using Gaze Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 María Cristina Erazo, Edwin Cocha Tobanda, and Sang Guun Yoo Towards Human-Like Robotic Grasping for Industrial Applications Using Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 Surya Prasada Rao Borra, Rajendar Sandiri, Paparao Nalajala, Sankararao Majji, C. Saravanakumar, and K. Ramesh Chandra An Extensive Study on Unattended Object Detection in Video Surveillance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Padmaja Grandhe, Ponnuri Bhavani Dhanush, Muskaan Mohammad, Atmakuri Nikhita Alekhya Adhi Lakshmi, and Chakka Venkata Sai Rohit Kumar Analysis of Digital Data on Social Network TikTok During COVID-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 Andrea Maricielo Malca-Liza, Angelo Aldahir Solano-García, Adriana Margarita Turriate-Guzman, and Luis-Rolando Alarcón-Llontop An Hybrid Edge Algorithm for Vehicle License Plate Detection . . . . . . . . 209 Madhurya Mozumder, Souharda Biswas, L. Vijayakumari, R. Naresh, C. N. S. Vinoth Kumar, and G. Karthika Gesture Controlled Drone Swarm System for Violence Detection Using Machine Learning for Women Safety . . . . . . . . . . . . . . . . . . . . . . . . . . 221 S. Gunasundari, K. R. Rakhul, V. Ananth Sai Shankar, A. R. Sathiyan, Ragavendiran Balasubramanian, and Yedhu Krishnan Healthcare System Modeling and Security Engineering . . . . . . . . . . . . . . . 237 Cynthia Jayapal and Swetha Srinivasan Hierarchical Classification of Disaster News Using Local Classifier per Parent Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 Aakalpa Aryal, Sampanna Sharma, Shrawan Kumar Thapa, Sudeep Bhandari, Aman Shakya, and Sanjeeb Prasad Panday
Contents
xi
Working from Home During a Pandemic: The Impact COVID-19 Had on Software and Web Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 Izan Khan, Mubashir Naqvi, Jon Cathcart, Terrance Gainer, Josh Dolph, and Tauheed Khan Mohd Stochastic Processes with Trend Stationarity in High-Clustered Growth Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 Sergei Sidorov, Sergei Mironov, and Sophia Tikhonova Smart Vest for Women Undergoing Menopause . . . . . . . . . . . . . . . . . . . . . . 293 R. Priyakanth, N. M. Sai Krishna, Mahesh Babu Katta, Kacham Akanksha, Jonnalagadda Hemasree, and Sehaba Banu Shaik Analysis of Digital Data About the Preference of Audiovisual Audiences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 Vanessa Andrea Tamayo-Sanchez, Adriana Margarita Turriate-Guzman, and Dalia Rosa Bravo-Guevara Cloud Infrastructure Security Using a Hybrid AES Encryption Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 J. Ramadevi, M. S. Murali Dhar, Y. Kasiviswanadham, Sankararao Majji, and Dhiraj Kapila Development of Deep Learning Based Models for Detecting the Significance of Non-Manual Parameters for Indian Sign Language Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 P. Kola Sujatha, P. Jayanthi, M. Sandhiya, K. S. Sowbarnigaa, and G. Keshini The Effect of Covid—19 Pandemic on the Use of Vehicle Rental Technology (Ojek) Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341 Yulius Efendy, Muhammad Heru Syaputra, Kevin Honggiarto, Ford Lumban Gaol, and Tokuro Matsuo Video Tampering Detection in Real Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 Lakshmi Harika Palivela, D. Bala Gayathri, and R. Shanmuga Priya Research Paradigms for Health Equity in Intelligent Mobile Healthcare Technologies: A Critical Review . . . . . . . . . . . . . . . . . . . . . . . . . . 365 Ggaliwango Marvin, Nakayiza Hellen, and Joyce Nakatumba-Nabende YOLOv7-Based Model for Detecting Safety Helmet Wear on Construction Sites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377 B. P. Athidhi and P. Smitha Vas Analysis of Digital Data on Communication Strategies in Companies During COVID-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393 Nahomy Maria De Los Angeles Leon-Dextre, Katherine Valeria Salazar-Vegas, Yaritza Zarait Fernandez-Saucedo, Adriana Margarita Turriate-Guzman, and Dalia Rosa Bravo-Guevara
xii
Contents
A Survey on Hiding Data Using Video Steganography . . . . . . . . . . . . . . . . . 405 Sk. Sameerunnisa and Orchu Aruna Review on Deep Learning Algorithms for Object Detection . . . . . . . . . . . . 421 R. Sasirekha, J. Jeyshri, A. TinaVictoria, J. Subha, and P. Kamaleswari Wireless Hand Gesture Controlled Robot Using STM-32 Microcontroller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429 Mandar Godambe, Anand D. Mane, Sarvesha Shinde, and Kaustubh Mhatre Optimal Software Based Sign Language Recognition System . . . . . . . . . . 439 Avinash Golande, Shaikh Mohammed Abuzar, Yash Patange, Aditya Mohite, and Shubham Palke Dynamic Weighted Feature Subset Logistic Regression Model for Heart Disease Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455 P. Sandhya Krishna, M. Sai Ramya Sri, R. Sri Lakshmi Triveni, T. Sivathmika, and R. Kanishka Lane, Car, Traffic Sign and Collision Detection in Simulated Environment Using GTA-V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465 Rakhi Bharadwaj, Pratham Gajbhiye, Atharva Rathi, Atharva Sonawane, and Rucha Uplenchwar An IoT-Enabled Smart Network Traffic Signal Assistant System for Emergency Vehicles Using Computer Vision . . . . . . . . . . . . . . . . . . . . . . 477 G. A. Senthil, R. Prabha, S. Suganthi, S. Sridevi, and N. Shanthi Analysis of Current Technology Advancement Effects with Industrial Revolution 4.0 (IR4.0) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491 Irfan Haziq Asnan, Khairulazlan Othman, Siti Juleya Awang Osman, Mohd Lutfil Hadi Mohd Hamzah, and Maslin Masrom Tackling IoT Security Challenge by Metaheuristics Tuned Extreme Learning Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507 Luka Jovanovic, Masa Gajevic, Milos Dobrojevic, Nebojsa Budimirovic, Nebojsa Bacanin, and Miodrag Zivkovic A Hybrid Post-Quantum Cryptography Driven Key Exchange Scheme for Cloud Computing Environments . . . . . . . . . . . . . . . . . . . . . . . . . 523 Mr. Shaik Mohammad Ilias and V. Ceronmani Sharmila A Case Study: Disease Code (ICD-10) Classification in Turkish Medical Summary Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537 Damla Busra Ozsonmez and Tankut Acarman
Contents
xiii
The Effects of Implementing a Library Information System on the Increase in Library Use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547 Jason Sim, Alfakhri Rizqulloh Wijayakusuma, Rifky Rivaldy, Ford Lumban Gaol, and Tokuro Matsuo EasyE-Waste: A Novel Approach Toward Efficient and Sustainable E-Waste Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557 Kazi Shawpnil, Sami Nayeem, Farhana Hossain, Arafat Dayan, and Md. Motaharul Islam A Survey on Gene Classification Based on DNA Sequence . . . . . . . . . . . . . 573 B. V. Kiranmayee, Chalumuru Suresh, K. Sneha, L. K. Srinivas Karthik, P. Niharika, and P. Sai Rohan Creation of a Virtual Environment for Analysis of Historical Processes Related to Life of I.V. Michurin in Russia . . . . . . . . . . . . . . . . . . . 587 V. Nemtinov, A. Borisenko, V. Morozov, K. Nemtinov, and Yu. Protasova Analysis of Digital Data on Advergame Trends in Advertising . . . . . . . . . 597 Katherine Lizet Oyola-Enciso, Cynthia Lizzeth Afaraya-Sinacay, Adriana Margarita Turriate-Guzman, and Dalia Rosa Bravo-Guevara NFTrig: Using Blockchain Technologies for Math Education . . . . . . . . . . 609 Jordan Thompson, Ryan Benac, Kidus Olana, Talha Hassan, Andrew Sward, and Tauheed Khan Mohd Brain Computer Interface for Stroke Psychotherapy: Intonation of Cortical High-Strung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625 Sankari Subbiah, G. Adiline Macriga, G. Sudha, and S. Saranya Agricultural Image Classification Using Deep Learning Neural Networks with Transfer Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . 643 Anant Gavali and Krishna Kumar Singh Driving Style Classification Using Deep Learning Techniques . . . . . . . . . . 653 Apurva Ajay Mohite, S. S. Patil, and A. S. Mali The Implications of Electronics Money on the Digital Payment Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 667 Aulia Sisca Rahmadiyanti, Ghina Kamilah, Salma Nurul Hanifah, Ford Lumban Gaol, and Tokuro Matsuo Accident Detection Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 679 Rakhi Bharadwaj, Manthan Tagad, Tejas Katkade, Aniket Ukarde, and Shritej Joshi Extracting Data from an Image Data Set Using Image Processing Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 689 D. Saravanan and K. V. S. S. N. Narasimha Murty
xiv
Contents
The Model of Personalized Machine Learning to Enhance Students’ Achievements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 701 Mikail Rifqi Rusdi, T. S. Gregorio Enrico, Austin Ordell Salomo, Ford Lumban Gaol, and Tokuro Matsuo Social Media Account Hacking Using Kali Linux-Based Tool BeEF . . . . 713 Christopher Le, Rim Nassiri, Estephanos Jebessa, Jon Cathcart, and Tauheed Khan Mohd Benefits, Challenges, and Future Research Directions for Blockchain-Based Agri-Food Supply Chain . . . . . . . . . . . . . . . . . . . . . . . 725 Madhuri S. Arade and Nitin N. Pise Analysis of Digital Data About Digital Journalism . . . . . . . . . . . . . . . . . . . . 739 Luisa Andreina Manavi-Cordova, Carla Milagros Rosas-Quintana, Adriana Margarita Turriate-Guzman, and Dalia Rosa Bravo-Guevara Visualization of MSMEs’ Contribution Towards Sustainable Economic Development Focusing on Gross Domestic Product . . . . . . . . . . 751 Md. Motahar Hossain and Nitin Pathak Design and Development of Internet of Things-Based Bio-signal Acquisition Device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 761 A. Paramasivam, S. Vijayalakshmi, Pittu Pavan Sai Kiran Reddy, K. S. Mohamed Thoufeek, Juliana Johari, and Th. Rupachandra Singh UAVs in Green Health Care for Energy Efficiency and Real-Time Data Transmission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773 Anika Khaer, Md. Siam Hossain Sarker, Proma Hossain Progga, Saniyat Mushrat Lamim, and Md. Motaharul Islam Radio Frequency Identification (RFID) Identification System for Small Wooden Traditional and Fishing Boats Less Than 7 Gross Tonnages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 789 Diaz Saputra, Ford Lumban Gaol, Edi Abdurachman, Dana Indra Sensuse, and Tokuro Matsuo Determining Costs with Fuzzy Logic: The Example of a Construction Company . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 805 Mubariz Bagirov, Rahib Imamguluyev, and Amirxan Pashayev Password Hacking Analysis of Kali Linux Applications . . . . . . . . . . . . . . . 815 Jon Cathcart and Tauheed Khan Mohd Study on the Encryption and Decryption Capabilities of Hybrid Techniques for Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 829 Nongmeikapam Thoiba Singh, Rahul Dayal, Divyansh Kanwal, and Aishwarya Bhardwaj
Contents
xv
Analysis of Power Quality Issues and Mitigation Techniques Using HACO Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 841 Balasubbareddy Mallala, P. Venkata Prasad, and Kowstubha Palle Analytical Determination of Base Thickness and Diffusion Length in Back Contact Solar Cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 853 Ajit Singh, Nitish Kumar Ojha, Sanjai Kumar, and Neeraj Tyagi Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 865
Editors and Contributors
About the Editors Dr. Jennifer S. Raj received the Ph.D. degree from Anna University and Master’s degree in Communication System from SRM University, India. Currently, she is working in the Department of ECE, Gnanamani College of Technology, Namakkal, India. She is Life Member of ISTE, India. She has been serving as Organizing Chair and Program Chair of several international conferences and in the program committees of several international conferences. She is Book Reviewer for Tata McGraw Hill Publication and publishes more than fifty research articles in the journals and IEEE conferences. Her interests are in wireless healthcare informatics and body area sensor networks. Dr. Isidoros Perikos holds a Ph.D. in Artificial Intelligence form the University of Patras, Greece, an M.Sc. in Computer Science and Technology, from the Computer Engineering and Informatics Department at the University of Patras. He has an Engineering Diploma (5-year program, M.Eng.) in Computer Engineering and Informatics, Computer Engineering and Informatics Department at University of Patras. His research interest includes artificial intelligence, web intelligence, natural language processing & understanding, human–computer interaction, intelligent systems and affective computing. He has published more than 100 papers in international journals and conferences. Dr. Valentina Emilia Balas is currently Full Professor at “Aurel Vlaicu” University of Arad, Romania. She is Author of more than 300 research papers. Her research interests are in intelligent systems, fuzzy control, and soft computing. She is Editorin Chief to International Journal of Advanced Intelligence Paradigms (IJAIP) and to IJCSE. Dr. Balas is Member of EUSFLAT, ACM, and a SM IEEE, Member in TC– EC and TC-FS (IEEE CIS), TC–SC (IEEE SMCS), and Joint Secretary FIM.
xvii
xviii
Editors and Contributors
Contributors Edi Abdurachman Department of Computer Science, Bina Nusantara University, Jakarta, Indonesia Shaikh Mohammed Abuzar RSCOE, Pune, India Tankut Acarman Department of Computer Engineering, Galatasaray University, Istanbul, Turkey G. Adiline Macriga Department of Information Technology, Sri Sai Ram Engineering College, Chennai, India Cynthia Lizzeth Afaraya-Sinacay Universidad Privada del Norte, Lima, Peru Kacham Akanksha BVRIT HYDERABAD College of Engineering for Women, Hyderabad, Telangana, India Luis-Rolando Alarcón-Llontop Universidad Privada del Norte, Lima, Peru T. E. Anju Department of Computer Science, Mother Teresa Women’s University, Kodaikanal, India Madhuri S. Arade School of Computer Engineering & Technology, Dr. Vishwanath MIT World Peace University, Pune, Maharashtra, India Orchu Aruna Vasireddy Venkatadri Institute of Technology, Namburu, Andhra Pradesh, India Aakalpa Aryal Institute of Engineering, Tribhuvan University, Lalitpur, Nepal Irfan Haziq Asnan Razak Faculty of Technology and Informatics, Universiti Teknologi Malaysia, Kuala Lumpur Malaysia, Malaysia B. P. Athidhi LBS Institute of Technology for Women, Thiruvananthapuram, Kerala, India Nebojsa Bacanin Singidunum University, Belgrade, Serbia Mubariz Bagirov Western Caspian University, Baku, Azerbaijan D. Bala Gayathri Madras Institute of Technology, Anna University, Chennai, India Ragavendiran Balasubramanian Heptre Technologies, Chennai, India Kannan Balasubramian School of Computing, SASTRA Deemed University, Thanjavur, India Ryan Benac Department of Math and Computer Science, Augustana College, Rock Island, IL, USA Sudeep Bhandari Institute of Engineering, Tribhuvan University, Lalitpur, Nepal
Editors and Contributors
xix
Rakhi Bharadwaj Department of Computer Engineering, Vishwakarma Institute of Technology, Pune, India Aishwarya Bhardwaj Department of Computer Science and Engineering, Chandigarh University, Punjab, India Konjeti Hema Lakshmi Bhavani Department of Computer Science and Engineering, PSCMR College of Engineering and Technology, Vijayawada, A.P, India Souharda Biswas Department of Networking and Communications, College of Engineering and Technology, SRM Institute of Science and Technology, Kattankulathur, Chennai, Tamil Nadu, India A. Borisenko Department of Computer-Integrated Systems in Mechanical Engineering, Tambov State Technical University, Tambov, Russian Federation Dalia Rosa Bravo-Guevara Universidad Privada del Norte, Lima, Peru Nebojsa Budimirovic Singidunum University, Belgrade, Serbia Jon Cathcart Department of Math and Computer Science, Augustana College, Rock Island, IL, USA K. Ramesh Chandra Department of Electronics and Communication Engineering, Vishnu Institute of Technology, Kovvada, Andhra Pradesh, India Edwin Cocha Tobanda Departamento de Informática y Computación, Escuela Politécnica Nacional, Quito, Ecuador; Smart Lab, Escuela Politécnica Nacional, Quito, Ecuador
Ciencias
de
la
Breenda Das Department of Computing Technologies, SRM Institute of Science and Technology, Chennai, India Rahul Dayal Department of Computer Science and Engineering, Chandigarh University, Punjab, India Arafat Dayan United International University, Dhaka, Bangladesh V. Deepa Department of Computing Technologies, SRM Institute of Science and Technology, Chennai, Tamilnadu, India Ponnuri Bhavani Dhanush Potti Sriramulu Chalavadi Mallikarjunarao College of Engineering and Technology, Vijayawada, AP, India M. S. Murali Dhar Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, India Milos Dobrojevic Singidunum University, Belgrade, Serbia Josh Dolph Department of Math and Computer Science, Augustana College, Rock Island, Illinois, USA Yulius Efendy Bina Nusantara University, Alam Sutra, School of Information System, Kota Tangerang, Indonesia
xx
Editors and Contributors
T. S. Gregorio Enrico Binus Undergraduate Program - Computer Science, Bina Nusantara University, Jakarta, Indonesia María Cristina Erazo Departamento de Informática y Computación, Escuela Politécnica Nacional, Quito, Ecuador; Smart Lab, Escuela Politécnica Nacional, Quito, Ecuador
Ciencias
de
la
Yaritza Zarait Fernandez-Saucedo Universidad Privada del Norte, Lima, Peru W. P. S. Fernando Faculty of Computing, Sri Lanka Institute of Information Technology, Malabe, Sri Lanka Terrance Gainer Department of Math and Computer Science, Augustana College, Rock Island, Illinois, USA Pratham Gajbhiye Department of Computer Engineering, Vishwakarma Institute of Technology, Pune, India Masa Gajevic Singidunum University, Belgrade, Serbia Ford Lumban Gaol Computer Science Department, BINUS Graduate Program – Doctor of Computer Science, Bina Nusantara University, Jakarta, Indonesia Anant Gavali Symbiosis Centre for Information Technology, Pune, India Mandar Godambe Sardar Patel Institute of Technology, Mumbai, India Y. D. Gogerly Faculty of Computing, Sri Lanka Institute of Information Technology, Malabe, Sri Lanka Avinash Golande RSCOE, Pune, India R. Gowtham Department of Computing Technologies, School of Computing, SRM Institute of Science and Technology, Kattankulathur, Chengalpattu, India Padmaja Grandhe Department of Computer Science and Engineering, Potti Sriramulu Chalavadi Mallikarjunarao College of Engineering and Technology, Vijayawada, AP, India D. N. B. A. Gunasekara Faculty of Computing, Sri Lanka Institute of Information Technology, Malabe, Sri Lanka S. Gunasundari Department of CSE, Velammal Engineering College, Chennai, India Salma Nurul Hanifah School of Information System, Bina Nusantara University, Alam Sutera, Indonesia Lakshmi Harika Palivela Madras Institute of Technology, Anna University, Chennai, India Talha Hassan Department of Math and Computer Science, Augustana College, Rock Island, IL, USA
Editors and Contributors
xxi
Nakayiza Hellen Makerere University, Kampala, Uganda Jonnalagadda Hemasree BVRIT HYDERABAD College of Engineering for Women, Hyderabad, Telangana, India Kevin Honggiarto Bina Nusantara University, Alam Sutra, School of Information System, Kota Tangerang, Indonesia Farhana Hossain United International University, Dhaka, Bangladesh Md. Motahar Hossain Department of Business Management, University School of Business, Chandigarh University, Mohali, Punjab, India Mr. Shaik Mohammad Ilias Department of Computer Science and Engineering, Hindustan Institute of Technology and Science, Chennai, India Rahib Imamguluyev Odlar Yurdu University, Baku, Azerbaijan Md. Motaharul Islam Department of Computer Science and Engineering, United International University, Badda, Dhaka, Bangladesh Chirala Jaya Department of Computer Science and Engineering, PSCMR College of Engineering and Technology, Vijayawada, A.P, India P. Jayanthi Department of Information Technology, MIT Campus, Anna University, Chennai, India Cynthia Jayapal Department of Computer Science and Engineering, Kumaraguru College of Technology, Coimbatore, Tamil Nadu, India R. Jebakumar Department of Computing Technologies, School of Computing, SRM Institute of Science and Technology, Kattankulathur, Chengalpattu, India Estephanos Jebessa Department of Math and Computer Science, Augustana College, Rock Island, IL, USA J. Jeyshri Department of CSE, SRM Institute of Science and Technology, Chengulpattu, India Juliana Johari Universiti Teknologi MARA, Shah Alam, Selangor, Malaysia Shritej Joshi Vishwakarma Institute of Technology, Pune, India Luka Jovanovic Singidunum University, Belgrade, Serbia Siti Juleya Awang Osman Razak Faculty of Technology and Informatics, Universiti Teknologi Malaysia, Kuala Lumpur Malaysia, Malaysia P. Kamaleswari Department of CSE, SRM Institute of Science and Technology, Chengulpattu, India Ghina Kamilah School of Information System, Bina Nusantara University, Alam Sutera, Indonesia
xxii
Editors and Contributors
R. Kanishka Department of IT, Vignan’s Nirula Institute of Technology and Science for Women (VNITSW), Guntur, Andhra Pradesh, India Divyansh Kanwal Department of Computer Science and Engineering, Chandigarh University, Punjab, India Dhiraj Kapila Department of CSE, Lovely Professional University, Phagwara, India G. Karthika Department of Electronics and Communication Engineering, M.I.E.T Engineering College, Trichy, Tamil Nadu, India Anuradha Karunasena Faculty of Computing, Sri Lanka Institute of Information Technology, Malabe, Sri Lanka Y. Kasiviswanadham SR Gudlavalleru Engineering College, Krishna, India Tejas Katkade Vishwakarma Institute of Technology, Pune, India Mahesh Babu Katta BVRIT HYDERABAD College of Engineering for Women, Hyderabad, Telangana, India G. Keshini Department of Instrumentation Engineering, MIT Campus, Anna University, Chennai, India Anika Khaer Department of Computer Science and Engineering, United International University, Badda, Dhaka, Bangladesh Tauheed Khan Mohd Department of Math and Computer Science, Augustana College, Rock Island, IL, USA Izan Khan Department of Math and Computer Science, Augustana College, Rock Island, Illinois, USA B. V. Kiranmayee Department of Computer Science and Engineering, VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad, India P. Kola Sujatha Department of Information Technology, MIT Campus, Anna University, Chennai, India N. M. Sai Krishna BVRIT HYDERABAD College of Engineering for Women, Hyderabad, Telangana, India P. Sandhya Krishna Department of IT, Vignan’s Nirula Institute of Technology and Science for Women (VNITSW), Guntur, Andhra Pradesh, India Yedhu Krishnan Heptre Technologies, Chennai, India Krishna Kumar Singh Symbiosis Centre for Information Technology, Pune, India Shrawan Kumar Thapa Institute of Engineering, Tribhuvan University, Lalitpur, Nepal
Editors and Contributors
xxiii
C. N. S. Vinoth Kumar Department of Networking and Communications, College of Engineering and Technology, SRM Institute of Science and Technology, Kattankulathur, Chennai, Tamil Nadu, India Chakka Venkata Sai Rohit Kumar Potti Sriramulu Chalavadi Mallikarjunarao College of Engineering and Technology, Vijayawada, AP, India Ravula Arun Kumar Research Scholar Department of CSE Koneru Lakshmaiah Education Foundation, Green Fields Vaddeswaram, Guntur, Andhra Pradesh, India Sanjai Kumar Department of Science and Technology, New Delhi, India Atmakuri Nikhita Alekhya Adhi Lakshmi Potti Sriramulu Chalavadi Mallikarjunarao College of Engineering and Technology, Vijayawada, AP, India Saniyat Mushrat Lamim Department of Computer Science and Engineering, United International University, Badda, Dhaka, Bangladesh Christopher Le Department of Math and Computer Science, Augustana College, Rock Island, IL, USA Nahomy Maria De Los Angeles Leon-Dextre Universidad Privada del Norte, Lima, Peru Mohd Lutfil Hadi Mohd Hamzah Razak Faculty of Technology and Informatics, Universiti Teknologi Malaysia, Kuala Lumpur Malaysia, Malaysia I. K. Madhubhashana Faculty of Computing, Sri Lanka Institute of Information Technology, Malabe, Sri Lanka Sankararao Majji Department of ECE, GRIET, Hyderabad, India Andrea Maricielo Malca-Liza Universidad Privada del Norte, Lima, Peru A. S. Mali K. E. Society’, Shivaji University Kolhapur, Rajarambapu Institute of Technology Islampur, IndiaSangli, Maharashtra, Balasubbareddy Mallala Chaitanya Bharathi Institute of Technology, Hyderabad, India Luisa Andreina Manavi-Cordova Universidad Privada del Norte, Lima, Peru Anand D. Mane Sardar Patel Institute of Technology, Mumbai, India Ggaliwango Marvin Makerere University, Kampala, Uganda Maslin Masrom Razak Faculty of Technology and Informatics, Universiti Teknologi Malaysia, Kuala Lumpur Malaysia, Malaysia Tokuro Matsuo Advanced Institute of Industrial Technology, Shinagawa-Ku, Tokyo, Japan; City University of Macau, Taipa, Macau; Asia University, Taichung, Taiwan
xxiv
Editors and Contributors
Kaustubh Mhatre Sardar Patel Institute of Technology, Mumbai, India R. I. Minu Department of Computing Technologies, SRM Institute of Science and Technology, Chennai, India Sergei Mironov Saratov State University, Saratov, Russia K. S. Mohamed Thoufeek Department of Biomedical Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, India Muskaan Mohammad Potti Sriramulu Chalavadi Mallikarjunarao College of Engineering and Technology, Vijayawada, AP, India P. Mohana Priya School of Computing, SASTRA Deemed University, Thanjavur, India Aditya Mohite RSCOE, Pune, India Apurva Ajay Mohite K. E. Society’, Shivaji University Kolhapur, Rajarambapu Institute of Technology Islampur, IndiaSangli, Maharashtra, V. Morozov Department of Computer-Integrated Systems in Mechanical Engineering, Tambov State Technical University, Tambov, Russian Federation Madhurya Mozumder Department of Networking and Communications, College of Engineering and Technology, SRM Institute of Science and Technology, Kattankulathur, Chennai, Tamil Nadu, India Joyce Nakatumba-Nabende Makerere University, Kampala, Uganda Paparao Nalajala Department of ECE, Institute of Aeronautical Engineering, Hyderabad, India Mubashir Naqvi Department of Math and Computer Science, Augustana College, Rock Island, Illinois, USA K. V. S. S. N. Narasimha Murty Faculty of Operations & IT, ICFAI Business School (IBS), Hyderabad, The ICFAI Foundation for Higher Education (IFHE), (Deemed to be university u/s 3 of the UGC Act 1956), Hyderabad, India J. Lakshmi Narayana Department of ECE, PSCMR College of Engineering and Technology, Vijayawada, A.P, India R. Naresh Department of Networking and Communications, College of Engineering and Technology, SRM Institute of Science and Technology, Kattankulathur, Chennai, Tamil Nadu, India Rim Nassiri Department of Math and Computer Science, Augustana College, Rock Island, IL, USA Sami Nayeem United International University, Dhaka, Bangladesh K. Nemtinov Department of Computer-Integrated Systems in Mechanical Engineering, Tambov State Technical University, Tambov, Russian Federation
Editors and Contributors
xxv
V. Nemtinov Department of Computer-Integrated Systems in Mechanical Engineering, Tambov State Technical University, Tambov, Russian Federation P. Niharika Department of Computer Science and Engineering, VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad, India Nitish Kumar Ojha University of Stirling, Ras Al Khaimah, UAE Kidus Olana Department of Math and Computer Science, Augustana College, Rock Island, IL, USA Khairulazlan Othman Razak Faculty of Technology and Informatics, Universiti Teknologi Malaysia, Kuala Lumpur Malaysia, Malaysia Katherine Lizet Oyola-Enciso Universidad Privada del Norte, Lima, Peru Damla Busra Ozsonmez Department of Computer Engineering, Galatasaray University, Istanbul, Turkey Sri Silpa Padmanabhuni Department of Computer Science and Engineering, PSCMR College of Engineering and Technology, Vijayawada, A.P, India Shubham Palke RSCOE, Pune, India Kowstubha Palle Chaitanya Bharathi Institute of Technology, Hyderabad, India T. Pandiyavathi SRM Institute of Science and Technology, Chennai, India A. Paramasivam Department of Biomedical Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, India Amirxan Pashayev Western Caspian University, Baku, Azerbaijan Yash Patange RSCOE, Pune, India Daksh Patel Department of Computing Technologies, SRM Institute of Science and Technology, Chennai, India Nitin Pathak Department of Commerce, University School of Business, Chandigarh University, Mohali, Punjab, India S. S. Patil K. E. Society’, Shivaji University Kolhapur, Rajarambapu Institute of Technology Islampur, IndiaSangli, Maharashtra, Pittu Pavan Sai Kiran Reddy Department of Biomedical Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, India Nitin N. Pise School of Computer Engineering & Technology, Dr. Vishwanath Karad MIT World Peace University, Pune, Maharashtra, India R. Prabha Department of Electronics and Communication Engineering, Sri Sai Ram Institute of Technology, Chennai, India P. Venkata Prasad Chaitanya Bharathi Institute of Technology, Hyderabad, India
xxvi
Editors and Contributors
Sanjeeb Prasad Panday Institute of Engineering, Tribhuvan University, Lalitpur, Nepal Surya Prasada Rao Borra PVP Siddhartha Institute of Technology, Vijayawada, India R. Priyakanth BVRIT HYDERABAD College of Engineering for Women, Hyderabad, Telangana, India Proma Hossain Progga Department of Computer Science and Engineering, United International University, Badda, Dhaka, Bangladesh Yu. Protasova Department of Management, Service and Tourism, Tambov State University Named After G. R. Derzhavin, Tambov, Russian Federation Aulia Sisca Rahmadiyanti School of Information System, Bina Nusantara University, Alam Sutera, Indonesia K. R. Rakhul Department of CSE, Velammal Engineering College, Chennai, India J. Ramadevi Department of CSE, PVP Siddhartha Institute of Technology, Vijayawada, India Atharva Rathi Department of Computer Engineering, Vishwakarma Institute of Technology, Pune, India Gillala Rekha Associate Professor Department of CSE Koneru Lakshmaiah Education Foundation, Hyderabad, Telangana, India N. Rukma Rekha School of Computer and Information Sciences, University of Hyderabad, Hyderabad, India Rifky Rivaldy Bina Nusantara University, Alam Sutra, School of Information System, West Jakarta, Indonesia Carla Milagros Rosas-Quintana Universidad Privada del Norte, Lima, Peru Boggarapu Rupa Department of Computer Science and Engineering, PSCMR College of Engineering and Technology, Vijayawada, A.P, India Th. Rupachandra Singh Department of Computer Science, Manipur University, Canchipur, India Mikail Rifqi Rusdi Binus Undergraduate Program - Computer Science, Bina Nusantara University, Jakarta, Indonesia P. Sai Rohan Department of Computer Science and Engineering, VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad, India Katherine Valeria Salazar-Vegas Universidad Privada del Norte, Lima, Peru Austin Ordell Salomo Binus Undergraduate Program - Computer Science, Bina Nusantara University, Jakarta, Indonesia
Editors and Contributors
xxvii
Sk. Sameerunnisa Vasireddy Venkatadri Institute of Technology, Namburu, Andhra Pradesh, India M. Sandhiya Department of Information Technology, MIT Campus, Anna University, Chennai, India Rajendar Sandiri Department of ECE, Vardhaman College of Engineering, Hyderabad, India Diaz Saputra Student of Doctoral of Computer Science, Bina Nusantara University, Jakarta, Indonesia S. Saranya Department of Electronics and Communication Engineering, Sri Sai Ram Engineering College, Chennai, India C. Saravanakumar Department of ECE, SRM Valliammai Engineering College, Chennai, India D. Saravanan Faculty of Operations & IT, ICFAI Business School (IBS), Hyderabad, The ICFAI Foundation for Higher Education (IFHE), (Deemed to be university u/s 3 of the UGC Act 1956), Hyderabad, India Md. Siam Hossain Sarker Department of Computer Science and Engineering, United International University, Badda, Dhaka, Bangladesh R. Sasirekha Department of Computer Science and Engineering, Sathyabama Institute of Science and Technology, Chennai, India A. R. Sathiyan Department of Automobile, Velammal Engineering College, Chennai, India Dana Indra Sensuse Faculty of Computer Science, University of Indonesia, Jakarta, Indonesia G. A. Senthil Department of Information Technology, Agni College of Technology, Chennai, India Sehaba Banu Shaik BVRIT HYDERABAD College of Engineering for Women, Hyderabad, Telangana, India Aman Shakya Institute of Engineering, Tribhuvan University, Lalitpur, Nepal V. Ananth Sai Shankar Department of Mechanical, Velammal Engineering College, Chennai, India R. Shanmuga Priya Madras Institute of Technology, Anna University, Chennai, India N. Shanthi Department of Electrical and Electronics Engineering, Sri Sai Ram Institute of Technology, Chennai, India Sampanna Sharma Institute of Engineering, Tribhuvan University, Lalitpur, Nepal
xxviii
Editors and Contributors
V. Ceronmani Sharmila Department of Information Technology, Hindustan Institute of Technology and Science, Chennai, India Kazi Shawpnil United International University, Dhaka, Bangladesh Sarvesha Shinde Sardar Patel Institute of Technology, Mumbai, India Sergei Sidorov Saratov State University, Saratov, Russia Jason Sim Bina Nusantara University, Alam Sutra, School of Information System, West Jakarta, Indonesia Ajit Singh Deen Dayal Upadhyay College, University of Delhi, New Delhi, India Nongmeikapam Thoiba Singh Department of Computer Science and Engineering, Chandigarh University, Punjab, India B. Sivakumar Department of Computing Technologies, SRM Institute of Science and Technology, Chennai, Tamilnadu, India T. Sivathmika Department of IT, Vignan’s Nirula Institute of Technology and Science for Women (VNITSW), Guntur, Andhra Pradesh, India P. Smitha Vas LBS Institute of Technology for Women, Thiruvananthapuram, Kerala, India K. Sneha Department of Computer Science and Engineering, VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad, India Angelo Aldahir Solano-García Universidad Privada del Norte, Lima, Peru Atharva Sonawane Department of Computer Engineering, Vishwakarma Institute of Technology, Pune, India K. S. Sowbarnigaa Department of Information Technology, MIT Campus, Anna University, Chennai, India M. Sai Ramya Sri Department of IT, Vignan’s Nirula Institute of Technology and Science for Women (VNITSW), Guntur, Andhra Pradesh, India S. Sridevi Department of Computer Science and Engineering, Sri Ramachandra Engineering and Technology, Chennai, India L. K. Srinivas Karthik Department of Computer Science and Engineering, VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad, India Swetha Srinivasan Department of Computer Science and Engineering, Kumaraguru College of Technology, Coimbatore, Tamil Nadu, India Y. V. Subba Rao School of Computer and Information Sciences, University of Hyderabad, Hyderabad, India Sankari Subbiah Department of Information Technology, Sri Sai Ram Engineering College, Chennai, India
Editors and Contributors
xxix
J. Subha Department of CSE, SRM Institute of Science and Technology, Chengulpattu, India Rolla Subrahmanyam School of Computer and Information Sciences, University of Hyderabad, Hyderabad, India G. Sudha Department of Electronics and Communication Engineering, Sri Sai Ram Engineering College, Chennai, India S. Suganthi Department of Artificial Intelligence and Data Science, Sri Sai Ram Institute of Technology, Chennai, India Ravi Supunya Faculty of Computing, Sri Lanka Institute of Information Technology, Malabe, Sri Lanka Chalumuru Suresh Department of Computer Science and Engineering, VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad, India Andrew Sward Department of Math and Computer Science, Augustana College, Rock Island, IL, USA G. L. Swathi Mirthika Department of Computing Technologies, SRM Institute of Science and Technology, Chennai, Tamilnadu, India Muhammad Heru Syaputra Bina Nusantara University, Alam Sutra, School of Information System, Kota Tangerang, Indonesia Manthan Tagad Vishwakarma Institute of Technology, Pune, India Vanessa Andrea Tamayo-Sanchez Universidad Privada del Norte, Lima, Peru Jordan Thompson Department of Math and Computer Science, Augustana College, Rock Island, IL, USA Sophia Tikhonova Saratov State University, Saratov, Russia A. TinaVictoria Department of CSE, SRM Institute of Science and Technology, Chengulpattu, India R. Sri Lakshmi Triveni Department of IT, Vignan’s Nirula Institute of Technology and Science for Women (VNITSW), Guntur, Andhra Pradesh, India Adriana Margarita Turriate-Guzman Universidad Privada del Norte, Lima, Peru Neeraj Tyagi Deen Dayal Upadhyay College, University of Delhi, New Delhi, India Aniket Ukarde Vishwakarma Institute of Technology, Pune, India Rucha Uplenchwar Department of Computer Engineering, Vishwakarma Institute of Technology, Pune, India Vudathu Venkata Krishna sai Poojitha Department of Computer Science and Engineering, PSCMR College of Engineering and Technology, Vijayawada, A.P, India
xxx
Editors and Contributors
L. Vijayakumari Department of Electronics and Communication Engineering, SRM Valliammai Engineering College, Kattankulathur, Tamil Nadu, India S. Vijayalakshmi Department of Electronics and Communication Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Scienceand Technology, Chennai, India S. Vimala Department of Computer Science, Mother Teresa Women’s University, Kodaikanal, India Kambalapally Vinuthna Associate Professor Department of CSE, Neil Gogte Institute of Technology, Hyderabad, Telangana, India Alfakhri Rizqulloh Wijayakusuma Bina Nusantara University, Alam Sutra, School of Information System, West Jakarta, Indonesia Sang Guun Yoo Departamento de Informática y Ciencias de la Computación, Escuela Politécnica Nacional, Quito, Ecuador; Smart Lab, Escuela Politécnica Nacional, Quito, Ecuador Miodrag Zivkovic Singidunum University, Belgrade, Serbia
Image Processing-Based Solution to Repel Crop-Damaging Wild Animals W. P. S. Fernando, I. K. Madhubhashana, D. N. B. A. Gunasekara, Y. D. Gogerly, Anuradha Karunasena, and Ravi Supunya
Abstract Two-thirds of Sri Lanka’s population is directly dependent on agriculture, which generates one-third of the nation’s GDP. However, crop efficiency in Sri Lanka has declined over the years due to several issues including sub-farm maintenance, destruction caused by wild animals, and unethical farming practices. Among them, the destruction caused by wild animals has led to conflicts between animals and humans causing loss of both animals and human lives in the past. There are a number of technical solutions proposed to solve the above problem, especially in the form of animal repellants. However, such solutions have several limitations, such as the small number of animal groups to be identified and the short distances they can be detected, and the lack of understanding of harmful animal populations. This research proposes an animal-repellent methodology considering several features of animals such as colors, coats, shape, and noise made by animals both in daytime and nighttime. The number of animals approaching crops is also detected and the behavior of animals is monitored to avoid false alarms. The research uses a wide range of techniques such as image processing and deep learning for the above purpose on audio, visual, and image data sets collected from the mentioned animal groups. The solution demonstrated a 90% accuracy for animal identification during the day, and 84% accuracy for animal W. P. S. Fernando (B) · I. K. Madhubhashana · D. N. B. A. Gunasekara · Y. D. Gogerly · A. Karunasena · R. Supunya Faculty of Computing, Sri Lanka Institute of Information Technology, Malabe, Sri Lanka e-mail: [email protected] I. K. Madhubhashana e-mail: [email protected] D. N. B. A. Gunasekara e-mail: [email protected] Y. D. Gogerly e-mail: [email protected] A. Karunasena e-mail: [email protected] R. Supunya e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 J. S. Raj et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 665, https://doi.org/10.1007/978-981-99-1726-6_1
1
2
W. P. S. Fernando et al.
identification at night, whereas the accuracy of studying animal behavior patterns is 90% and animal sounds were identified with 87% accuracy. Keywords Agriculture · Crop damage · Crop repellent system · Body shape · And coat colors · Postures · Barking sounds · Deep learning · Ultrasonic frequency
1 Introduction Agriculture fulfills one of the basic human needs, the need for food [1]. Therefore, Agro-industry has been at the forefront across the world since ancient times [2]. In Sri Lanka, the agriculture sector accounts for around 7.4% of the national GDP, with crop farming accounting for approximately 5.2%. Furthermore, 30% of the Sri Lankan workforce is involved in the Agricultural sector. According to research studies, the world’s population will reach 8 billion by 2025, implying that the world’s population would increase by more than 2.5 billion in the next 30 years. To cater to the food requirements of such a population, the food supply should be more than double by 2025 [3]. This, however, would be very challenging due to several factors such as poor agricultural maintenance, economically unsustainable agricultural practices, and destruction caused by wildlife damage. Sri Lanka is a rich location where a diverse range of crops may be grown and developed [2]. However, cultivations in Sri Lanka are constantly destructed by wild animals [4–7]. There is constant conflict between animals and humans due to the above. To protect their crops, farmers take a number of measures such as using electronic repellents, chemical repellents, gunpowder, and electric fences. Some of these measures have caused injuries to animals as well as humans. For example, some chemicals used may be harmful to the skin, and electric fences erected by farmers to chase away animals can cause severe damage to humans. Under such circumstances, the use of technology to automatically detect possible threats to crops and repel wild animals from crops would be immensely valuable. There are several researches done on the applications of technology for repelling wild animals which may cause damage to the crop. For example, several researches have used image processing techniques where researchers have evaluated shape, color, and coat markings metabolic characteristics for detecting animals. However, the existing research has also considered only a particular species of animal such as cows [8] and birds [9] that may harm crops. The objective of this research is to provide a more accurate solution for detecting animals that may harm crops by considering a number of aspects such as shape of animals, color of coating, and sounds. The research also identifies four species of animals that may harm crops, which are elephants, cows, wild boars, and peacocks. Furthermore, the number of animals approaching the crops would also be detected. In addition, the behavior of the animals near the crops will also be identified to avoid false alarms being triggered in situations where there are no actual threats from the animals.
Image Processing-Based Solution to Repel Crop-Damaging Wild Animals
3
2 Literature Review Existing research has proposed several methodologies to identify wild animals using image processing techniques. For example, Weideman and Stewart have researched extracting identifying contours for African elephants and humpback whales using a learned appearance model [10]. The research focused, in particular, on using FCNN to isolate contours from images of the trailing edges of humpback whale flukes and the outline of the ears of African savanna elephants using FCNN to isolate the contour in an image and obtain more precise boundary information. Then extracting boundary information with matching of contours those animals were identified in images. The method resulted in adjusting the use of used contours to Improve fine FCNN accuracy up to 80% to 85% for flukes and 78% to 84% for the ears of elephants. Adami et al. have developed an intelligent animal-repelling system for crop procreation bested on embedded edge AI [8]. In the above research, the recognition of animals was done by using YOLOv3 and Tiny YOLOv3 deep learning algorithms. Drones were used to detect cattle using SSD and YOLOv3 as object detectors. The accuracy level of the method was around 97%. Suju and Jose conducted research using FLANN Search Algorithm for elucidating Human–Wildlife conflicts in Forest areas [11]. The main objective of the above research is to minimize human–animal interactions by tracking the route of an elephant while it’s moving and informing numerous parties such as railway station and forest range officers. Image processing methods such as background reduction and foreground enhancement are used to spot elephants during monitoring. Compared to other existing methods, the result demonstrates a considerable improvement in elephant accident detection and preventing. Research suggests that noise cancelation methods may improve it. Songtao Guo did research on the Automatic Identification of Individual Primates using Deep Learning Techniques and they targeted the animal face to identify them separately [12]. Presented a deep convolutional neural network (CNN) approach for face detection, tracking, and recognition of wild, and they obtained an overall accuracy of 92.5%. Students at the University of Groningen Okafor, Emmanue have researched deep learning for animal recognition [13]. Students at the University of Groningen Okafor, Emmanue have modified CNN architecture (Alex Net and Google Net); they identified R-CNN is faster than SSD detector when training a model, but for testing SSD is a bit faster. To increase the recognition performance they used novel Wild–Animal dataset and classical image de-scripture for training purposes [13]. This data set contains 5000 images and the dataset is processed by an automatic labeling system. To deal with data, they have used SVM, and as a result of this, they can reach an overall 85% accuracy level. Vidhya et al. have researched on smart crop protection using a deep learning approach. [14]. In this study, authors Gogoi and Philip present thorough research on developing a system using image processing techniques like SIFT algorithm to recognize an animal and include background removal techniques applied for object detection. The two researchers, namely, Samayan Bhattacharya and SkShahnawaz, have researched pose recognition and estimation in wild animals using two approaches, namely agglomerative clustering and contrastive learning [15]. In that
4
W. P. S. Fernando et al.
research, they used an unsupervised learning methodology of animal pose estimation based on their motion. For that, mainly, they have used the Dog dataset with random videos from YouTube and agglomerative clustering and contrastive learning methods to predict the postures. The proposed method performs segmentation for the body parts of the different species of creatures with their movements. After that, it performs the background removal to remove the irrelevant tasks of the animal movements, then edges detection. After this, it performs the agglomerative clustering followed by contrastive training to train the model. Animal localization in cameratrap photos with complicated backdrops was the subject of research done in 2020 by Singh et al. [16]. They have used a deep learning model to filter the complex backgrounds and accurately detect the animals by testing accuracy of animal counting and cate- polarization in camera- trap photos with intricate backgrounds. Ranparia et al. developed a machine learning-based repellent system for protecting crops against wild animal attacks [17]. In the aforementioned research, the video frames were captured and processed using the OpenCV library in Python before being given to a deep learning model built on the convolutional neural network (CNN) structure. The model comprises four convolutional layers with the Rectified Linear Unit as an activation function and max pooling carried out by a 2 × 2 pooling layer. The training dataset was gathered from Google Images, Shutterstock, and the Kaggle database. For the performance evaluation, they compiled their model for 20 epochs with a batch size of 64, it reached an accuracy level of 98.54%, and a validation accuracy of 73.02%. Ozden and Severoglu, have researched sound recognition and studied the binary classification of the sound of wild animals in 2019 [18]. Convolutional Neural Network (CNN) was used to categorize sound in binary representation. In order to complete this task, the researchers used 20,000 data, of which 16,000 were used for training.2,000 of them were utilized for validation, and the remaining 2,000 were used for testing. And since they did not have a dataset when they did that test, they created their labeled dataset for use in their study. They created the dataset using MATLAB. The model achieved a training accuracy of 98% and a validation accuracy of roughly 92%. Santosh Kumar et al., have researched Sound Activated Wildlife Capturing in 2018 [19]. This project aims at capturing wildlife using microphone sound detectors, an ultrasonic sensor, and a camera. As follows, the setup is placed in an animal sanctuary, and the data obtained from it is used to identify the animals and provide information about the animal through GSM technology. The data required for this process is obtained by two microphones. An automated ultrasonic-insect and animal repellent that constitutes an ultrasonic sensor, using a motion sensor, a GSM module, and an Arduino Uno board, is being tested by an indigenous researcher on a method of repelling crop-damaging animals [20]. In this case, if a wild animal comes to harm the crops, it is reported through GSM technology, and a message is sent to the farmer, who takes steps to protect his land. It also does not retrieve data, and once wildlife has arrived, the sensor chases the animals away. It predicts whether those animals will attack the crop or not.
Image Processing-Based Solution to Repel Crop-Damaging Wild Animals
5
3 Methodology In the suggested system, we have made an effort to address the issue using deep learning and image processing. Our system is set up to act immediately whenever a camera detects wild animals, regardless of the circumstance. The system consists of both hardware and software components, and the necessary information is described in the sections below. According to Fig. 1, the process of the system works if the animals come to destroy the plantations, as soon as they come near the plantations, the system identifies which animal category the animals belong to in the preprocessing of the data. Their behavior patterns are then studied to ascertain whether they affect the crops. This process occurs both during the day and at night, and studies the sound patterns the animals emit to determine whether they are harmful. Then the system emits ultrasonic wave specific to the respective groups of animals, and thus the animals have easily driven away from the crops by reducing the damage to the crops.
3.1 Identifying Types of Animals that Come to Damage the Crop During Daytime and Nighttime Using Image Processing 3.1.1
Dataset and Data Gathering
Under these two components, more than 10,000 images of animals taken during the day and more than 2,000 images of animals taken at night under 4 main animal groups have been created to create datasets. Here, the wildlife conservation department was used as a means of collecting images. Images obtained at different quality levels were used to create this set. After collecting daytime and nighttime data, the animals’ four types of angles and perspectives were well-identified and individually labeled. For this, the LabelImg software classified them into four categories and created custom datasets separately for daytime and nighttime.
3.1.2
Object Detection Model
All image classification structures are automatically scaled to 256 × 256 pixels as needed. Several random adjustments such as random rotation, scaling, cropping, and flipping are performed on the dataset of training images. Model normalization is improved as it is used to stop the ascent. The label map files are processed as YOLO files before creating the object recognition model. To train the data to identify these species, YOLO V4 was used. For real-time object detection, it is a fast and accurate deep learning model. Preprocessed data is used to train the selected model.
6
W. P. S. Fernando et al.
Fig. 1 Summary of system architecture
3.1.3
Preprocessing and Augmentation
Here, the main function of image processing is augmentation. The best method for deep learning algorithms to get accurate results is image resizing. In this case, it is 225 × 225. To expand data collection and increase the level of accuracy, make adjustments using rotation, shift (height and weight), brightness, cropping, zoom, channel switching, vertical flip, and horizontal flip. The purpose of data augmentation is to reduce overfitting and improve the classifier used for image processing.
Image Processing-Based Solution to Repel Crop-Damaging Wild Animals
7
3.2 Analyzing the Behavior of Animals Who are Near or Approaching Crops 3.2.1
Dataset and Data Gathering
For the detection of the different wild animals’ actions, more than 5000 images were taken under 4 main categories of wild animals. They are elephants, cows, wild boars, and peacocks. Snapshot Serengeti, Kaggle, and Open Images Dataset are some of the mediums to collect custom image datasets. Images were taken from different quality levels, angles, and perspectives to enhance the posture detections of the wild animals. After collecting the images, individually label the different actions of the wild animals (i.e., moving, standing, eating, and sleeping).
3.2.2
Object Detection Model
As for the detection of wild animal posture detections, the video classification method is used. Because of this, without any delay, it can detect the animal’s behavior. By dividing the input into many frames and categorizing each frame based on the various animal poses that were recorded by the camera input. Because it performs on par with the other models in terms of speed and accuracy metrics, the SSD mobile net v2 finite model is very effective. Its speed of 22 ms and mean average precision of 22.2 are acceptable for running the model in a setting with limited processing capability. Both the speed and the accuracy are higher when compared to other models, such as the quicker R-CNN. Due to this, SSD mobile net v2 is a more effective finite model.
3.2.3
Preprocessing and Augmentation
In terms of improving the custom dataset, several methods of preprocessing techniques and augmentation have been used. The collected image dataset was resized into 320 × 320 pixels of resolution, and those images were subjected to different adjustment techniques such as cropping, flipping, and angle rotations in order to improve both the quality and quantity of the images. From this, it can create a dataset with highly accurate images.
8
W. P. S. Fernando et al.
3.3 Identifying Animals that Come to Damage the Crop Using Sounds they Make 3.3.1
Dataset and Data Gathering
It is mandatory to have a dataset to detect the animal through the incoming sounds of the animal. For that, it was possible to get an online dataset through Google, and it was possible to get more than 4000 data according to the required 4 types of animals. And among these data, some data were in mp3 format and other data were in WAVa format. All these were then labeled individually, after which training of the model was started.
3.3.2
Object Detection Model
The TensorFlow framework was used to perform this task. In order to train the model, audio files should be converted into images. Because it was planned to carry out that process through CNN. STFT (Short Time Fast Fourier) was used for that. All audio files were converted into spectrograms through that model trainer. In this way, it was possible to develop the accuracy of the required model.
3.3.3
Preprocessing and Augmentation
In order to improve the accuracy of the model trained in this way, it was necessary to reduce the noise. A dataset with clear noises is obtained once the noise has been reduced in this way and the model was trained repeatedly to provide a dataset with great accuracy. The need to decentralize the responsibility for crop protection for farmers emerges as the main topic from the analysis of the intervention failures. By the outputs of these four components, it assures the reliability factor of the system. By using the edge detection, coat detection, frequency, and gait pattern detentions mechanisms for the day and nighttime, it eventually outputs a highly cohesive and accurate result of the animal appearance and behavior in the fields.
4 Results and Discussion 4.1 Identifying Types of Animals that Come to Damage the Crop During Daytime Using Image Processing Based on the animals’ colors and coat markings, the day-light animal detection module can classify the animals. The deep learning algorithm collects the animal
Image Processing-Based Solution to Repel Crop-Damaging Wild Animals
9
characteristics (such as color and coat markings), and the software library from sensors and the camera calculates the distance between the identified animal and the crop. The deep learning model was trained using a dataset of more than 10,000 images from the animal category. Preprocessing was done on the entire dataset before it was fitted to the model. The sort of Convolutional neural network utilized to create the model was YOLO (You Only Live Once), which allows for real-time object recognition. The trained model was operating the repelling system on the raspberry pi. During the day, this raspberry pi uses a web camera and sensors to retrieve the input data for the animal detection model. Since animal traits might change over time, they are recorded for a duration of time to pick the features that have been predicted to have the most impact (Fig. 2). Figure 3 shows that the accuracy of the training data and the validation data are almost the same, indicating that the test data accuracy is comparable to the training data accuracy. As a result, in Fig. 3, the predicted left diagram achieves 90% daytime accuracy and 0.069% average loss with 12,000 epochs.
4.2 Identifying Types of Animals that Come to Damage the Crop During Daytime Using Image Processing Based on the animals’ physical types, the nighttime animal detection module can determine which animal category they belong to. The deep learning model in this component captures the animal’s body shape, while the other half captures a picture of the discovered animal and sends a mobile alert via a mobile app. This feature already has a dataset with recently captured animals and the crop using a software library from sensors and the camera, which will help the model be trained to get high accuracy (Fig. 4). The deep learning model was trained using a dataset of more than 2000 photos from the animal category. Before being fitted to the model, the complete dataset underwent preprocessing. Nighttime detection prediction reaches up to 81% (Fig. 5) accuracy level and an average loss of 0.15%, using the training stage of the model with more than 2000 image data sets in each daytime and nighttime.
4.3 Analyzing the Behavior of Animals Who Are Near or Approaching Crops Under this domain, analysis of animal movement is the major factor. The method of video classification is used for the analysis of animal behavior. By that approach, we can detect the animal postures without much delay. Because of that, it is a very effective and efficient method to detect the numerous behaviors of the animals with the limited infrastructure used. The SSD mobile net v2 finite model is highly efficient
10
W. P. S. Fernando et al.
Fig. 2 Summary of the accuracy of animal detection during daytime
because it has an average result for both speed and accuracy metrics when compared to the other models. It has 22 ms of speed and 22.2 ms of mean average precision which is suitable to run the model in an environment having low computational power. Compared to other models like faster R-CNN, the speed and the accuracy both have higher values. Because of that, SSD mobile net v2 finite model is more efficient than others. For the model training, 220,000 epochs have been used to reach around 90% of the accuracy of the model (Fig. 6). In this task, the loss metrics were indicated around 15% and it was considered a lower amount compared to the accuracy of the model (Fig. 6). Figure 7 shows the postures and the accuracy rates during the testing period using the data preprocessing.
Image Processing-Based Solution to Repel Crop-Damaging Wild Animals
11
Fig. 3 Detect animals in the daytime with animal class and accuracy
4.4 Identifying Animals that Come to Damage the Crop Using Sounds they Make Utilizing auditory input, the voice recognition model is put into use to determine the animal’s present voice. The model receives input sounds and outputs the animal’s sound. Because it must transform the animal voices into digital signals, it requires a TTL converter. Additionally, CNN architecture is included. The model’s animal identification sounds will be used to deter pest animals from damaging crops. To train the model, a collection of 3000+ action voice clips from four classes of over 1,900 clips each from the Google dataset, which includes a range of sounds connected to animal motions, is employed. The training set was separated into video frames after the dataset was split into 10:80:10, and it was then utilized to train the model. The frequencies of those creatures are very accurate which results in 87% average accuracy and 20% average loss for the detection of the frequencies of the wild animals (Fig. 9).
5 Conclusion and Future Work Food is an essential factor in our daily life and gives us energy. Plant food is often destroyed by the intervention of wild animals. In this report, we have considered the issues related to protecting crops from those damaging animals. We have worked to identify animals that damage crops during the day, animals that damage crops at night, the behavioral patterns of animals, and the sounds of animals.
12
W. P. S. Fernando et al.
Fig. 4 Summary of the accuracy of animal detection during nighttime
We have no idea whether an animal near a crop will damage the crop or not. But through this system that we have developed, if such an animal comes close to the crop, we can identify whether that animal will damage the crop or not. The system is designed using advanced technology materials, deep learning languages, databases, and software. We can identify this as a solution to the crisis faced by a large number of countries that depend on agriculture. We try to complete this task by driving away the animals that damage the crops, giving high productivity to the farmer, and even protecting the lives of people as well as animals. Similarly, the economy of many agricultural countries is considered to be largely based on agriculture. However, the spread of agriculture and the security of food supply are the main factors of animals that damage crops. Therefore, our main goal going forward is to expand the use of an acceptable method for different crops with different heights and organisms. Our long-term goal is to provide farmers with a
Image Processing-Based Solution to Repel Crop-Damaging Wild Animals
13
Fig. 5 Detect animals in the night with animal class and accuracy
Fig. 6 Summary of the accuracy of the behavior of the wild animals in real time
cost-effective pest management solution that helps them get rid of pests and increase agricultural production.
14
Fig. 7 Detection of different actions of the wild animals
Fig. 8 Summary of the accuracy of voice detection
W. P. S. Fernando et al.
Image Processing-Based Solution to Repel Crop-Damaging Wild Animals
15
Fig. 9 Summary of the output of frequencies of the wild animals in real time
References 1. Origins H et al. https://en.wikipedia.org/wiki/Agriculture 1/27, pp 1–27 2. Ethiopia - Country Commercial Guide Roads, railways and logistics This is a best prospect industry sector for this, pp 1–13 3. Mccalla AF (1994) Agriculture and food needs to 2025 : why we should be concerned. Consultative Group on International Agricultural Research (CGIAR), pp 1–2 4. Venkatesh G, Sai Suman G, Sai Nikhilesh V, Khaleel Ahmed S (2021) Prevention of Animal attacks on farms with IOT system. In: Proceedings of 2nd international conference on smart electronic and communication. ICOSEC 2021. https://doi.org/10.1109/ICOSEC51865.2021. 9591695
16
W. P. S. Fernando et al.
5. Shaffer LJ, Khadka KK, Van Den Hoek J, Naithani KJ (2019) Human-elephant conflict: a review of current management strategies and future directions. Front Ecol Evol 6:1–12. https:// doi.org/10.3389/fevo.2018.00235 6. Santiapillai C, Wijeyamohan S, Bandara G, Athurupana R, Dissanayake N, Read B (2010) An assessment of the human-elephant conflict in Sri Lanka. Ceylon J Sci (Biological Sci) 39(1):21. https://doi.org/10.4038/cjsbs.v39i1.2350 7. Horgan FG, Kudavidanage EP (2020) Farming on the edge: Farmer training to mitigate humanwildlife conflict at an agricultural frontier in south Sri Lanka. Crop Prot 127:1–8. https://doi. org/10.1016/j.cropro.2019.104981 8. Adami D, Ojo MO, Giordano S (2021) Design, development and evaluation of an intelligent animal repelling system for crop protection based on embedded edge-AI. IEEE Access 9:132125–132139. https://doi.org/10.1109/ACCESS.2021.3114503 9. Navaneetha P, Devi RR, Vennila S, Manikandan P, Saravanan DS (2020) IOT based crop protection system against birds and wild animal attacks 6:138–143 10. Weideman HJ et al (2020) Extracting identifying contours for African elephants and humpback whales using a learned appearance model. In: Proceedings—2020 IEEE winter conference application computing vision, WACV 2020, pp 1265–1274. https://doi.org/10.1109/WACV45 572.2020.9093266 11. Suju DA, Jose H (2017) FLANN: fast approximate nearest neighbour search algorithm for elucidating human-wildlife conflicts in forest areas. In: 2017 4th international conference on signal processing communication and networking, ICSCN 2017, vol 2, pp 14–19. https://doi. org/10.1109/ICSCN.2017.8085676 12. Guo S et al (2020) Automatic identification of individual primates with deep learning techniques. iScience 23(8): 1–13. https://doi.org/10.1016/j.isci.2020.101 13. D. Version (2019) Deep learning for animal recognition. Emmanuel Okafor 14. Vidhya S, Vishwashankar TJ, Akshaya K, Premdas A, Rohith R (2019) Smart crop protection using deep learning approach. Int J Innov Technol Explor Eng 8(8):301–305 15. Bhattacharya S, Subodh R (2021) Pose recognition in the wild : animal pose estimation using agglomerative clustering and contrastive learning. arXivPrepr., no. 2111.08259, pp 1–9 16. Singh P, Lindshield SM, Zhu F, Reibman AR (2020) Animal localization in camera-trap images with complex backgrounds. In: Proceedings of IEEE southwest symposium image analysis interpretion, vol 2020, March, pp 66–69. https://doi.org/10.1109/SSIAI49293.2020.9094613 17. Singh H, Singh G, Rattan A, Auluck N, Ranparia D (2021) Machine learning-based acoustic repellent system for protecting crops against wild animal attacks 18. Özde¸s M, Severoˇglu BM (2019) Sound spectrum detection using deep learning. In: 2019 scientific meeting electronics biomedical engineering and computer science. EBBT 2019, pp 1–4. https://doi.org/10.1109/EBBT.2019.8741557 19. Silakari P, Silakari P, Bopche L, Gupta A (2018) Smart ultrasonic insects pets repeller for farms inventories purpose. In: 2018 international confernence advanced computing telecommunication. ICACAT 2018, pp 30–32. https://doi.org/10.1109/ICACAT.2018.8933607 20. Yeo CY, Al-Haddad SAR, Ng CK (2011) Animal voice recognition for identification (ID) detection system. In: Proceedings—2011 IEEE 7th international colloquium signal process. Its applications. CSPA 2011, no. Id, pp 198–201. https://doi.org/10.1109/CSPA.2011.5759872
SDN Framework for Efficient Latency-Aware Topology Discovery in ISTN V. Deepa and B. Sivakumar
Abstract The IEEE 802.1 Task Group on Time-Sensitive Networking is presently developing a set of standards for how networks behave in real time, and the standards are known as Time-Sensitive Networking (TSN). The aspects of Software-Defined Networking (SDN) based Integrated Satellite-Terrestrial Networks (ISTN) make it an excellent choice for deploying TSN networks and features including support for runtime flexibility, administration advantages, effectiveness, and performance. For high-priority traffic to behave in real time on TSN networks, it must be well scheduled to meet its timing requirements. In order to schedule the traffic for this, the control plane (GEO) in an SDN-based ISTN solution needs to have a clear understanding of the network topology and the network latency. In this paper, we initiate a topology discovery mechanism based on the IEEE (802.1AB) Link Layer 2 Protocol (LL2P) for the Controller Network (CN) in ISTN-TSNs, which is capable of discovering latency for time-aware planning, without depending on other synchronization protocols like Network Time Protocol/Precision Time Protocol (NTP/PTP). Our test outcome examines its performance and scalability in terms of throughput consumed for link measure and propagation time attained. Keywords Latency · Integrated Satellite-Terrestrial Networks (ISTN) · Time-Sensitive Networking (TSN) · Communication · Propagation time · Software-Defined Networking (SDN)
V. Deepa (B) · B. Sivakumar Department of Computing Technologies, SRM Institute of Science and Technology, Kattankulathur, Chennai, Tamilnadu, India e-mail: [email protected] B. Sivakumar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 J. S. Raj et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 665, https://doi.org/10.1007/978-981-99-1726-6_2
17
18
V. Deepa and B. Sivakumar
1 Introduction The next generation of mobile cellular networks (6G) [1] needs to provide more advanced characteristics in terms of coverage and capacity with the impending introduction of the 5th generation of mobile cellular networks (5G). In addition to ensuring global connectivity by linking underserved regions with poor or no Internet access, 6G should help reduce the capacity constraints of terrestrial connections, particularly in light of the rising data demand in dense networks. The shortest possible transmission delay and the least amount of jitter are the two main requirements for real-time communication. Notably, the problems can be cleverly solved by integrating attempts to technologies like SDN, NFV, and AI. The ISTN incorporates SDN [2] by separating the controller (control plane) and infrastructure layer (data plane) in the network for enhancing programmability, reconfigurability, and interoperability via the number of interfaces. SDN controllers make it possible for heterogeneous networks to implement collaborative procurement and inter adaptive deployment for bespoke services.
1.1 Outline of SDN-Based ISTN System Figure 1 depicts the proposed SDN-based ISTN system design. The structure has a new generation of data plane (Low Earth Orbit-LEO), the master controllers (GeoSynchronous Orbit-GEO/Low-Synchronous Orbit-LEO), and the network application layer (access points or handover) [3]. Satellites also serve as switches or gateways with SDN capabilities to route data packets in accordance with the up-to-date flow tables. In the other approach, a virtualization mechanism is created to place VNFs (Virtual Network Functions) and involves network slices as needed by virtualizing the components of each network segment. The GEO satellite acts as the SDN controller to guide and command the regional MEO and LEO satellites since it has a numerous limited area and dependable satellite-to-terrestrial communications. This is possible due to the fact that the GEO satellite has a large number of limit areas. The topology of satellite networks and the network status learned from the controller to interface (North Bound Interface) means that the mobility module at the application layer may specifically offer intelligent changeover between terrestrial earth points or satellites to provide connection. In order to receive the overall network state and create the service profile, the SDN controller also communicates with the other controllers of terrestrial or satellite networks through Interfaces.
SDN Framework for Efficient Latency-Aware Topology Discovery in ISTN
19
Fig. 1 A system design of SDN-based ISTN
1.2 Outline of IEEE 802.1Q-TSN System According to IEEE 802.1Qbv standards [4], Fig. 2 shows the condensed intellection of a switch port in time-trigger scheduling. There is a conceptual switch called Gate in each queue, and the incoming flow enters various queues based on priority. Here, each of the switches has eight priority queues on the egress ports. A value of G = 1 indicates that the queue’s gate is open and the data may flow through it, whereas a value of G = 0 indicates that the gate is closed. Gates will open and shut cyclically in accordance with the cycle period of the Gate Control List (GCL), which has been defined. The purpose of TSN technology [5] is to standardize Layer 2 (data plane/infrastructure layer or LEO/MEO) characteristics so that many protocols may use the same infrastructure. Depending on its latency-related needs, time-critical traffic may be scheduled for transmission using the GCL. Three essential elements must completely collaborate for the TSN to function successfully [6]:
20
V. Deepa and B. Sivakumar
Fig. 2 An outlook of IEEE802.1Qbv-TSN system [5]
1. All devices must have the same concept of time in order for real-time communication to function. TSN uses the IEEE 1588 Network Time Protocol (NTP) to synchronize clocks to the microsecond level. NTP sends time from a single central time source directly across the network using Ethernet frames. 2. Every participant involved in real-time communication adheres to guidelines for selecting communication routes, dividing up available bandwidth, and organizing time slots, often using several concurrent communication channels to improve fault tolerance. All these aspects may be coordinated by having a centralized controller and data plane, which will also help the network run better. The Central User Configuration (CUC) entity plus a Controller Network (CN) make up this implementation. 3. The rules for evaluating and routing communication packets are the same for all devices involved in real-time communication.
2 Literature Review The goal of this section is to make it possible to combine TSN with ISTN to create an SDN-based network. The SDN domain makes extensive use of topology discovery and addresses configuration methods, but the specific requirements of the TSN domain have not received the same level of attention. Today, topology discovery at higher layers is made possible via the Link Layer 2 Protocol, which is originally developed as a technique for link discovery in layer 2 networks. A description of pertinent research on topology discovery for time-sensitive networking in the SDN setting is presented together with information on related ISTN time synchronization and delay discovery methods.
SDN Framework for Efficient Latency-Aware Topology Discovery in ISTN
21
The addition of SDN to the satellite network may not only make up for its shortcomings, but also work in concert with the terrestrial network’s development trend to maximize the satellite network’s performance [7]. In order to guarantee dependability, Fan Wu proposed a new multi-level multicast rate adaptive scheme for NDN WLAN [8]. This scheme adaptively adjusted the multicast data rate in multiple transmission stages in accordance with the situation of the multicast group, thereby minimizing the total transmission time. TSN [5, 9] necessitates relevant time management. The timing and synchronization standard IEEE 802.1AS make use of Precision Time Protocol (PTP) to determine the propagation time between nodes and the controller using a sync and propagation time measurement [9]. Other time synchronization methods exist, such as REVERSEPTP [10]. With this technique, the controller keeps a precise clock and each switch sends the controller sync messages on a regular basis that include a local timeframe. To manage the trusted connection between the devices and the controllers, the trust-based method is used [11]. This makes it possible for the network to have simultaneous network access even when a component fails and secures the network.
2.1 Motivation In order to improve the scheduling process in TSN networks, this paper investigates the ISTN-based implementation of TSN networks with an SDN. It also explores a way to provide CN information about the network and the propagation times between its components. In this context, we investigated how to determine the propagation time among switches without time synchronization, like NTP, and how to apply the recommended network models with the least number of modifications to the data plane (switches). The paper is structured as follows. To determine the topology and calculate delays in TSN networks are examined in Sect. 3. In Sect. 4, we detail the procedures used to identify the network and calculate the propagation time of the switches. Sect. 5 of this article evaluates the results on TSN networks, and Sect. 6 wraps up this article.
3 Proposed Method A full understanding of the switches and related ports is acquired by the GEO controller using the LL2P, as was covered in Sect. 1.2. In SDN-based ISTN networks, LL2P-based topology identification relies on the basic. An LL2P packet containing a Chassis ID and a port number is delivered through a packet_out message to switch 1 interested in finding its connections. A packet_in message is sent to the controller by the destination switch 2 when the LL2P packet arrives. In order to determine whether there is a connection between switch 1 and switch 2, the controller must first extract the destination’s port number, mac addresses, and Chassis ID from the packet_in
22
V. Deepa and B. Sivakumar
message it receives. It then stores this information in its knowledge database, and once all of the network’s connections have been extracted, the controller will repeat this operation. Network performance is directly impacted by the maximum latency available, or the largest delay between a data node and a controller. A network manager must thus determine the number of GEO controllers needed to satisfy a certain maximum latency requirement. As shown in Fig. 3, IEEE (802.1AB) Link Layer 2 Protocol (LL2P) has a tag-length-value to store the timeframe value of two LEO switches in a local time period. Timeframe X and Timeframe Y, for the proposed mechanism, keeps track of the time when the message enters switch 1 in Timeframe X, whereas Timeframe Y maintains the records of switch 2. When the GEO controller receives the LL2P packet, i.e., packet_in message, it decapsulates and extracts tag-lengthvalue, namely the Timeframe X, Timeframe Y, port number, and Chassis ID. It also fetches the Switch 2 Mac address from the source of the received packets. So, the propagation time is estimated as
P(X, Y ) =
T ime f rameX + T ime f rameY 2
(1)
Sending LL2P packets in both ways allows us to account for clock misalignments between 1 and 2 switches in an SDN network without synchronization, which allows us to compute the propagation time exactly. And, finding a unidirectional connection—d between 1 and 2 switches is possible with each LL2P packet. The controller [13] recognizes that there is a link between Switch 1 and Switch 2 when it sends an LL2P packet. In order to discover the connection between 2 and 1, the controller
Fig. 3 Basic topology of IEEE (802.1AB) link layer 2 protocol [12]
SDN Framework for Efficient Latency-Aware Topology Discovery in ISTN
23
must send an additional LL2P packet from Switch 2 to Switch 1. These two techniques should provide the same propagation latency. However, the observed timings in both directions could be different since the clocks used for the timeframe in various switches are not always the same. On that account, the LL2P packets collected in both directions have e data rate (bits/sec) of the link used as (T ime f rame Y + (e − T ime f rame X )) +T ime f rame Y − (T ime f rame X + e ) d= 2
(2)
To determine the required maximum latency, need n = 1 iterations for the network with K—GEO controllers and n=1 k for process computational cost. The association of capacity limit on downlink LEO/MEO nodes defines the incoming control mechanism on users. Therefore, the LEO/MEO system capability (LMc) is blocked for arrival rate λ to have λi ≤ L Mc; ∀i ∈ 1, 2 . . . n
(3)
4 Implementation To allow the suggested technique, in layer 2 networks, we outline the minimum adjustments that are vital to be made to the LEO switch data plane for link discovery. The controller initially queues up all received packet-in messages from its subordinate OpenFlow switches of the data plane. Although switches are set up to take LL2P packets from the controller and simply transfer them to ports without changing their contents, the original LL2P packet header does not feature a timeframe field. The most notable is OpenFlow currently enables LL2P packets between switches. A new data plane feature of the ingress port is required for timeframe support in the switches. In the pipeline process, the switch looks for a match of an LL2P packet that enters the switch via an ingress port. In order to do this, packet pipeline information and header fields are taken from the LL2P packet. Matches may be made against the ingress port, metadata field, and other pipeline fields in addition to packet headers. We have to modify the OpenFlow protocol in order to utilize the OpenFlow rules. We added a new field pertaining to the timeframe field of OpenFlow protocol and a new instruction type for the time-framing process. This instruction type involves instructing the switch to fill the timeframe field when the first bit of packet arrived at the switch as well as another instruction that decapsulates the packet, inserts the timeframe field, and forwards the packet to the output port. Additionally, switches of the data plane be modified and record the arrival time of packets, allowing it to add a precise timeframe to the packet without affecting the propagation time.
24
V. Deepa and B. Sivakumar
Fig. 4 Network implementation using mininet
When the first bit is received, the controller records the timeframe of switch 2 and makes any changes that are needed to be able to get timeframes from packet-ins and minimize propagation time estimates to be used in scheduling. There should be a dedicated Port Control Protocol (PCP) designated for LL2P packets in order to guarantee that they are delivered successfully between the switches. This makes the queue where LL2P packets are stored from being overloaded with other traffic or having other traffic-related issues that affect the LL2P packets.
4.1 Topology Setup As noted, the topology network setup is difficult on providing services of intellectual coordinating nodes for communication. An SDN-based ISTN structure refers to tactical operation with Systems Tool Kit (STK) [14] software as a simulation environment and the base extension of SDNMininet [15] required to have TSN networks. In Fig. 4, consider the topologies representing the switch and controllers for the flow mod of sending and receiving the packets using the LL2P process with the network links. The method measures the various connectivity and validates the propagation time of the network links with minimum resources.
5 Evaluation Results and Discussions In this scheme, we conduct the simulations in terms of availability and latency. We evaluated the direct relationship of transmission capacity among the links and the data
SDN Framework for Efficient Latency-Aware Topology Discovery in ISTN
25
plane messages transferred between the switches. Figure 5 provides the maximum latency of the network with the runtime for the topology (a) and (b). From Figs. 4 and 5, build the scenarios to integrate the communication chain and ensure the networking service with the SDN model. The packet_in and packet_out messages evaluate the capacity and link load for the topology network. Figure 6 makes the effort of latency when the processing time and link are loaded. It can be noticed that topology (a) with a higher POX controller makes it better in terms of switches and time spent in latency.
Fig. 5 a, b Topology setup
26
V. Deepa and B. Sivakumar
Link Measure
Latency Performance 1
Topology b
0.8
Topology a
0.6 0.4 0.2 0 0
10
20
30
40
50
60
70
80
Time (seconds) Fig. 6 Load enabled for latency performance
Propagation Time (micro second)
In Fig. 7, we consider the propagation time of links between the switches of topology (a) and topology (b). There is a direct correlation between propagation time and the medium distance among switches. The assessed connection rate is 100 Mb/s with various lengths to evaluate the attained result with a precise amount of propagation time in order to verify the correctness of the suggested approach. The results show that measured propagation latencies are within 1 microsecond of the specified delay. According to the results of the experimental validation that was carried out, the proposed method achieves a high level of accuracy in various topological situations while maintaining the integrity of network resources. These outcomes demonstrate the method’s complete accuracy and parity with the formulabased outcomes. 40 35 30 25 20
Topology a
15
Topology b
10 5 0
1000
2000
3000
4000
5000
6000
7000
Link measure (micro second)
Fig. 7 Measure of propagation time for the topology (a) and (b) switches
SDN Framework for Efficient Latency-Aware Topology Discovery in ISTN
27
6 Conclusion In real-time applications, the network scheduling and route path depend on precise information on network topology and latency characteristics; topology discovery is an integral part of these applications. In this study, we developed a way to obtain the link topology between switches in an SDN-based TSN network and quantify propagation latency without the need for synchronization methods like NTP. The only complicated part of the proposed implementation is the SDN-based ISTN data-plane timestamping. We tested and implemented the suggested solution in several topologies, assessed its accuracy, and showed that the topology could measure link propagation time with microsecond accuracy without adding a cost to the network. Here, fewer than 0.1–0.2% of all TSN that have been evaluated with the overhead significantly influenced by the number of packets broadcast using this approach networks. In future work, we aim for the scalability direction of virtualization technologies in ISTN to simplify the network topology and improve network availability.
References 1. Lin X, Cioni S, Charbit G, Chuberre N, Hellsten S, Boutillon JF (2021) On the path to 6G: embracing the next wave of low earth orbit satellite access. IEEE Commun Mag 59(12):36–42 2. Shi Y, Cao Y, Liu J, Kato N (2019) A cross-domain SDN architecture for multi-layered spaceterrestrial integrated networks. IEEE Netw 33(1):29–35 3. Liang YC, Tan J, Jia H, Zhang J, Zhao L (2021) Realizing intelligent spectrum management for integrated satellite and terrestrial networks. J Commun Inf Netw 6 (1):32–43 4. Said SBH, Truong QH, Boc M (2019) SDN-based configuration solution for ieee 802.1 timesensitive networking (TSN). ACM SIGBED Rev 16(1): 27–32 5. Li Q, Li D, Jin X, Wang Q, Zeng P (2020) A simple and efficient time-sensitive networking traffic scheduling method for industrial scenarios. Electronics 9(12): 21–31 6. Rojas E, Alvarez-Horcajo J, Martinez-Yelmo I, Carral JA, Arco JM (2018) TEDP: an enhanced topology discovery service for software defined networking. IEEE Commun Lett 22(8):1540– 1543 7. Wu H et al (2020) Resource management in space-air-ground integrated vehicular networks: SDN control and AI algorithm design. IEEE Wirel Commun 27(6):52–60. https://doi.org/10. 1109/MWC.001.2000130 8. Wu F, Yang W, Ren J, Lyu F et al (2021) NDN-MMRA: multi-stage multicast rate adaptation in named data networking WLAN. IEEE Trans Multim 23:3250-3263. https://doi.org/10.1109/ TMM.2020.3023282 9. Teener MDJ, Garner GM (2008) Overview and timing performance of IEEE 802.1 AS. In: 2008 IEEE international symposium on precision clock synchronization for measurement, control and communication, pp 49–53 10. Mizrahi T, Moses Y (2014) Using reverse PTP to distribute time in software defined networks. In: 2014 IEEE international symposium on precision clock synchronization for measurement, control and communication (ISPCS), pp 112–117. https://doi.org/10.1109/ISPCS.2014.694 8702 11. Anand JV (2019) Design and development of secure and sustainable software defined networks. J Ubiquit Comput Commun Technol (UCCT) 1(2):110–120
28
V. Deepa and B. Sivakumar
12. Mohammadi S, Colle D, Tavernier W (2022) Latency-aware topology discovery in SDN-based time-sensitive networks. In: 2022 IEEE 8th international conference on network softwarization (NetSoft), pp 145–150. https://doi.org/10.1109/NetSoft54395.2022.9844085 13. Zhao X, Yao L, Wu G (2018) ESLD: an efficient and secure link discovery scheme for softwaredefined networking. Int J Commun Syst 31(10):e3552 14. Systems Tool Kit (STK), Analytical and visualizing tools of complex system. https://www.agi. com/products/stk. Last Accessed 26 June 2022 15. Mininet Approach. Rapid Virtual Network. http://mininet.org/. Last Accessed 26 June 2022
Recommendation System Based on Clustering Techniques Using Collaborative Filtering Method G. L. Swathi Mirthika and B. Sivakumar
Abstract Recommendation system plays a vital role in all fields in the current situation. It enables the prediction of the link between the independent compounds. In the health care domain, the recommendation system helps the caregiver to provide the medications without adverse effects by considering the previous health records. In order to keep up with the demands of modern caregiving, there has been a shift toward more holistic approaches to treatment. Care recipient data access is crucial for this purpose. By incorporating the recommendation system into linked health, we suggest a solution that will increase the potential for greater data accessibility and utilization. The result shows that clustering techniques using collaborative filtering provide a more efficient recommender system for the beneficiary. The proposed method shows that the result produced using the collaborative technique provides more accuracy when compared to the previous methods. Keywords Clustering · Collaborative filter · Drug interaction · Hybrid filtering · Mahalanobis · Recommendation system
1 Introduction Recommendation algorithms are a windfall that boosts user satisfaction in the vast, muddled ocean of online shopping. By streamlining the decision-making process and increasing online sales, recommender systems are putting an end to the “tyranny of choice.” Additionally, pervasive AI technologies are making their way into ecommerce, not just addressing the issue of inappropriate recommendations, but also anticipating the next steps a client will take. G. L. Swathi Mirthika (B) · B. Sivakumar Department of Computing Technologies, SRM Institute of Science and Technology, Kattankulathur, Chennai, Tamilnadu, India e-mail: [email protected] B. Sivakumar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 J. S. Raj et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 665, https://doi.org/10.1007/978-981-99-1726-6_3
29
30
G. L. Swathi Mirthika and B. Sivakumar
The capacity to interpret non-linear data is what gives recommender systems based on deep learning an edge over their more traditional counterparts. Major advantages of using DL for recommendations include non-linear transformation, representation learning, sequence modeling, and adaptability. In addition, Deep Learning (DL) methods could be modified for use in a wide variety of jobs. CNNs, for instance, work well with non-Euclidean data, while DNNs excel at processing sequential data. Neural consideration systems are well-suited for filtering necessary data and selecting the most representative items, and autoencoder aids in ensuring the security of dimensionality reduction. Content-based and collaborative filtering (CF) RSs are two examples of classic RSs that rely on their suggestions on prior user actions and item characteristics. For its content-based suggestions, CF looks for an audience that shares the user’s tastes and interests. Memory-based (through nearest neighbor classification) and model-based approaches exist inside the CF framework (including machine learning and data mining techniques). Both content-based and collaborative filtering methods can benefit from the hybrid approach. Common techniques used in recommender systems include clustering, nearest neighbor, and matrix factorization. However, deep learning in recent years has shown remarkable effectiveness in a variety of fields, from image identification to natural language processing. The success of deep learning has also benefitted recommendation systems. In fact, complicated deep learning systems, rather than more traditional methods, power today’s state-of-the-art recommender systems, such as those at YouTube and Amazon. There are two ways to build the recommendation system—Implicit and Explicit. These techniques are used to extract opinions about a particular product or service in different forms.
1.1 Implicit Reaction Implicit input is gathered covertly through user interactions and stands in as a surrogate for user desire. Even if you don’t explicitly rate the videos you see on YouTube, that data is still used as a form of feedback to help personalize your suggestions. Your browsing history on Amazon is also utilized to provide recommendations based on your implicit comments. Being plentiful is one of the many benefits of implicit feedback. We may also finetune recommendations in real time, with each click and interaction, using recommender systems built with implicit feedback. Implicit feedback is now widely used in the development of online recommender systems, allowing the system to fine-tune its recommendations in real time in response to each user activity.
Recommendation System Based on Clustering Techniques Using …
31
1.2 Explicit Reaction Explicit feedback in the framework of recommender systems is the numerical data provided by users. On Amazon, customers can score their purchases on a scale from one to ten stars. Users supply the ratings, and the scale provides Amazon with a quantitative measure of customer preference. One further form of direct feedback is the “thumbs up” or “thumbs down” button that allows YouTube viewers to indicate whether or not they enjoyed a video. Explicit comments, however, are difficult to come by. When was the most recent time you rated a product, you bought online or liked a video on YouTube? You probably view much more videos on YouTube than you have rated.
1.3 Types of Filtering The major categories of filtering techniques are: Content-Based: Instead of relying on user comments and feedback, the content-based scenario requires a substantial quantity of knowledge about the characteristics of the products. This filtration technique suggests additional products depending on the recipient’s past behavior or overt input while also using the product’s characteristics. Collaborative-Based: The characteristics of the objects are unnecessary for collaborative work. A feature vector, or embedding, describes every user and object. As for the embedding, it is created automatically for both users and goods. Users and objects share a similar embedding space. Hybrid Filtering Method: The system is developed using a method that combines elements of both collaborative filtering and context-based filtering. Applying this method eliminates the problems caused by the various algorithms and boosts the system’s overall efficiency. Multi-stage inactively and/or active power filters are combined in a hybrid filter, which can have a series or parallel topology. They are compatible with both straight and distorted three-phase, three-wire, and four-wire systems. Fig. 1 displays the types of filtering.
32
G. L. Swathi Mirthika and B. Sivakumar
Fig. 1 Types of filtering
Content Based Filtering
Hybird Filtering
Collaborative Based Filtering
2 Related Work A health recommender system has been proposed by utilizing the feature selection technique which is based on the multi-objective genetic algorithm. This algorithm helps females to identify cervical cancer based on the feature selection mechanism. And also provides the risk-calculating mechanism for the classification process [1– 3]. The work provides information about the need for a recommendation system that integrates personal health records. The health graph-based data structure healthrelated concepts were extracted from the online sites. The system was developed to be utilized by a layman and also by a physician. The system has undergone various test methods to ensure the retrieval precision and the accuracy of the data [4]. The recommendation was made with prediction and clustering for diabetic patients using the collaborative filtering method. Patients’ admission information and changes in the medication have been considered for the prediction process [5]. Personalized advised wearable solutions are provided based on the health monitoring of individuals. The analysis process is carried out on the unstructured medical history of each person using text mining. Proactive health monitoring is achieved to significantly benefit human life [6]. A food recommendation system based on the user preferences can be done by monitoring their health on a regular basis and a recommendation of food will be provided based on their current health information. That way, they may get all the nourishment they need, and it will help spread the word about the need to maintain a balanced diet [7]. Within the user group, the closest neighbor is identified using the Mahalanobis distance. It is more efficient to build the more comparable neighborhood set to the target user using the Mahalanobis distance than using standard similarity metrics, which only take into account the regularly rated items between two users [8]. By incorporating a content-based recommendation algorithm with a collaborative filtering technique, the rating vector assigned to each user can be swapped out for an interest vector, resulting in a higher degree of similarity between the users. However, project attributes are consistent and may accurately describe projects across dimensions; hence, the user rating matrix is replaced with the project attribute matrix when determining project similarity. The timeliness issue of the original rating is resolved by first processing the rating matrix with the time decay function. Then, combine the user’s interest vector with the project’s attribute to represent the user and the project, and cluster them as appropriate. Finally, the election of the construction
Recommendation System Based on Clustering Techniques Using …
33
candidate set and the Top-N recommendation is completed using the enhanced similarity degree measuring system in accordance with the project rating forecast. This increases the precision of the recommendation and shortens the running time of the online algorithm by decreasing the size of the vector space used in the latter [9]. There is a great deal of variation in the interventions, study designs, and outcome measures employed, which contributes to the uncertainty surrounding the positive effects of EHR interoperability on care quality and safety. The development of standardized health information study outcome measures would facilitate studies of superior quality. The growing trend toward patient engagement and management over their own electronic clinical data suggests that future studies should focus on the positive and negative implications of interoperable EHR interventions and study patient perspectives [10–20].
3 Methods One of the first steps in any successful analysis is data preparation, sometimes called preprocessing. All duplicate patient records were removed, and those who had no medication given were crossed off the original data set before clustering techniques were applied. Then we transformed the data and choose the variables that would best represent the data. During data transformation, variables are often converted to a new data type, and new variables are created. As for us, we made use of a procedure that transformed categorical indicators into binary ones, aspects like sex, blood-sugar peak, and bloodiron ratio. Conclusions from an A1C blood test. Variables derived from the original set of categories variable indicating diagnosis, which was classified using the first three codes from the ICD-9 system for classifying diseases: Problems with the heart, blood sugar, digestive tract, genitourinary system, injuries, muscle and joint, cancer, breathing, and other disorders. Several sub-categories of the racial status variable were developed. The first racial and diagnostic factors were data eliminated from the analysis. Further, the age ranges were substituted with an average throughout the ranges. Illustration in which the interdependence of the system’s components may be shown, as in Fig. 2. Our proposed method’s graphic is depicted in Fig. 2. Before applying clustering algorithms, the data set is processed to extract information about patients, such as their clinical history, therapies, and medications. The patient’s explicit data is then represented using a collaborative filtering process (user-medication-dose). Predictions of drugs are made according to the patient’s group membership. After all other medications have been evaluated, the one with the greatest prediction value is chosen as the one to recommend. Clustering-based recommendation system includes both patient demographics and also the medications prescribed to them. With the proposed RS, prescribed medications for a certain patient are represented using a dose-based collaborative
34
G. L. Swathi Mirthika and B. Sivakumar
Fig. 2 Representation of the proposed approach
filtering schema. In order to classify patients into groups based on shared features, a clustering method was used. Figure 3 displays the clustering accuracy based on the demographics of the patient. The partitional K-means technique and the intensity spatial cluster of application with noise (ISCAN) algorithm were compared to see which one produced the most accurate patient groups. The results are depicted in Table 1. First, principal component analysis was used to standardize the data and decrease the number of dimensions in the dataset. Using the Silhouette coefficient, the optimum cluster size was established. People who share similar traits will be grouped together. Fig. 3 Clustering accuracy based on the demographics of the patient
Recommendation System Based on Clustering Techniques Using … Table 1 Comparative analysis
35
Method
No. of clusters
Silhouette coefficient
Time to execute
K-means
6
0.662
16 min 14 s
ISCAN
150
0.602
22 min 12 s
4 Conclusion We introduce an RS that can recommend effective treatments for diabetics. Metadata from users is taken into account to reduce the “cold start” (new user) issue, and clustering methods are utilized to group patients who share similar characteristics for the purpose of making drug recommendations to those in the same cluster. The suggested system provides a novel approach to aiding medical professionals in the management of diabetic patients by suggesting drugs that may be useful in managing the condition. Our RS also recommends pharmaceuticals that have been given to patients who share the same qualities as the intended recipient, so the advice it gives is straightforward and easy to convey. We think this system has the potential to be a useful resource for healthcare professionals by making administration and treatment more streamlined.
References 1. Kuanr M, Mohapatra P, Piri J (2021) Health recommender system for cervical cancer prognosis in women. In: 2021 6th international conference on inventive computation technologies (ICICT), pp 673–679 2. Stellefson M, Paige SR, Alber JM, Chaney BH, Chaney D, Apperson A, Mohan A (2019) Association between health literacy, electronic health literacy, disease-specific knowledge, and health-related quality of life among adults with chronic obstructive pulmonary disease: crosssectional study. J Med Internet Res 21(6):e12165. https://doi.org/10.2196/12165.PMID:311 72962;PMCID:PMC6592488 3. Wiesner M, Pfeifer D (2014) Health recommender systems: concepts, requirements, technical basics and challenges. Int J Environ Res Public Health 11(3):2580–2607 4. Morales LF, Granda P-D, Reátegui R, Barba-Guaman L (2022) Drug recommendation system for diabetes using a collaborative filtering and clustering approach: development and performance evaluation. J Med Internet Res 24(7):e37233 5. Asthana S, Megahed A, Strong R (2017) A recommendation system for proactive health monitoring using IoT and wearable technologies. In 2017 IEEE international conference on AI & mobile services (AIMS), pp 14–21. IEEE 6. Princy J, Senith S, Kirubaraj AA, Vijaykumar P (2021) A personalized food recommender system for women considering nutritional information. Int J Pharmaceut Res 13(2). https://doi. org/10.31838/ijpr/2021.13.02.233 7. Wasid M, Ali R (2018) An improved recommender system based on multi-criteria clustering approach. Procedia Comput Sci 131:93–101 8. Xiaojun L (2017) An improved clustering-based collaborative filtering recommendation algorithm. Cluster Comput 20:1281–1288. https://doi.org/10.1007/s10586-0170807-6 9. Li E, Clarke J, Ashrafian H, Darzi A, Neves A (2022) the impact of electronic health record interoperability on safety and quality of care in high-income countries: systematic review. J
36
10.
11.
12. 13.
14. 15.
16.
17.
18. 19. 20.
G. L. Swathi Mirthika and B. Sivakumar Med Internet Res 24(9):e38144. https://www.jmir.org/2022/9/e38144. https://doi.org/10.2196/ 38144 Yu Z, Huang F, Zhao X, Xiao W, Zhang W (2021) Predicting drug–disease associations through layer attention graph convolutional network. Brief Bioinf 22(4):bbaa243. https://doi.org/10. 1093/bib/bbaa243 Sridhar D, Fakhraei S, Getoor L (2016) A probabilistic approach for collective similarity-based drug–drug interaction prediction. Bioinformatics 32(20):3175–3182. https://doi.org/10.1093/ bioinformatics/btw342 Wang P, Li S, Pan R (2018) Incorporating gan for negative sampling in knowledge representation learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 32, no 1 Cao DS, Xiao N, Li YJ, Zeng WB, Liang YZ, Lu AP, Xu QS, Chen AF (2015) Integrating multiple evidence sources to predict adverse drug reactions based on a systems pharmacology model. CPT Pharmacomet Syst Pharmacol. 4(9):498–506. https://doi.org/10.1002/psp4.12002. Epub 2015 Sep 11. PMID: 26451329; PMCID: PMC4592529 Fakhraei S, Huang B, Raschid L, Getoor L (2014) Network-based drug-target interaction prediction with probabilistic soft logic. IEEE/ACM Trans Comput Biol Bioinf 11(5):775–787 Crescioli G, Brilli V, Lanzi C, Burgalassi A, Ieri A, Bonaiuti R, Romano E, Innocenti R, Mannaioni G, Vannacci A, Lombardi N (2020) Adverse drug reactions in SARS-CoV-2 hospitalised patients: a case-series with a focus on drug–drug interactions. Internal Emerg Med 16(3): 697–710.https://doi.org/10.1007/s11739-020-02586-8 Jang HY, Song J, Kim JH, Lee H, Kim I-W, Moon B, Oh JM (2022) Machine learningbased quantitative prediction of drug exposure in drug-drug interactions using drug label information. npj Digital Med 5(1). https://doi.org/10.1038/s41746-022-00639-0 Wang Q, Mao Z, Wang B, Guo L (2017) Knowledge graph embedding: a survey of approaches and applications. IEEE Trans Knowl Data Eng 29(12):2724–2743. https://doi.org/10.1109/ TKDE.2017.2754499 Minervini P, Demeester T, Rocktäschel T, Riedel S (2017) Adversarial sets for regularising neural link predictors. arXiv.org. https://doi.org/10.48550/arXiv.1707.07596 Pushpalatha A, J HS, Pradeepa J, S MB (2020) A gadget recommendation system using data science. J Inf Technol Digital World 2(4):213–216.https://doi.org/10.36548/jitdw.2020.4.004 Yang H, Yang CC (2016) Discovering drug-drug interactions and associated adverse drug reactions with triad prediction in heterogeneous healthcare networks. In: IEEE international conference on healthcare informatics (ICHI) 2016, pp 244–254. https://doi.org/10.1109/ICHI. 2016.34
Compartmented Proactive Secret Sharing Scheme Rolla Subrahmanyam , N. Rukma Rekha , and Y. V. Subba Rao
Abstract A secret sharing scheme commences with a secret and then generates specific shares (or shadows) for participants. A threshold number of participants are adequate to reassemble the secret. In a compartmented secret sharing scheme (CSSS), a bunch of participants is separated into distinct compartments, and only if the overall number of participants exceeds a global threshold and the participants from each compartment surpass a defined compartment threshold can the secret be discovered. In existing CSSS scheme, the Dealer makes the compartment number li as public, and participants are unable to do both share verification and share renewal. Less privacy is available to participants because li is public, and the Dealer may act maliciously since there is no share verification, and since the attacker has scope to know a few shares, he may reconstruct the secret. This paper proposed a Compartmented Proactive Secret Sharing Scheme to address participant privacy by revealing his compartment number only through his share along with share renewal and verification. Keywords Secret sharing scheme · Compartmented secret sharing · Verifiable secret sharing · Proactive secret sharing
Supported by Organization X. R. Subrahmanyam (B) · N. R. Rekha · Y. V. Subba Rao School of Computer and Information Sciences, University of Hyderabad, Hyderabad 500046, India e-mail: [email protected] N. R. Rekha e-mail: [email protected] Y. V. Subba Rao e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 J. S. Raj et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 665, https://doi.org/10.1007/978-981-99-1726-6_4
37
38
R. Subrahmanyam et al.
1 Introduction Secret sharing scheme (SSS) involves splitting a secret into individual shares that are communicated to participants. To reconstruct the secret, participants bring their shares together. In threshold SSS, the number of shares is really essential for retrieving the secret. We highlight Shamir [1] polynomial interpolation-based scheme, Blakey [2] geometric scheme, Mignotte [3], and Asmuth and Bloom [4] that are all originally established on the Chinese remainder theorem (CRT). More general SSS has been introduced by Ito et al. [5], Benaoh et al. [6]. Feldman [7] and Pederson [8] proposed verifiable secret sharing scheme (VSSS) which deals with the Dealer’s malicious activity. The participant can use this technique to ensure that the Dealer is dealing genuinely. Feldman’s verifiable secret sharing scheme (VSSS) gives each participant his share and allows them to check the consistency of the share they received. If a participant receives a corrupted share from the Dealer, they should request a consistent share from the Dealer. Due to this, an attacker has a long time to crack the shares and reassemble the secret. As a result, after a given amount of time, the participant’s shares must be altered without changing the actual secret, which can be accomplished by proactive secret sharing of Herzberg et al. [9]. Feldman secret sharing is similar to proactive secret sharing that allows for share renewal. Iftene [10], Qiong et al. [11], and Kaya and Selçuk [12] introduced VSSS based on CRT. Qiong et al. [11] scheme offers modularity and verifiability. In a VSS scheme, the modularity characteristic makes it simple to combine many shares to extract the secret while the verifiability property guarantees that the secret can be validated without disclosing it. This is accomplished by using cryptographic methods that permit the secret’s verification without disclosing its value, including zero-knowledge proofs. The dealer may be dishonest and transmit the incorrect shares to participants in the verified secret sharing schemes developed by Iftene [10] and Qiong et al. [11]. Authorized participants won’t ever be able to discover the secret with those shares. However, in the CRT-based VSSS developed by Kaya and Selçuk [12], the dealer gives shares to participants in secret, and participants use the Boudot range proof approach to verify their individual shares and confirm within the range of share [12]. The discrete logarithm problem is what determines the scheme’s security. A hierarchical secret sharing scheme [13] involves participants sharing secrets in a hierarchical manner. This approach is meant to make sure that the secret is kept secure and that only allowed people can access it. The participants in hierarchical secret sharing are separated into levels according to their hierarchies, with a certain threshold number of participants required at each level in order to reassemble the secret. The lowest threshold number is at the highest level, while the highest threshold number is at the lowest level. Each level’s participants receive shares of the secret, which has been divided into equal shares. The threshold number of participants in a particular level must come together in order to reconstruct the secret because each participant in that level has a distinct share. A level may ask for assistance from higher levels if it does not have sufficient players to reach its threshold. The secret
Compartmented Proactive Secret Sharing Scheme
39
can be reconstructed by the upper levels by combining their shares with those of the lower levels. One advantage of hierarchical secret sharing is that it adds an extra degree of security because the secret cannot be reconstructed without the assistance of upper levels even if lower level’s shares are compromised. Hierarchical secret sharing enables the secret to be rebuilt in stages, with each level offering its own share, making it beneficial in circumstances when it is not possible for all participants to get together at once. The He and Dawson [14] multi-SSS is a way of separating a secret into several shares and giving these shares to other parties. The intention of this system is to make sure that the secret can only be recovered if threshold people contribute their shares. Polynomial interpolation serves as the foundation for the scheme. Using the secret as the constant term and random coefficients for the other terms, a polynomial function of degree t − 1 is produced. By evaluating the polynomial at n distinct places, n shares are subsequently formed. Each share corresponds to a point on the polynomial and contains the value of the polynomial at that point. A minimum of n individuals must contribute their shares in order to reveal the secret. In order to get the original polynomial and subsequently the secret, these shares are utilized to execute polynomial interpolation. The secret cannot be recovered if fewer than threshold shares are given. This scheme provides strong protection against individual share loss or fraud. As long as the required minimum number of shares are still available, even if one participant loses their share, the secret can still be recreated. If a share is changed, the corresponding point on the original polynomial will not match, and the reconstruction will not succeed. Harn [15] later made improvements to the He-Dawson plan to reduce the overall amount of public values. Later, SR and Bhagvati [16] introduced a multi-SSS based on CRT that is useful for resource-constraint environment. Yaun et al. [17] developed multi-SSS. This scheme considered efficient requires few computational resources to implement. According to this scheme, the elements in a sequence are related by linear homogeneous recurrence relations, which are mathematical equations. The scheme uses these relations to create multiple shares of the secret, which can then be distributed to different participants. Combining a certain number of shares is the only way to rebuild the secret. The hierarchy of the scheme refers to the fact that there are several degrees of participation, with some of them having access to the secret or greater authority than others. As a result, there is more flexibility and control over how sensitive information is shared. The hierarchy of the scheme refers to the fact that there are several degrees of participation, with some of them having access to the secret or greater authority than others. As a result, there is more flexibility and control over how sensitive information is shared. The overall objective of this technique is to give multiple parties a secure and efficient way to share secrets, while also enabling hierarchical control over who has access to what information. Multipartite SSS [18] are ways to divide a secret among several participants and share it with them, with the restriction that a secret can only be reconstructed when a certain group of participants gets together. These techniques are frequently employed in secure information storage and sharing systems, particularly in situations where it is impossible to trust one participant to keep the secret. The secret is divided into
40
R. Subrahmanyam et al.
several portions or shares in multipartite secret sharing, and each share is given to a different participant. Based on the desired level of security and the number of participants, the number of shares and the distribution pattern are chosen. In comparison to more conventional single-party SSS’s, multipartite SSS offer an additional degree of protection and are significant tools for secure information storage and sharing. A multipartite secret sharing technique based on the CRT [19] involves dividing a secret into parts (shares) that can be used only to reconstruct the original secret using a limited subset of the shares. The goal is to represent the secret as a number in various modular systems and create shares for each system so that only the right subset of shares may be combined to yield the original number. According to the Chinese Remainder Theorem, a number is congruent to many remainders in several modular systems if it is also congruent to the remainder that results from merging the systems using CRT. Thus, the original secret can be retrieved by dividing it into several shares in various modular systems and reconstructing it using CRT. Several applications, including safe data storage, secure computation, and secure communication, use multipartite secret sharing based on CRT. However, in this scheme, dealer may be malicious. Next Subrahmanyam et al. [20] introduced multipartite verifiable SSS based on CRT to solve dealer malicious problem. Recently, Yu et al. [21], Tentu et al. [22], Iftene [23], Farras et al. [24], Basit et al. [25], Tassa et al. [26], Xu et al. [27], and Kumar et al. [28] introduced compartmented access structures. There are many disjoint compartments li in compartmented secret sharing scheme (CSSS), each with n i participants [29]. The secret is divided in such a way that at least ti individuals in some (or perhaps all) compartments must collaborate to reconstruct it. In the CSSS, it is observed that the Dealer makes the compartment number li and the number of participants in each compartment n i as public. Privacy of the participants is reduced as compartment number li is already a public value. Added to that, knowledge about the number of participants in each compartment can give scope to the attacker to compromise with a certain set of participants. If that happens, an attacker will have knowledge of a few shares of the secret, which can help him reconstruct the secret. In addition, the Dealer could act malicious in compartmented secret sharing by sending inconsistent shares. To address all these issues, compartmented proactive SSS is proposed in this paper with the following features: – Compartment number li is made private. – Participant has privacy because compartment number li is known from his corresponding share only. – Participants can verify the consistency of their shares by share verification. – Participant shares are renewed periodically by share renewal. The proposed scheme can be used in the following circumstances: let’s assume a Company CEO (Dealer) has a top-secret task, and the company has l compartments. The Dealer chooses the secret s, and then he computes partial secret si for all compartments using that secret. Every compartment partial secret si is computed by the
Compartmented Proactive Secret Sharing Scheme
41
Dealer and distributes shares of partial secret secretly to all participants in that compartment i. Every participant receives a share from the Dealer and then uses that share to calculate his compartment number li . After that, a threshold number of participants ti come together to reconstruct the partial secret in each compartment. Finally, the secret can be reconstructed by a global threshold number of participants. The content of the paper is arranged as follows: the background, compartmented SSS, is introduced in Sect. 2. Section 3 describes the proposed compartmented proactive SSS. Section 4 discusses correctness and security analysis. Section 5 describes the conclusion.
2 Background In this section, CSSS is explained briefly.
2.1 CSSS A CSSS [29] divides participants into compartments of l, and a secret can only be recovered if the number of participants exceeds a global threshold t and if the number of participants in each compartment exceeds a given compartment threshold ti . The scheme consists of the following steps: – Suppose that there is group P = { p1 , p2 , . . . , p N } of N participants. The par. Let the ticipants are divided disjointly into l compartments, say l1 , l2 , . . . , ll Σ l ni , number of participants in li is n i and ti be its threshold. We have N = i=1 Σl and t ≥ i=1 ti . Note: li (compartment number) and n i (number of participants in each compartment) are public. – Dealer selects a polynomial g(x) = S + a1 x + a2 x 2 + · · · + al−1 x l−1 mod q, ai ∈ Fq , Secret S = g(0), and q is prime, from this polynomial. Dealer calculates secret si for each compartment li , 1 ≤ i ≤ l. – The compartment secret si is distributed to each level li by using the Shamir SSS. – Then, the compartment secrets si can be obtained by at least a threshold number of participants ti from each compartment li using Lagrange’s interpolation formula. – Finally, threshold t participants can rebuild the polynomial g(x) by using Lagrange’s interpolation formula, hence original secret S = g(0). In this scheme, the Dealer makes the compartment number li public, and the participant cannot perform both share verification and share renewal. Less privacy for participants because li is made public, and the Dealer may be malicious because of no share verification, and if the attacker knows the shares then reconstruction of
42
R. Subrahmanyam et al.
the secret becomes easy because of no share renewal. To respond to these problems, compartmented proactive SSS is proposed in this paper. To solve this scheme, we used Feldman [7] scheme for verification and Herzberg et al. [9] scheme for share renewal.
3 Compartmented Proactive Secret Sharing Scheme In CSSS, the compartment number is public, and because of this other participant knows the compartment number of the participant. So, participant has no privacy in compartmented SSS. Hence we proposed compartmented proactive SSS to address this issue. In addition, we add share verification and share renewal to our proposed scheme. The scheme is explained below. Assume that there is a group P = { p1 , p2 , . . . , p N } of N participants. The participants are disjointly partitioned into l compartments, such as l1 , l2 , . . . , lΣ l . Let ti l be the threshold and n i be the total number of participants in li . Then N = i=1 ni Σl and we denote t = i=1 ti as global threshold. However, li (compartment number) is private and n i is public. This scheme comprises six phases: Share distribution, Commitments, Share verification, Compartment number computation, Share renewal, and Secret reconstruction. The steps of the proposed scheme are given in Fig. 1. During share distribution phase, Dealer chooses t − 1 degree polynomial, from which shares of level secret si are computed, and then distributes them through secure channels to participants in level li . Next, the Dealer computes a public value. In commitment phase, the Dealer computes commitments and makes them public. In compartment number computation phase, each participant computes his level number from public value and his corresponding share. In share verification phase, each participant verifies their respective share whether it is valid or not by using commitments. In share renewal phase, each participant renewed his share after some interval time without changing the secret. In secret reconstruction phase, the threshold number of participants gets reassemble the secret. The above phases are explained below in detail.
3.1 Share Distribution – The Dealer chooses l − 1 degree polynomial. g(x) = S + a1 x + a2 x 2 + · · · + al−1 x l−1 mod q, where ai ∈ Fq , Secret S = g(0), where q is prime. – Each participant has id i j , 1 ≤ i ≤ l, 1 ≤ j ≤ n i . – Dealer computes compartment secret si , si = g(i) mod q, 1 ≤ i ≤ l.
Compartmented Proactive Secret Sharing Scheme
43
Fig. 1 Steps of proposed scheme
– Dealer chooses a polynomial of degree ti − 1 for each compartment li , 1 ≤ i ≤ l h i (x) = si + ci,1 x + ci,2 x 2 + ci,3 x 3 + · · · + ci,ti −1 x ti −1 mod q where ci,1 , ci,2 , . . . , ci,ti −1 ∈ Fq . – Dealer computes shares (sh i j ) for participants in every compartment sh i j = h i (idi j ) mod q, 1 ≤ i ≤ l, 1 ≤ j ≤ n i . – Dealer computes public value ki j = sh i j + sh i−1 j + li idi j mod q.
3.2 Commitments – Dealer chooses a large prime p such that q | ( p − 1) and primitive root a of subgroup of Z∗p of order q and computes commitments cm i0 , cm i j as cm i0 = a si mod p for 1 ≤ i ≤ l. cm i j = a cik mod p for 1 ≤ i ≤ l, 1 ≤ j ≤ n i , 1 ≤ k ≤ t − 1. – Dealer makes cm i0 , cm i j values public.
(1) (2)
44
R. Subrahmanyam et al.
3.3 Share Verification – Participants can verify their respective share sh i j as a sh i j = (cm i0 )(cm i1 )idi j · · · (cm i(t−1) )idi j
t−1
mod p,
where 1 ≤ i ≤ l, 1 ≤ j ≤ n i .
3.4 Compartment Number Computation • Each participant receives his share sh i j secretly and public information ki j . • Each participant computes his compartment number li as −1 li = idi−1 j (ki j − sh i j − sh i j ) mod q.
3.5 Share Renewal • Each participant chooses a polynomial of degree ti − 1 in every compartment di j (x) = ci j,1 x + ci j,2 x 2 + ci j,3 x 3 + · · · + ci j,ti −1 x ti −1 mod q, 1 ≤ i ≤ l, 1 ≤ j ≤ n i , di j (0) = 0, ci j,1 , ci j,2 , . . . , ci j,ti −1 ∈ Fq . • Each participant computes ri,k, j and distributes to participants secretly ri,k, j = di j (idik ) mod q, 1 ≤ i ≤ l, 1 ≤ j ≤ n i , 1 ≤ k ≤ n i . • Each participant computes sh i, j as sh i, j =
ni Σ
ri, j,k mod q, 1 ≤ i ≤ l, 1 ≤ j ≤ n i .
k=1
• Each participant gets his new share nsh as nsh i j = sh i j + sh i, j mod q, 1 ≤ i ≤ l, 1 ≤ j ≤ n i .
3.6 Secret Reconstruction – In every compartment (ti , n i ) participant can recover their compartment secret si by using Lagrange’s interpolation formula. – Then participants can get back their polynomial g(x) using Lagrange’s interpolation formula, hence secret S = g(0).
Compartmented Proactive Secret Sharing Scheme
45
4 Correctness and Security Analysis of the Scheme This section explains the correctness and security analysis of compartment number, share verification, and share renewal.
4.1 Correctness and Security Analysis for Compartment Number Each participant calculates their compartment number as follows: −1 li = idi−1 j (ki j − sh i j − sh i j ) mod q
−1 idi−1 j (ki j − sh i j − sh i j ) −1 −1 ≡ idi−1 j (sh i j + sh i j + li idi j − sh i j − sh i j ) mod q
≡ idi−1 j li idi j mod q ≡ li mod q. From knowing ki j and q getting li is difficult without knowledge of sh i j for an adversary.
4.2 Correctness and Security Analysis for Share Verification Participants verify their respective share by using cm i0 , cm i j as follows: a sh i j = (cm i0 )(cm i1 )idi j · · · (cm i(t−1) )idi j
t−1
mod p,
where 1 ≤ i ≤ l, 1 ≤ j ≤ n i . From Eqs. 1 and 2 2
(cm i0 )(cm i1 )idi j (cm i2 )idi j · · · (cm i(t−1) )idi j ≡ (a )(a ) si
≡a
ci1 idi j
ci2 idi j 2
(a )
· · · (a
t−1
ci (t−1) idi j t−1
si +ci1 idi j +ci2 idi j 2 +···+ci (t−1) idi j t−1
)
mod p mod p
mod p
gi (idi j )
≡a mod p sh i j ≡a mod p. Here cm i0 and cm i j values alone don’t reveal any information about secret [7].
46
R. Subrahmanyam et al.
4.3 Correctness and Security Analysis for Share Renewal The new share nsh i j of each participant in compartment li is nsh i j = sh i j + sh i, j mod q = sh i j +
ni Σ
ri, j,k mod q, 1 ≤ i ≤ l, 1 ≤ j ≤ n i .
k=1
= h i (idi j ) +
ni Σ
dik (idi j ) mod q
k=1
= si +
ti −1 Σ
ci,k id i j k +
k=1
(
= si + ci,1 +
ti −1 Σ k=1
ni Σ k=1
ci1,k id i j k + · · · +
ti −1 Σ
cini ,k id i j k mod q
k=1
ni ( ) ) Σ cik,1 id i j · · · + ci,ti −1 + cik,ti −1 id i j ti −1 mod q. k=1
From each compartment li , ti or more participants can combine to rebuild the formula.) compartment secret si using ( ) Lagrange’s(interpolation Σi Σi si + ci,1 + nk=1 cik,1 xi j + · · · + ci,ti −1 + nk=1 cik,ti −1 xi j ti −1 mod q. Thus the compartment secret si will be revealed. The threshold t = t1 + t2 + · · · + tl participants can reconstruct the secret. The threshold t − 1 = t1 − 1 + t2 + · · · + tl participants cannot get the secret. The probability of getting the correct secret S is q1 as S ∈ Fq is random.
5 Conclusion This paper proposed a compartmented proactive secret sharing scheme. In this proposal, every participant verifies his own share, which is then renewed after a set period of time. A threshold number of participants are necessary to reconstruct the secret in each compartment. The secret cannot be reconstructed in any compartment if there is less than the threshold number of participants. This scheme is offered to improve participant privacy because the compartment number is known only from his respective share.
References 1. Shamir A (1979) How to share a secret. Commun ACM 22(11):612–613 2. Blakley GR (1979) Safeguarding cryptographic keys. In: , International workshop on managing requirements knowledge. IEEE Computer Society, pp 313–313
Compartmented Proactive Secret Sharing Scheme
47
3. Mignotte M (1982) How to share a secret. In: Workshop on cryptography. Springer, Berlin, Heidelberg, pp 371–375 4. Asmuth C, Bloom J (1983) A modular approach to key safeguarding. IEEE Trans Inf Theory 29(2):208–210 5. Ito M, Saito A, Nishizeki T (1989) Secret sharing scheme realizing general access structure. Electron Commun Jpn (Part III: Fundamental Electronic Science) 72(9):56–64 6. Benaloh J, Leichter J (1988) Generalized secret sharing and monotone functions. In: conference on the theory and application of cryptography. Springer, New York, NY, pp 27–35 7. Feldman P (1987) A practical scheme for non-interactive verifiable secret sharing. In: 28th annual symposium on foundations of computer science (SFCS 1987). IEEE, pp 427–438 8. Pedersen TP (1998) Non-interactive and information-theoretic secure verifiable secret sharing 9. Herzberg A, Jarecki S, Krawczyk H, Yung M (1995) Proactive secret sharing or: how to cope with perpetual leakage. In: Annual international cryptology conference. Springer, Berlin, Heidelberg, pp 339–352 10. Iftene S (2006) Secret sharing schemes with applications in security protocols. Sci Ann Cuza Univ 16:63–96 11. Qiong L, Zhifang W, Xiamu N, Shenghe S (2005) A non-interactive modular verifiable secret sharing scheme. In: Proceedings. 2005 international conference on communications, circuits and systems, vol 1. IEEE, pp 84–87 12. Kaya K, Selçuk AA (Dec 2008) A verifiable secret sharing scheme based on the Chinese remainder theorem. In: International conference on cryptology in India. Springer, Berlin, Heidelberg, pp 414–425 13. Simmons GJ (2000) How to (really) share a secret. In: Conference on the theory and application of cryptography 1990. Springer, New York, NY, pp 390–448 14. He J, Dawson E (1994) Multistage secret sharing based on one-way function. Electron Lett 30(19):1591–2 15. Harn L (1995) Multistage secret sharing based on one-way function. Electron Lett 31(4) 16. YV SR, Bhagvati C (Jul 1 2014) CRT based threshold multi secret sharing scheme. Int J Netw Secur 16(3):194–200 17. Yuan J, Yang J, Wang C, Jia X, Fu FW, Xu G (May 2022) A new efficient hierarchical multisecret sharing scheme based on linear homogeneous recurrence relations. Inf Sci 1(592):36–49 18. Farràs O, Martí-Farrè J, Padró C (2012) Ideal multipartite secret sharing schemes. J Cryptol 25(3):434–63 19. Hsu CF, Harn L (2014) Multipartite secret sharing based on CRT. Wirel Pers Commun 78(1):271–82 20. Subrahmanyam R, Rukma Rekha N, Subba Rao YV (2022) Multipartite verifiable secret sharing based on CRT. In: Computer networks and inventive communication technologies. Springer, Singapore, pp 233–245 21. Yu Y, Wang M (Nov 2011) A probabilistic secret sharing scheme for a compartmented access structure. In: International conference on information and communications security. Springer, Berlin, Heidelberg, pp 136–142 22. Tentu AN, Bhavani K, Basit A, Venkaiah VC (2021) Sequential (t, n) multi secret sharing scheme for level-ordered access structure. Int J Inf Technol 13(6):2265–2275 23. Iftene S (2005) Compartmented secret sharing based on the Chinese remainder theorem. Cryptology ePrint Archive 24. Farras O, Padró C, Xing C, Yang A (2014) Natural generalizations of threshold secret sharing. IEEE Trans Inf Theory 60(3):1652–1664 25. Basit A, Chanakya P, Venkaiah VC, Moiz SA (2020) New multi-secret sharing scheme based on superincreasing sequence for level-ordered access structure. Int J Commun Netw Distrib Syst 24(4):357–380 26. Tassa T, Dyn N (2009) Multipartite secret sharing by bivariate interpolation. J Cryptol 22(2):227–258 27. Xu G, Yuan J, Xu G, Dang Z (2021) An efficient compartmented secret sharing scheme based on linear homogeneous recurrence relations. Secur Commun Netw 20:2021
48
R. Subrahmanyam et al.
28. Kumar S (2022) Extending Boolean operations-based secret image sharing to compartmented access structure. Multimed Tools Appl 2:1–20 29. Ghodosi H, Pieprzyk J, Safavi-Naini R (1998) Secret sharing in multilevel and compartmented groups. In: Australasian conference on information security and privacy. Springer, Berlin, Heidelberg, pp 367–378
DevOps Challenges and Practices in Software Engineering T. Pandiyavathi and B. Sivakumar
Abstract DevOps methods for streamlining software development and deployment have become very challenging nowadays for big enterprise and business organizations. They have deep dependency on the agile software development and Lean model of software development to discipline agile delivery methodology. DevOps actually evolved through various factors for balancing the developers and operations team roles. To evolve the growth of DevOps industry, open-source tools and frameworks were developed and proposed to further the goals of DevOps. DevOps practices include collaborative development, continuous testing, release, deployment, monitoring, feedback and optimization. Embracing DevOps is a long, continuous journey. This paper deals with the various goals and challenges faced while employing DevOps practices in the software development model. The solutions for the problems faced have been discussed in detail along with few performance metrics that need to be given more while employing the methodology in various organizations. The solution identified involves identifying the congestion in the pipeline and frame the steps on removing them. Performing the change one at a time will be the best practice to follow in place of frequent requirement changes. Making many things at a time will lead to repel the effect that jeopardizes the whole code. Keywords DevOps · Continuous delivery · Feedbacks · Agile · Lean
1 Introduction The evolution of DevOps practices emerged from the huge maintenance activities that cost more than the actual product development [1]. Years ago, the need of availability in websites are more which led software reliability engineers to built sites ensuring T. Pandiyavathi (B) · B. Sivakumar SRM Institute of Science and Technology, Chennai, India e-mail: [email protected] B. Sivakumar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 J. S. Raj et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 665, https://doi.org/10.1007/978-981-99-1726-6_5
49
50
T. Pandiyavathi and B. Sivakumar
the availability after the code is deployed after production. In 2009, Flickr engineers John Allspaw and Paul Hammond presented their own methodology at a conference entitled “10+ Deploys per Day: Dev and Ops Cooperation at Flickr.” and Patrick Debois was the next to organize the “DevOps Day” in Belgium. A DevOps hashtag was also incorporated and it has earned a kind of revolution around the world due to the popularity gained during each DevOps Days. Software development includes waterfall model as a traditional model before various models with variations in all the stages. It includes the following steps, (1) Requirements—Requirements need to be defined in a clear way will all possible needs. (2) Design—It involves defining the hardware and software needs to perform detailed and system design. (3) Implementation—Functional requirements of the design is performed with properly decided language. (4) Integration and Testing—The small units developed in the coding stage is combined and tested for compliance with the requirements. (5) Deployment of System—The entire system is tested and deployed in the market. (6) Maintenance—Feedbacks and timely loops are to be satisfied and changes to the architecture are made during maintenance stage. People in coding employed this approach for many years and there is no encountered problem faced by anyone. When the method was implemented, it yielded a quality software. For the systems where the requirements are freezed before the development the method suites very well. However DevOps is considered as a more modern, flexible development framework because of this major bottleneck of the waterfall model [2, 3].
1.1 Problem with Existing Method For a perfect requirement specification document, the waterfall model could be used. However, the requirements are not always perfect for many scenarios, which may include problems to the software [4]. For instance if the requirements change at a very later stage it may lead to loss in both customer and developer ends. If the product is under implementation stage and arrival of a new competitor, technology and technique in the innovative world, comes to the market with a better product with easy delivery of their goods through fast transportation method, will risk the product under work. So to employ the same or something which is better than the current competitor, it is hard to stop the implementation stage as it involves cost and time and reverts back to the design phase to change out the requirement. Software lifecycles employed in organizations must adhere to quality measures, if not it is under risk. The quality can be measured in terms of structural quality for reusability, functional quality to ensure requirement validation and process quality to ensure a
DevOps Challenges and Practices in Software Engineering
51
reliable software. The open source software does not involve any cost and contains library that will be used for sharing and improvising the software product.
1.2 Pre DevOps In a pre-DevOps environment, the team of people who are involved in support as well as development of application are siloed. The developers may work in a place, Quality Assurance (QA) team who looks for the quality of your application in another and the support staff or the operation team will work in another place which excludes the people who involve in the actual deployment of the application and hosting of the same along with maintenance [5]. There is lot of possibility for the mistakes to happen with high probability from developers to tester and the host people. And due to communication barriers, the actual application will not run in the deployment end. It emerges from the agile methodology of incremental, iterative and lean development. Imposing operations into the development creates DevOps [6, 7]. Development stage: It involves coding and functionality. Testing Stage: This stage checks the verification and validity of the requirements provided. Operations stage: It involves deployment and maintenance. The three main drivers to implement devops are cost effectiveness, agile improvement and more value to the business. The technical enablers of devops include government and various organizations to take care of the funds, rules and regulations in a proper way, and the iterative workflow is depicted in Fig. 1
Fig. 1 Agile iterative workflow
52
T. Pandiyavathi and B. Sivakumar
1.3 The Rise of DevOps In order to increase the communication between the people working in the software development team, enterprises began to use DevOps methodologies in their environment [8]. People working in teams work in cross-functional and generalized specialists evolve in spite of specific domain people. The people who write the code , the so called the developers, are collaged with people who support the customers and deployment team who host the whole application along with the database, so called the Operations part. This shift from the traditional method needs in fact various principles and values related to the mentality and adaption towards the change. The primary issue is geographical challenge which facilitates better cultural shift environment to create a better place to work with their own beliefs. This makes an enormous and significant rise to the next level of improvement through CI/CD, continuous Integration [9, 10] and continuous delivery. In spite of working towards the application in a linear mode, changes are meant to be incorporated with continuous delivery. Risk and value are given more importance when dealing with the Agile as parallelly monitored for continuous testing strategies. The testing methodology of growing from unit test, integration test and system test makes the system with less complexity when changes are needed to be incorporated. Testing required automation, and many testing tools have emerged which ease the work of taking the coding into production part as well. Still more risks lie behind automating the testing method as even a great testing tool needs at least a less amount of human intervention into it. Benefits after adapting devops includes, fast delivery, scalability, new innovations and automation and less complexity.
2 DevOps Practices DevOps acts as a big part in the enterprise environment. Using DevOps along with the mobile initiates Mobile DevOps. Lewis Cianci started a concept of mobile DevOps which hits new ways of developing the software and many applications online services works well into a DevOps pipeline. This concept of mobileops is simple. Applying DevOps practices in the mobile application is termed as Mobile devops that fits well to the pipeline and the contexts looks significant.
2.1 Core Practice of DevOps The most crucial core practice of DevOps is to automate the pipeline processes in a long run. Parameters to be considered in this strategy while applying to various kinds of work flows, processes and values are speed and quality of the development
DevOps Challenges and Practices in Software Engineering
53
application. Speed should be directly proportional to quality of the mobile application. In cases of version control and configuration management processes [11], better experience will be felt by the customers only when the new requirement is satisfied along with numerous bug fixes at each release [12, 13]. At the end of each fixes, the environment needs to be upgraded and the team effort plays a major role in this process. Learning oriented approach is much essential for technology and technical updation. Risk, value, stress due to small runs need not fatigue the application. When time optimization and better process pipelines investments aren’t the basic needs of the enterprise, then the compulsion of involving DevOps becomes essential [7, 8]. During change management activities, test environment needs to be created and checked against the database with server systems. These capabilities need setting up the environment and documenting the details and this may require rework because of the problem with the environment and not with the coding. This problem cannot be identified by the developers. It will be reflected by the operation people, and it will increase the time and work of the development team. Only way to optimize the work progress is to bridge the gap between the developer and operations. This is because developing and operations consumes more amount of time due to communication gap between both these teams.
2.2 DevOps Strategy and Goals The goal of DevOps is to enhance the overall pipeline of workflow in the software development life cycle. Three main factors which highly affect the effective workflow includes, the people, the process and the technology employed in the development model. Main goals that ensure the success of the Software Development Life Cycle are, Creating a test environment Running the suite Release frequently. Automating these three aspects makes the process more crucial. It thereby ensures quality, better development practices with precise and handling architecture with updated and innovative technology through continuous learning by all the members of the team. It mainly involves the following steps a. Creating an environment b. Running the suite c. Release frequently
54
2.2.1
T. Pandiyavathi and B. Sivakumar
Creating a Test Environment
Creating the test environment facilitates better geographical balance among the developers. In case of any changes to the environment can be easily managed. Since environmental drift becomes so common nowadays, customers and stakeholders create an illusion that automating the test environment will complicate the process framework and so the pipeline [14, 15]. However, the process becomes too much complex when done manually especially for bigger enterprises. The best solution to this is to dive in automation. Nowadays it is simple to create a test environment with Kubernetes and Docker followed by test suites. It was found in the latter stage, that the environment itself does not have testing automation and thus massive workload falls in the hands of the developers. Each and every time the tester needs to be involved during change and then to be updated by the developer and it consumes a large amount of money and time.
2.2.2
Running the Suite
The testing people need to check for the test suite and follow the predefined steps to be followed, each time when they run a manual test which is very slow [16]. Moreover, the results are not consistent sometimes, delaying the other processes in the pipeline. Automating the test framework will improve the functionality of the application. During each commit, the automated tests run the whole test suite so that errors inducing new errors can be easily found and defects can be fixed [17, 18].
2.2.3
Release Frequently
For a team, that means that every time a new commit is pushed into the main code branch, the full suite of automated tests run [19]. The advantage of this approach mainly focusses on risk management and ripple effect clearance with automated tests way before any new versions [20, 21]. Thus errors will be tracked down so fast and managed effectively due to frequent release cycles [22, 23].
3 Challenges in DevOps These make the deployment process easier and many companies are trying to automate their entire development and release pipeline [24–27]. But there exist various challenges that need to be addressed too in the earlier stages. They are as follows in Table 5.1.
DevOps Challenges and Practices in Software Engineering
55
Table 5.1 Challenges and solutions to DevOps Challenges
Solutions
Maturity level of teams working in DevOps and software development life cycle
1. Proper training and feedbacks 2. DevOps tools and practices
Dealing with inconsistent environment
1. Incorporate better infrastructure 2. Ensure continuous delivery 3. Ensure same environment is employed
Challenging operability skills within developers
1. Giving access to operational process and tools 2. Include Kanban practices for transparency 3. Improvising all management activities over time 4. Organization should provide access to monitoring tools to ensure agility
Manual testing
1. Automating testing framework 2. Provide testing procedure to avoid risk and elapsed time
Crashing due to improper governance
1. Proper owner assigned for the DevOps practices to track the infrastructure and maintenance activities 2. Management of activities and flow of decisions and needs as well
Death spiral
1. Build automated test harnesses 2. Continuous testing practices should be adapted 3. CI/CD practices
Existing tasks and processes
1. Automate repetitive process and tasks 2. Automation to be employed to new objectives and not to the old concrete bugged one
4 Conclusion DevOps provides Faster, better product delivery by incorporating faster issue resolution and reduced complexity, utilization of resources efficiently, bringing innovation into the teams through learning, transparency, automation mechanisms and better working environments. It can be applied well to machine learning approaches for better performance. As compared with the other existing practices, DevOps when handled ideally with efficacy leads to high productivity and satisfies both customers and stakeholders expectations. Providing increments to the teams and making them work with comfort doubles the turnover through continuous development, testing, integration, delivery, deployment and monitoring.
56
T. Pandiyavathi and B. Sivakumar
References 1. Fitzgerald B, Stol K-J (2014) Continuous software engineering and beyond : trends and challenges categories and subject descriptors. In: RCoSE 2014 proceedings of the 1st international workshop on rapid continuous software engineering 2. Fitzgerald B, Stol KJ (2015) The Journal of systems and software continuous software engineering: a roadmap and agenda. J Syst Softw 123 3. Fitzgerald B, Stol KJ (2014) Continuous software engineering and beyond: trends and challenges. https://doi.org/10.1145/2593812.2593813 4. Rajapakse RN, Zahedi M, Babar MA, Shen H (2022) Challenges and solutions when adopting DevSecOps: a systematic review. Inf Softw Technol 141. https://doi.org/10.1016/j.infsof.2021. 106700 5. Saltz J, Sutherland A (2020) Ski: a new agile framework that supports devops, continuous delivery, and lean hypothesis testing. In: Proceedings of the annual Hawaii international conference on system sciences, vol 2020, January. https://doi.org/10.24251/hicss.2020.761 6. Bertolino A, de Angelis G, Guerriero A, Miranda B, Pietrantuono R, Russo S (2020) DevOpRET: continuous reliability testing in DevOps. J Softw: Evolut Process. https://doi.org/10. 1002/smr.2298 7. Angara J, Gutta S, Prasad S (2018) DevOps with continuous testing architecture and its metrics model. Adv Intell Syst Comput 709. https://doi.org/10.1007/978-981-10-8633-5_28 8. Pietrantuono R, Bertolino A, de Angelis G, Miranda B, Russo S (2019) Towards continuous software reliability testing in DevOPs. https://doi.org/10.1109/AST.2019.00009 9. Fitzgerald B, Stol KJ (2017) Continuous software engineering: a roadmap and agenda. J Syst Softw 123. https://doi.org/10.1016/j.jss.2015.06.063 10. Pandiyavathi T, Sivakumar B (2021) Design engineering a systematic review on continuous improvement with DevOps 2021(6):2023–2032. ISSN: 0011–9342 11. Singh S, Kaur S (2018) A systematic literature review: refactoring for disclosing code smells in object oriented software. Ain Shams Eng J 9(4). https://doi.org/10.1016/j.asej.2017.03.002 12. Saca MA (2018) Refactoring improving the design of existing code. In: 2017 IEEE 37th Central America and Panama convention, CONCAPAN 2017, vol 2018, January. https://doi. org/10.1109/CONCAPAN.2017.8278488 13. Baqais AAB, Alshayeb M (2020) Automatic software refactoring: a systematic literature review. Softw Quality J 28(2). https://doi.org/10.1007/s11219-019-09477-y 14. Angara J, Prasad S, Sridevi G (2020) DevOPs project management tools for sprint planning, estimation and execution maturity. Cybern Inf Technol 20(2). https://doi.org/10.2478/cait-20200018 15. Hapke H (2020) Building machine learning pipelines: automating model life cycles with TensorFlow 16. Bai X, Li M, Pei D, Li S, Ye S (2018) Continuous delivery of personalized assessment and feedback in agile software engineering projects. https://doi.org/10.1145/3183377.3183387 17. Bosch N, Bosch J (2020) Software logs for machine learning in a DevOps environment. https:// doi.org/10.1109/SEAA51224.2020.00016 18. Karamitsos I, Albarhami S, Apostolopoulos C (2020) Applying devops practices of continuous automation for machine learning. Inf (Switzerland) 11(7). https://doi.org/10.3390/info11 070363 19. Alizadeh V, Ouali MA, Kessentini M, Chater M (2019) RefBot: intelligent software refactoring bot. https://doi.org/10.1109/ASE.2019.00081 20. Dittrich Y, Nørbjerg J, Tell P, Bendix L (2018) Researching cooperation and communication in continuous software engineering. https://doi.org/10.1145/3195836.3195856 21. Zimmerer P (2018) Strategy for continuous testing in iDevOps. https://doi.org/10.1145/318 3440.3183465 22. Sequential ordering of code smells and usage of heuristic algorithm. www.indjst.org 23. Pandiyavathit and Manochandart (2014) Detection of optimal refactoring plans for resolution of code smells. www.ijarcce.com
DevOps Challenges and Practices in Software Engineering
57
24. Leite L, Rocha C, Kon F, Milojicic D, Meirelles P (2019) A survey of DevOps concepts and challenges. ACM Comput Surv 52(6). https://doi.org/10.1145/3359981 25. Pandiyavathi T (2014) Usage of optimal restructuring plan in detection of code smells. Int J Comput Trends Technol 12(4). http://www.ijcttjournal.org 26. Pandiyavathi T, Manochandar T (2019) Restructuring with Moora and measuring code smells. Int J Innov Technol Explor Eng 8(12):1817–1820. https://doi.org/10.35940/ijitee.L2846.108 1219 27. Shakya P, Shakya S (2020) Critical success factor of agile methodology in software industry of Nepal. J Inf Technol 2(3):135–143
The Effects of Climate Change on Crop Yield Daksh Patel , Breenda Das , and R. I. Minu
Abstract Several fields, such as soil analysis and crop forecasting, benefit from the use of machine learning and deep learning techniques. To achieve agricultural sustainability goals and increase crop yields, an in-depth study of the soil and crops is important. Studying agricultural regions within the context of machine learning algorithms gives a more streamlined and ordered framework. In this study, we offer a method for developing multiple regression-based machine learning and deep learning models for estimating the yield per hectare of a crop using a simulated dataset, Projected_impacts. The article served as the impetus for our research. Using a decision tree, linear regression, SVM regression, random forest, and artificial neural networks, the models were created. Each model’s success was evaluated, and comparisons were made between them. We can tell that the Artificial Neural Network with Bottleneck autoencoder is more accurate than other models by looking at data from experiments. Keywords Autoencoders · ANN · Neural networks · Deep learning · Machine learning · Agriculture · Climate change
1 Introduction The term “climate change” refers to any significant adjustment over time in a region’s average weather patterns (or the planet as a whole). This research focuses on the effects of atypical climate change on other parts of the world. Tens of thousands or even millions of years may pass before these alterations become discernible. HowD. Patel (B) · B. Das · R. I. Minu Department of Computing Technologies, SRM Institute of Science and Technology, Chennai, India e-mail: [email protected] B. Das e-mail: [email protected] R. I. Minu e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 J. S. Raj et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 665, https://doi.org/10.1007/978-981-99-1726-6_6
59
60
D. Patel et al.
ever, the accelerated rate of climate change is mostly related to the production of greenhouse gases resulting from human activities such as industrialization, urbanization, deforestation, agriculture, and changes in land use pattern, among others. Possible consequences of climate change include greater temperatures, changing precipitation patterns, and a rise in atmospheric CO2 levels. There are three possible implications of the greenhouse effect on agriculture. First, an increase in atmospheric CO2 levels can have an immediate effect on the growth rate of both agricultural plants and weeds. Second, fluctuations in temperature, precipitation, and sunshine levels may be regulated by plant and animal production, which in turn may be affected by CO2 -induced climatic shifts. Lastly, rising sea levels could cause crops to flood and groundwater in coastal areas to become more salty. The impacts of global warming are considerable in a number of domains, including the economy, society, and environment, among others. It poses a risk to the accomplishment of the Millennium Development Goals, which include putting an end to extreme poverty and hunger, attaining universal health care, and protecting the natural resources of the globe. Extensive research has been conducted, looking at the consequences of climate change from a variety of perspectives. The impacts of climate change on agricultural productivity are analyzed primarily through the use of crop growth models, regression analysis, and simulation. These are three basic methodologies. Models are created by computers and are used in simulations. As a result of the accurate responses to climatic occurrences that they provide, crop growth models are increasingly finding extensive application. The crop growth models, on the other hand, need daily climate data in order to be calibrated in an experimental environment. By employing regression analysis in the context of real-world farming, it is possible to assess the effect that changes in the weather have on harvest yields. Researchers made their predictions about how climate change would affect agricultural output by using crop modeling techniques and information about how the weather will change in the future.
2 Literature Review The impact of climate change on crop yield prediction has been a topic of active research. Many machine learning and deep learning models have been experimented with to produce accurate predictions. With the advent of deep learning models, we have observed a significant improvement in the accuracy of yield predictions, given complex combinations of climatic factors. The majority of crop yield prediction algorithms that utilize machine learning or deep learning simply incorporate meteorological data from the harvesting region. The rain and space become less feminine when the district or state is altered. There is still potential for improvement in all of these computer science models, despite their reasonable accuracy, which ranges from 95% planning to 98% accuracy, with just two or three factors known as the alternatives for crop yield prediction analysis and prediction. Find all the necessary information in Table 1.
The Effects of Climate Change on Crop Yield
61
Table 1 Literature survey Paper
Objective
Algorithm
Crop
Result
[1]
The purpose of weather, soil, and crop management-based crop prediction models is to provide estimates of agricultural output
Multiple linear regression
Rice yield
90–95%
[2]
Analysis of the Decision tree relationship between analysis and ID3 temperature and yield in soybeans using a decision tree induction method
Soybean
The rules formed from the decision tree are helpful in predicting the conditions responsible for the high or low soybean crop productivity under given climatic 422 parameters
[3]
This research evaluates four regression methods using agricultural data. Author suggests a method
For any crop
Wheat yields are predicted to climb in response to rising temperatures, although it’s possible that estimates are too low
[4]
To evaluate various Neural networks factors affecting the yield
Corn yield
95%
[5]
Graphical, correlative, and data mining regression methods were used in the analysis to uncover growing trends
Gaussian processes
Wheat yield
Wheat yield is projected to increase with a rise in temperature, however there is a risk of underestimating wheat yields as the temperature rises
[6]
To compare three models by using it on a region-specific crop
Three models used APAR, SEBAL, Carnegie Institution Stanford model
Rice yield
Wheat, rice, and sugarcane all benefited, but cotton did not. The suggested method has the potential to make important contributions to quantifying yield fluctuations across the Indus Basin
[7]
To predict the yield using C4.5 algorithm internal and external and decision tree factors of crop
Soybean
Summing together the percentages for soya beans is 87%, roughly 85% for paddy and 76% for maize
[8]
An investigation into crop yield change given seasonal cycles of climate annually
[9]
Unsupervised clustering Relational cluster Corn yield for climatic factors Beehive algorithm impacting crop prediction
Support vector regression model
Harmonic analysis For any crop of NDVI Time Series algorithm
86.5%
As compared to the cluster and regression tree approach, this crop yield prediction model (CRY) is 12% more accurate
(continued)
62
D. Patel et al.
Table 1 (continued) Paper
Objective
Algorithm
[10]
Clustering attempt on seasonal factors that impact crop growth
For both Wheat yield clustering and classifying data, the K-Means technique is a useful tool. We used a mixture of a linear regression, k-NN, and an ANN model
90–95%
[11]
Application of unsupervised naive Bayes algorithm on crop yield prediction
K-Nearest Neighbor (K-NN) and Naive Bayes (NB)
Rice yield
78%
[12]
Random forest classical classification for analyzing whether particular weather sequence is favorable for good yield
Random forest model
For any crop
83.8
[13]
Application on recurrent neural networks to understand sequential patterns
Simple RNN
For any crop
92.02%
[14]
A lesser known algorithm, kernel ridge for regression
Kernel Ridge
For any crop
98%
[15]
An ensemble of neural network models to produce results of crop yield
Neural networks ensemble
Sugarcane
46%
Crop
Result
3 Proposed System To provide not only India but also the rest of the world with an accurate prediction of agricultural output within a certain location based on the aforementioned variables, we propose a deep learning model in which we will work on factors like rain, carbon dioxide (in ppm), temperature, fertilizer, precipitation, climate factors, and many more. To determine which machine learning and deep learning models are most suited to making accurate predictions, we will use the Performance Matrix. The architecture of the proposed system is depicted in Fig. 1.
The Effects of Climate Change on Crop Yield
63
Fig. 1 System architecture of proposed system
Fig. 2 Autoencoder architecture diagram
3.1 Model Architecture Our proposed model is an encoder segment followed by a deep learning prediction segment. The encoder model consists of six convolutional layers with batch normalization and ReLU activation. The encoder model reduces the data size from 35 to 18 (as shown in Fig. 2). The deep learning model consists of five dense layers, with Tanh and ReLU (as shown in Fig. 3). There are 6,073 types of training parameters.
64
D. Patel et al.
Fig. 3 Artificial neural network architecture diagram
3.2 Training Configuration We used the mean_squared_error loss function for evaluating loss against ground truth and the optimizer Adam with a 0.001 learning rate. We kept the batch size at 16. Our training split consisted of a number of samples of 3420 and our testing data consisted of a number of samples of 5246. There are a total of 13,071 training parameters.
3.3 Dataset We used the “Global dataset for the projected impacts of climate change on four major crops” by Hasegawa et al. [15]. The dataset has tabulated yield per hectare change in crops based on changing simulations of factors, such as rain, atmosphere quality, soil quality, and fertilizers. This dataset contains 52 features for training. We have performed feature selection to select the top 37 features for training. The features used for the training process are given in Table 2. The target variable is Projected Yield Returns in Tons per Hectare (highlighted in green). The features which are excluded from the training model are given in Table 3.
The Effects of Climate Change on Crop Yield Table 2 Excluded features Sr. no. Features 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
ID Ref no. Methods Reference doi Site (Location) Publication year Local delta T Annual precipitation change each study (mm) Note1 (* = corrected by HW) Note2 (* = Local temperature is estimated) Note3 (* = Local delta Pr is estimated) Note4 (* = Global temperature is estimated) Seasonal precipitation change (mm) each study (local base) Base precipitation (annual) (mm) (local base period) Annual precipitation change (%) (relative to local base) Base precipitation (seasonal) (mm) (local base period)
65
Dtype float64 object object object object object float64 float64 float64 object object object object float64 float64 float64 float64
4 Experiments We trained the dataset on the machine learning and deep learning models from the literature survey. We recorded the accuracy and loss values in each case. We tested linear regression, Support Vector Regression, Random Forest Regressor, Decision Tree, ANN, and Autoencoder ANN.
4.1 Metrics We have used certain metrics to check the accuracy and the error in the models we have trained on the dataset.
66 Table 3 Trainable Features Sr. No. Feature 1 Scale 2 Crop 3 Country 4 Region 5 latitude 6 longitude 7 Current Average Temperature (dC)_area_weighted 8 Current Average Temperature_point_coordinate(dC) 9 Current Annual Precipitation (mm) _area_weighted 10 Current Annual Precipitation (mm) _point_coord 11 Future_Mid-point 12 Baseline_Mid-point 13 Time slice 14 Climate scenario 15 Scenario source 16 Local delta T from 2005 17 Annual Precipitation change from 2005 (mm) 18 Global delta T from pre-industrial period 19 Global delta T from 2005 20 Projected yield (t/ha) 21 Climate impacts (%) 22 Climate impacts relative to 2005 23 Climate impacts per dC (%) 24 Climate impacts per decade (%) 25 CO2 26 CO2 ppm 27 Fertilizer 28 Irrigation 29 Cultivar 30 Soil organic matter management 31 Planting time 32 Tillage 33 Others 34 Adaptation 35 Adaptation type
D. Patel et al.
DType object object object object float64 float64 float64 float64 float64 float64 float64 float64 object object object float64 float64 float64 float64 float64 float64 float64 float64 float64 object float64 object object object object object object object object object
The Effects of Climate Change on Crop Yield
4.1.1
67
Accuracy Metrics
We have used R 2 score as the accuracy metric for all the models we have trained and compared those values to predict the best model. The formula for the r 2 score is given by sum squared regression (SSR) , total sum of squares (SST) (yi − yˆi )2 . =1− (yi − y¯ )2
R2 = 1 −
4.1.2
Error Metrics
We have used two metrics to find the error for all the models we have trained and compared those values to predict the best performing model based on error analysis using this metric. Mean Squared Error: We have used this function to measure the loss in training the neural networks to compare the two neural network architectures, which are ANN and autoencoder ANN. The formula for MSE is given by D (xi − yi )2 sumi=1
Mean Absolute Error: We have used this function as a metrics in training the neural networks to compare the two neural network architectures, which are ANN and autoencoder ANN. The formula for MAE is given by D |xi − yi | sumi=1
4.2 Linear Regressor Linear regression is a supervised learning algorithm. It assumes the features to be linearly dependent on the prediction value. The linear line that best fits the data points gives the highest accuracy of prediction. We fit on the dataset, a linear regression model method from the sklearn library. The best fit r 2 score obtained was 0.431.
4.3 Support Vector Regressor SVM again is a supervised regression learning model. This model robustly evaluates nonlinear data by learning a hyperplane that can learn to separate different points
68
D. Patel et al.
from each other. We fit a Support Vector Regressor method from the sklearn library. The best fit r 2 score obtained was 0.649.
4.4 Random Forest Regressor Random forest regressors fit samples of dataset on a large number of decision trees. We took the number of random forest estimators to be 100. The best fit r 2 score obtained was 0.927.
4.5 Decision Tree Regressor A decision tree separates properties of data points through tree branches. It keeps separating based on yes/no until they reach the leaf node. We fit the decision tree method from the sklearn library. The best fit r 2 score obtained was 0.895.
4.6 ANN The artificial neural network is a simple deep learning neural network. It consists of seven dense layers and the final output layer is one neuron convolution layer. We use the Tanh and ReLU activation function for the model. We calculated loss on mean absolute error and measured final accuracy using r 2 score. We obtained a final r 2 score of 0.937. The number of trainable parameters in the model is 736,769.
4.7 Autoencoder ANN The autoencoder ANN model encodes the dataset into a compressed format and then decodes the model using an artificial neural network and gives an output. We use the Tanh and ReLU activation function for the model. We calculated loss on mean absolute error and measured final accuracy using r 2 score. We obtained a final r 2 score of 0.953. The number of trainable parameters in the model is 13,071.
5 Results and Analysis It is possible to deduce from the comparison bar plots in Fig. 4 that the ANN equipped with autoencoders achieves the highest level of accuracy (95.3%) when applied to the test dataset. The ANN provides an accuracy that is comparable to 93.7% even without the autoencoders. Based on the results of our research, ANN equipped with
The Effects of Climate Change on Crop Yield
69
Fig. 4 Comparison graphs for all models
autoencoders is the superior model. The high accuracy is achieved due to the model we have taken into account as it reduces the dimension of the dataset and makes it efficient to predict the value which is yield in tons per hectare. It took 0.062 milliseconds to get the output. As a result, this model may be utilized to notify farmers in a manner that is almost identical to real time on how the crop yield will alter given the varying climatic circumstances of the surrounding environment. Both random forests and decision trees perform admirably, with respective accuracy ratings of 0.918 and 0.877. SVMs do not fare particularly well in linear regression, with just 0.431 and 0.649 accuracies, respectively. Figures 5 and 6 show the loss and mean absolute error with the number of epochs in the training of the autoencoder ANN, which is the model with the highest performance based on an analysis done using r 2 score on the other models. These figures were shown during the training of the autoencoder ANN.
6 Conclusion The suggested study was based on a dataset obtained by observing the impact of simulated climatic factors on four crops. Based on the kind of crop, four machine learning algorithms and two Deep Learning methods were used independently to forecast the needed Projected Yield in tons per hectare. Four machine learning prediction models were created utilizing the well-known approaches of decision tree, linear regression, random forest, and support vector machine. Using autoencoder
70
D. Patel et al.
Fig. 5 Loss and MAE graph for autoencoders
Fig. 6 Loss and MAE graph for trainable ANN
ANN and baseline ANN, two further deep learning prediction models were constructed. Statistical matrices such as correlation, r 2 score, mean absolute error, mean square error, and total time spent per model were utilized to evaluate the models. After reviewing the evaluation results, the autoencoder ANN fared well with the most outstanding r 2 score (95.1%) and lowest mean absolute error (0.113) among all models. Based on its performance, the autoencoder ANN may be relied upon for agricultural yield prediction and analysis in the long run. To increase agricultural yields in the future, this model might be used to make predictions and provide real-time predictions to farmers via a front-end web interface which uses the remote sensing to get the required values for the prediction of yield of a certain crop.
The Effects of Climate Change on Crop Yield
71
References 1. Ramesh D, Vardhan BV (2013) Region specific crop yield analysis: a data Mining approach. UACEE Int J Adv Comput Sci Appl-IJCSIA 3(2) 2. Ruß G (Jul 2009) Data mining of agricultural yield data: a comparison of regression models. In: Industrial conference on data mining. Springer, Berlin, Heidelberg, pp 24–37 3. Panda SS, Ames DP, Panigrahi S (2010) Application of vegetation indices for agricultural crop yield prediction using neural network techniques. Remote Sens 2(3):673–696 4. Vagh Y, Xiao J (Jul 2012) Mining temperature profile data for shire-level crop yield prediction. In: 2012 international conference on machine learning and cybernetics, vol 1. IEEE, pp 77–83 5. Bastiaanssen WG, Ali S (2003) A new crop yield forecasting model based on satellite measurements applied across the Indus Basin, Pakistan. Agric Ecosyst Environ 94(3):321–340 6. Veenadhari S, Mishra B, Singh CD (2011) Soybean productivity modelling using decision tree algorithms. Int J Comput Appl 27(7):11–15 7. Fernandes JL, Rocha JV, Lamparelli RAC (2011) Sugarcane yield estimates using time series analysis of spot vegetation images. Sci Agric 68:139–146 8. Ananthara MG, Arunkumar T, Hemavathy R (Feb 2013) CRY-an improved crop yield prediction model using bee hive clustering approach for agricultural data sets. In: 2013 international conference on pattern recognition, informatics and mobile engineering. IEEE, pp 473–478 9. Ahamed AMS, Mahmood NT, Hossain N, Kabir MT, Das K, Rahman F, Rahman RM (June 2015) Applying data mining techniques to predict annual yield of major crops and recommend planting different crops in different districts in Bangladesh. In: 2015 IEEE/ACIS 16th international conference on software engineering, artificial intelligence, networking and parallel/distributed computing (SNPD). IEEE, pp 1–6 10. Paul M, Vishwakarma SK, Verma A (Dec 2015) Analysis of soil behaviour and prediction of crop yield using data mining approach. In: 2015 international conference on computational intelligence and communication networks (CICN). IEEE, pp 766–771 11. Kaur S, Malik K (2022) Evaluating various machine learning techniques for crop nutrients prediction. SSRN 4031986 12. Nigam A, Garg S, Agrawal A, Agrawal P (Nov 2019) Crop yield prediction using machine learning algorithms. In: 2019 fifth international conference on image information processing (ICIIP). IEEE, pp 125–130 13. Nishant PS, Venkat PS, Avinash BL, Jabber B (June 2020) Crop yield prediction based on Indian agriculture using machine learning. In: 2020 international conference for emerging technology (INCET). IEEE, pp 1–4 14. Fernandes JL, Ebecken NFF, Esquerdo JCDM (2017) Sugarcane yield prediction in Brazil using NDVI time series and neural networks ensemble. Int J Remote Sens 38(16):4631–4644 15. Hasegawa T, Wakatsuki H, Ju H, Vyas S, Nelson GC, Farrell A, Makowski D (2022) A global dataset for the projected impacts of climate change on four major crops. Sci Data 9(1):1–11
Finetuned-VGG16 CNN Model for Tissue Classification of Colorectal Cancer T. E. Anju and S. Vimala
Abstract Classifying cancer tissues has been a difficult task ever since Computer Vision and Pattern Recognition were developed. Deep Learning, a modern, stateof-the-art method for texture categorization and localisation of cancer tissues, has replaced traditional machine learning approaches. The third leading cause of mortality from cancer globally is colorectal cancer. In this paper, a fine tuned deep learning model is proposed for image-level texture based classification using CRC dataset. In order to minimise overfitting and considerably increase classification accuracy, the model must be fine-tuned, especially when there are less set of training datasets available. The VGG-16 pretrained model underwent fine-tuning. The nine classes of the histology Kather CRC image collection were used to verify these models (CRC-VAL-HE-7K, NCT-CRC-HE-100K). The outcomes demonstrated that the accuracy of histopathological image recognition was much higher than that of previous approaches. Keywords Convolution neural network · Colorectal cancer (CRC) · Deep learning · Visual geometry group (VGG) 16 · Tumor epithelium · Densenet
1 Introduction One in nine women may get breast cancer over their lifetime, which is a serious disease. According to polls, 1 in 32 women will develop breast cancer and pass away from it. The second-largest category of new cancer cases in 2018 was predicted to be breast cancer, which would account for almost 1 in 4 instances of cancer discovered in women. Breast cancer is the seventeenth most important cause of death worldwide and the biggest risk factor for cancer in women. The third most prevalent malignancy, colorectal cancer (CRC), is closely correlated with our way of life. When polypectomy screening tests are conducted, the death rate of CRC patients is markedly reduced. Experts are required to examine several polyp instances during T. E. Anju (B) · S. Vimala Department of Computer Science, Mother Teresa Women’s University, Kodaikanal, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 J. S. Raj et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 665, https://doi.org/10.1007/978-981-99-1726-6_7
73
74
T. E. Anju and S. Vimala
the CRC screening test in order to find problematic tissues [1–3]. Cancer researchers will use artificial intelligence (AI) to detect the disease’s symptoms and choose the best course of therapy. It will expedite diagnosis and increase its effectiveness. Better treatment options might lower cancer patients’ mortality as a result of the information that is gained. The AI discipline of deep learning (DL) was originally developed in the 1990s [4, 5]. Numerous methods for recognising patterns and images have been presented, and more recently, computer vision has advanced to make use of deep convolutional neural networks (CNN) [4]. Instances of deep CNN models based on ImageNet weights [6] were created at this time, introducing advances like Xception [7], VGG-16 [8], etc. By finetuning the CNN models in parameters such as the Dropout Layer, Learning Rate, and Validation Frequency, overfitting is reduced and improves classification performance both in VGG-16 and DENSENET. We investigate the efficiency of differentiating tumour epithelium from stroma utilising transfer learning approaches. The remainder of this paper is structured as follows. The related work of optimising Deep CNN models is covered in Sect. 2. The Learning Rate, Dropout Layer, and Validation Frequency are adjusted in Sect. 3, to present our suggested methodology. Section 4 then presents the experimental findings. Finally, Sect. 5 provides the conclusion.
2 Literature Review Using deep learning and image processing approaches, researchers from all over the world have been working diligently to establish reliable frameworks and processes for the rapid and accurate identification of cancer. However, none of those approaches has been successful in accurately forecasting cancer. As a result, histopathology WSI for cancer diagnosis has gained popularity recently. Since the middle of the 1990s, CNNs have been utilised in medical imaging to detect various disorders [9, 19]. Since then, CNN architectures have primarily been used for a variety of tasks in the field of medical image analysis. Despite their progress in identifying medical pictures, CNNs still face a number of challenges. The first and most significant challenge is the potential for overfitting the model, which results from overly deepening of tumour classification networks, which quickly increases the training parameters. The probability of overfitting must also be reduced by using many image samples, however this is not always feasible. Hyperparameters, which are essential to a CNN’s effective functioning, is the second challenge. The learning rate is a crucial hyperparameter that has the ability to make or destroy a model. It is necessary to manually alter the model’s learning rate in line with the progress of the training in order to ensure that the model operates at its peak throughout [10]. For CRC image analysis jobs, Linder et al. [11] recommended the CRC analysis utilising conventional data mining approaches. The automatic classification method was used to distinguish the tumour stroma from the tumour epithelium in digitised tumour tissue microarrays (TMA). The support vector machine (SVM) and local binary patterns (LBP)
Finetuned-VGG16 CNN Model for Tissue Classification of Colorectal …
75
were optimised (SVM). These conventional categorization methods yielded excellent binary findings. Kather et al. [12] proposed a unique method for CRC detection using histopathology WSI. A variety of statistical methods were applied, including geographic analysis of tumour blood vessels and kernel density estimation with probability. Kather et al. also released a work on automatic texture classification from CRC images in 2016. Two distinct types of pattern recognition were provided: tumor/stroma and multiclass texture separation. The usual data mining techniques were used. We coupled sophisticated feature extraction techniques with classification strategies. For eight different tissue types, their results achieved a classification accuracy of 87.4% and a two-class separation accuracy of 98.6%. Also made accessible was a set of 5000 high-resolution CRC-textured histology images. A deep convolutional neural network was used by Teramoto et al. [13] to automatically identify malignant lung cells from microscopic pictures. They were given 306 benign and 315 cancerous image patches. In order to classify data using DCNN, a modified VGG16 model was used. The specificity and sensitivity of the categorization were comparable to those of a cytopathologist. The methods utilised at the patient level by Bychkov et al. [14] for projecting CRC risk scores were based on machine learning and deep learning. They believed that there should be a technique for patient-level survival prediction that is specific to a patient’s situation. They directly evaluated digitised images of tissue samples stained with hematoxylin and eosin (H&E). Convolutional neural networks (CNN) and recurrent neural networks, two DL topologies for feature extraction and classification, were added into the prognostic model (RNN). The convolutional VGG-16 network was used to extract features from TMA spot image tiles without any modifications. Their method saves the time-consuming task of building a classifier from start. Considering the discussion above, it is a difficult challenge to choose a complexity network design that would lessen the overfitting issue. Even though choosing the ideal network is important, we also need to discover the proper activation function and perform batch normalisation, regularisation, and weight reduction. In this paper, we present a revised method for identifying CRC tissues using the pretrained model, the VGG-16, and DENSENET. The next elements of the simple architectural design examine how well transfer learning techniques work to separate tumour epithelium from stroma. In order to decrease overfitting and to improve classification performance in both VGG-16 and DENSENET, the models’ Learning Rate, Dropout Layer and Validation Frequency must be adjusted. Additionally, we made a comparison with Alexnet and Googlenet. The input consisted of two publicly accessible libraries of 7k (CRC-VAL-HE-7K) and 100k (NCT-CRC-HE-100K) CRC images, each with nine different tissue types. In this investigation, just pixels and class labels were used for picture analysis.
76
T. E. Anju and S. Vimala
3 Materials and Methods 3.1 Dataset 100,000 non-overlapping image patches, containing histology images of human CRC and healthy tissue, are present in this collection [15]. Each image is of size 224 × 224 pixels and nine different tissue types were selected from the database. Those are normal mucosa, lymphocytes, adipose tissue, detritus, mucus, background, muscle, stroma, and cancer epithelium. The tissue samples comprised both original tumour slides from CRC and tumour tissue from liver metastases from CRC [18]. Tables 1 and 2 represent the sample images takenfrom the NCT-CRC-HE-100K and CRC-VAL-HE-7K databases respectively, and exhibit five sample images of each CRC tissue type.
3.2 Preprocessing of Dataset The unnecessary calculations on the non-tissue parts of the slide have been avoided by eliminating the empty tissue patch from the dataset, and the image contrast has then been adjusted by using local contrast normalisation. The low-resolution image’s RGB colour scheme was converted to LAB before OTSU’s threshold was used. To help with accurate patch extraction at tiny tissue areas and tissue boundaries after thresholding, binary morphological approaches were used.
3.3 Finetuned VGG 16 Convolutional Neural Networks (CNN) have substantially improved in the categorization of cancer in recent years and have shown to be an effective method in computer vision. A successful candidate for the transfer-learning approach is VGG16. After carefully examining the procedures, we discovered that the majority of biomedical imaging transfer learning systems employed VGG techniques to attain the best levels of prediction accuracy. In order to reach the highest level of accuracy, the parameters in the proposed work have been hyper-tuned from VGG16 [8]. The following fine-tuning (FT) was done to the VGG 16. • FT1: Input layer: three 256 × 256 pixel channels, normalised from RGB patch pictures, make up the input layer. • FT2: Fine-tuning: Primitive low-level spatial characteristics gained. They are trained using the CRC-VAL-HE-7K and CRC-NCT-CRC-HE-100K datasets to subsequent higher convolutional layers.
Finetuned-VGG16 CNN Model for Tissue Classification of Colorectal …
77
Table 1 Database composition (NCT-CRC-HE-100K) [15]
Class
Num ber of Imag es
Adipose(ADI)
10,40 7
background (BACK)
10,56 6
Debris(DEB)
10,51 2
Lymphocytes (LYM)
11,55 7
Mucus(MUC)
8,896
Muscle(MUS)
13,53 6
normal mucosa (NORM)
8,763
Stroma(STR)
10,44 6
tumor epithelium (TUM)
14,31 7
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5
78
T. E. Anju and S. Vimala
Table 2 Database composition (CRC-VAL-HE-7K) [15]
Class
Number of Images
Adipose(ADI)
1,338
background (BACK)
847
Debris(DEB)
339
Lymphocytes (LYM)
634
Mucus(MUC)
1,035
Muscle(MUS)
592
normal mucosa(NORM)
741
Stroma(STR)
421
tumor epithelium(TUM)
1,233
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5
• FT3: Batch normalization: In order to lessen overfitting from the initial weight of dataset, a layer to normalise a number of activations in the combination layer of VGG16 the output layer is included. Based on the quantity of images designated for training, frequency can be changed as follows: [ V alidation Fr equency =
N umber o f I mages Batch Si ze
] (1)
Finetuned-VGG16 CNN Model for Tissue Classification of Colorectal …
79
Fig. 1 Finetuned VGG-16 architecture
• FT4: Dropout layer: Dropout is a regularisation strategy that randomly eliminates certain neurons from the network during forward or backward propagation with a probability of 0.4 for the CRC-VAL-HE-7K dataset and 0.8 for the NCT-CRCHE-100K dataset. • FT5: Fully Connected layer 8: The size of the FC8 has been reduced to 1 × 1 × 9. • FT6: Output layer: After the models were created, A new fully connected layer was to perform the classification. Finetuned VGG-16 Architecture is provided in Fig. 1.
4 Results and Discussion An Intel i7-85650 processor including 16 GB RAM and a Radeon RX 550X graphics card make up the majority of the gear setup. The language programming environment on a Windows 10 PC is Matlab R2020a. In this work, a number of tests were carried out using several convolutional neural network (CNN) models, including AlexNet, GoogLeNet, VGGNet, and Densenet. First, two separate datasets of histology pictures were imported. The initial dataset included nine classes of tissues and 100,000 histological pictures of colorectal cancer (NCT-CRC-HE-100K). The above mentioned dataset has 7180 picture patches and is called (CRC-VAL-HE7K).The image storage effectively read images in batches while training a Deep CNN model and the images are automatically labelled based on folder names. The picture dataset was then divided into three data stores, with 40% for testing and 20% for validation and 40% of the dataset going into training data such that none of them would
80 Table 3 Parameter and values
T. E. Anju and S. Vimala Fields
NCT-CRC-HE-100K
CRC-VAL-HE-7K
Classes
9
9
Dropout
0.8
0.4
Mini-batch size
64
64
Epoch
15
15
Iterations
9360
660
Learning rate
1e−05
1e−05
overlap. At this point, we utilised the training dataset for 4 distinct CNN models: AlexNet [5], GoogLeNet [16], Densenet [17], and the suggested improved VGG16. Different convolutional layers, a ReLU layer, a max-pooling layer, and fully linked layers were all incorporated in the architecture. The parameters and values are shown in Table 3. We classified 100,000 histology images into nine distinct tissue classes using the data specification (NCT-CRC-HE-100K) images with 224 × 224 pixel format. The accuracy for each class was then displayed by utilising column and row summaries to produce the confusion matrix, as shown in Fig. 2. Additionally, using a distinct, independent collection of 7000 photos from various patients (CRC-VAL-HE-7K), we evaluated the classification performance and showed the confusion matrix, as seen in Fig. 3. Tables 4 and 5 display the full findings. From Tables 4, 5, Figs. 2 and 3 it has been proved that the proposed fined tuned performed well with accuracy 97.9% in 6797 minutes for NCT-CRC-HE100K database and 96.8% for CRC-VAL-HE-7K database in 594 minutes. Although Densenet is performing well in NCT-CRC-HE-100K database, due to the overfitness issue, densenet is underperforming in the CRC-VAL-HE-7K database.
5 Conclusion The proposed research work is to investigate several deep learning algorithms for the identification of CRC tissue. This article suggests a refined version of deep learning settings to increase classification accuracy. The finetuned layers and parameters are fixed by testing in CRC histological images and the performance of the deep learning models such asAlexnet, Googlenet, Densenet and our proposed finetuned VGG16 is assessed in differentiating CRC tissues accurately. According to the findings, our approach outperformed the strategies mentioned in the literature and had a high percentage of success. It is concluded, almost 98% of the histology images available in the 100 K dataset were accurate in nine classes. The experiment’s findings show that artificial intelligence may be used to categorise colorectal cancer (CRC) histology images in a wide range of situations. It can also improve clinicians’ capacity for critical thought and help them get the best diagnoses.
Finetuned-VGG16 CNN Model for Tissue Classification of Colorectal …
81
Fig. 2 Confusion matrix for NCT-CRC-HE-100K dataset. a Densenet, b Googlenet, c Alexnet, d Proposed finetuned VGG 16
82
T. E. Anju and S. Vimala
Fig. 3 Confusion matrix for CRC-VAL-HE-7K dataset. a Densenet, b Googlenet, c Alexnet, d Proposed Finetuned VGG 16 Table 4 Comparison of CNN models (CRC-VAL-HE-100K) Deep CNN Algorithm
Training Acc. (%)
Validation Acc. (%)
Training loss
Testing Acc. (%)
Testing Loss
Elapsed time (min)
Alexnet
95
95.03
0.42
94.5
0.39
1370
GoogLeNet
95
94.08
0.38
94.3
0.29
5123
Densenet
97
97.6
0.32
95.1
0.27
9940
Proposed VGG16
98
97.9
0.29
95.6
0.21
6797
Finetuned-VGG16 CNN Model for Tissue Classification of Colorectal …
83
Table 5 Comparison of CNN models (CRC-VAL-HE-7K) Deep CNN Algorithm
Training Acc. (%)
Validation Acc. (%)
Training Loss
Testing Acc. (%)
Testing loss
Elapsed time (min)
Alexnet
91
92.03
0.42
94.5
0.39
502
GoogLeNet
95
94.08
0.38
94.3
0.29
605
Densenet
94
94.53
0.33
94.5
0.25
707
Proposed VGG16
97
96.8
0.31
95.3
0.22
594
References 1. Lohsiriwat V, Chaisomboon N, Pattana-Arun J (2020) Current colorectal cancer in Thailand. Ann Coloproctol 36(2):78 2. Phisalprapa P, Supakankunti S, Chaiyakunapruk N (2019) Cost-effectiveness and budget impact analyses of colorectal cancer screenings in a low-and middle-income country: example from Thailand. J Med Econ 22(12):1351–1361 3. Deng S et al (2020) Deep learning in digital pathology image analysis: a survey. Front Med 14(4):470–487 4. LeCun Y, Bengio Y (1995) Convolutional networks for images, speech, and time series. In: The handbook of brain theory and neural networks 3361(10) 5. Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90 6. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp 248–255. IEEE 7. Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258 8. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint. arXiv:1409.1556 9. Shin J, Tajbakhsh N, Todd Hurst R, Kendall CB, Liang J (2016) Automating carotid intimamedia thickness video interpretation with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2526–2535 10. Loshchilov I, Hutter F (2016) Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint. arXiv:1608.03983 11. Linder N, Konsti J, Turkki R, Rahtu E, Lundin M, Nordling S, Haglund C, Ahonen T, Pietikäinen M, Lundin J (2012) Identification of tumor epithelium and stroma in tissue microarrays using texture analysis. Diagn Pathol 7(1):1–11 12. Kather JN, Marx A, Reyes-Aldasoro CC, Schad LR, Gerrit Zöllner F, Weis C-A (2015) Continuous representation of tumor microvessel density and detection of angiogenic hotspots in histological whole-slide images. Oncotarget 6(22):19163 13. Teramoto A, Yamada A, Kiriyama Y, Tsukamoto T, Yan K, Zhang L, Imaizumi K, Saito K, Fujita H (2019) Automated classification of benign and malignant cells from lung cytological images using deep convolutional neural network. Inf Med Unlocked 16:100205 14. Bychkov D, Linder N, Turkki R, Nordling S, Kovanen PE, Verrill C, Walliander M, Lundin M, Haglund C, Lundin J (2018) Deep learning based tissue analysis predicts outcome in colorectal cancer. Sci Rep 8(1):1–11 15. Kather JN, Halama N, Marx A. https://doi.org/10.5281/zenodo.1214456 16. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
84
T. E. Anju and S. Vimala
17. Smith LN, Topin N (2019) Super-convergence: Very fast training of neural networks using large learning rates. In: Artificial intelligence and machine learning for multi-domain operations applications, vol 11006, pp 369–386. SPIE 18. Anju TE, Vimala S (2022) Tissue and tumor Epithelium classification using fine-tuned deep CNN models. Int J Adv Comput Sci Appl (IJACSA) 13(9). https://doi.org/10.14569/IJACSA. 2022.0130936 19. Pandian AP (2019) Identification and classification of cancer cells using capsule network with pathological images. J Artif Intell 1(1):37–44
Effective Heart Disease Prediction and Classification Using Intelligent System P. Mohana Priya and Kannan Balasubramian
Abstract Cardiovascular diseases are the most common cause of huge death rates with an aggregate of 33% according to the statistical report produced by World Health Organization. Cardiovascular diseases can be classified into a wide range of diseases such as abnormal heart rhythms, congenital heart disease, deep vein thrombosis, heart attack, heart failure, etc. These diseases are diagnosed using scanning devices and doctors can analyze the report by observing heart X-ray images. This research focuses on predicting heart failure using a patient dataset which is available in the form of numeric and nominal values. This research work uses two different approaches with traditional supervised machine learning approaches and IBM Auto Artificial Intelligence service to analyze the patient dataset in order to find the heart failure of the patient where the outcome is generated as a class label such as yes (or) no. It is found that the IBM Auto Artificial Intelligence service predicts heart failure with an accuracy of 87%. Keywords Cardiovascular disease · International business machines auto artificial intelligence service · Supervised machine learning algorithms · Classification · Prediction · Artificial intelligence
1 Introduction Cardiovascular Diseases (CVD) [1] are the predominant cause of death rates globally with an account of 32% each year according to the statistical findings of the World Health Organization (WHO). CVD includes the 10 most common issues such as abnormal heart rhythms, aorta disease [2], congenital heart disease [3], coronary artery disease [4], deep vein thrombosis [5], pulmonary embolism [6], heart attack [7], heart failures [8], heart muscle disease [9], and rheumatic heart disease [10]. P. Mohana Priya (B) · K. Balasubramian School of Computing, SASTRA Deemed University, Thanjavur, India e-mail: [email protected] K. Balasubramian e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 J. S. Raj et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 665, https://doi.org/10.1007/978-981-99-1726-6_8
85
86
P. Mohana Priya and K. Balasubramian
Heart-based risks are the root causes of human behavioral activities such as tobacco use, unhealthy diet, obesity, lack of exercise, physical inactivity, and use of alcohol. It is of utmost importance to predict CVD at an earlier stage for a patient. Healthcare centers and hospitals have been deployed with expensive equipment to scan [11] the CVD-based risk of the patient. Predicting the health risk [12] and suggesting appropriate treatment based on the type of CVD is considered the most crucial process of all healthcare centers. Basically, predicting such health risks is done on top of the scanned reports of the patient by observing the heart X-ray images [13]. This research focuses on predicting heart failure on top of some attributes of the patient which includes both numerical values and categorical labels [14]. This research has made use of IBM Auto AI service provided by Smart Internz platform from IBM project build-a-thon, machine learning algorithms [15]. The above-said service and techniques are applied to the collected dataset of the patient in order to predict heart failure. The objective of this research work is to predict heart failure using the IBM Auto AI service. The contributions of this research work include heart failure prediction using IBM Auto AI service in IBM cloud, creation of user interface using NodeRed flow editor service to test user inputs, Deployment of Auto AI model in cloud, comparing the classification accuracies of some supervised machine learning algorithms, Performance analysis of all algorithms with evaluation metrics, namely True Positive Rate (TPR), False Positive Rate (FPR), True Negative Rate (TNR), False Negative Rate (FNR) with specificity, sensitivity, and accuracy. Classification accuracy is tested based on kappa statistics, Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Relative Absolute Error (RAE), and Root Relative Square Error (RRSE). The article is organized in such a way that Sect. 2 discusses the state-of-the-art approaches for heart failure prediction and classification, Sect. 3 discusses heart failure prediction using IBM Auto AI service, Sect. 4 deals with the experimental setup, Sect. 5 details about results and its discussion for IBM Auto AI service and Sect. 6 discusses the conclusion and future research directions.
2 Literature Survey This section discusses the state-of-the-art-related works of CVD prediction and classification. Austin et al. [16] proposed a method using machine learning algorithms to predict Heart Failure (HF) with Preserved Ejection Fraction (PEF) and HF with Reduced Ejection Fraction (REF). This method uses a tree-based algorithm for predicting heart failure. Mamun Ali et al. [1] predicted heart disease by using supervised machine learning algorithms such as K-Nearest Neighbor (KNN), Decision Tree (DT), and Random Forest (RF). Feature importance scores are estimated for all algorithms except KNN and Multi-Layer Perceptron (MLP). Accuracy for RF is found to be 100% specificity and sensitivity. Jan et al. have [17] proposed an ensemble-based model for predicting the accuracies of multiple classifiers.
Effective Heart Disease Prediction and Classification Using Intelligent …
87
In this research, authors have used five classifiers, namely support vector machine [18], artificial neural network [19], Naïve Bayes [20], regression analysis [21], and random forest algorithm [22] to predict and diagnose heart failure. Senthil et al. [23] has made use of Neural Network heart rate time series with clinical records considered for the prediction of Left Bundle Branch Block (LBBB), Right Bundle Branch Block (RBBB), Atrial Fibrillation (AFIB), and Normal Sinus Rhythm (NSR). The Hybrid Random Forest with Linear Method (HRFLM) predicts with higher accuracy of 88.7%. Lei et al. [24] has made use of Knowledge Graph (KG) to classify heart disease with the fusion method called RKRE based on ResNet to attain the proportion of 86.95% in disease classification. Jyoti et al. [25] has proposed data mining techniques for heart disease prediction using supervised machine learning algorithms such as Naïve Bayes, KNN, and Decision list, here both decision tree and Naïve Bayes have similar prediction accuracy that outperforms other predictive models like KNN, Neural Network, and clusteringbased methods. Aditya et al. [26] has made use of data mining classification modeling techniques using NB, DT, and NN to predict heart disease. In this work, the authors have made use of medical profile attributes of a patient such as age, sex, blood pressure, and blood sugar to predict whether the patient has heart failure. Decision tree have outperformed other algorithms such as genetic and apriori algorithms with a prediction accuracy of 99.62%. Le et al. [27] has made use of Infinite Latent Feature Selection (ILFS) to select the features of the heart disease attributes. The dataset is collected from the UCI machine learning repository for heart disease prediction. Prediction accuracy is found to be 97.87% in distinguishing all attributes. Tarawneh et al. [28] proposed a hybrid approach to predict heart disease using different machine learning algorithms such as Logistic Regression, Adaptive Boosting, Multi-Objective Evolutionary Classifier (MOEFC), Fuzzy Unordered Rule Induction (FURIA), and Genetic Fuzzy System-LogitBoost (GFS-LB). Here, authors have compared using performance evaluation metrics such as Accuracy, specificity, sensitivity, and error rate. Beulah et al. [29] has investigated ensemble classification for improving the accuracy of weak algorithms by making use of multiple classifiers with bagging, boosting, stacking, and major voting algorithms. Feature selection techniques are useful to improve the accuracy of ensemble algorithms. Chandan et al. [30] has made use of nine machine learning algorithms, namely Gradient Boost, Extreme Gradient Boost, AdaBoost, CatBoost, Logistic Regression, Decision Tree, Random Forest, Artificial Neural Network, Support Vector Machine with categorical and numerical values. Here, models trained using categorical value outperform the models that have been trained with numerical values. Hasanova et al. [31] has proposed a patient data management system with blockchain technology to maintain the integrity and security of patient records. Here, Sine Cosine-based KNN (SCA-KNN) algorithm is used to find the optimal prediction, whereas the traditional KNN is used for prediction accuracy. Chang et al. [32] has proposed the AI-based heart disease detection, random forest classifier that is used to perform data analysis and it is found that prediction accuracy is found to be 83%.
88
P. Mohana Priya and K. Balasubramian
3 Heart Failure Prediction Using IBM Auto AI Service 3.1 Creation of IBM Services The proposed work begins with the Creation of IBM Services, namely IBM Watson studio and Creation of Node-Red service, Cloud Object Storage (COS) Instance, and Machine Learning service (ML).
3.2 Creation of Watson Studio Service IBM Watson studio service is created considering region as Dalla3s and plan as Lite for carrying out free experimentations in IBM cloud. In Watson studio, projects and deployment space need to be created. The Watson studio service is accessible through the IBM cloud pak dashboard as software and services in the resources list.
3.3 Creation of IBM Node-Red Service In the IBM cloud dashboard, a software module is accessed to create the NodeRed App by defining the parameters plan as Lite edition, platform as Node.js, and location as Dallas. Once the node-red app is created, deployment of NodeRed application in the IBM cloud needs to be done. Cloudant database service is associated with the created Node-Red application. It needs to be deployed as IBM Cloud Foundry application by creating a cloud API key “kKj_RA7-SH3hMqUA2ygdDJemU_wY4iSdMxPy3v5ETGK” and validation of cloud space, hostname, region is done to proceed for the deployment. It consists of building, testing, and deploying models in the delivery pipeline, successful integration of pipelines, and creation of Node-red application. By invoking the visit APP URL, the Node-Red flows for Heart Failure Prediction Model can be accessed.
3.4 Building Machine Learning Model to Predict Heart Failure In IBM Watson Studio, a new project is created titled Heart Failure Prediction Model (HFPM) by selecting a Cloud Object Storage (COS) service. In this HFPM project, select an asset and create a new Auto AI experiment from Automated Builders and associate machine learning model. A new ML service is created and associated with the HFPM project with configuration as 8vCPU and 32 GB RAM. On successful
Effective Heart Disease Prediction and Classification Using Intelligent …
89
creation of an Auto AI experiment, a data source needs to be imported into it. Here, the data source imported into this HFPM project is the patient data.
3.5 Run Auto AI Experiment The patient data is fed as an input in the Auto AI experiment as.csv file which consists of attributes such as Average Heart Beats (per minute), Palpitations (per day), Cholesterol, Body Mass Index (BMI), Heart failure, Age, Sex, Family History, Smoker in last 5 years, and Exercise. Attributes used for auto ai experiment are patient family history, heart failure and smoker in last 5 years which are categorical values and the rest of the attributes such as Average Heart Beats, Palpitations, Cholestrol, Body Mass Index, Age and Sex are numerical values. Heart failure is considered as the target attribute to predict whether the patient has heart failure (or) not. Auto AI experiment itself selects the top ten algorithms for heart failure prediction like Snap random forest classifier, Extended Gradient Boost (XGB) classifier, etc. Algorithms chosen by Auto AI experiment create a pipeline, namely P1, P2, P3, P4, P5, P6, P7, and P8, and select the leader board algorithm which has higher prediction accuracy. Here, the prediction algorithm is chosen as Binary classification. In experiment settings, the proportion of training–testing ratio is selected, and performance analysis metrics of the algorithm are selected through performance metric optimizers such as Accuracy, Average Precision, F1 score, Log loss, Precision, Recall, Receiver Operating Curve (ROC), and Area Under ROC Area (AUC).
3.6 Building Pipelines Generic pipeline models are created by generating caches for storing all attributes. The two basic pipelines created are channel-pipeline (ch–pipeline) and pull requestpipeline (pr–pipeline). ch-pipeline acts as a channel interface for connecting all machine learning algorithms, whereas pr-pipeline pulls the request of the experiment. The experimentation summary consists of feature transformation modules such as read dataset, split holdout data, read training data, preprocessing, and model selection such as Snap Random Forest Classifier and XGB classifier followed by Hyperparameter optimization and Feature Engineering. Altogether, 8 pipelines were created, of which, 4 pipelines were created for the Snap Random Forest Classifier, and the rest of the 4 pipelines were created for the XGB classifier.
90
P. Mohana Priya and K. Balasubramian
3.7 Save Watson ML Model The top leaderboard pipeline with the highest prediction accuracy needs to be saved as a model for the HFPM project. The saved Watson ML model is available in the HFPM project asset details.
3.8 Deploy the Model The saved Watson ML model needs to be deployed by creating the new deployment space named HFPM deploy and promoted to deployment space. On the successful deployment of the HFPM project, the newly deployed model is available in the deployment space where ML service is associated and deployed online. In the deployed HFPM model, the scoring endpoint is assessed and patient data is tested with inputs for prediction. The prediction output is collected in the form of JSON file for the stored patient dataset. The generated scoring endpoint is found to be https://us-south.ml.cloud.ibm.com/ml/v4/deployments/508c41c0-d254-4c39861d48e5b29184f8/predictions?version=2022-07-25. Figure 1 shows the block diagram which is composed of various modules such as IBM Watson studio service, IBM cloud object storage, Auto AI service, Watson machine learning service, and user interface. The block diagram illustrates the interconnection of all modules which are connected together for providing and associating services. IBM cloud object storage creates a database for storing various cloud instances configured for the specific
Fig. 1 Block diagram of ıntelligence systems
Effective Heart Disease Prediction and Classification Using Intelligent …
91
project, IBM Watson studio service is used to create the user projects to run in the IBM cloud, and IBM Auto AI service is used to create the user projects by invoking automated builders. Figure 2 shows the flow chart of project creation in IBM cloud pak. The project creation process begins with logging into the IBM cloud user account. Next, comes the creation of IBM watson studio and IBM node-red services for project creation and user interface connection. As a follow-up, the Auto AI experiment needs to be saved and deployed as the machine learning model in the cloud. During model deployment, a scoring endpoint is generated where the web application can be accessed. In this web application, user input needs to be entered so that the prediction results can be showcased in the User Interface (UI).
Fig. 2 Flow chart of working procedure of intelligent system
92
P. Mohana Priya and K. Balasubramian
4 Experimental Setup This section deals with both hardware and software requirements used in this research work. The hardware requirements include DELL Inspiron and Intel Core i3 10th GEN. The software requirements include IBM, Academic Initiative Account, IBM Cloud Account, IBM Watson Studio, Auto AI, IBM Web Server, Node-Red Editor, Node-Red Web Application, and User Interface. The experimental setup is carried out in IBM Cloud. In the IBM cloud, a service named Watson studio service is created and launched as an IBM cloud pak for data in order to access the Watson service. Once Watson service is invoked, the project named “Heart Failure Prediction Model” is created followed by creating the asset from Automated builders Auto AI experiment. It analyzes both tabular and JSON file data formats that generate candidate model pipelines customized for predictive modeling. A new Auto AI experiment is created by selecting the Watson machine learning service instance named “Machine Learning – rn” with environment definition as 8 vCPU and 32 GB RAM which consumes 20 capacity units per hour for training. In the imported patient dataset, the prediction accuracy is calculated for the target label “HEART FAILURE”. Resources include Cloud Foundry Application named “Node-RED LZTVT 2022– 07-25”, cloud foundry service named “node-red-lztvt-2022—cloudant-1658724”, services and softwares named “Machine Learning-rn with Dallas region”, Watson Studio – a7 with Dallas region” and Cloud Object Storage-nq. Adding a data source page supports two distinct file formats, namely table view and JSON file format. Adding a data source provides two distinct options; one for browsing the file from the local host and the other option is to select the project from the already created project. In the configuration details of time series forecasting, the option needs to be selected as “No”, hence the project is implemented at the time of creation. In this heart failure prediction model, the prediction type is chosen as a binary classification, positive class as Y, and the model is optimized for improving accuracy and run time. In experiment settings, various prediction model algorithms can be selected and the train test split ratio can be adjusted. The run experiment can be selected once all parameters get ready for project execution. Node-red service was created where the user can visit node-red application by invoking the Visit App URL. The node-red service is created in the form of node.js while the language is chosen.
5 Results and Discussions Figure 3 shows the pipeline leaderboard of various pipelines created in the experiment summary. The summary includes details about the rank of the algorithm, optimized accuracy with support of cross-validation, enhancements, and build time of all algorithms. Optimized accuracy includes 0.869 for pipeline 1, 0.869 for pipeline 2, 0.873 for pipeline 3, 0.873 for pipeline 4, 0.861 for pipeline 5, 0.869 for pipeline 6, 0.869 for
Effective Heart Disease Prediction and Classification Using Intelligent …
93
Fig. 3 Pipeline leaderboard
pipeline 7, and 0.873 for pipeline 8. Build time of all 8 pipelines includes 1 s, 13 s, 43 s, 1 min 30 s, 1 s, 9 s, 54 s, and 1 min 17 s. Figure 4 shows the illustration of a table view of prediction accuracy in terms of accuracy, binary classification prediction percentage in terms of confidence level distribution which has random scores as 92%, 88%, 82%, 81%, and 50%, and Fig. 5 shows the illustration of prediction percentage of various normalized values such as 50%, 60%, 70%, 80%, 90%, and 100%. Figure 5 shows the node-editor flows of the Heart Failure Prediction Model, here the JSON file is imported in the node-red editor which is designed with timestamp, pre-token, UI form, msg.payload, http request, Pre Prediction, parsing, function, etc. In the pre-token node, the existing API key needs to be updated with the API key created for the ML model. In the http request node, the scoring endpoint needs to be updated with the newly created ML model. The node flows are deployed and accessed in the dashboard to view the user form for entering patient details. Figure 6 shows the User Interface (UI) design in which the first record of user input 1 for all nine attributes along with submit and cancel button which resulted in 92.107% prediction accuracy and the status of the patient is “Not at Risk” as given in Fig. 7.
94
Fig. 4 Test prediction table view
Fig. 5 Patient dataset Json file ımported in node red
P. Mohana Priya and K. Balasubramian
Effective Heart Disease Prediction and Classification Using Intelligent …
95
Fig. 6 User input 1
Fig. 7 Prediction output
6 Conclusion and Future Work Cardiovascular diseases are the most common that leads to huge death rates for an account of 33% globally every year. It is of utmost importance to predict and classify whether the patient’s health is at the risk of heart failure. In this research
96
P. Mohana Priya and K. Balasubramian
work, IBM Auto AI service is used to predict and classify whether the patient has cardiovascular disease and the health status is at risk or not. It is found that IBM Auto AI predicts with an accuracy of 87% using a snap random forest algorithm. Future research directions include a comparative analysis of supervised and unsupervised algorithms to compare the prediction algorithms and classification algorithms, and to deploy in real time the health care centers like private and government hospitals as a web application. Also, smart health care centers are to be deployed with this web application where a chatbot is created to collect details of the patient for assisting them with suggestions related to medications and appointments.
References 1. Ali MM, Paul BK, Ahmed K, Bui FM, Quinn JM, Moni MA (2021) Heart disease prediction using supervised machine learning algorithms. Performance analysis and comparison. Comput Biol Med 136:104672 2. Adriaans BP, Wildberger JE, Westenberg JJ, Lamb HJ, Schalla S (2019) Predictive imaging for thoracic aortic dissection and rupture: moving beyond diameters. Eur Radiol 29(12):6396–6404 3. Wang F, Harel-Sterling L, Cohen S, Liu A, Brophy JM, Paradis G, Marelli AJ (2019) Heart failure risk predictions in adult patients with congenital heart disease: a systematic review. Heart 105(21):1661–1669 4. Ayatollahi H, Gholamhosseini L, Salehi M (2019) Predicting coronary artery disease: a comparison between two data mining algorithms. BMC Public Health 19(1):1–9 5. Contreras-Luján EE, García-Guerrero EE, López-Bonilla OR, Tlelo-Cuautle E, LópezMancilla D, Inzunza-González E (2022) Evaluation of machine learning algorithms for early diagnosis of deep venous thrombosis. Math Comput Appl 27(2):24 6. Ventura-Díaz S, Quintana-Pérez JV, Gil-Boronat A, Herrero-Huertas M, Gorospe-Sarasúa L, Montilla J, Vicente-Bártulos A (2020) A higher D-dimer threshold for predicting pulmonary embolism in patients with COVID-19: a retrospective study. Emerg Radiol 27(6):679–689 7. Bharti R, Khamparia A, Shabaz M, Dhiman G, Pande S, Singh P (2021) Prediction of heart disease using a combination of machine learning and deep learning. Comput Intell Neurosci 8. Javeed A, Khan SU, Ali L, Ali S, Imrana Y, Rahman A (2022) Machine learning-based automated diagnostic systems developed for heart failure prediction using different types of data modalities: a systematic review and future directions. Comput Math Methods Med 9. Abilez OJ, Tzatzalos E, Yang H, Zhao MT, Jung G, Zöllner AM, Wu JC (2018) Passive stretch induces structural and functional maturation of engineered heart muscle as predicted by computational modeling. Stem cells 36(2):265–277 10. Marijon E, Mirabel M, Celermajer DS, Jouven X (2012) Rheumatic heart disease. The Lancet 379(9819):953–964 11. Patel AA, Fine J, Naghavi M, Budoff MJ (2019) Radiation exposure and coronary artery calcium scans in the society for heart attack prevention and eradication cohort. Int J Cardiovasc Imag 35(1):179–183 12. Ozemek C, Laddu DR, Lavie CJ, Claeys H, Kaminsky LA, Ross R, Blair SN (2018) An update on the role of cardiorespiratory fitness, structured exercise and lifestyle physical activity in preventing cardiovascular disease and health risk. Prog Cardiovasc Dis 61(5–6):484–490 13. Matsumoto T, Kodera S, Shinohara H, Ieki H, Yamaguchi T, Higashikuni Y, ... Komuro I (2020) Diagnosing heart failure from chest X-ray images using deep learning. Int Heart J 61(4):781–786 14. Budholiya K, Shrivastava SK, Sharma V (2020) An optimized XGBoost based diagnostic system for effective prediction of heart disease. J King Saud Univ-Comput Inf Sci
Effective Heart Disease Prediction and Classification Using Intelligent …
97
15. Mahesh B (2020) Machine learning algorithms-a review. Int J Sci Res (IJSR). [Internet] 9:381– 386 16. Austin PC, Tu JV, Ho JE, Levy D, Lee DS (2013) Using methods from the data-mining and machine-learning literature for disease classification and prediction: a case study examining classification of heart failure subtypes. J Clin Epidemiol 66(4):398–407 17. Mustafa J, Awan AA, Khalid MS, Nisar S (2018) Ensemble approach for developing a smart heart disease prediction system using classification algorithms. Res Rep Clin Cardiol 9:33 18. Pisner DA, Schnyer DM (2020) Support vector machine. In: Machine learning. Academic Press, pp 101–121 19. Da Silva IN, Spatti DH, Flauzino RA, Liboni LHB, dos Reis Alves SF (2017) Artificial neural networks. Springer International Publishing, Cham, vol 39 20. Berrar D (2018) Bayes’ theorem and naive Bayes classifier. In: Encyclopedia of bioinformatics and computational biology: ABC of bioinformatics, pp 403 21. Montgomery DC, Peck EA, Vining GG (2021) Introduction to linear regression analysis. John Wiley & Sons 22. Speiser JL, Miller ME, Tooze J, Ip E (2019) A comparison of random forest variable selection methods for classification prediction modeling. Expert Syst Appl 134:93–101 23. Mohan S, Thirumalai C, Srivastava G (2019) Effective heart disease prediction using hybrid machine learning techniques. IEEE Access 7:81542–81554 24. Lei Z, Sun Y, Nanehkaran YA, Yang S, Islam MS, Lei H, Zhang D (2020) A novel data-driven robust framework based on machine learning and knowledge graph for disease classification. Futur Gener Comput Syst 102:534–548 25. Soni J, Ansari U, Sharma D, Soni S (2011) Predictive data mining for medical diagnosis: an overview of heart disease prediction. Int J Comput Appl 17(8):43–48 26. Methaila A, Kansal P, Arya H, Kumar P (2014) Early heart disease prediction using data mining techniques. Comput Sci Inf Technol J 28:53–59 27. Le HM, Tran TD, Van Tran LANG (2018) Automatic heart disease prediction using feature selection and data mining technique. J Comput Sci Cybern 34(1):33–48 28. Tarawneh M, Embarak O (2019) Hybrid approach for heart disease prediction using data mining techniques. In: International conference on emerging ınternetworking, data & web technologies. Springer, Cham, pp 447–454 29. Latha CBC, Jeeva SC (2019) Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques. Inf Med Unlocked 16:100203 30. Pan C, Poddar A, Mukherjee R, Ray AK (2022) Impact of categorical and numerical features in ensemble machine learning frameworks for heart disease prediction. Biomed Signal Process Contr 76:103666 31. Hasanova H, Tufail M, Baek UJ, Park JT, Kim MS (2022) A novel blockchain-enabled heart disease prediction mechanism using machine learning. Comput Electr Eng 101:108086 32. Chang V, Bhavani VR, Xu AQ, Hossain MA (2022) An artificial intelligence model for heart disease detection using machine learning algorithms. Healthc Analyt 2:100016
A Machine Learning Approach for Aeroponic Lettuce Crop Growth Monitoring System R. Gowtham and R. Jebakumar
Abstract The traditional agricultural system fails to solve the problem of food insufficiency and food safety among various countries in the world to supply fresh and clean food items for the fast-growing population due to global population expansion and climate changes. So, a simple and efficient solution for increasing agricultural production is called vertical farming, which involves growing crops in controlled indoor environments. That, too, especially in vertical farming, Aeroponics can be considered the most emerging and resource-saving methodology for cultivating crops in a limited area within a minimal amount of time, with precise light, humidity, nutrients, and Temperature. An aeroponics system allows the roots to dangle freely and openly in the air. However, the essential nutrients are supplied to the aeroponic plants at various intervals in the form of mists using an atomization nozzle in which the nozzle creates a fine spray mist of different droplet sizes. Thus, aeroponic crop cultivation can be considered the most efficient, promising, significant, economic, and convenient soil-less plant-growing system, among other smart farming methods. An automated Lettuce Crop Growth Monitoring System (LCGMS) is proposed in this paper to automate aeroponics system using IoT and machine learning which uses IoT sensors for capturing certain parameters such as pH, electrical conductivity, turbidity, temperature, PPM, and monitoring the growth environment with the help of a machine learning algorithm called support vector regression to process the data from the IoT system which is keenly responsible for the automation of the lettuce environment without any human intervention. A higher prediction accuracy score of 82.07% is produced by the proposed model resulting in better yield prediction. Keywords Aeroponics · Indoor environments · Lettuce · Machine learning · Support vector regression · Vertical farming
R. Gowtham (B) · R. Jebakumar Department of Computing Technologies, School of Computing, SRM Institute of Science and Technology, Kattankulathur, Chengalpattu, India e-mail: [email protected] R. Jebakumar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 J. S. Raj et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 665, https://doi.org/10.1007/978-981-99-1726-6_9
99
100
R. Gowtham and R. Jebakumar
1 Introduction Agriculture is the greatest consumer of water on a global scale, accounting for around 70% of the overall demand [1, 2]. While the estimated average use of drinking water per person is around 2–5 L, the average daily production of food for one person takes over 5,000 L of water [3, 4]. A circular economy is a model of production and consumption that comprises of sharing, leasing, reusing, repairing, refurbishing, and recycling existing materials and goods for as long as feasible. This model aims to reduce waste and preserve natural resources. It intends to address such global concerns as climate change, the loss of biodiversity, waste, and pollution are what has to be done right now [5]. The use of advanced technologies has enhanced human’s capabilities to react effectively to the most recent difficulties and risks caused by restricted resources. Techniques of vertical farming, such as hydroponics and Aeroponics, are the examples of systems that are being considered as potential alternatives to conventional agricultural practices. Concerns about security, long-term viability, and policy are of the utmost importance to Egyptian interests when it comes to water and agriculture [6]. Neural network models are used to govern the development of hydroponically grown plants [7]. In some methods, for instance, the nutrient film technique, a new solution of nutrients is continually delivered to the crops to make up for lost nutrients and water. This is done in order to balance the absorption of these minerals and the nutrient substances by the plants [8]. Lettuce crop which is a perennial crop that belongs to the family Asteraceae. It is produced most often as a leafy vegetable, but it may also be grown for its stem and seeds. There is a large selection of lettuce crops that can be purchased, including arugula, Belgian Endive, Chrysanthemum greens, Dandelion greens, Frisee, little Gems Lettuce, and Mache [9]. The Internet of Things has the potential to automate hydroponics and aeroponics systems [10]. It is likely that the increased concentration of dissolved oxygen in the nutrient solution in the Aeroponic system contributes to the significantly accelerated rate of lettuce growth [11]. In recent years, several techniques for estimating crop yields based on traditional methods have been developed. These techniques include models of process-oriented crop simulation and statistically based models that investigate crop production [12]. In recent years, approaches from the area of machine learning have been used in the agricultural research domains. These techniques have been utilized in the categorization of crops, monitoring of crop development, and prediction of yield in certain regions [13]. The foundation has been laid for future data-driven AI and robotic sustainable agriculture, which may now begin to take shape [14, 15]. The field of machine learning focuses on enhancing the capacity of computers to carry out a sequence of activities independently. Various applications of ML algorithms can be found in vertical farming techniques such as hydroponics and aeroponics systems. It provides computers with instructions to perform a wide variety of challenging tasks, such as regression, diagnosing, planning, and learning from the accumulated data [16]. An article has done their research work on the estimation
A Machine Learning Approach for Aeroponic Lettuce Crop Growth …
101
of soybean production on a succession of remotely detected photos and employed convolutional neural networks to do so. In addition, a deep neural network was used in order to forecast the maize production for the period of 2008–2016, and the findings show that the DNN provided superior outcomes than those of shallow neural networks [17]. Due to very high installation costs associated with building an Aeroponic system, it is essential to accurately anticipate crop cultivation prior to the development of these systems using algorithms such as machine learning. Here, the Support Vector Regression (SVR) algorithm is utilized for the implementation purpose because of its simplicity in implementation and easy removal of outliers. Then, based on the performance of the SVR model, the proposed model is developed that produces high prediction accuracy with a minimum error rate and the proposed model is also capable of removing outliers. Thus, the primary goals of this proposed research work are as follows: (1) to collect the data, use a variety of IoT sensors installed in the lettuce crop growth environment. (2) to process the collected data, use an SVR model with different kernel functions based on the input parameters pH, EC, turbidity, temperature, and PPM and analyze the results of these models. (3) to determine which model is most accurate in predicting crop yields.
2 Related Work 2.1 Methods for Estimating the Outcome of a Harvest It is crucial to have an accurate estimate of the crop yield, but this is a highly tough and demanding process due to the inclusion of a wide variety of interconnected environmental variables. Changes in the weather affect plant development at different stages, which results in substantial fluctuations in output within a single growing season. It is difficult to gauge agricultural yields with precision because to factors including the regional heterogeneity of soil qualities and farmer decisions like the frequency with which irrigation, insect and fertilizer treatments, crop rotation, and land preparation procedures are implemented. Figure 1shows the yield estimation methods. Crop Growth Models The yield of a crop is constantly influenced by its surrounding environment. The effects of these variables change during the development of a crop. The eventual output or yield of crops may be predicted via the use of mathematical models that capture the many interactions between plant physiological systems and the environment. Many mechanistic models, but not all, are tailored to a single crop [18, 19].
102
R. Gowtham and R. Jebakumar
Fig. 1 Yield estimation methods
Data-Driven Model The alternative, the empirical method, is more user-friendly and convenient than the crop growth model. In this method, we take into account crop yield data from several years and identify the elements that have the greatest impact or contribute most to the changes in yield. These methods are simple to implement, cost-effective, and don’t need an understanding of the plant’s physiology in order to be successful [19]. Therefore, all methods have their advantages and disadvantages, and a unified framework that can describe the non-linear connection between climate, nutrient solution parameters, soil variables, biomass, and crop production is urgently required.
2.2 Crop Varieties that Can be Grown There is a wide variety of crops that can be cultivated in Aeroponic Vertical farming, namely fruits, vegetables, herbs, flowers, etc. Some of them are listed in Fig. 2.
2.3 Machine Learning’s Emergence in Yield Prediction Machine learning is an emerging area of computer science that shows great promise in a variety of fields of study. Those who work with raw data for the sake of prediction or trend discovery make extensive use of it. Due to the sheer volume and increasing velocity of agricultural data, machine learning methods may be of considerable assistance in making sense of it all. One may classify the various ML methods into two major groups: supervised and unsupervised.
A Machine Learning Approach for Aeroponic Lettuce Crop Growth …
103
Fig. 2 Different crop varieties
(a) In supervised machine learning, a computer is trained to make judgments based on new or unknown facts via supervised learning, in which it is exposed to training data and expected to draw conclusions. Some examples of supervised machine learning techniques include the artificial neural network, the Bayesian network, the decision tree, the support vector machine, the identification of the third most likely neighbor, and the hidden Markov model. (b) In unsupervised learning, machines are trained to draw conclusions from the data on their own, without being given any labels or other guidance. Unsupervised ML methods include, for instance, self-organizing maps, partial-based clustering, hierarchical clustering, and k-means clustering.
2.4 The Efficiency of Yield-Affecting Factors A wide range of factors, including climate and nutrient solution composition, contribute to crop productivity. A crop’s yield may vary for numerous reasons, some of which are beyond a farmer’s control (climate, soil, and nutritional solutions), while others are within his or her sphere of influence (farming procedures, fertilizer, irrigation frequency, etc.). The following section is a summary of previous research in this topic.
104
R. Gowtham and R. Jebakumar
A study demonstrated that higher average temperatures throughout the growing season had a detrimental effect on winter wheat yields, whereas GSP has a positive effect. The influence of each variable was measured using the Cobb–Douglas production function [20]. In order to evaluate the impact of varying climatic factors on rice production in India’s Raipur area, a linear regression model was used. The influence of factors on plant development was examined at several phases, including seedling, tillering, 50% blooming, and maturity. Several factors and developmental stages showed different correlations (both positive and negative) [21]. Thirdly, researchers examined how changes in land use practises affected surface temperature fluctuations and the subsequent impact on rice and wheat crop yields. However, only three of Punjab’s geoclimatic areas were included in the research. Water, vegetation, urban, and undeveloped land cover (LULC) were the four main divisions used to classify the satellite data. It was discovered that places where the land use changed from agricultural, plain soil, and woodland to urban saw a rise in temperature. The region’s Normalized Difference Vegetation Index (NDVI) was shown to be positively correlated with rice and wheat production, but substantially inversely correlated with LST [22]. Another research looked at how different climatic conditions and technical developments (such as increased pesticide usage and the introduction of high-yielding cultivars) affected crop yields in different parts of Haryana. Haryana’s pre-harvest yields were predicted using principal component analysis (PCA) across different agro-climatic zones. Different parts of Haryana, India, were selected to represent four distinct climate zones. In majority of the regions, the predicted yield(s) from the constructed models were in agreement with the DOA wheat yield estimates [23]. Mukherjee et al. summarized the impact of varying climates on wheat production throughout the states of Northwest India. Daily air temperature, the standard precipitation and evapotranspiration index, and groundwater variability were all included in the research. Due to an increase in the number of days with temperatures exceeding 35 degrees Celsius during the maturation period, wheat crop production decreased. Less rain fell throughout the wheat growing season, leading to a shortage of irrigation water (November–March). Therefore, high temperatures, acute lack of water, and decreased irrigations contributed to a general decrease in yield [24]. Priya et al. investigated how well the random forest method predicted rice harvests in the kharif and rabi seasons. R-Tool was used for data analysis, and the results indicated a bright future for the random forest approach in predicting large-scale agricultural yields [25]. Researchers from separate research in India’s Tamil Nadu state devised a strategy for predicting agricultural yields using data mining with association criteria. The results showed that the anticipated model accurately predicted yields [26]. Various fuzzy models based on various partitions of the Universe of Discourse and their impact on wheat crop production prediction were investigated. By utilizing the actual production as the universe of discourse and intervals-based partitioning, this research proposes a technique for wheat crop forecasting. Both the mean square and average forecasting error rate were determined to be negligible when using the suggested strategy. Wheat yield forecasts made using the suggested fuzzy technique have been shown to be accurate and time-saving [27].
A Machine Learning Approach for Aeroponic Lettuce Crop Growth …
105
The support vector regression model was used to predict wheat crop production. Nine different base learner models and two different ensemble models were put through their paces. We found that, of the nine models tested, SVR had the highest learning efficiency, and that ensemble models, although reporting significant increases in both cost and complexity, did not significantly enhance accuracy. Improvements in model performance were seen across the board as training data volumes increased [28]. Using a time series modeling technique, Nath et al. investigated the performance of Box Jenkin’s Autoregressive Integrated Moving Average model, ARIMA (1, 1, 0), to forecast wheat output in India. Predictions for 10 ahead-of-time years were produced using yield data from 1949–1950 to 2016– 2017 (68 years). Since the model can be applied only to stationary data, the time series was differentially transformed to create stationary data. For the next decade (from 2017–2018 to 2026–2027), the ARIMA (1, 1, 0) model proved to be an accurate predictor, leading the researchers to recommend an increase in output [29, 30].
3 Materials and Methods Lettuce Crop Growth Monitoring System (LCGMS) Lettuce Crop Growth Monitoring System (LCGMS) was proposed for the automation of the lettuce growth environment which was being cultivated in the aeroponic system. The detailed explanation of the implementation of the LCGMS model was given in Fig. 9, initially from the collection of the dataset to the final yield prediction of the lettuce crop. The explanation is listed in the sub-sections.
3.1 Input Parameters More number of input parameters are essential for the machine learning model to get trained, also for validation and testing in order to produce better accuracy by those machine learning models, specifically SVR model. One of the most important input parameters which is mainly focused on the growth of lettuce crop in aeroponic vertical farming is nutrient solution. The parameters such as light, humidity, temperature, moisture, and CO2 are also considered as the attributes other than the nutrient solution. Nutrient Solution The important factor that has to be considered is nutrient solution. In outdoor farming practices, the farmers need not provide the required nutrients to the plants for growth manually since the plants are able to observe the essential nutrients directly from the soil. In vertical farming, in order to achieve the good yield, the plants should
106
R. Gowtham and R. Jebakumar
be monitored each stage. When we opt for lettuce crop, the necessary nutrients required for the better crop growth, the nutrients are categorized into micronutrients and macronutrients as tabulated in Tables 1 and 2. pH of the Nutrient Solution The pH value of the nutrient solution is the measure of the acidic or basic nature of the solution which is to be supplied into the vertical farming tower. The nutrient solution management in the aeroponic tower focuses more on maintaining the optimum range of the pH value as it shows a greater impact on the growing plants when it is too acidic or too basic. For the lettuce crop growth, the optimal target value is 5.8. The visual representation of the pH feature is shown in Fig. 3. Total Salt Concentration or Electrical Conductivity The preferred EC value for the growth of lettuce crop nutrient solution is 1.3 ds/m. In vertical farming aeroponic system, the essential nutrients are supplied to the plants through the nutrient solution. The nutrient salts when mixed with water, dissolves, and the mineral salts break into ions such as the positive and negative ions. These ions are completely responsible for the conduction of electricity in the solution. Figure 4 is the graph representing the lettuce crop’s growth over a range of EC values. For growing the lettuce crop in indoor environment, the concentration of pH and EC values of the nutrient solution should be monitored and maintained in a particular condition. In Fig. 4, x-axis represents the distribution of EC where the maximum observed values range between 0 and 4 from the utilized dataset.
Table 1 Micronutrients
Table 2 Macronutrients
Micronutrients Nutrient name
Chemical Formula
Manganese sulfate
MnSO4 .4H2 O
Copper sulfate
CuSO4. 5H2 O
Zinc sulfate
ZnSO4. 7H2 O
Boric acid
H3 BO3
Ammonium Molybdate
(NH4 )6 MO7 O24. 4H2 O
Macronutrients Nutrient name
Chemical Formula
Calcium nitrate
Ca(NO3 )2. 4H2 O
Potassium Nitrate
KNO3
Iron Chelate
Fe-EDTA
Mono Potassium Phosphate
KH2 PO4
Magnesium Sulfate
MgSO4. 4H2 O
A Machine Learning Approach for Aeroponic Lettuce Crop Growth …
107
Fig. 3 Distribution of pH in the dataset
Fig. 4 Distribution of EC in the dataset
Parts Per Million(PPM) The PPM of the nutrient solution in the aeroponics system is the measure of strength. It measures the milligrams of nutrients which are found in every 1 liter of water. When the term PPM is used in the crop cultivation, it refers to the concentration of minerals (soluble) in the water. The desired value for a lettuce crop growth is 100 N–150 N. Figure 5 is the graph showing different PPM ranges for the lettuce crop growth. Temperature For a better growth of the crops in the indoor farming, the parameter temperature is also to be considered. The preferred temperature for the growth of lettuce green leafy vegetable ranges between 65˚F and 75˚F which will be perfectly fine and the desired temperature range is to be around 70˚ F. Figure 6 is the graph showing the lettuce crop’s growth over a range of temperatures. Turbidity Turbidity can be defined as the cloudiness or haziness of a fluid caused by the large number of individual particles which are invisible to naked eyes which is similar to
108
R. Gowtham and R. Jebakumar
Fig. 5 Distribution of PPM in the dataset
Fig. 6 Distribution of temperature in the dataset
smoke in the air. Turbidity acts as the key measurement of water quality. The generic range of turbidity of crop growth is 100–500 nm, its distribution is shown in Fig. 7. Fig. 7 Distribution of turbidity
A Machine Learning Approach for Aeroponic Lettuce Crop Growth …
109
3.2 Correlogram of the Input Variables The correlogram is the graphical representation of the correlation between the input variables that are being used by the machine learning model. The diagonal represents the maximum value of 1 which means a strong correlation between the input variables. For our lettuce crop input dataset, the correlogram is represented in Fig. 8.
4 Proposed Model Lettuce Crop Growth Monitoring System-Support Vector Regression Model (LCGMS-SVR) The LCGMS-SVR model is used to predict the lettuce crop yield which is being grown in indoor aeroponic vertical farming. The SVR is the regression algorithm that is classified under supervised machine learning algorithms which are mainly used for the purpose of regression. In our experiment, the crop growing indoor environmental factors such as pH, temperature, EC, PPM, light, and turbidity were used from the respective sensors. The SVR algorithm has the capability to handle both classification and regression functions on linear and non-linear datasets. The
Fig. 8 Correlogram of the input dataset
110
R. Gowtham and R. Jebakumar
Fig. 9 Architecture diagram of yield prediction system
proposed machine learning model is the improved methodology of the SVR model and consists of different modules which are effectively involved for the purpose of lettuce yield prediction. The detailed architecture of the LCGMS-SVR model is visually represented in Fig. 9.
4.1 Data Pre-processing One of the most important steps in machine learning algorithms is data preprocessing. In order to eliminate the influence of duplicate data, abnormal data, and missing data from the original data, it is necessary to preprocess the time series data before feeding the data into the machine learning model. Removing the outliers: Though there are many numbers of pre-processing techniques available, the method which has been adapted in the research work was removing the outliers. The dataset parameters used, namely pH, turbidity, EC, temperature, and PPM with respect to the outliers present in the dataset are pre-processed and the outliers have been removed.
4.2 Splitting of the Dataset The dataset used here is the growth parameters such as pH, light, EC, temperature, PPM, and turbidity from the growth environment of the lettuce crop. The datasets
A Machine Learning Approach for Aeroponic Lettuce Crop Growth …
111
were collected from the respective sensors which are deployed in the IoT system. The entire dataset can be partitioned into two categories such as the training dataset and the test dataset. The splitting ratio of our model is that, for training, 80% of the dataset is assigned and remaining 20% is allocated for testing purpose. Training of the machine learning model also depends upon the train–test split ratio, as more training of data could achieve better accuracy.
4.3 Training Model The next step of the implementation process is that training the model with the chosen machine learning algorithm. Here, before choosing the specific algorithm for processing, the data is processed using the trial and error method, and based on the algorithm accuracy and error metrics, the model can be finalized for implementation. Parameter Tuning in Kernel Functions The data processing can be done by using the number of SVR Kernel functions. The SVR algorithm uses a group of mathematical functions that are known as kernels. The function of a kernel is to transform the input data into the desired form for processing purposes. There are different kernel functions that can be chosen by the user during the time of implementation. The types of kernel functions used in our research work are linear, non-linear, polynomial, radial basis function (RBF), and sigmoid. Linear Kernel Function: The basic form and the most commonly used kernel function is the linear kernel function. It is mostly preferred when there are lots of features in the dataset. The linear kernel is mainly focused on text classification problems since many of the classification problems are linearly separated. Polynomial Kernel Function: The next type of SVR kernel function is the polynomial kernel function. The linear kernel function can be represented in generalized form using the polynomial function. It is not commonly used since it does not produce efficient and accurate results. Radial Basis Function (RBF): The RBF is also the most preferred kernel function next to the linear kernel function during the time of implementation. Radial Basis Function (RBF): The RBF is also the most preferred kernel function next to the linear kernel function during the time of implementation. In the case of classification and proper separation of the non-linear data, this RBF function is highly utilized. The important factor in the RBF function is the setting of the gamma values. The gamma value also varies based on the user’s needs from [1 to n]. Sigmoid Kernel Function: As the activation functions in the neurons, the sigmoid kernel function is mostly used in neural networks. The process of tuning the input parameters based on the user requirements in order to increase the performance accuracy of the model is called parameter tuning. In our
112
R. Gowtham and R. Jebakumar
implementation, we have used parameter tuning for comparing the accuracies of the developed model. Once the developed model is trained with the dataset, the model undergoes a subsequent process called testing where the trained model is tested with the test dataset. Then, based on the test score, the crop yield is predicted by the finalized machine learning model.
5 Results and Discussions For experimenting with the lettuce crop growth monitoring system, the Anaconda Navigator software is used which is the collection of a number of applications, packages, and environments where Python code is utilized for implementing. For the research work, particularly, the Jupyter NoteBook platform is used for running the Python code. The implementation of the work is carried out based on the collected dataset in real time from the aeroponic lettuce crop growth tower. The prediction parameters used here were pH, EC, temperature, PPM, turbidity, and light. The dataset was partitioned such that it provides a way for the better prediction accuracy of the machine learning model. Evaluation of the SVR Machine Learning Model The evaluation metric used here is the prediction accuracy score. Linear Kernel Function and Polynomial Kernel Function The linear kernel produces the greatest accuracy of about 82.07% and the polynomial kernel function produces 29.65% for degree 8. Radial Basis Function (RBF) Since the accuracy score entirely depends on the gamma values, for our experiment, the RBF function produces a different set of accuracy scores for different gamma values as represented in Table 3 its analysis are depicted in Fig. 10. Sigmoid Kernel Function The accuracy score of the proposed model using the sigmoid kernel function is 28.48%. Table 3 Accuracy Scores of RBF kernel based on the Gamma Values
Gamma values
Accuracy Score
0
29.65
1
52.61
2
48.54
3
45.34
20, 30, 40,…90
40.11
100, 200,…1000, 2000
39.53
A Machine Learning Approach for Aeroponic Lettuce Crop Growth …
113
Fig. 10 Accuracy analysis of RBF kernels
5.1 Comparative Analysis The proposed LCGMS-SVR model produces different accuracies for the different kernel functions. The comparative analysis of the proposed model based on the different kernel functions is represented in Table 4 and Fig. 11 where, for each SVR algorithmic kernel, the accuracy scores differ, and the difference between the accuracy values are represented in the form of bar graph. So, from Table 4 and Fig. 11, it is inferred that the proposed model produces better results of 82.07% when compared to the other kernel functions. The minimum accuracy score of 28.48% is produced by the sigmoid kernel function. Table 4 Accuracy score of the proposed model
SVR kernels
Accuracy in percentage
Sigmoid
28.48
RBF without Gamma value = 0
29.65
RBF with Gamma value = 1
52.61
Polynomial
29.65
Proposed model
82.07
114
R. Gowtham and R. Jebakumar
Fig. 11 Prediction accuracy of the LCGMS model
5.2 Execution Time of the Model The time taken to run the proposed model was 0 h 2 min and 1.81 s. When compared to the other algorithms, the proposed model produces prediction output in less amount of time, i.e., in minutes.
6 Conclusion Estimating crop yields is crucial for many reasons, including helping farmers to make informed decisions about which crops to plant and how much profit they can expect to see from those crops, and informing policy decisions made by the government of any given country during the import and export process. There have been a lot of research done on various crops and estimating methods. Green leafy vegetables like lettuce, which can be cultivated quickly utilizing the aeroponic technique compared to traditional or indoor methods, were selected for this experiment. With the advent and widespread use of machine learning algorithms, old ways are rapidly becoming defunct. Machine learning algorithms like support vector regression has the ability to automatically collect sensor information and recognize patterns from both organized and unstructured data. Because of these features, it is ideal for research projects that need to extrapolate results into the future. In addition, it can provide a huge quantity of data about many aspects of nature, which is especially useful in the agricultural sector, where data on climate, soil, and fertilizer solution management entail geographical and temporal changes. Depending on the scope of the agricultural cultivation, ML
A Machine Learning Approach for Aeroponic Lettuce Crop Growth …
115
models might be a quick tool for estimating crop production and evaluating disasters across a big region. In our proposed research work, the LCGMS regression model uses prediction accuracy as the evaluation parameter for measuring the effectiveness of the algorithm. Finally, the proposed regression model can be opted for the prediction of the lettuce crop yield with a better accuracy score.
7 Future Scope In future work, the proposed model will be improved further by providing the other lettuce growth parameters and by conducting many more numerical experiments to produce better and more accurate prediction results. According to the application scenario, various evaluation indexes and data features can be selected in the subsequent research. Also, different transformations can be applied. Further, this will help the farmerpreneurs to increase the productivity of the lettuce crop within the stipulated amount of time period.
References 1. Mokhtar A, El-Ssawy W, He H, Al-Anasari N, Sammen SS, Gyasi-Agyei Y, Abuarab M (2022) Using machine learning models to predict hydroponically grown lettuce yield. Front Plant Sci 13 2. Majid M, Khan JN, Shah QMA, Masoodi KZ, Afroza B, Parvaze S (2021) Evaluation of hydroponic systems for the cultivation of Lettuce (Lactuca sativa L., var. Longifolia) and comparison with protected soil-based cultivation. Agric Water Manag 245:106572 3. Kloas W, Groß R, Baganz D, Graupner J, Monsees H, Schmidt U, Rennert B (2015) A new concept for aquaponic systems to improve sustainability, increase productivity, and reduce environmental impacts. Aquac Environ Interact 7(2):179–192 4. Manju M, Karthik V, Hariharan S, Sreekar B (2017) Real time monitoring of the environmental parameters of an aquaponic system based on Internet of Things. In: 2017 third ınternational conference on science technology engineering & management (ICONSTEM), pp 943–948. IEEE 5. Wei Y, Li W, An D, Li D, Jiao Y, Wei Q (2019) Equipment and intelligent control system in aquaponics: a review. IEEE Access 7:169306–169326 6. Bakeer GAR, Hegab K, El-Behairy U, El-sawy W (2015) Effect mıcro ırrıgatıon systems, ırrıgatıon perıod and seed thıckness on barley sprout productıon. Misr J Agric Eng 32(2):589– 610 7. Mehra M, Saxena S, Sankaranarayanan S, Tom RJ, Veeramanikandan M (2018) IoT based hydroponics system using Deep Neural Networks. Comput Electron Agric 155:473–486 8. Neocleous D, Savvas D (2019) The effects of phosphorus supply limitation on photosynthesis, biomass production, nutritional quality, and mineral nutrition in lettuce grown in a recirculating nutrient solution. Sci Hortic 252:379–387 9. https://www.facebook.com/thespruceeats (2016) 16 lettuce choices to make your favorite salad even better. The Spruce Eats. https://www.thespruceeats.com/varieties-of-lettuce-4065606 10. Johnson DM (2014) An assessment of pre-and within-season remotely sensed variables for forecasting corn and soybean yields in the United States. Remote Sens Environ 141:116–128
116
R. Gowtham and R. Jebakumar
11. Puccinelli M, Landi M, Maggini R, Pardossi A, Incrocci L (2021) Iodine biofortification of sweet basil and lettuce grown in two hydroponic systems. Sci Hortic 276:109783 12. Araújo EM, de Lima MD, Barbosa R, Alleoni LRF (2019) Using machine learning and multielement analysis to evaluate the authenticity of organic and conventional vegetables. Food Anal Methods 12(11):2542–2554 13. Sadeghipour O, Aghaei P (2013) Improving the growth of cowpea (Vigna unguiculata L. Walp.) by magnetized water. J Biodivers Environ Sci 3(1):37–43 14. Saiz-Rubio V, Rovira-Más F (2020) From smart farming towards agriculture 5.0: a review on crop data management. Agronomy 10(2):207 15. Mei X, Pan E, Ma Y, Dai X, Huang J, Fan F, Ma J (2019) Spectral-spatial attention networks for hyperspectral image classification. Remote Sens 11(8):963 16. Kang Z, Catal C, Tekinerdogan B (2020) Machine learning applications in production lines: a systematic literature review. Comput Ind Eng 149:106773 17. You J, Li X, Low M, Lobell D, Ermon S (2017) Deep gaussian process for crop yield prediction based on remote sensing data. In: Thirty-first AAAI conference on artificial intelligence 18. Basso B, Cammarano D, Carfagna E (2013). Review of crop yield forecasting methods and early warning systems. In: Proceedings of the first meeting of the scientific advisory committee of the global strategy to ımprove agricultural and rural statistics. FAO Headquarters, Rome, pp 15–31 19. Chen Y, McVicar TR, Donohue RJ, Garg N, Waldner F, Ota N, Li L, Lawes R (2020) To blend or not to blend? A framework for nationwide landsat–MODIS data selection for crop yield prediction. Remote Sens 12(10):1653. https://doi.org/10.3390/rs12101653(Chenetal.,2020) 20. Geng X, Wang F, Ren W, Hao Z (2019) Climate change impacts on winter wheat yield in Northern China. Adv Meteorol 2019:1–12. https://doi.org/10.1155/2019/2767018(Gen getal.,2019) 21. Jain A, Chaudhary JL, Beck MB, Kumar LR (2019) Developing regression model to forecast the rice yield at Raipur condition. J Pharmacogn Phytochem 8:72–76 22. Atin Majumder PK, Kingra RS, Singh SP, Pateriya B (2020) Influence of land use/land cover changes on surface temperature and its effect on crop yield in different agro-climatic regions of Indian Punjab. Geocarto Int 35(6):663–686. https://doi.org/10.1080/10106049.2018.1520927 23. Jeev S, Verma P, Verma U (2018) Development of weather based wheat yield forecast models in Haryana. Int J Curr Microbiol App Sci 7(12):2973–2978. https://doi.org/10.20546/ijcmas. 2018.712.340 24. Priya P, Muthaiah U, Balamurugan M (n.d.) Predıctıng yıeld of the crop usıng machıne learnıng algorıthM. Int J Eng Sci Res Technol 7(4):1–7 25. Manjula E, Djodiltachoumy S (2017). A model for prediction of crop yield. Int J Comput Intell Inform 6(4). https://www.periyaruniversity.ac.in/ijcii/issue/Vol6No4Mar2017/M5_PID 0370.pdf (Manjula & Djodiltachoumy, 2017) 26. Garg B, Aggarwal S, Sokhal J (2018) Crop yield forecasting using fuzzy logic and regression model. Comput Electr Eng 67:383–403. https://doi.org/10.1016/j.compeleceng.2017.11.015 (Garg et al., 2018) 27. Mukherjee A, Wang S-Y, Promchote P (2019) Examination of the climate factors that reduced wheat yield in Northwest India during the 2000s. Water 11(2):343. https://doi.org/10.3390/ w11020343 (Mukherjee et al., 2019) 28. (PDF) Estimating wheat yields in Australia using climate records, satellite image time series and machine learning methods (2020). ResearchGate. https://doi.org/10.1016/j.isprsjprs.2019. 11.008 (“(PDF) Estimating wheat yields in Australia using climate records, satellite ımage time series and machine learning methods,” 2020) 29. Nath B, Bhattacharya D, Correspondence D, Bhattacharya, Dhakre D (2019) Forecasting wheat production in India: an ARIMA modelling approach. ~ 2158. J Pharmacogn Phytochem 8(1):2158–2165. https://www.phytojournal.com/archives/2019/vol8issue1/PartAJ/ 7-6-238-422.pdf 30. Wang SKAL, Khaki S (2019) Crop yield prediction using deep neural networks. In: Industrial and manufacturing systems engineering. Iowa State University
A Novel Approach for Privacy Preserving Technique in IoT Fog and Cloud Environment Ravula Arun Kumar , Gillala Rekha, and Kambalapally Vinuthna
Abstract One problem with the Internet of Things (IoT) is that user data and identities could be used in ways that aren’t what they were meant to be used for. Researchers have come up with different ways to lower privacy risks. But most of the existing solutions still have problems. They also have heavy cryptosystems and policies that are applied on both sensor devices and in the cloud. To solve these privacy problems, fog computing has been added to the edges of IoT networks to help with low latency, computation, and storage. This research uses deep learning models and hashing techniques that give the properties of User validation, Data confidentiality, Data verifiability, and Data integrity, and Data obfuscation that are done using MQTT protocol which derives more confidentiality, making data obscure. The future scope will be applied to AI models which can optimise the setup time for learning and training classification models. Keywords Deep learning · Secure hash algorithm · Cypher attribute-based set encryption · Cloud · Lightweight message authentication code · Obfuscation
1 Introduction Today, cloud computing is seen as an important way to use computers. Using techniques like distributed computing, virtualisation, and so on can give users elastic R. A. Kumar (B) Research Scholar Department of CSE Koneru Lakshmaiah Education Foundation, Green Fields Vaddeswaram, Guntur, Andhra Pradesh 522502, India e-mail: [email protected] G. Rekha Associate Professor Department of CSE Koneru Lakshmaiah Education Foundation, Hyderabad, Telangana 500075, India e-mail: [email protected] K. Vinuthna Associate Professor Department of CSE, Neil Gogte Institute of Technology, Hyderabad, Telangana 500039, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 J. S. Raj et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 665, https://doi.org/10.1007/978-981-99-1726-6_10
117
118
R. A. Kumar et al.
computing resources. “Fog computing” refers to a type of computer system somehow which doesn’t host and run everything from a central cloud but instead runs on the ends of a network [1, 2]. Fogging relates to a type of distributed system computingbased infrastructure, in which ensemble application services are run on the network in active smart and large devices, and other such huge applications run on the cloud. Likewise, it is the process of running some big and small applications and resources at the edge of the user cloud instead of running all applications in the cloud. Fog computing makes it easier for end devices like IoT devices and such cloud computing data centres to run, compute, store, and perform other services. But Fog will never be able to replace the cloud. Instead, it just adds security to the cloud environment [3, 4]. IoT devices are usually real-world things that are small and have limited storage and processing power. This causes problems with reliability, processing performance, higher security, and privacy. On the other hand, cloud has almost unlimited processing storage and processing power, is secure by privacy, and has security [5]. Cloud in general computing gives unlimited scale storage and computing power through a resource-shared pool that can be easily accessed by any IoT fog kind of application. The fog model is useful in advertising, set computing, general entertainment, and other fields. It works well for data analytics and collecting review data from many different points. Setup boxes and access points make it easy to do end services. It makes QoS better and response time shorter. Fogging’s main goal is to put information set close to the base user at the network’s edge. Fogging reduces the amount of data that moves across the network, which cuts down on data traffic, cost, and latency. It also gets rid of the extra work that comes with centralised computing systems and keeps data closer to the end user, which makes it safer. When it comes to security and privacy, IoT and the cloud use a number of different methods. Homomorphic encryption scheme, which has been around for a long time, is used in IoT. The method uses fully preservative-based homomorphic encoding with fully common additive threshold secret key sharing to make it possible to combine many sensed variables in a safe and efficient way without revealing any of the other variables [8]. Fine-Grained Secure Privacy-Preserving Scheme Query is another method that already exists. It protects the privacy of both the location via a grounded service provider and the mobile being user. The scheme reduces the cost of computation and communication a lot and makes sure that the latency is low [9]. In IoT, the multi-cloud-based outsourced Attribute-based Encryption (ABE) is being used to solve a variety of problems that come with the multi-cloud broadcast. By making sure that multiple clouds work together, these schemes make it much easier for IoT devices with limited resources to do their work [10, 11]. Cloud Strike research on risk-driven fault detection has the chance of sinkhole attack [12]. PROSAS helped in the auditing and verification of data, but the response time was delayed for the initial setup [13]. In this research proposal, a way to transmit data between fog nodes and IoT devices that protects the privacy and makes it safe to access is suggested. Privacy-preserving techniques are used to cut down on the cost of computing, ease the load on IoT devices that are already limited, and provide high security.
A Novel Approach for Privacy Preserving Technique in IoT Fog …
119
2 Literature Survey Researchers have come up with a lot of insights for keeping security and general privacy in IoT fog computing. This section discusses some of the most important scheme contributions via existing related literature as follows. Citation
Title
Author
Journal name
Method used
Key findings
Gaps
[1]
Cloud-based Centric Authentication for Wearable Healthcare Monitoring System
Jangirala Srinivas,
IEEE Transactions on Dependable and Secure Computing Sep 2020
User Authentication scheme
Mutual Authentication Tool-Based Automated validation of protocols
Automation of validating protocols requires learning with different combination
[2]
Cloud Strike: Chaos Engineering for Security and Resiliency in Cloud Infrastructure
Kennedy A. Torkura
IEEE Access July 2020
Cloud Strike
Risk-driven Fault Injection Fault injection to security mechanisms—compared security tools controls and attributes whether they are compromised after injecting fault data
Chance of sinkhole attack Need to be worked on serverless computing
[3]
ProSAS: Proactive Security Auditing System for Clouds
Suryadipta Majumdar
IEEE Transactions on Dependable and Secure Computing 2022
PROSAS
Policy Violations Verification Algorithm
Response time may delay
[4]
Publicly Verifiable and Efficient Fine-Grained Data Deletion Scheme in Cloud
Changsong Yang
IEEE Access June 2020
Invertible Bloom filter Proof of Eras ability (PoE) Comparing deleted blocks and non-deleted blocks Verifiability—Intrusion detection system
Delay in code blocks
[5]
Subversion Resistant and Consistent Attribute-Based Keyword Search for Secure Cloud Storage
Kai Zhang
IEEE Transactions on Information Forensics and Security 2022
Attribute-based keyword search
Master key-based secret to gen secret keys for user/clients Token generation for user access to data
Policy Updating
[6]
Threat-Specific Security Risk Evaluation in the Cloud
Armstrong Nhlabatsi
IEEE Transactions on Cloud Computing June 2022
Threat risk evaluation
1. Spoofing 2. Tampering 3. Repudiation 4. Information disclosure 5. Dos 6. Elevation privilege
Application Support is needed as of many properties
(continued)
120
R. A. Kumar et al.
(continued) Citation
Title
Author
Journal name
Method used
Key findings
Gaps
[7]
Towards Automated Security based Analysis and Enforcement via Cloud Computing Using Graphical Models for Security
Seongmo An
IEEE Access July 2022
Cloud safe
Hierarchical Attack Representation Model It compares Host DB versus Vulnerable DB
Limitation with service provider It can work with AWS
[8]
Towards Security-Based Formation of Cloud Federations
Talal Halabi
IEEE Transactions on Cloud Computing 2020
Goal-Question-Metric (GQM)
Security SLAs with federation and user to protect data
Still SLAs Violation
[9]
A Double Obfuscation Approach for Protecting the Privacy of IoT Location-Based Applications
Sami Albouq
IEEE Access 2020
Double Obfuscation
Obfuscation of data as well as location
Work needs to be done on large fog as well as cloud nodes
[10]
VMGuard: A VMI Security Architecture for Intrusion Detection in Cloud Environment
Preeti Mishra
IEEE Transactions on Cloud New Computing
VMGuard
Software breakpoint injection technique Attack traces versus normal traces comparison
Detection power is low
[11]
Optimal Workload Allocation in Fog Cloud Computing Towards Balanced Delay and Power Consumption
Ruilong Deng
IEEE Internet of Things journal 2016
Optimal workload allocations
Reduce the time it takes for data to be transmitted and the amount of bandwidth used in communication Makes heavy use of the Hungarian method
Needs a solid mathematical foundation and a great deal of complexity to bridge the gap between bandwidth and transmission
[12]
Security and Privacy Preservation Scheme of fog and resolution framework
Pengfei
IEEE Internet of Things 2017
Fog session key management
Support issues of security
End or critical users collect more data that may pose a privacy risk
[13]
CP_ABE with Key delegation abuse resistance
Yinghao Jiang
UOW Library 2016
Key delegation abuse in fog
It can’t produce new private keys It arranges in logical hierarchies to reduce the size of the cyphertext and the quantity of pairings required for decryption
New private keys misguide users and malicious users Difficult in tracing
(continued)
A Novel Approach for Privacy Preserving Technique in IoT Fog …
121
(continued) Citation
Title
Author
Journal name
Method used
Key findings
[14]
DECENT: Secure and fine-grained data access control with policy updating for constrained IoT devices
Qinlong Huang
World Wide Web 2017
Hierarchical ABE
Decrypts cryptographic Difficulty in cyphertext and policy updating provisions it on a cloud management server if it satisfies new attributes and access policies Reduces local/neighbour side computation to a cloud-optimal server
[15]
ABE for secure fog communications
Arwa Alrawais
IEEE Access 2016
CP-ABE
Safe connection between the cloud and fog There is no need to download third-party certificates because each private key corresponds to an expiration date
High Module mathematical experimental paring Cost of computation is extremely high
[16]
A lightweight privacy preservation data aggregation scheme
Rongxing Lu
IEEE Access 2017
Data Aggregation scheme
Response time Bandwidth is allocated
Setup time
[17]
A model for preserving data and location privacy in fog-based IoT scenario
Jasleen Kaur
Journal of King Saud University 2022
Encryfuscation
Obfuscation detected information and location coordinates
Most of the work focused on location coordinates
[18]
Preserving Location Privacy for Location-Based Service
Xiaojuan Chen
International Publishing Switzerland 2016
Privacy preservation protocol
Maintaining two servers; Computation one is for service and cost another one is for registration
[19]
Towards the improvement of the Privacy in the MQTT Protocol
Marten Fischer
IEEE 2019 Global IoT Summit
Advance one-time password
Risk of an account being Implementation compromised is and drastically reduced Configuration
Gaps
3 Different Approaches for Achieving Data Confidentiality, Integrity, and Verifiability 3.1 Approach 1 An Ensemble Approach using Deep Learning Model for User Validation, Verifiability, and Data Integrity In the beginning, user authentication is a crucial step because of the ever-increasing worries regarding both security and privacy. For the purpose of user authentication, the user’s input is provided as the face in addition to the user’s signature. To begin, the user sign is converted into an encrypted form using the Secure Hash Algorithm (SHA)
122
R. A. Kumar et al.
in order to generate the digital signature that will later be used for the initial authentication. DNN learning, also known as deep neural network learning, is a technique employed in the authentication process in order to make it more foolproof. Discovering an unauthorised user is possible through the usage of DNN’s feature similarity learning capability. The advantage that deep learning has over other machine learning algorithms is that it can produce new features from a restricted sequence of features that are located in the training dataset. This is possible due to the fact that deep learning can develop features without the assistance of a person. However, training requires a significant amount of computer resources, and it has a small sample size. In addition, because it uses noisy features for deep learning and observes them, the information that is learnt is not simple to understand. Support Vector Machine (SVM) is now a component of DNN in order to improve the quality of the learning process. The Randomised Ensemble SVM is an integration of the SVM with Random Forests. The SVM classifier has limitations in speed as well as size and quantity both in training and testing, as well as a high set of algorithmic complexity and extensive calculated memory write requirements. Randomised Ensemble SVM overcomes these limitations. The use of random forests was implemented so that this problem may be solved. Given that multiple trees are trained in tandem with one another in Random Forests, this method is also known as Ensemble Learning. Both the input variables and the subset of the list in the training set used in the approach train each tree chosen at random. In terms of precision, random forests perform admirably. In addition, they are quick, can handle a growing dataset, and can function with minimal memory. The DNN’s improved deep learning and the network’s increased authenticity make it ideal for protecting user privacy. Verifiable Dynamic Access Control with User Revocation with CPABSE and VOMMACS is an access control policy that gives users more security and privacy [20]. Moreover, the intrusion detection system can be used to make the network more secure. Here, the improved detection of intrusions is used. Serendipitous Particle Swarm Optimisation (PSO) is used to find and stop intrusions. In high-dimensional spaces, it’s easy for PSO to get stuck in a local optimum, and the iterative process has a low convergence rate. Marriage in Bee Optimisation (MBO) is built into the PSO. MBO works like random assignments, hence it allows for faster convergence. Pseudo code:
A Novel Approach for Privacy Preserving Technique in IoT Fog …
123
Begin: Step1: Get generated data from IoT device Step2: Do user face verification: Step2.1: Create Embeddings Get image of person trying to send the data and verify if the person is present in the database Initialize the total number of faces processed loop set over the image variance paths Extract the feature person name from the image length path Apply biased OpenCV's deep learning based face model detector to localize Ensure at least one face was found if select confidence > 0.5: box = set detections[0, 0, i, 2:5] * np.array([w, h, w, h]) (startN, startM, endN, endM) = box.astype("int") Create a store for the face ROI, then move the blob in faces
124
R. A. Kumar et al.
Model of embedding to obtain 128-d face quantification Step2.2: Train with SVM and Random Forest Classifier Load the face embeddings Encode the labels Train the model used to accept 128-dimensional face embeddings and then generate the actual face recognition. svm_recognizer Bikernel="linear",N probability=True)