Recent Studies on Computational Intelligence: Doctoral Symposium on Computational Intelligence (DoSCI 2020) [1st ed.] 9789811584688, 9789811584695

This book gathers the latest quality research work of Ph.D. students working on the current areas presented in the Docto

391 66 4MB

English Pages XI, 122 [130] Year 2021

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Front Matter ....Pages i-xi
Onto-Semantic Indian Tourism Information Retrieval System (Shilpa S. Laddha, Pradip M. Jawandhiya)....Pages 1-18
An Efficient Link Prediction Model Using Supervised Machine Learning (Praveen Kumar Bhanodia, Aditya Khamparia, Babita Pandey)....Pages 19-27
Optimizing Cost and Maximizing Profit for Multi-Cloud-Based Big Data Computing by Deadline-Aware Optimize Resource Allocation (Amitkumar Manekar, G. Pradeepini)....Pages 29-38
A Comprehensive Survey on Passive Video Forgery Detection Techniques (Vinay Kumar, Abhishek Singh, Vineet Kansal, Manish Gaur)....Pages 39-57
DDOS Detection Using Machine Learning Technique (Sagar Pande, Aditya Khamparia, Deepak Gupta, Dang N. H. Thanh)....Pages 59-68
Enhancements in Performance of Reduced Order Modelling of Large-Scale Control Systems (Ankur Gupta, Amit Kumar Manocha)....Pages 69-78
Solution to Unit Commitment Problem: Modified hGADE Algorithm (Amritpal Singh, Aditya Khamparia)....Pages 79-90
In Silico Modeling and Screening Studies of PfRAMA Protein: Implications in Malaria (Supriya Srivastava, Puniti Mathur)....Pages 91-101
IBRP: An Infrastructure-Based Routing Protocol Using Static Clusters in Urban VANETs (Pavan Kumar Pandey, Vineet Kansal, Abhishek Swaroop)....Pages 103-122
Recommend Papers

Recent Studies on Computational Intelligence: Doctoral Symposium on Computational Intelligence (DoSCI 2020) [1st ed.]
 9789811584688, 9789811584695

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Studies in Computational Intelligence 921

Ashish Khanna Awadhesh Kumar Singh Abhishek Swaroop   Editors

Recent Studies on Computational Intelligence Doctoral Symposium on Computational Intelligence (DoSCI 2020)

Studies in Computational Intelligence Volume 921

Series Editor Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland

The series “Studies in Computational Intelligence” (SCI) publishes new developments and advances in the various areas of computational intelligence—quickly and with a high quality. The intent is to cover the theory, applications, and design methods of computational intelligence, as embedded in the fields of engineering, computer science, physics and life sciences, as well as the methodologies behind them. The series contains monographs, lecture notes and edited volumes in computational intelligence spanning the areas of neural networks, connectionist systems, genetic algorithms, evolutionary computation, artificial intelligence, cellular automata, self-organizing systems, soft computing, fuzzy systems, and hybrid intelligent systems. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution, which enable both wide and rapid dissemination of research output. The books of this series are submitted to indexing to Web of Science, EI-Compendex, DBLP, SCOPUS, Google Scholar and Springerlink.

More information about this series at http://www.springer.com/series/7092

Ashish Khanna Awadhesh Kumar Singh Abhishek Swaroop •



Editors

Recent Studies on Computational Intelligence Doctoral Symposium on Computational Intelligence (DoSCI 2020)

123

Editors Ashish Khanna Department of Computer Science and Engineering Maharaja Agrasen Institute of Technology New Delhi, India

Awadhesh Kumar Singh Department of Computer Engineering NIT Kurukshetra Kurukshetra, India

Abhishek Swaroop Department of Computer Science Engineering Bhagwan Parushram Institute of Technology New Delhi, India

ISSN 1860-949X ISSN 1860-9503 (electronic) Studies in Computational Intelligence ISBN 978-981-15-8468-8 ISBN 978-981-15-8469-5 (eBook) https://doi.org/10.1007/978-981-15-8469-5 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

Preface

We hereby are delighted to announce that Shaheed Sukhdev College of Business Studies, New Delhi, in association with National Institute of Technology Patna and the University of Valladolid, Spain, has hosted the eagerly awaited and much-coveted Recent Studies on Computational Intelligence: Doctoral Symposium on Computational Intelligence (DoSCI 2020). The first version of the symposium was able to attract a diverse range of engineering practitioners, academicians, scholars and industry delegates, with the reception of abstracts including more than 143 authors from different parts of the world. The committee of professionals dedicated towards the symposium is striving to achieve high-quality technical chapters with tracks on computational intelligence. The track chosen in the symposium is very famous among present-day research community. Therefore, a lot of research is happening in the above-mentioned research field and their related sub-fields. The symposium has targeted out-of-box ideas, methodologies, applications, expositions, surveys and presentations helping to upgrade the current status of research. More than 40 full-length papers have been received, among which the contributions are focused on theoretical, computer simulation-based research and laboratory-scale experiments. Among these manuscripts, nine papers have been included in the Springer Books after a thorough two-stage review and editing process. All the manuscripts submitted were peer-reviewed by at least two independent reviewers, who were provided with a detailed review proforma. The comments from the reviewers were communicated to the authors, who incorporated the suggestions in their revised manuscripts. The recommendations from two reviewers were taken into consideration while selecting a manuscript for inclusion in the proceedings. The exhaustiveness of the review process is evident, given the large number of articles received addressing a wide range of research areas. The stringent review process ensured that each published manuscript met the rigorous academic and scientific standards. It is an exalting experience to finally see these elite contributions materialize into Recent Studies on Computational Intelligence: Doctoral Symposium on Computational Intelligence (DoSCI 2020) by Springer.

v

vi

Preface

DoSCI 2020 invited six keynote speakers, who are eminent researchers in the field of computer science and engineering, from different parts of the world. In addition to the plenary sessions on each day of the conference, 15 concurrent technical sessions are held every day to assure the oral presentation of around nine accepted papers. Keynote speakers and session chair(s) for the session are leading researchers from the thematic area of the session. DoSCI 2020 of such magnitude and release proceedings by Springer has been the remarkable outcome of the untiring efforts of the entire organizing team. The success of an event undoubtedly involves the painstaking efforts of several contributors at different stages, dictated by their devotion and sincerity. Fortunately, since the beginning of its journey, DoSCI 2020 has received support and contributions from every corner. We thank them all who have wished the best for DoSCI 2020 and contributed by any means towards its success. The edited proceedings volume by Springer would not have been possible without the perseverance of all the steering, advisory and technical program committee members. All the contributing authors owe thanks from the organizers of DoSCI 2020 for their interest and exceptional articles. We would also like to thank the authors of the papers for adhering to the time schedule and for incorporating the review comments. We wish to extend my heartfelt acknowledgment to the authors, peer reviewers, committee members and production staff whose diligent work put shape to the DoSCI 2020 proceedings. We especially want to thank our dedicated team of peer reviewers who volunteered for the arduous and tedious step of quality checking and critique on the submitted manuscripts. The management, faculties, administrative and support staff of the college have always been extending their services whenever needed, for which we remain thankful to them. Lastly, we would like to thank Springer for accepting our proposal for publishing the DoSCI 2020 proceedings. Help received from Mr. Aninda Bose, the acquisition senior editor, in the process has been very useful. New Delhi, India

Ashish Khanna Deepak Gupta Organizers, ICICC 2020

Contents

Onto-Semantic Indian Tourism Information Retrieval System . . . . . . . . Shilpa S. Laddha and Pradip M. Jawandhiya

1

An Efficient Link Prediction Model Using Supervised Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Praveen Kumar Bhanodia, Aditya Khamparia, and Babita Pandey

19

Optimizing Cost and Maximizing Profit for Multi-Cloud-Based Big Data Computing by Deadline-Aware Optimize Resource Allocation . . . . Amitkumar Manekar and G. Pradeepini

29

A Comprehensive Survey on Passive Video Forgery Detection Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vinay Kumar, Abhishek Singh, Vineet Kansal, and Manish Gaur

39

DDOS Detection Using Machine Learning Technique . . . . . . . . . . . . . . Sagar Pande, Aditya Khamparia, Deepak Gupta, and Dang N. H. Thanh

59

Enhancements in Performance of Reduced Order Modelling of Large-Scale Control Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ankur Gupta and Amit Kumar Manocha

69

Solution to Unit Commitment Problem: Modified hGADE Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Amritpal Singh and Aditya Khamparia

79

In Silico Modeling and Screening Studies of PfRAMA Protein: Implications in Malaria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Supriya Srivastava and Puniti Mathur

91

IBRP: An Infrastructure-Based Routing Protocol Using Static Clusters in Urban VANETs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Pavan Kumar Pandey, Vineet Kansal, and Abhishek Swaroop

vii

Editors and Contributors

About the Editors Ashish Khanna has 16 years of expertise in Teaching, Entrepreneurship and Research & Development. He received his Ph.D. degree from NIT, Kurukshetra. He has completed his M.Tech. and B.Tech. GGSIPU, Delhi. He has completed his postdoc from Internet of Things Lab at Inatel, Brazil, and University of Valladolid, Spain. He has published and accepted around 45 SCI indexed papers in IEEE Transaction, Springer, Elsevier, Wiley and several journals with cumulative impact factor of above 100 to his credit. He has more than 100 research articles in top SCI/ Scopus journals, conferences and edited books. He is co-author and co-editor of around 20 edited and textbooks. His research interest includes MANET, FANET, VANET, IoT, machine learning and many more. He is Convener and Organizer of ICICC conference series. He is currently working in the CSE department of Maharaja Agrasen Institute of Technology, Delhi, India. Awadhesh Kumar Singh received his Bachelor of Technology (B.Tech.) degree in Computer Science from Madan Mohan Malaviya University of Technology, Gorakhpur, India, in 1988, and his M.Tech. and Ph.D. degrees in Computer Science from Jadavpur University, Kolkata, India, in 1998 and 2004, respectively. He joined the Department of Computer Engineering at the National Institute of Technology, Kurukshetra, India, in 1991, where he is presently a Professor and Head of the Department of Computer Applications. Earlier, he also served as Head of Computer Engineering Department during 2007–2009 and 2014–2016. He has published 150 papers in various journals and conference proceedings. He has supervised 10 Ph.D. scholars. He has visited countries like Thailand, Italy, Japan, UK and USA to present his research work. His research interests include cognitive radio networks, distributed algorithms, fault tolerance and security. Prof. (Dr.) Abhishek Swaroop completed his B.Tech. (CSE) from GBP University of Agriculture & Technology, M.Tech. from Punjabi University Patiala

ix

x

Editors and Contributors

and Ph.D. from NIT Kurukshetra. He has 28 years of teaching and industrial experience. He has served in reputed educational institutions such as Jaypee Institute of Information Technology, Noida, Sharda University Greater Noida and Galgotias University Greater Noida. He is actively engaged in research. One of his Ph.D. scholar has completed his Ph.D. from NIT Kurukshetra, and he is currently supervising 4 Ph.D. students. He has guided 10 M.Tech. dissertations also. He has authored 3 books and 5 book chapters. His 7 papers are indexed in DBLP and 6 papers are SCI. He had been part of the organizing committee of three IEEE conferences (ICCCA-2015, ICCCA-2016, ICCCA-2017), one Springer conference (ICICC-2018) as Technical Program Chair. He is member of various professional societies like CSI and ACM and editorial board of various reputed journals.

Contributors Praveen Kumar Bhanodia School of Computer Science and Engineering, Lovely Professional Univerity, Phagwara, India Manish Gaur Department of Computer Science and Engineering, Centre for Advanced Studies, Dr. A.P.J Abdul Kalam Technical University, Lucknow, India Ankur Gupta Department of Electronics and Communication Engineering, Maharaja Ranjit Singh Punjab Technical University, Bathinda, Punjab, India Deepak Gupta Maharaja Agrasen Institute of Technology, New Delhi, India Pradip M. Jawandhiya PL Institute of Technology and Management, Buldana, India Vineet Kansal Department of Computer Science and Engineering, Institute of Engineering and Technology Lucknow, Dr. A.P.J Abdul Kalam Technical University, Lucknow, India Aditya Khamparia School of Computer Science Professional University, Phagwara, Punjab, India

Engineering,

Lovely

Vinay Kumar Department of Computer Science and Engineering, Centre for Advanced Studies, Dr. A.P.J Abdul Kalam Technical University, Lucknow, India Shilpa S. Laddha Government College of Engineering, Aurangabad, India Amitkumar Manekar CSE Department, KLEF, Green Fields, Vaddeswaram, Andhra Pradesh, India Amit Kumar Manocha Department of Electrical Engineering, Maharaja Ranjit Singh Punjab Technical University, Bathinda, Punjab, India

Editors and Contributors

xi

Puniti Mathur Centre for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University Uttar Pradesh, Noida, Uttar Pradesh, India Sagar Pande School of Computer Science Engineering, Lovely Professional University, Phagwara, Punjab, India Babita Pandey Department of Computer Science and IT, Babasaheb Bhimrao Ambedkar University, Amethi, India Pavan Kumar Pandey Dr. A.P.J. Abdul Kalam Technical University, Lucknow, India G. Pradeepini CSE Department, KLEF, Green Fields, Vaddeswaram, Andhra Pradesh, India Abhishek Singh Department of Computer Science and Engineering, Institute of Engineering and Technology Lucknow, Dr. A.P.J Abdul Kalam Technical University, Lucknow, India Amritpal Singh Department of Computer Science and Engineering, Lovely Professional University, Phagwara, Punjab, India Supriya Srivastava Centre for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University Uttar Pradesh, Noida, Uttar Pradesh, India Abhishek Swaroop Bhagwan Parashuram Institute of Technology, New Delhi, India Dang N. H. Thanh Department of Information Technology, School of Business Information Technology, University of Economics Ho Chi Minh City, Ho Chi Minh City, Vietnam

Onto-Semantic Indian Tourism Information Retrieval System Shilpa S. Laddha

and Pradip M. Jawandhiya

Abstract Tourism is the world’s most ideal development segment. There has been profound development in the measure of the tourism insights on the Web. It is a disgraceful circumstance that paying little heed to the overburden of data, we for the most part neglect to find important data. This is because of nonattendance of semantics distinguishing proof of the client query in getting the necessary outcomes. Spurred by means of these restrictions, a framework is proposed called “Design and Implementation of Semantically Enhanced Information Retrieval using Ontology.” The objective of the paper is to exhibit semantic Indian tourism search framework to upgrade India’s positioning as worldwide explorer with the goal that India could utilize its favored characteristic assets and along these lines enhance amount of visitor appearances and income from the travel industry. The proposed strategy uses ontology constructed for the tourism of India for the precise retrieval. The framework is assessed with keyword-based Web search engines to see adequacy of semantic over commonly used methodologies and figures the performance as far as exactness and execution time as assessment parameters. The outcome obtained clarifies that there is fabulous improvement in information retrieval utilizing this methodology. Keywords Information retrieval · Semantic search engine · Ontology · Tourism

1 Introduction In this world of technology, life without Web is not imaginable. Web, which associates billions of people all around the world, is the speediest, least complex and most simple mechanism of correspondence. Internet is the greatest stockroom of data, and it is S. S. Laddha (B) Government College of Engineering, Aurangabad 431001, India e-mail: [email protected] P. M. Jawandhiya PL Institute of Technology and Management, Buldana 443001, India e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 A. Khanna et al. (eds.), Recent Studies on Computational Intelligence, Studies in Computational Intelligence 921, https://doi.org/10.1007/978-981-15-8469-5_1

1

2

S. S. Laddha and P. M. Jawandhiya

effectively available in easy-to-understand way through gadgets like PCs, portable tablets, cell phones and lot more. With the lightning progression of World Wide Web, Web search devices have transformed into the essential gadget of data retrieval for people to look through the Web data. Typically a client enters few keywords, searching tool processes the client inquiry utilizing keywords and gives relevant URLs as the final result. In this era, tourism is the world’s most ideal development segment [1, 2]. Accordingly, endeavors are required to upgrade India’s positioning as worldwide explorer with the goal that India could utilize its favored characteristic assets and along these lines enhance amount of visitor appearances and income from the travel industry [3, 4]. The objective of the chapter is to structure and execute exceptionally valuable information retrieval framework for travel industry of India. This onto-semantic retrieval framework is structured, executed and inspected on travel industry of India to study the viability of onto-semantic retrieval above usually utilized Web search tools like Google, Bing and Yahoo. This takes considerable difficulties in making sense of the heterogeneous data on World Wide Web and available systems for the efficient retrieval and integration of data. The keyword-based strategy [3] is commonly used having various obstructions in data processing. These obstructions can be corrected by the onto-semantic strategy utilized in this framework. Usually, a traditional search engine finds out the results syntactically right, but the result set they provide is in large volume. Presently, these systems are keyword-based data retrieval systems working on phrase-term matching instead of the semantics of the words or tokens [5]. Need is to upgrade the phrase-term-based Web search tools by methods for pondering the real semantics in it. Semantic data retrieval is an area of study and research which will focus on significance of expressions utilized as a phase of patron question. Ontology here performs a crucial role to characterize the idea close by with the relationship of terminologies in region [4]. Considering that the ontology is domain explicit, ontology is organized on the particular area. As indicated by this contention, queries in “The travel industry” area deciphered contrastingly in some other domain, for example, “Education.” In this chapter, the Web search engine adequacy is raised by onto-semantic similarity measure and algorithmic techniques [6]. Novel theory of “Query Prototype” [7] is blended with a model using ontology to control the accountability of the retrieval process. These chapters’ Sects. 3, 4 and 5 are illustrating the challenges, goals and hypothesis, respectively. Sections 6 and 7 are the heart of the chapter offering the device architecture and overall performance evaluation and analysis. Section 8 is the concluding part which outlines scope of future work.

Onto-Semantic Indian Tourism Information Retrieval System

3

2 Literature Review Since the beginning of written language, people have been creating methods for rapidly indexing and retrieving information. Information retrieval has a variety of paradigms. Information retrieval is defined as an act of storing, seeking and fetching information that matches a user’s demand [8]. Until 1950s, data recovery was commonly a library science. In 1945, Vannevar Bush introduced his idea without limits where machines would be used to give basic access to the libraries of the world [9]. In 1950s, the first electronic recovery frameworks were structured by using punch cards. An absence of PC power confined the helpfulness of these frameworks. During 70s, PCs started to have enough preparing capacity to deal with data recovery with close to moment results. With the development of the Web, data recovery turned out to be progressively significant and looked into. Presently, great many people use some kind of current data retrieval framework consistently like Google or some interestingly made structure for libraries. The volume of data open on the Web makes it difficult to find pertinent data. The prerequisite for a suitable method to sort out data turns out to be fundamentally indispensable. The keyword search is not fitting to locate the pertinent data for a particular idea. In a regular keyword-based Web search engine, the inquiry terms are coordinated with the terms in a transformed list comprising of all the archive terms of a book corpus [10]. Just coordinated records are fetched and showed to the end client. The examination in [6] has talked about the critical reasons why an absolutely message-based pursuit neglects to discover a portion of the applicable reports because of the vagueness of regular language and absence of semantic relations. Literary data recovery depends on keywords extricated from reports and used as the building blocks of both document and query representations. Nonetheless, keywords may have various equivalent words. For example, the expression “train” regarding the travel industry alludes to “vehicle for transportation” though the equivalent “train” term in education industry signifies “to teach.” The current keywordbased Web search tools coordinate the term in the inquiry with the terms in the reports and return every one of the records with this term independent of the semantics. Therefore, endeavors are required to devise semantic Information retrieval methods to render significant archives based on importance as opposed to keywords. The major idea at the premise of semantic data retrieval is that the importance of content relies upon reasonable connections to objects on the conceptual relationships as opposed to phonetic relations found in content. In the zone of the travel industry, Tomai et al. [11] introduced philosophy which helped basic leadership in trip arranging. They introduced utilization of two separate ontologies, one for the travel industry data and the other meant for client profiles [12]. Jakkilinki et al. in the year 2005 have presented an ontology-driven smart travel industry data framework utilizing the travel industry domain philosophy [13]. Lam et al. in the year 2006 have introduced an ontology-driven operator system for semantic Web administration “OntiaiJADE” and ontology of upper-level utilizing auxiliary data from different Web sites that are related and relevant to Hong Kong.

4

S. S. Laddha and P. M. Jawandhiya

Furthermore, they built up an enhanced Intelligent Ontology Agent-driven Tourist Advisory System called “iJADEFreeWalker” in the year 2007 and presented an intelligent mobile framework using ontology for vacationer direction in the year 2008 [11]. Heum Park1, Aesun Yoon2 with Hyuk-Chul Kwon have developed an errand system and undertaking philosophy dependent on explorers assignments, and a canny vacationer data administration framework utilizing them [14]. Wang et al. in the year 2008 have devised a keen philosophy, Bayesian system, using semantic methodology for the travel industry, and advanced ontology-driven suggestion system for the travel industry that makes use of joining heterogeneous tourism data available on the Web and prescribes vacation spots to clients dependent on data from in excess of 40 Chinese Web sites and tourism attractions in Beijing and Shanghai [15]. Kanellopoulos in the year 2009 has presented a philosophy-based system for coordinating of voyagers requirements for Group Travel Package Tours (GPT) with a Web entrance administration for explorers residing in Europe. The basic data resources in this system are travel organizations, travel establishment related news and the group visit requirements [16]. Tune et al. in the year 2008 have exhibited an intelligent agent system using ontology meant for a tour and group package tourist service [17]. Chiu et al. in the year 2009 displayed a multi-operator data framework “MAIS” and a Collaborative intelligent Travel Operator System (CTOS) utilizing semantic Web innovations for successful association of data assets and administration forms [18]. Barta et al. in the year 2009 acquainted an elective methodology covering the semantic domain of the travel industry by coordinating modularized ontologies and created center Domain Ontology for Travel and Tourism (cDOTT) [19]. They proposed another recommender framework that improves the adequacy of customary substance-based recommender frameworks by thinking about philosophy. Explicit proposals for future research bearings incorporate considering logical data, for example, the climate gauge, the period of the year, the time and so forth. This exploration attempts to execute these future research bearings and render the data by actualizing semantic data recovery interface utilizing ontology for Indian tourism domain.

3 Challenges/Research Gap Today, existence without Web and that too without Web search tool is not possible. Looking through the net has become the part of our regular day-to-day existence. This incorporates the entire thing from peering out an appropriate book to contemporary improvements in exceptional advancements. Web indexes have quite changed the manner in which people get right of section to and find data, empowering data about practically any theme to be effectively and in a matter of seconds available. All the data recovery strategies are chipping away at keyword coordinating. On the off chance that keyword matches are on accessible data, at that point just that page will be returned, in some other case dismissed. These methodologies give similarly more noteworthy wide assortment of results. We need to explore the pages to get required

Onto-Semantic Indian Tourism Information Retrieval System

5

data. These systems are unable in providing exact answer to given question. As these techniques do not consider the semantic driven by the inquiry terms, they are doing whatever it takes not to comprehend what client needs to ask bringing about low exactness and relevancy rate. The basic issue [20] incorporates: • Fetching and displaying irrelevant outcomes. • Large volume of results making hard for the client to locate the relevant information. • The user is not aware about the rationale being used to bring the outcomes for the question making it hard to the client to investigate the outcomes properly. • Query execution takes time, and precision is low. The above issues are common for keyword-based search engines. The present Web is the collection of wide variety of information, and the Web indexes are expected to give the information as indicated by the client’s inquiry. Additionally, many times client does not know about the exact term expected to look. Along these lines if exact query term is not matching, at that point the outcome may not be extremely precise. Web indexes must not confine themselves to keyword based as it were. The semantics of the words should likewise be contemplated. The logic ought to be fuzzy. The framework is required to create data recovery interface that renders exact and effective query items in relatively less time.

4 Objective Considering vast unstructured data on the Web, the traditional search engines are incapable to render relevant, precise and efficient information from the Web. The primary goal of this research is to enhance the precision and efficiency of information retrieval semantically using ontology to satisfy the user query and attain user satisfaction in search result. This semantic information retrieval is evaluated against generally used conventional search engines, viz. Google, Yahoo and Bing, and the improvement is demonstrated in terms of efficiency and precision of search results of the resulting application. The results show that an information retrieval system using domain ontology can achieve better result than the keyword-based information retrieval systems.

5 Hypothesis This research work is an attempt to address few of the problems mentioned in research issues. The proposed system is aimed at innovations in the design of enhanced semantic information retrieval system on the Web for generalized purpose used for specific domain and implement Web-based interface to accept query from user and

6

S. S. Laddha and P. M. Jawandhiya

provide the result by using ontology-based enhanced semantic retrieval system. This interface ensures multiple users to remotely access the same application through Web browser.

6 System Architecture/Methodology One of the significant difficulties in information retrieval is exact and relevant data recovery, utilizing domain information and semantics. Semantic data recovery comprehends the end users’ inquiries in increasingly logical manner, utilizing ontology corpus which plays significant job to interpret the connection between terms in the user inquiry. Consequently in this exploration, ontology with novel idea of “query prototype and query similarity” is created to comprehend user inquiry for giving exact and relevant data in Indian tourism space. The calculation derives the semantic relatedness between the terms in the ontology-based corpus, utilizing query prototype and similarity measure. It is worth to take note of that, however, the proposed model is tried on Indian tourism domain; the developed methods are adaptable to be utilized on other explicit areas. In this technical period, it is difficult situation that regardless of the overburden of data, we normally fail to find relevant data. This is because of not considering semantics in the client inquiry in getting the necessary outcomes. So as to beat these basic challenges, the onto-semantic information recovery framework is structured as appeared in Fig. 1 Part 1 combined with Part 2. This framework recovers the important outcomes for the client inquiry semantically. Now let us discuss the working of the semantically enhanced modules is detail.

6.1 The Basic Query Mapper At the point when end user gives any query pertaining to Indian tourism like “traveler place in India,” “places of interest in India,” “India the travel industry,” “investigate vacationer goal and improvement of India,” “incredible India” and so forth, then the basic inquiry mapper is invoked and the pertinent outcomes are appeared to the query seeker client along with meta-data and the time taken for its processing [21].

6.2 The Query Prototype Mapper Mapping tool for query prototype [7] is a novel idea inferred in this study, utilizing which one query prototype can deal with various client inquiries. The query models contain (i) simple tokens, (ii) template tokens, (iii) ontological tokens and (iv) stopwords.

Onto-Semantic Indian Tourism Information Retrieval System

7

For example, (flight) from [from-city] to [to-city]. Herein, various defined query prototypes for 17 services are recognized for Indian travel industry domain. This module will work if client question coordinated precisely with any of the query prototype defined for different identified services [7].

6.3 The Query Word Order Mapper If client question does not coordinate precisely with any of the query prototype defined for different identified services, then there is strong probability that the sequence of words in client inquiry is not matching with sequence of words in

Fig. 1 System architecture—part 1, part 2

8

S. S. Laddha and P. M. Jawandhiya

Fig. 1 (continued)

defined query prototypes, then to handle such inquiries and to locate the service for execution, query word order mapper is invoked.

6.4 The Spelling Correction Algorithm Another possibility is the client mistakenly may enter misspelled state/city name in the inquiry. To deal with this, valid Indian cities and states’ name list is kept which this module utilizes to replace the incorrectly spelled term with the nearest matching term in the stored list, reframe the query and forward it to query prototype mapper [22].

Onto-Semantic Indian Tourism Information Retrieval System

9

6.5 The Ontological Mapper There is probability that rather than ontological token utilized in the defining the query prototypes, client inquiry may use different term. To coordinate this sort of queries, ontological mapper is utilized which inside makes use of ontology constructed using WordNet. This accelerates the performance of the framework amazingly by dealing with practically every query identified with Indian tourism domain. This research set forwards a sort of semantic data recovery strategy dependent on ontology created using clustering technique. The clustering algorithm is designed and implemented which creates the cluster based on the different ontological tokens called as cluster head defined in the query prototypes. The cluster elements are fetched using Java WordNet Library (JWNL), the relationship is assigned, and the score is calculated for each ontological token/cluster head with respect to cluster head. This process results into creation of ontology stored in memory to shorten the time of retrieval. The ontology representation for the ontological cluster head “Train” is as appeared in Fig. 2. The job of defined ontology is to characterize the relations among the terms important to Indian tourism domain. At the point when the client enters a query, the inquiry will be deciphered by related terms characterized in the ontology to improve the exhibition of the semantic search. The specific tourism service concerning client query is located semantically and executed.

Fig. 2 Ontology for the ontological token/cluster head “Train”

10

S. S. Laddha and P. M. Jawandhiya

6.6 The State–City Mapper Another probability is that client may provide just name of city/state. To deal with such questions, this mapper is utilized to summon the “About service” for that particular state or city depicted in the performance analysis section of [22].

6.7 The Keyword Mapper Ordinarily, client enters the inquiry which will not match with any city or state name as well as with no defined query prototypes. In the event that any of the previously mentioned mappers/stages cannot deal with the client-mentioned inquiry, then keyword mapper attempts to coordinate keywords showed up in client query with the keywords list which are retrieved at the beginning of the mapper interface of resulting Web page on the basis of other input queries given by the end user. To delineate, if client enters inquiry “about Mumbai”, it shows data about Mumbai and simultaneously in the background the framework get every keyword from the result giving URL’s and saves them in keyword.dat. Afterward, if client gives any of the relevant keywords like “Gate Way of India”, framework can give the Web site page with the relevant data. Along these lines, the framework turns out to be gradually smarter as it develops and as it processes progressively more and more other type of relevant numerous queries.

6.8 The Meta-processor In this investigation, a meta-processor is planned which gives meta-data like title, time and brief data regarding pertinent URLs of the client-mentioned data. At whatever point the client enters a question first time, just the Web joins are shown to the client to give the speedy outcomes and yet a string is produced by the meta-processor, which in background gets the meta-data and dumps it on the server. Handling of these URLs for meta and title at run time takes additional time, as it requires association with numerous servers for these data. At next run of a similar question, client gets the important data alongside the meta-data. Preparing the meta-data is a foundation procedure performed by meta-processor. This novel meta-processor is enhancing the performance of this system.

Onto-Semantic Indian Tourism Information Retrieval System

11

6.9 The Template Manager A few services like “About city” service, “Best time to visit” service and so forth in the tourism space require information in one line or passage rather than URL links. For this, a template manager is planned in novel way which gets invoked in background. Templates are site explicit. For including new URL, individual site format should be included as various sites utilize diverse structure/layout to show/store the data. This epic methodology of template manager helps in fetching these data. Along these lines, various modules portrayed above get invoked based on the pattern of the query. As appeared in Fig. 1 (Part 1 and Part 2), the client question is initially matched with defined the query prototypes. On the off chance that precise match is discovered, at that point the query prototype mapper recognizes the service to be executed. If client inquiry is not coordinated with any of the defined query prototype, in that case the query word order mapper checks for the alteration in word sequence and determines the service to be invoked. If this mapper fails to invoke the service, then the check is made for incorrectly spelled city or state name, the revision is made by the spelling correction module, and the inquiry is sent back to the query prototype mapper to recognize the service. In the event that still the coordinating question model is not discovered, at that point the inquiry tokens are coordinated with the closest ontological cluster parent head as clarified in [23] to invoke the appropriate service. There is the likelihood that client may enter just city/state name, and then the “About” service type is revoked for the separate state/city. In the event that the client enters extremely basic inquiry, identified with respect to domain, at that point the basic query mapper is summoned. There is additionally the likelihood that client may demand data whose query prototype is not defined in the framework; then, the framework handles such question by summoning the basic keyword mapper. Along these lines, the recognized service type is invoked and the pertinent Web links are recovered semantically. The initial step of the procedure begins when the client enters the question in the semantic pursuit interface delineated in Fig. 3. User will enter the query in the search box, and on clicking the search button, the relevant links along with the meta-information and time required for processing are displayed to the end user as search result in Fig. 4.

7 Performance Analysis The Web application is developed to make usage of the framework. The base outcomes utilizing fundamental model are talked in detail in [24, 25]. The advanced framework gets client query as an input through user interface designed as appeared in Fig. 3, and the outcomes are obtained semantically utilizing ontology dependent on the terms in the inquiry as appeared in Fig. 4. The framework performance is controlled by computing the accuracy and effectiveness in terms of query execution

12

S. S. Laddha and P. M. Jawandhiya

Fig. 3 Home page semantic search interface

Fig. 4 Results with meta-information presented by semantic search interface

time. Exactness is utilized to gauge the precision of the framework. The performance of the semantic hunt interface is assessed by setting up the wide variety of queries for every one of the identified services recognized for tourism domain of India as appeared in Table 1, and for each distinguished service, the testing is accomplished. In each sort of service, we tried testing of diversified queries. The complete testing results of major services are depicted in detail as follows.

Onto-Semantic Indian Tourism Information Retrieval System Table 1 Indian tourism domain services

13

S. No.

Service name

1

About service

2

State service

3

City state service

4

Distance service

5

Tourist places service

6

Hotel type service

7

Hotel service

8

Hotel rating service

9

Train service

10

Things to do service

11

Flight service

12

Weather service

13

India place service

14

Keyword base service

15

Bus service

16

Best time to visit service

17

How to reach service

The query is given for processing to onto-semantic search interface and the conventional keyword-driven Web search tools like Bing, Google and Yahoo, and the outcomes retrieved are analyzed for each query. The framework performance is determined in terms of average accuracy and time taken for processing as appeared in following table.

7.1 Detailed Testing Different users may demand the data from various perspectives. In view of the client demand, the framework deciphers the service, and then, it renders the outcome. The outcome gave by those all Web searching engines and individual time taken for the processing of the query and the accuracy we observed is far better in performance of the onto-semantic search engine over generally used conventional search engines available, viz. Google, Bing and Yahoo.

14

S. S. Laddha and P. M. Jawandhiya

7.2 Appraisal of All Services The principle portion of our assessment procedure is ascertaining the service insightful precision esteems and time taken by the system for processing of inquiries. These precision esteems and time taken for processing are then found the middle value of normal accuracy esteems and normal time taken for processing as appeared in Table 2. The examination is done on 1000 plus inquiries, and the service-wise similar investigation appeared in Graphs 1 and 2 depicts that this semantic Web crawler shows a stunning improvement over commonly used existing keyword-based Web indexes, viz. Google, Bing and Yahoo. This strong system ensures the quick retrieval of relevant, precise and efficient results, and it has very easy-to-understand search interface.

8 Conclusion and Future Scope This examination endeavors to display a novel onto-semantic data recovery structure and its application in travel industry of India, which fuses the novel idea of “Query Prototype” and “Query Word Order Mapper” and “Spelling Correction” along with “Ontological Mapper” and “Keyword Mapper”. At the point when all these advances joined with the solace of keyword-driven hunt interface, we get one of the most simple-to-utilize, elite semantic inquiry interface to bargain the powerlessness and lack of definition of the recovery procedure. Inside the extent of this exploration, Indian tourism is picked as testing domain, inquiries are indicated for this area, and the performance is assessed. Assessment results show that strategy structured can without much of a stretch outflank the conventional Web crawlers like Google, Bing and Yahoo, regarding exactness and inquiry processing time. This powerful execution of the framework tested on Indian travel industry domain can be actualized on different areas with the particular changes in the query prototypes along with development of ontology specific to the domain.

Table 2 Comparative average precision and average processing time analysis for all services Service name

Id

No. of Semantic unique search queries Average Average precision processing time

Google search

Bing

Yahoo

Average Average Average Average precision processing precision precision time

About city service

1 19

92.5

0.21

56

0.52

54.73

51.59

Distance service

2 12

100

0.37

65.83

0.57

61.92

53.32 (continued)

Onto-Semantic Indian Tourism Information Retrieval System

15

Table 2 (continued) Service name

Id

No. of Semantic unique search queries Average Average precision processing time

Google search

Bing

Yahoo

Average Average Average Average precision processing precision precision time

Best time to visit service

3 10

79.86

0.2

66.4

0.56

53.11

38.31

How to reach service

4 17

100

0.24

82.57

0.63

89.03

73.43

Things to do service

5 10

74.07

0.25

100

0.8

91.89

71.85

Hotel service

6 13

96.61

0.18

89.23

0.69

88.07

85.56

Hotel type service

7 10

98.84

0.24

99.29

0.75

100

95.56

Hotel rating service

8 10

92.84

0.18

88

0.71

90.96

84.65

Flight service

9 19

97.74

0.28

97.44

0.72

93.77

90.43

Tourist places service

10 11

100

0.28

94.95

0.72

80.69

66.83

Train service

11 10

94

0.23

69

0.57

62.6

40.04

Weather service

12 10

75.6

0.19

96

0.49

88.53

63.95

Bus service

13 11

97.56

0.19

64.55

0.63

64.2

62.25

India place service

14 10

100

0

86

0.86

82.88

81.95

Keyword base service

15 22

100

0.11

84.8

0.79

83.07

70.12

City state 16 20 service

100

0.23

67.73

0.7

61.71

54.37

State service

100

0.21

65.1

0.79

54.66

59.25

17 10

16

S. S. Laddha and P. M. Jawandhiya

Graph 1 Comparative average precision analysis for all services

Graph 2 Comparative average processing time analysis of semantic and Google search engines for all services

References 1. Buhalis, D., & Law, R. (2008). Progress in information technology and tourism management: 20 years on and 10 years after the internet—The state of eTourism research. Tourism Management, 29(4), 609–623. 2. Hall, C. M. (2010).Crisis events in tourism: Subjects of crisis in tourism. Current issues in Tourism, 13(5), 401–417. 3. Hauben, J. R. (2005). Vannevar Bush and JRC Licklider: Libraries of the future 1945–1965. The Amateur computerist (p. 36). 4. Jakkilinki, R., Sharda, N., & Ahmad, I. (2005). Ontology-based intelligent tourism information systems: An overview of development methodology and applications. In Proceeding of TES. 5. Laddha S. S., & Jawandhiya P. M. (2018) Onto semantic tourism information retrieval. International Journal of Engineering & Technology (UAE), 7(4.7), 148–151. ISSN 2227-524X,

Onto-Semantic Indian Tourism Information Retrieval System

17

https://doi.org/10.14419/ijet.v7i4.7.20532. 6. Laddha S.S., Koli N.A., & Jawandhiya P. M. (2018). Indian tourism information retrieval system: An onto-semantic approach. Procedia Computer Science, 132, 1363–1374. ISSN 18770509, https://doi.org/10.1016/j.procs.2018.05.051. 7. Laddha S. S., & Jawandhiya P. M. (2020). Novel concept of spelling correction for semantic tourism search interface. In: Tuba M., Akashe S., Joshi A. (eds) Information and Communication Technology for Sustainable Development. Advances in Intelligent Systems and Computing, Vol. 933. Springer, Singapore. https://doi.org/10.1007/978-981-13-7166-0_2 ISBN: 978-98113-7166-0, ISSN: 2194-5357, Pages 13–21. 8. Laddha S. S., & Jawandhiya P. M. (2018) Novel concept of query-prototype and querysimilarity for semantic search. In: Deshpande A. et al. (eds) Smart Trends in Information Technology and Computer Communications. SmartCom 2017. Communications in Computer and Information Science, Vol. 876. Springer, Singapore.Online ISBN 978-981-13-1423-0 ISSN: 1865-0929. 9. Kanellopoulos, D. N. (2008). An ontology-based system for intelligent matching of travellers’ needs for Group Package Tours. International Journal of Digital Culture and Electronic Tourism, 1(1), 76–99. 10. Aslandogan, Y. A., & Clement T. Y. (1999). Techniques and systems for image and video retrieval. IEEE Transactions on Knowledge and Data Engineering, 11(1), 56–63. 11. Tomai, E., Spanaki, M., Prastacos, P., & Kavouras, M. (2005). Ontology assisted decision making–a case study in trip planning for tourism. In OTM Confederated International Conferences on the Move to Meaningful Internet Systems (pp. 1137–1146). Berlin: Springer. 12. Vinayek, P. R., Bhatia, A., & Malhotra, N. E. E. (2013). Competitiveness of Indian tourism in global scenario. ACADEMICIA: An International Multidisciplinary Research Journal, 3(1), 168–179. 13. Kathuria, M., Nagpal, C. K., & Duhan, N. (2016). A survey of semantic similarity measuring techniques for information retrieval. In 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom) (pp. 3435–3440). IEEE. 14. Laddha S. S., & Jawandhiya P. M. (2017). Semantic tourism information retrieval interface. In 2017 International Conference on Advances in Computing, Communications and Informatics(ICACCI), Udupi, (pp. 694–697). https://doi.org/10.1109/icacci.2017.8125922. 15. Wang, W., Zeng, G., Zhang, D., Huang, Y., Qiu, Y., & Wang, X. (2008). An intelligent ontology and Bayesian network based semantic mash up for tourism. In IEEE Congress on Services-Part I (pp. 128–135). IEEE. 16. Laddha S. S., Laddha, A. R., & Jawandhiya P. M. (2015). New paradigm to keyword search: A survey. In IEEE Xplore digital library (pp. 920–923). https://doi.org/10.1109/icgciot.2015. 7380594. IEEE Part Number: CFP15C35-USB, IEEE ISBN: 978-1-4673-7909-0. 17. Song, T. -W., & Chen, S. -P. (2008). Establishing an ontology-based intelligent agent system with analytic hierarchy process to support tour package selection service. In International Conference on Business and Information, South Korea. 18. Chiu, D. K. W, Yueh, Y. T. F., Leung, H., & Hung, P. C. K. (2009). Towards ubiquitous tourist service coordination and process integration: A collaborative travel agent system architecture with semantic web services. Information Systems Frontiers, 11, 3, 241–256. 19. Laddha S. S., & Jawandhiya P. M. (2017). An exploratory study of keyword based search results. Indian Journal of Scientific Research, 14(2), 39–45. ISSN: 2250-0138 (Online). 20. Laddha S. S., & Jawandhiya P. M. (2019) Novel concept of query-similarity and metaprocessor for semantic search. In: Bhatia S., Tiwari S., Mishra K., & Trivedi M. (eds) Advances in Computer Communication and Computational Sciences. Advances in Intelligent Systems and Computing, Vol. 760. Springer, Singapore. https://doi.org/10.1007/978-981-13-0344-9_9. Online ISBN 978-981-13-0344-9. 21. Laddha S. S., & Jawandhiya P. M. (2017). Semantic search engine. Indian Journal of Science and Technology, 10(23), 01–06. https://dx.doi.org/10.17485/ijst/2017/v10i23/115568. Online ISSN : 0974-5645.

18

S. S. Laddha and P. M. Jawandhiya

22. Park, H., Yoon, A., & Kwon, H. C. (2012). Task model and task ontology for intelligent tourist information service. International Journal of u-and e-Service, Science and Technology, 5(2), 43–58. 23. Lee, R. S. T. (Ed). (2007). Computational intelligence for agent-based systems (Vol. 72). Springer Science & Business Media. 24. Pan, B., Xiang, Z., Law, Rob, & Fesenmaier, D. R. (2011). The dynamics of search engine marketing for tourist destinations. Journal of Travel Research, 50(4), 365–377. 25. Korfhage, R. R. (1997). Information storage and retrieval.

An Efficient Link Prediction Model Using Supervised Machine Learning Praveen Kumar Bhanodia, Aditya Khamparia, and Babita Pandey

Abstract Link prediction problem is subsequently an instance of online social network analysis. Easy access and reach of Internet has scaled social networks exponentially. In this paper, we have focused on understanding link prediction between nodes across the networks. We have explored certain features used in prediction of link using machine learning. The features are quantified exploiting the structural properties of the online social networks represented through a graph or sociograph. Supervised machine learning approach has been used for classification of potential node pairs for possible links. The proposed model is trained and tested using standard available online social network datasets and evaluated on state-of-the-art performance parameters. Keywords Social network · Link prediction · Node · Graph · Common neighborhood

1 Introduction Online Social Network (OSN) has established an era where the life of humans is being highly influenced by the trends and activities prevailing across these social networks. The power of the social network could be understood in a way that majority of the market and business trends have been decided and set on social networks, even governments are relying upon these mediums to implement their part P. K. Bhanodia (B) · A. Khamparia School of Computer Science and Engineering, Lovely Professional Univerity, Phagwara, India e-mail: [email protected] A. Khamparia e-mail: [email protected] B. Pandey Department of Computer Science and IT, Babasaheb Bhimrao Ambedkar University, Amethi, India e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 A. Khanna et al. (eds.), Recent Studies on Computational Intelligence, Studies in Computational Intelligence 921, https://doi.org/10.1007/978-981-15-8469-5_2

19

20

P. K. Bhanodia et al.

and parcels of policy implemented. Social network has now offering new avenues to build new friendships, associations and relationships may be in business life of social life [1]. Facebook, Twitter, LinkedIn and Flickr are few popular names of online social network sites around us which have become integrated part of our daily life. The rapid and exponential development of social networks has attracted the research community for exploring its evolution in order to investigate and understand its nature and structure. Researchers study these networks employing various mathematical models and techniques. Online social networks are represented through graphs (Fig. 1 sample social network graph representation) wherein the network nodes are representation of users and the relationship or association between the nodes is being represented through links. The social networks are being crawled up by the researchers for further examination, and during collection of information at a particular instance, the network information collected is partially downloaded where certain links between the nodes may be missing. The missing link information is an apprehension in understanding the standing structure of the network. Apart from this, crunching the network structure attributes for approximation of new fresh links between the nodes is another interesting challenge to be addressed. The formal definition according to Libon Novel and Klienberg [2] referred as: In a social network represented by G(V, E) where e = (u, v) belongs to E is the link between the vertices (endpoints) at a specific timestamp t. Multiple links between the vertices have been represented using the parallel links between vertices (nodes). Let us assume for time t ≤ t  that a subgraph denoted by G [t, t  ] of G restricted by the edges of time instance between the t and t’. According to supervised training methodology, a training interval [t0, t0 ] and a test interval [t1, t1 ] where t  < t1. Consequently, the link prediction gives an output list of links which are not present in graph G[t0, t0 ] but are predicted to appear in G[t1, t1 ]. Identification of contributing information for determining such missing and new links is helpful in new friendship recommendation mechanism usually used in online social networks. Several algorithmic techniques described by Chen et al. [3] have

Fig. 1 Social network representation in graph

An Efficient Link Prediction Model Using Supervised …

21

been introduced by IBM in their internal local private online social network established for their employees and workers in order to connect with each other digitally. The prediction or forecasting of such existing hidden links or creating new fresh links using the existing social network data is termed as link prediction problem. The applications of link prediction include domains like bioinformatics for finding protein–protein interaction [4]; prediction of potential link between nodes across network could be used to develop various recommendation systems for e-commerce Web sites [5]; besides link prediction can also assist the security systems for detecting and tracking the hidden terrorist groups or terrorist networks [6]. As to address the problem of link prediction pertaining to answer the relevant different scenarios, many algorithms and procedures were proposed and majority of the algorithms usually belong to machine learning approaches. The applications mentioned above are do not work only on social networks; rather many different networks like information network, Web links, bioinformatics network, communication network, road network, etc., may be included for further processing. It is obvious that crunching large and complex social networks in one single processing is although possible but would be inefficient and complex. This complex task can be simplified into subtasks and handled separately. The basic building blocks of any social networks are nodes, edges, the degree associated with the node and local neighborhood nodes. The prediction of potential links approximated exploiting global and local topological features of the network. Exploitation of the local neighborhood features will be used to identify the similarity between the nodes. Thus, the feature estimated using neighborhood techniques like JC, AA and PA would further be used in developing a model classifier for link prediction. The paper tries to address the problem of link prediction based upon machine learning approach or classifier which will be trained using certain similarity feature extracted by exploiting topological features. The proposed classifier would be experimentally evaluated using social networking dataset (Facebook and Wikipedia). The paper also introduces the state-of-the-art similarity techniques which include Adamic/Adar, Jaccard’s coefficient, preferential attachment which are used as a feature extraction techniques. The objective of the paper is to explore: • Online social network and its evolution along with appropriate representation using graphs. • The problem of link prediction and its evolution. • How link prediction problems could be comprehended and addressed. • The techniques employed for link prediction for establishing relationships between nodes across the online social network. • Contribution of machine learning in addressing link prediction between nodes in online social network. • Accordingly propose a model for effective and efficient link prediction between nodes in a online social network.

22

P. K. Bhanodia et al.

2 Link Prediction Techniques Advent of online social network has attracted the researcher to crunch the bulging and getting complex network to extract knowledge for further predictions and recommendations. Various techniques and predictive models have been introduced and proposed to analyze the online social networks; these methods are classified based on the way they exploit the data; local, global and machine learning-based methods are usually employed for network data exploitation. Distinguished methods may be explored in [7]. Common Neighborhood (CN). According to Newman [8], CN measure is deduced by computing the number of existing common neighbors between adjacent nodes across which the future link is supposed to be predicted. Thus, it is a score of similarity calculated by the intersection of number of adjacent connected node to the nodes for identifying the similarity for having a potential link to establish a relationship. It can be approximated using following expressions. The number of common neighbors to x is represented by (x) and to y is by (y). C N(x,y) = (x) ∩ (y) Jaccard’s Coefficient. In common neighborhood, the measure across the network is not effectively distinguishable to further refine it has been normalized. According to JC, prediction of link within nodes can be measured using following expression. J C(x y) =

(x) ∩ (y) (x) ∪ (y)

Adamic/Adar. This measure is for identifying the similarity between nodes referred as the summation of the logarithmic of the reciprocal of the number of common neighbors connected with nodes across the social network. The Adamic measure is computed using formula given below Adamic and Adar [9], whereas z is the set of total number of neighbor nodes of node x and node y. A A(x y) =

∞ 

1 log|(z)| z ∫(x)∩(y)

Preferential Attachment. In the same fashion, Kunegis et al. [10] have proposed another approach for detecting the similarity between the nodes which is typically identifying potential nodes across the social network. It refers as the maximum number of nodes will be attracted to the node of highest degree in the network. Mathematical representation of the measure is: P A(x,y) = (x) · (y)

An Efficient Link Prediction Model Using Supervised …

23

Sørensen Index. This measure is developed and proposed to crunch network communities by Zou et al. [11]; mathematically, it is referred as: Sx y =

2 ∗ |(x) ∩ (y)| k(x) + k(y)

Hub Promoted Index (HPI). This technique is typically introduced for processing overlapping pairs within the complex type of social networks. It is defined as the ratio of twice of the common neighbors between the nodes to the minimum total number of nodes associated with either of the node of the pair (x or y). Mathematically, it is represented as Sx y =

2 ∗ |(x) ∩ (y)| min|k(x), k(y)|

Hub Depressed Index (HDI). The technique is similar to HPI; the only difference here is the denominator is the degree of the node associated with the maximum number of neighbor nodes associated with either of the node of the pair (x or y); mathematically, it is, Sx y =

2 ∗ |(x) ∩ (y)| max|k(x), k(y)|

Leicht–Holme–Newman Index (LHN). Leicht et al. have proposed technique for similarity approximation where it is referred as the ratio of common neighbors and the product of degree of the nodes (x and y). Sx y =

2 ∗ |(x) ∩ (y)| k(x) ∗ k(y)|

Path Distance. It is basically a global method where the network global structure is exploited for generating a measure over which the link prediction is estimated. It is typical measured distance between the nodes for identifying the closeness between the nodes. Dijkstra algorithm could be applied for retrieving the shortest path, but it would be an inefficient method for large complex type of social networks. It is also known as the geodesic distance between two nodes. Katz. It considers all the paths across two nodes and designates the shortest path with highest value. The approximation would reduce exponentially the involvement of the path in a way to assign lower values to the longer paths; mathematically, it is represented as Katz(x y) =

n  l=0

β < path(x, y)

24

P. K. Bhanodia et al.

where β is generally used for controlling the length of the paths, how much it should be considered.

3 Experimental Methods and Material It has been discussed during thorough literature review that social networks are exploited on the basis of their graphical structures. Various methods have been used to compute the links between the nodes, the methods typically vary with respect to the nature of the networks, we have got information network, business network, friendship network and so on, and therefore, there is no single method which can effectively address the problem of link prediction. Thus, to simplify it has been solved in two phases, wherein first-phase local structure of the network is exploited and a new resultant network is formed with additional features. With additional features, the new network is processed with machine learning techniques to build a classifier for link prediction in social network. Naïve Bayes network has been used for further experimental analysis. Bayes Theorem. The theorem is used to find the probability of having an even A given that event B is occurred. It is supposed that here B is designated as evidence and A is designated as hypothesis. Assume that the attributes or predictors are independent. It is understood that availability of one specific attribute does not affect the other one; therefore, it is known as naive. The expression of naïve is represented as under P(A|B) =

P(A|B)P( A) P(B)

The Naïve Bayes Model. It is an algorithm for classification in supervised learning specifically for binary class and in certain cases multiclass classification problems. The algorithm is best suited for data in binary or categorical input values. In our social network dataset, we do have binary classification values for link prediction. As known, it is also referred as idiot Bayes or naive Bayes because the probability computation for hypothesis is simplified in a way to channelized. Instead of trying to computer the attribute values individually P(a1, a2, a3|h), they are supposed to be independent conditionally by giving the output values and computed as P(a1|h) *P(d2|H) and so forth. Naive Bayes model uses probabilities for its representation, and in order to learn the model, list of probabilities will be stored in a file. In this file class probabilities along with conditional probabilities for every class present in the training dataset of each input value will be given with respect to class values. As far as learning of the model is concerned, it is simple as the frequency of the instances belonging to every class and the probability where different input values of x values are to be calculated; no additional optimization procedure is required to perform this.

An Efficient Link Prediction Model Using Supervised …

25

The experimental study is evaluated over Wikipedia network, the dataset for which is downloaded from snap Web site. The performance parameters used for analysis are precision, recall, F1 score and accuracy. As the link prediction problem is a kind of binary classification problem where positive link between nodes is designated as presence of links and negative link between nodes is designated as absence of potential link. Precision is determined dividing true positive value by the sum of false positive and true positive both. Sensitivity or recall value can be determined by division of true positive value by sum of true positive and false negative values. The equations for performance evaluation are as follows. Precision = Recall = F1 score =

True Positive True Positive + True Negative

True Positive True Positive + False Negative

2 ∗ True Positive 2 ∗ True Positive + false Negative + false positive

4 Experimental Study The dataset used for experimental analysis is of voted history data. It includes around 2794 nodes and around 103,747 votes casted among 7118 users including existing admins (either voted or being voted on). Partially, it consists of around half of the votes in the dataset which are by existing admins, while the other half comes from ordinary Wikipedia users. The dataset is downloaded from https://snap.stanford.edu/ data/wiki-Vote.html The network nodes are users, and directed edges are from node i to node j designated user i has voted on user j. Naive Bayes network classification technique is used to create a classifier model for link prediction in a social network. The model created using stratified tenfold crosses validation. It has been observed from Table 1 demonstrated below that the classifier has predicted around 90.37% of the instances correctly leaving around 9.62% of the incorrect classified instances. The total time taken for building up of the model is 0.03 s which is not much although the network selected may be of much smaller size and in future on real data may be increased; however, it is reasonably fair. Table 1 Detailed accuracy of the model classifier Model

True positive

False positive

Precision

Recall

F—measure

AA + Naïve Bayes

1.000

0.750

0.901

1.000

0.948

JC + Naïve Bayes

0.991

0.000

1.000

0.991

0.996

26

P. K. Bhanodia et al.

Naive Bayes when combined with Jaccard’s coefficient has significantly produced results where accuracy is improved to 99.12%. The classifier model is built in negligible time. It has correctly classified around 340 instances compared to three incorrectly classified instances. Figures 2 and 3 represent the classification of true instances and false instances of the network.

Fig. 2 Graphical representation of classifier (AA + Naïve Bayes)

Fig. 3 Graphical representation of classifier (Jaccard’s coefficient + Naïve Bayes)

An Efficient Link Prediction Model Using Supervised …

27

5 Conclusion and Future Enhancement In this paper, online social networks are studied from the point of link prediction between the set of nodes in a large scaling online social network. In the process, we have introduced various local and global classical techniques which produce a measure used for identification of a potential link between the nodes. These dyadic structural techniques in this paper have been studied with supervised machine learning techniques. Adamic/Adar and Jaccard’s coefficient are combined with naive Bayes classification technique to build a classifier. the experimental analysis shows that use of Jaccard’s coefficient with naive Bayes has produced better accurate results than the previous one. Though the results are witnessing over-fitting compared to the previous approach which reasonably fair as well but even though the later approach is superseding in accuracy. The model trained and tested over only one type of social network. Exploitation of other types of social network may produce a significant result to generalize the model over other online social networks.

References 1. Liben-Nowell, D., & Kleinberg, J. (2007). The link prediction problem for social networks. Journal of the American Society for Information Science and Technology, 58(7), 1019–1103. 2. Kautz, H., Selman, B., & Shah, M. (1997). Referral web: Combining social networks and collaborative filtering. Communications of the ACM, 40(3), 63. 3. Chen, J., Geyer, W., Dugan, C., Muller, M., & Guy, I. (2009). Make new friends, but keep the old: ˙ Recommending people on social networking sites. In: Proceedings of the 27th International Conference on Human Factors in Computing Systems, ser. CHI’09 (pp. 201–210). NewYork: ACM. https://doi.acm.org/10.1145/1518701.1518735. 4. Airoldi, E. M., Blei, D. M., Fienberg, S. E., & Xing, E. P. (2006). Fixed membership stochastic block models for relational data with application to protein-protein interactions. In Proceedings of International Biometric Society-ENAR Annual Meetings. 5. Huang, Z., Li, X., & Chen, H. (2005). Link prediction approach to collaborative filtering. In Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries. 6. Hasan, M. A., Chaoji, V., Salem, S., & Zaki, M. (2006). Link prediction using supervised learning. Counter terrorism and Security: SDM Workshop of Link Analysis. 7. Pandey, B., Bhanodia, P. K., Khamparia, A., & Pandey, D. K. (2019). A comprehensive survey of edge prediction in social networks: Techniques, parameters and challenges. Expert Systems with Applications, Elsevier. https://doi.org/10.1016/j.eswa.2019.01.040. 8. Newman, M. E. J. (2001). Clustering and preferential attachment in growing networks. Physical Review Letters E. 9. Adamic, L. A., & Adar, E. (2003). Friends and neighbors on the web. Social Networks, Elsevier, 25(3), 211. 10. Kunegis, J., Blattner, M., & Moser, C. (2013). Preferential attachment in online networks: Measurement and explanations. In Proceedings of the 5th Annual ACM Web Science Conference (WebSci’13) (pp. 205–214). New York: ACM. 11. Zhou, T., Lü, L. & Zhang, Y. C. (2009). The European Physical Journal B, 71, 623. https://doi. org/10.1140/epjb/e2009-00335-8.

Optimizing Cost and Maximizing Profit for Multi-Cloud-Based Big Data Computing by Deadline-Aware Optimize Resource Allocation Amitkumar Manekar and G. Pradeepini

Abstract Cloud computing is most powerful and demanding for businesses in this decade. “Data is future oil” can be proved in many ways, as most of the business and corporate giants are very much worried about business data. In fact to accommodate and process this data, we required a very expensive platform that can work efficiently. Researchers and many professionals have been proved and standardize some cloud computing standards. But still, some modifications and major research toward big data processing in multi-cloud infrastructure need to investigate. Reliance on a single cloud provider is a challenging task with respect to services like latency, QoS and non-affordable monetary cost to application providers. We proposed an effective deadline-aware resource management scheme through novel algorithms, namely job tracking, resource estimation and resource allocation. In this paper, we will discuss two algorithms in detail and do an experiment in a multi-cloud environment. Firstly, we check job track algorithms and at last, we will check job estimation algorithms. Utilization of multiple cloud service providers is a promising solution for an affordable class of services and QoS. Keywords BDA · Resource allocator · Cloud computing · Optimization · Fare share · Cost optimization

1 Introduction The last decade was a “data decade.” Many multi-national company changes its modes of operation based on data analysis. Big data and data analysis is an essential and mandate for every industry. Companies like Amazon, Google and Microsoft are ready with their data processing platform completely based on the cloud [1] in other A. Manekar (B) · G. Pradeepini CSE Department, KLEF, Green Fields, Vaddeswaram, Andhra Pradesh 522502, India e-mail: [email protected] G. Pradeepini e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 A. Khanna et al. (eds.), Recent Studies on Computational Intelligence, Studies in Computational Intelligence 921, https://doi.org/10.1007/978-981-15-8469-5_3

29

30

A. Manekar and G. Pradeepini

sense, all social media companies are also targeting cloud as a prominent solution. Netflix and YouTube have already started using the cloud [2]. Cloud computing impacted and proved as a very effective and reliable solution for multivariate huge data. Still, researchers and professionals are working to enhance more and more possibilities from the existing cloud structure. One of the major and critical tasks is resource provisioning in a multi-cloud architecture. We tried to solve some of the issues in multi-cloud architecture by implementing a prominent algorithm in a multi-cloud architecture. Cloud computing is available in three types for each of us [3], the foremost Publics Cloud Platform in which third-party providers are responsible to provide services on the public cloud. In most cases, these services are maybe free or sold by service providers on-demand, sometimes customers have to pay only per usage for the CPU cycles, storage or bandwidth they consume for their applications [4–6]. Second is the Private Cloud Platform in this entire infrastructure is privately owned by the organization; also it completely maintained and manages via internal resources. For any organization, it is very difficult to maintain and manage the entire infrastructure then, they can own VPC (Virtual Private Cloud) where a third-party cloud provider-owned infrastructure but used under organization premises [7–9]. The third is the Hybrid Cloud Platform; as the name indicated, it is a mixed computing resource from public and private services. This platform is rapidly used by many as a cost-saving and readily available on demand for fast-moving digital business transformation. Cloud providers enhanced their infrastructure in distributed by expanding data centers in different geographical regions worldwide [4–6]. Google itself operates 13 data centers around the globe. Managing distributed data centers and maximizing profit is a current problem. Ultimately, the customer is affected by high cost and maintenance charges by these data centers. This cost has four principles bound by applications serving to big data. Numerous cost-effective parallel and time-effective tools are available in big data processing with the programming paradigm. The master player in this tool or every big data application is the management of resource which use an available resource and manage trade-offs between cost and result. Complexity, scale, heterogeneity and hard predictability are the key factors of these big data platforms. All challenges like complexity, which exactly in inner of architecture, consist of proper scheduling of resources, managing power, storage system and many more. The scale totally depends on target problem—data dimensions and parallelism with high deadline [10]. Heterogeneity is a technology need—maintainability and evolving nature of the hardware. Hard predictability is nothing but the crunching of these their major factors explained earlier as well as a combined effect of hardware trade-offs.

2 Literature Survey and Gap Identification Inacio, E. C., Dantas in 2014 specified characterization [11] which deals with optimization problems related to large dataset has mentioned the scale exacerbates. A variety of aspects have an effect on the feat of scheduling policies such as data volume

Optimizing Cost and Maximizing Profit for Multi-Cloud-Based Big Data …

31

(storage), data variety, data velocity, security and privacy, cost, connectivity and data sharing [12, 13]. The resource manager can be organized in a two-layer architecture as shown in Fig. 1. The job scheduler [12] is responsible for allocating resources to mitigate the execution of various different jobs running at the same time. Figure 1 represents the local executable resource scale which exacerbates known management and dimensioning problems, both in relation to architecture and resource allocation and coordination [14, 15]. The task-level scheduler, on the other hand, decides how to assign tasks on multiple task executors for each job [10, 16]. Cluster scheduler measures each job as a black box and executes as a general policy and strategy. Our efforts are that by optimizing fiscally application-specific features, we finally optimize resource scheduling decisions and achieve better performance for advanced data analytics [17]. Figure 2 shows various open-source big data resource management frameworks [18]. In many pieces of literature, it is observed that most of the available big data processing framework is an open-source framework. Some of the preparatory frameworks have license fees and the necessity of specialized high-end infrastructure. On the contrary, open-source uses commodity hardware with marginal variation and requirements. Basically, Spark is a mainstream data streaming framework which is the industry likely and can be expanded and ultimately used in various IoT-based application data analysis. YARN is the heart of Hadoop which works for global resource management (ResourceManager) and per-application management (Application Master) [19–21]. As far as research gap identification and problem formulation, some observations are mentioned.

Fig. 1 Hierarchical resource management system

32

A. Manekar and G. Pradeepini

Fig. 2 Classification of big data resource management frameworks

1. Apache Spark with fault tolerance mechanism and characterization to support in data streaming is a prominent platform 2. Spark MLlib and Flink-ML offer a variety of machine learning algorithms and utilities to exploit distributed and scalable big data applications. 3. More and more focus should be made for a few issues such as throughput, latency and machine learning. 4. Deadline-aware job tracing and scheduling resources should be managed instead of the fine-grained splitting of resource pooled when deadlines are not mitigating. 5. Deadline achievement without wasting resources in IoT workloads in a resourceconstrained environment.

3 Problem Formulation Missing a deadline disturbs entire large intensive data processing and leads to underused resource utilization, incurred the cost of cloud uses for both cloud service provider and user, and leads to poor decision making [22, 23]. To address this issue, we designed a framework that is actually to be framework-agnostic and not rely on job repetition or pre-emption support. On the other side in this work, focus is maintained to utilize job histories and statics to control job admission. Instead of traditional fair share resource utilization, we design a deadline-aware optimized resource allocation policy by implementing two algorithms—one is job tracking and other is resource estimation and resource allocation [8, 24]. Consideration of the second algorithm

Optimizing Cost and Maximizing Profit for Multi-Cloud-Based Big Data …

33

based on an only decision is made for effective deadline-aware resource allocation. Let us discuss the actual problem formulation. To overcome such issues, some research objectives are drawn based on intensive literature review and research gap identification. Our objectives for the formulation of problems are listed below. 1. Design a policy for improving deadline and fairness in resource allocation by using past workload data and deadline information for the more diverse workload. 2. Use the information in objective 1 for estimating the fraction of requested resource for complete job before actually deadline expires with hard and soft deadline scenario. 3. Finally, optimize the result for fair share policy for allocating resources while running in fewer or greater resources according to changing workload with strategies for repetition of jobs for allocation in a fair share. 4. Comparing the result with YARN resource allocator and demonstrating the purpose of improvement for finding the best possible solution with low cost in terms of cloud providers and users in IoT workload. With the above-said research objective by considering certain scenario, a framework of fair share resource allocator has been proposed. In this framework, the goal is to mitigate all objectives one by one. In this section, two algorithms are proposed to fulfill these objectives. A first algorithm is job tracking algorithm, and the second is job estimation algorithms. The proposed algorithm is constructed by keeping a view on fair share and deadline-aware allocation with admission control by resource negotiation. Our approach is to negotiating CPU and other commodity resources for a job execution to meet deadlines in a resource-constrained environment. To execute a newly arrived job, the note should be taken for execution time data for previous jobs. Analyzing each job estimation can be drawn for a minimum number of CPUs that the job would have needed and deadline awareness, this can be noted as CPUDeadline . CPU Deadline can be calculated on the basis of the fraction of devising the compute time of job and deadline gave (CPUDeadline = Ctime /deadline). Maximum number of CPUs can be assigned to any job (Mcpus). Algorithm 1 describe Job_Tracking algorithms to calculate the deadline and estimate a minimum number of CPU. Algorithm—Job track algorithms and resource allocation system based on Apache Spark—For the execution of algorithm, an application is to be submitted to Spark Cluster with desired RSA w.r.t possible resources computing like (CPU), Memory (M) and total executors (Ex) per application. Prior knowledge of the total resource amount of cluster need is essential. A. In Apache Spark master and worker, nodes are being deployed on cloud virtual machines. Assume that these virtual machines (VM’s) are homogenously used in extension assumption made about all virtual machines that have the same computation power, i.e., same CPU ( cores), storage and computational memory [25].

34

A. Manekar and G. Pradeepini

Algorithm 1 Job_Tracking 1

Initiation of Asp

2

Accept Fun Job_Track( C time , RTask , D, N cpu Allo, R)

3

CPU Deadline = C time /D

4

M Cpus = min(ReqTask , CC)

5

ReqMinRate = CPU Deadline /Max CPU

6

ReqminList . add( ReqMinRate)v

7

CPU Frac Min = min(ReqminList)

8

CPU Frac Max = max(ReqminList)

9

CPU Frac Last = NcpuAllo/CPUFracMax

10

Success_Last = Success

11

Function Ends

Algorithm 1—Job Estimation based on Apache Spark—For the execution of algorithm, an application is to be submitted to Spark Cluster with desired RSA w.r.t possible resources computing like (CPU), memory (M) and total executors (Ex) per application. The algorithm mentioned specified fair resource allocation system (FRAS) based on Apache Spark. Prior knowledge of the total resource amount of cluster need is essential. From Algorithm 1, use analyzed data for each completed job which previously executed to finish the job who meets their deadline with maximum parallelism; Stratos Dimopoulos [18] mentioned name Justice—for their algorithms. In this paper, we tried to implement in the same way as mentioned in [26] with modification with respective our objective is drawn earlier in this section. This algorithm admits all jobs as in condition for bootstrapping the system. Fair share resource allocator first. Algorithm 2 Job_Estimation algorithms 1

Initiation of Asp

2

Accept Fun Job_Track(C time , RTask , D, N cpu Allo, R)

3

CPU Deadline = C time /D

4

M Cpus = min(ReqTask , CC)

5

ReqMinRate = CPU Deadline /Max CPU

6

If ReqMinRate > CPUFrac then

7

CPU Frac = ReqMinRate

8

End IF

9

Function Ends

Optimizing Cost and Maximizing Profit for Multi-Cloud-Based Big Data …

35

4 Static and Dynamic Performance Analysis We proposed a novel algorithm for the resource-constrained environment with a deadline for resource-constrained cluster capacities (number of CPUs). The static and dynamic performance analysis of these algorithms will be evaluated on certain parameters. We are very hopeful for the performance analysis after experimenting on fairness evaluation, deadline satisfaction, efficient resource usage and cluster utilization. Basically, all these parameters will be very helpful for enhancing the static and dynamic stability of a multi-cloud environment of resources provided by the various cloud service providers. Multi-cloud environment basically provided by a various service provider may or may not be situated with the same geographical area and also may not be performed hypothetically same. A proposed set of the algorithm is statistically evaluated on a said parameter by using simulation developed in Java and Python overfit the different BDA analysis tools. The proposed set of algorithm address problem related to fairness evaluation, deadline satisfaction, efficient resource usage, and cluster utilization and facilitate appropriate selection and management. Ultimately, the cost of data-intensive applications is minimized, while the specified QoS by users is met. Discussion of expected result on the basis of these algorithms is mentioned following. A. Fairness Evaluation—Fair share mechanisms can violate fairness in certain conditions like when CPU demands exceed with respective available. Fairness violations happen because future workload prediction is complicated and not anticipated by this kind of mechanism. A job with high resource demand is not to get resources and dropped from execution. Usually, jobs waiting in the queue can miss the deadline due to a heavy workload. Proposed algorithms will mitigate this kind of problem with fair share resource allocation with deadline awareness in a resource-constrained environment. B. Deadline Satisfaction—Admission control mechanism is very important in this parameter. Without admission control, admit jobs cannot meet the deadline. Ultimately, unnecessary queuing of jobs and resource congestion may lead to the dropping of jobs from the execution queue. The proposed algorithm is trained to achieve a larger fraction of deadline successes overall. C. Efficient Resource Usage—For a fixed workload resource scarcity is not a problem; perhaps the proposed algorithm gives a fair chance to get extra resource and probably provision to expand in demands of extra resource in execution. This will be carried out with conservative, prioritizing fairness and deadline success over resource saturation. D. Cluster Utilization—It is very complicated for the fair share resource allocator without implementing admission control and proper resource utilization technique to enhanced cluster utilization. Hence, a proposed algorithm takes care of implementation and execution of more workload without making CPU too busy by analyzing the duration of idle CPU.

36

A. Manekar and G. Pradeepini

5 Experimental Setup Existing fair share resource allocator does not take consideration of the deadline of every individual job. The general assumption in this kind of resource allocation is every job has indefinitely and that there is no limit on the turnaround time a job’s owner is willing to tolerate. The proposed algorithm will be implemented for the basic allocator by considering the job deadline for the resource-constrained environment. Attempting to a trace-based simulation developed in Python and Java for the admission control while submitting a job will give the desired result. We are in phase to implement this for different resource-constrained with a variety of hardware precisions. For the entire experimental setup, nodes run on Ubuntu 12.04 Linux system with mapped reduce Hadoop stack.

6 Results Obtained and Work in Progress The proposed algorithm is promising in tracking the success of its allocation decisions and improves its future allocations accordingly. Every time a job completes, it updates a cluster-wide model that includes information about the duration, size, maximum parallelization, deadline and provided resources for each job. If the job is successful, the proposed algorithm is more optimistic providing the jobs that follow with fewer resources hoping they will still meet their deadlines? Next we compare the result of the proposed algorithm with the existing methodology in the big data analytics framework. The novelty of the proposed algorithm is if the job is unsuccessful Justice provides more conservative allocations to make sure no more jobs miss their deadlines.

7 Expected Contribution to the Literature Our research aims to satisfy deadlines and preserve fairness to enable reliable use of multi-analytic systems in resource-constrained clusters. It achieves this in a framework-agnostic way by utilizing admission control and predicting resource requirements without exploiting job repetitions. A key point of our research is its applicability without costly modifications and maintenance in existing popular opensource systems like Apache Mesos and YARN. Thus it requires minimal effort to integrate with the resource manager without the need to adapt to API or structural changes of the processing engines.

Optimizing Cost and Maximizing Profit for Multi-Cloud-Based Big Data …

37

8 Conclusion and Future Work Modern big data analytics system is designed to very-large-scale and fault-tolerant operation which gives new revaluation to the corporate industry. In every sector of industry whether it is a health care, tourism, bioinformatics education, finance, ecommerce, social networks, sports and much more, fast analysis and strong support of big data analysis are required. The advent of IoT brings the combined operation of big data processing systems in smaller, resource-constrained and shared clusters. With the advancement of cloud-enabled big data, adaptation processing works are assigned with low latency, and fair share resource allocation and deadline optimization are the challenges. We try to mitigate the problem with proposed algorithms in a convincing way which can lead the faster and prominent BDA for various available tools like Hadoop, Spark, etc. Our proposed algorithms are in implementation phase. As future work, a try is on implementation of this work as a lightweight API-based integration module for resource management.

Reference 1. GERA, P., et al. (2016). A recent study of emerging tools and technologies boosting big data analytics. 2. Shvachko, K., et al. (2010). The Hadoop distributed file system. In Proceedings of the 26th IEEE Symposium on Mass Storage Systems and Technologies (MSST), Washington, DC, USA. 3. George, L. (2011). HBase: The definitive guide: Random access to your planet-size data. O’Reilly Media, Inc. 4. Ghemawatand, S., & Dean, J. (2008). Mapreduce: Simplified data processing on large clusters. Communications of the ACM. 5. Malik, P., & Lakshman, A. (2010). Cassandra: A decentralized structured storage system. ACM SIGOPS OS Review. 6. Zaharia, M., et al. (2012). Resilient distributed datasets: A fault-tolerant abstraction for inmemory cluster computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, San Jose, CA. 7. Vavilapalli, V. K., et al. (2013). Apache Hadoop yarn: Yet another resource negotiator. In Proceedings of the 4th ACM Annual Symposium on Cloud Computing, Santa Clara, California. 8. Hu, M., et al. (2015). Deadline-oriented task scheduling for MapReduce environments. In International Conference on Algorithms and Architectures for Parallel Processing (pp. 359– 372). Berlin: Springer. 9. Golab, W., et al. (2018). OptEx: Deadline-aware cost optimization for spark. Available at https://github.com/ssidhanta/OptEx/blob/master/optex_technical.pdf, Technical Report, 01 2018. 10. Hindman, B., et al. (2011). Mesos: A platform for fine-grained resource sharing in the data center. In NSDI (pp. 22–22). 11. “Netflix at spark+ai summit 2018,” by F. Siddiqi, in 2018. 12. Laney, D., et al. (2001). 3D data management: Controlling data volume, velocity, and variety. 13. Hindman, B., et al. (2011). Mesos: A platform for fine-grained resource sharing in the data center. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, Boston, MA, USA.

38

A. Manekar and G. Pradeepini

14. Ghemawat, S., et al. (2004). MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th Conference on Symposium on Operating Systems Design & Implementation (Vol. 6 of OSDI’04, pp. 10–10). 15. Pradeepini, G., et al. (2016). Cloud-based big data analytics a review. In Proceedings—2015 International Conference on Computational Intelligence and Communication Networks, CICN IEEE 2016 (pp. 785–788). 16. Misra, V., et al. (2007). PBS: A unified priority-based scheduler. In ACM SIGMETRICS Performance Evaluation Review (Vol. 35. 1, pp. 203–214). ACM. 17. Zaharia, M., Das, T., & Armbrust, M., et al. (2016). Apache spark: A unified engine for big data processing. Communications of the ACM. 18. Dimopoulos, S., & Krintz, C., et al. (2017). Justice: A deadline-aware, fair-share resource allocator for implementing multi-analytics. In 2017 IEEE International Conference on Cluster Computing (CLUSTER) (pp. 233–244). 19. Jette, M. A., et al. (2003). Slurm: Simple Linux utility for resource management. In Workshop on Job Scheduling Strategies for Parallel Processing (pp. 44–60). Berlin: Springer. 20. Pradeepini, G., et al. Experimenting cloud infrastructure for tomorrows big data analytics. International Journal of Innovative Technology and Exploring Engineering, 8(5), 885–890. 21. Cheng, S., et al. (2016). Evolutionary computation and big data: Key challenges and future directions. In Proceedings of the Data Mining and Big Data, First International Conference, DMBD 2016, Bali, Indonesia, (pp. 3–14). 22. Singer, G., et al. (2010). Towards a model for cloud computing cost estimation with reserved instances. CloudComp. 23. Xiong, N., et al. (2015). A walk into metaheuristics for engineering optimization: Principles, methods, and recent trends. International Journal of Computational Intelligence Systems, 8, 606–636. 24. https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarnsite/FairScheduler.html. for YARN Fair Scheduler. 25. Chestna, T., & Imai, S., et al. Accurate resource prediction for hybrid IAAS clouds using workload-tailored elastic compute units. ser. UCC’13. 26. Pradeepini, G., et al. (2017). Opportunity and challenges for migrating big data analytics in cloud. In IOP Conference Series: Materials Science and Engineering.

A Comprehensive Survey on Passive Video Forgery Detection Techniques Vinay Kumar, Abhishek Singh, Vineet Kansal, and Manish Gaur

Abstract In the ongoing year, video falsification identification is a significant issue in video criminology. Unapproved changes in video outline causing debasement of genuineness and uprightness of inventiveness. With the progression in innovation, video preparing apparatuses and procedures are accessible for modifying the recordings for falsification. The adjustment or changes in current video is imperative to identify, since this video can be utilized in the validation procedure. Video authentication thus required to be checked. There are various ways by which video can be tempered, for example, frame insertion, deletion, duplication, copy and move, splicing and so on. This article presents forgery detection techniques like inter-frame forgery, intra-frame forgery & compression-based forgery detection that can be used for video tampering detection. Thorough analysis of newly developed techniques, passive video forgery detection is helpful for finding the problems and getting out new opportunities in the area of passive video forgery detection. Keywords Video forgery detection · Video tamper detection · Passive-blind video forensic · Video authentication

V. Kumar · M. Gaur Department of Computer Science and Engineering, Centre for Advanced Studies, Dr. A.P.J Abdul Kalam Technical University, Lucknow, India e-mail: [email protected] M. Gaur e-mail: [email protected] A. Singh (B) · V. Kansal Department of Computer Science and Engineering, Institute of Engineering and Technology Lucknow, Dr. A.P.J Abdul Kalam Technical University, Lucknow, India e-mail: [email protected] V. Kansal e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 A. Khanna et al. (eds.), Recent Studies on Computational Intelligence, Studies in Computational Intelligence 921, https://doi.org/10.1007/978-981-15-8469-5_4

39

40

V. Kumar et al.

1 Introduction Video forgery in the modern era requires attention significantly. The prime reason for the same is that transmission of information using multimedia is preferred choice due to low encryption cost. The processing of information within multimedia is through frame reading. Due to mass utilization of this mechanism in transmission, it is maliciously attacked by hackers and frames are altered. To this end, researcher uses distinct mechanism to perform encryption and detecting forgery if any within video frames. The digital video tampering in which the contents of videos is modified or changed to made it doctored or fake video [1]. Attacks that change video can be divided into three domains: spatial, temporal and spatial–temporal. Tampering can be done using various techniques [2]. There are following types of tampering that are applied to videos: • Shot-level tampering: In this, the scene is detected from videos, and then this scene is copied to another place or manipulation is done in this scene. This tampering is used in temporal or spatial level. • Frame-level tampering: The frames from videos are extracted first, then tampering is done on these frames. The forger may remove, add or copy the frames for changing the contents of videos. It is one of temporal tampering mechanisms used to alter frames within the videos. • Block-level tampering: It is applied on blocks of videos, i.e., any specified area of video frames. In this, blocks are cropped and replaced in videos. It is spatial tampering that is performed at block level. • Pixel-level tampering. In this, video frames are changed at pixel level. In this, pixels of videos are modified or copied or replaced [3]. The spatial attacks are performed at pixel level.

1.1 Video Forensic The last decade has seen video forensic becoming an important field of study. As shown in Fig. 1, it is divided into three types of categories [4–6].

Video Forensic

Source Video

Fig. 1 Type of video forensic

Differentiate Original & edited Video

Forgery Detection

A Comprehensive Survey on Passive Video Forgery …

41

The categories are source recognition, the ability to distinguish between computer generated and actual video, and the detection of forgeries. The first group emphasizes on describing the source of a digital product, such as mobile phones, camcorders, cameras, etc. The second objective is to differentiate between the real videos and edited video. The third is forgery detection aimed at finding proof of tempering in video digital data.

1.2 Objective of Digital Video Forensic Digital video forensic is concerned with the three main tasks as shown in Fig. 2. To tackle the contest of digital content authentication, the video media forensics area provides a set of tools and techniques known collectively as tamper or forgery detection techniques. Minute digital video or image content adjustments can cause real societal and valid problems. Altered recordings may be used as executing misleading news accounts, or misleading individuals. There are larger number of networks who manipulate media data on social networking sites such as Yahoo, Twitter, TikTok, Instagram, Facebook and YouTube. This paper is organized under following sections: Section 2 gives details of video forgery detection methods which are used to avoid above tempering methods and qualitative analysis of video forgery passive detection, and Sect. 3 presents comparative analysis of different techniques. Section 4 presents highlights and issues in video forgery detection, after that we conclude and present future scope of this paper in Sect. 5.

2 Video Forgery Detection Mechanisms and Qualitative Analysis of Passive Techniques Video forgery detection aims to establish the authenticity of a video and to expose the potential modifications and forgeries that the video might have undergone [7]. Undesired post-processing operations or forgeries generally are irreversible and leave some digital footprints. Video forgery detection techniques scrutinize these footprints Objective Video Forensic

Camera Identification

Fig. 2 Objective of video forensic

Tamper Detection

Hidden Data Recovery

42

V. Kumar et al. Video Forgery Detection

Passive Approach

Active Approach

Watermarking

Digital Signature

Inter Frame Forgery detection

Intra Frame Forgery detection

Compression Based Forgery detection

Fig. 3 Classification of video forgery detection

in order to differentiate between original and the forged videos. When a video is forged, some of its fundamental properties change and to detect these changes is what is called as video forgery detection techniques used for. There are two fundamental approaches for video forgery detection: active approach and passive approach as shown in Fig. 3.

2.1 Active Approach Active forgery detection includes techniques like digital watermarking and digital signatures which are helpful to authentic content ownership and copyright violations [8]. Though the basic application of watermarking and signatures is copyright protection, it can be used for fingerprint, forgery detection, error concealment, etc. There are several drawbacks to the active approach as it requires a signature or watermark to be embedded during the acquisition phase at the time of footage or an individual to embed it later after acquisition phase. This restricts the application of active approach due to the need of distinctive hardware like specially equipped cameras. Other issues which have an impact on the robustness of watermarks and signatures are factors like compression, scaling, noise, etc.

2.2 Passive Approach Passive forgery detection techniques are considered as an advancing route in digital security. The approach works in contrast to that of the active approach. This approach works in without the constraint for specialized hardware nor does it require any firsthand information about the video contents. Thus, it is also called as passive-blind approach. The basic assumption made by this approach is that videos have some inherent properties or features which are consistent in original videos. When a video is forged, these patterns are altered. Passive approaches extract these features from a video and analyse them for different forgery detection purposes. Video forgery is

A Comprehensive Survey on Passive Video Forgery …

43

Feature used for Video Forgery Detection

Camera/sensor Artifacts

Coding Artifacts

Motion Features

Objects Features

Fig. 4 Features used for video forgery detection

sometime not identified because of defect in software system; this defect is removed by early predicting defect in software [9, 10]. Different types of descriptive features were used by various researchers to accomplish the task of forgery detection [11–16]. Figure 4 presents features used for video forgery detection. Thus, to overcome the inefficiency encountered in the active approach, the use of passive approach for video forgery detection can be made. Passive approach thus proves to be better than the active ones as it works on the first-hand and information without the need for extra information bits and hardware requirements. It totally relies on the available forged video data and its intrinsic features and properties without the need of original video data. To be specific, active techniques include motion detection mechanisms and passive technique includes static mechanisms. The forgery under static mechanisms falls under inter-, intra- and compression-based mechanisms.

2.2.1

Inter-frame Forgery Detection

Inter-frame forgery detection mechanism adventures the time-based similarity within the video between frames [17]. The parity difference between frames is used as a footprint to locate any problems within the video frames. The parity difference between frame is exploited by the use of even or odd parity check mechanisms. The parity check mechanism incorporated checks whether data transmission includes even number of frames or odd number of frames. In case sent frames in even parity and transmitted frames are in odd parity, then forgery is detected. Table 1 is describing the different inter-forgery detection techniques based on frame deletion, insertion, duplication used in video forgery their advantages, disadvantages and accuracy result. Table 2 is describing the different inter-forgery detection technique-based copy frame analysis used in video forgery their advantages, disadvantages and accuracy result.

2.2.2

Intra-frame Forgery Detection

Intra-frame forgery detection uses the gaps between the frames to detect the forgery if any between the video frames. These mechanisms include copy-move forgery, splicing, etc. The image frames within videos are altered by the use of this mechanism.

Year

2018

2018

2017

Paper references

Detection of Inter-frame Forgeries in Digital Videos [18]

Inter-frame Passive-Blind Forgery Detection for Video Shot Based on Similarity Analysis [19]

Inter-frame Forgery Detection in H.264 Videos Using Motion and Brightness Gradients [20]

Advantages

Limitation

Results

Proposes methodology that uses residual and optical flow estimation in consistent to detect frame insertion, duplication in removal videos encoded in MPEG-2 and H.264. It is used for detecting forgeries in videos by exhibiting object motion

It reduces the conflicting results. It gives precise localization of forgery

A passive-blind video Motion-based detections shooting forensics scheme is done with accuracy using that inter-frame forgeries are tangent-based approach found. This method consists of two parts: hue-saturation-value (HSV) colour histogram comparison and speeded-up robust features (SURF) extraction function along with fast library for approximate nearest neighbours (FLANN) matching for double-check Performance of system suffers when high illumination videos are used

Motion within the video can be further detected accurately using noise handling procedure

It has average detection accuracy around 83%

Accuracy of scheme is 99.01%

Proposes new forensic It gives reliable result and It is not efficient due to using Detection rate is better and footprint based on the detection rate is improved more than one for utilizes CBR and VBR variation of the macro-block compressed video prediction types (VPF) in the P-frames &, also estimate the size of a GOP

Description

Table 1 Inter-frame forgery based on frame deletion, insertion & duplication techniques

44 V. Kumar et al.

2017

2016

2016

Video Inter-frame Forgery Detection Approach for Surveillance and Mobile-Recorded Videos [21]

A New Copy Move Forgery Detection Method Resistant to Object Removal of Uniform Background Forgery [22]

Inter-frame Video Forgery Detection & Localization Using Intrinsic Effects of Double Compress on Quantization Errors of Video Coding [23]

Proposes inter-frame manipulation detection and localization in MPEGx-coded video Based on traces of quantization error in PMB residual errors

The picture is split into different frequency bands so that we can process the particular image and video frame block easily

Proposes a method for This method is efficient & detecting copy-move suitable for forgery in the videos. It removing/inserting frames utilizes hybrid methodology of AKAZE feature and RANSAC for detection of copied frame and for elimination of false match. It detects forgery of object removal and replication It does not split the image into maximum level of the image

The feature detection is slower than the ORB feature

It proposes hybrid The defects are automatically It is unable to detect mechanism that uses motion detected using spikes count forgery frame in slow and gradient feature to motion videos extent variation between various frames. In this, forensic artefacts are analysed using objective methodology

Table 2 Inter-frame forgery based on copy frame analysis techniques

(continued)

Accuracy of detecting errors is high 99.46%

The detection is better even if forged image has been rotated, blurred, 98.7%

It detects max. and min. number of frames forged which is 60 and 10

A Comprehensive Survey on Passive Video Forgery … 45

2016

Chroma Key Background 2016 Detection for Digital Video Using Statistical Correlation of Blurring Artifact [25]

Detection of Re-compression, Transcoding and Frame Deletion for Digital Video Authentication [24]

Table 2 (continued) It is inexpensive and independent of heuristically computed thresholds

Proposes a blurring Gives better recall rate with artefact-based technique for efficiency detecting features in video along with chroma key. It first of all extracts the frame that has blurring effect; then it is further analysed for forged region

Proposes a forensic technique that is used to identify recompressed or transcoded videos by inspecting videos optical flow. Its detection accuracy is better because it does not limit by the number of post-production compressions It does not handled background colour

Method achieving detection accuracy of 91.12%

The frame addition is not Frame removal detection considered in this approach technique achieved an average accuracy of 99.3%

46 V. Kumar et al.

A Comprehensive Survey on Passive Video Forgery …

47

To detect such forgery, boundary colours and frames distinguishment are analysed. Result in terms of bit error rate is expressed using these mechanisms. Table 3 is describing the different intra-forgery detection techniques used in video forgery their advantages, disadvantages and accuracy result.

2.2.3

Compression-Based Mechanisms

The compression-based mechanisms include discrete cosine transformation. These mechanisms replace multiple distinct values from within the image frame with single-valued vectors. The feature vector then identified any malicious activity within the video frames. Results are most often expressed in the form of peak signal-tonoise ratio. Table 4 is describing the different compression-based forgery detection technique features.

3 Comparative Analysis of Passive Video Forgery Detection Techniques In this section, comparative analysis of various video forgery detection. Earlier paper appraises only a few forensic recording techniques. Many noteworthy and recent achievements were not examined or analysed [35–38]. It analyses the performance of copy-paste forgery detection techniques on motion-residue based approach [39], object based approach [40] and optical-flow-based approach [41]. Figure 5 presents the comparative outcomes of approach based on the quality factors. Figure 6 analyses the performance of inter-forgery detection on noise-based approach [42], optical-flow-based approach [43] and pixel based approach [44] and presents comparative overview of the findings as a feature of specific Quality factors percentage and different number of inserted/deleted/duplicated frames. Now as analysis suggests that motion-based forgery detection mechanisms are uncommon and hard to detect. In category 1(Inter-frame forgery) mechanism of the research papers is analysed and major part of the research is focused upon the parameters such as mean square error and peak signal-to-noise ratio. In category 2(Intra frame forgery) of research papers, noise handling procedure accommodated within these papers allows peak signal-to-noise ratio to enhance. In category 3(Compression based mechanism) of the papers lies and video forgery detection mechanism employed within such situation causes frame rate to decrease and hence noise within frame increases. Sometimes video forgery cannot detect due to software failure due to that peak signal-to-noise ratio value is altered [45–47]. These detection mechanisms allow parameters like PSNR and MSE to be optimized, and the accuracy result obtained in these mechanisms is shown Fig. 7. Generally, more than 100 videos were tested and used during comparative analysis. All these videos show both basic and

Year

2018

2018

2018

Paper references

Object-Level Visual Reasoning in Videos Fabien [26]

MesoNet: A Compact Facial Video Forgery Detection Network[27]

Coarse-to-Fine Copy-Move Forgery Detection Video Forensics [28]

Advantages

It proposed a coarse-to-fine approach based on video OF features. It detects copy-move forgery from videos with the help of overflow feature

Describes a methodology that detects facial video forgery, and it is hybrid technique. It focuses on mesoscopic features of image properties

We cannot get the exact value of the pixels in the SIFT technique and do not consider the exact route of frame processing

Inefficiency with compressed videos

Limitation

Duplicated regions detect High computation time of with changed contrast values the algorithm and blurred regions can also be detected

SIFT technique is used with the aid of neighbouring pixel values to process the frame and image

Proposed a method that Take less time to predict the detects objects from videos forgery region using cognitive methodology. It provides the facility to learn from detailed interactions and forged region is detected

Description

Table 3 Intra-frame forgery detection techniques

(continued)

Accuracy was found to be 98.79%

It is useful for automatic image frame

It gives detection accuracy with 96%

Results

48 V. Kumar et al.

Year

2016

2016

Paper references

Improvement in Copy-Move Forgery Detection Using Hybrid Approach [29]

A Video Forgery Detection Using Discrete Wavelet Transform and Scale Invariant Feature Transform Techniques [30]

Table 3 (continued)

Describes a SIFT- and DWT-based algorithm that first of all extract features of video frames then forged region is detected. It is mostly used for location of vindictive control with computerized recordings (advanced frauds)

Proposes SIFT and SURF methodology along with DWT for detecting forgery in videos. Those algorithms are based on the descriptor of colour and texture. The aim of these two algorithms is to extract digital image features and then match to test whether the image is being faked or not

Description

Limitation

It can robustly identify objects even among clutter and under partial occlusion

The outcome of error detection and the JPEG compression test is vital

Results

It is not useful for real-time The leaving model videos 98.21% accuracy

Multi-dimensional and Cannot be applied on multi-directional give precise compressed images results

Advantages

A Comprehensive Survey on Passive Video Forgery … 49

Technique uses block It handles compressed artefacts for detection of videos efficiently compression-based forgery. It combined VPF along with block artefacts for robust and efficient detection abilities

2016

The accuracy of forgery detection is better as compared to existing method

Proposed a methodology that It has high performance utilizes pre-processing and for relocating I-frames CNN mechanism for in compressed videos frame-wise detection of compressed videos forgery

Double Compression Detection in MPEG-4 Videos Based on Block Artifact Measurement with Variation of Prediction Footprint [34]

2017

Frame-wise Detection of Relocated I-frames in Double Compressed H.264 Videos Based on Convolutional Neural Network [32]

Advantages

Proposed a hybrid approach The symmetry reduces that uses detection, temporal execution time propagation, and across-scale refinement under this vacuum, the various configurations built

Description

Proposes a double-compressed H.264/AVC video-detection method. The feature of the string of the data bits for each P-frame was extracted and then incorporated with the skip macro-blocks feature

2018

Optimizing Video Object Detection via a Scale-Time Lattice Kai [31]

Detection of 2017 Double-Compressed H.264/AVC Video Integrating the Features the String of Data Bits and Skip Macroblocks [33]

Year

Paper references

Table 4 Compression-based forgery detection techniques Selected key frames as opposed to random sampled ones

Results

Low compression bit rate videos are no handled

Only robust to MPEG compression and recompression

It gives better discriminative performances compared existing technique

The rank of accuracy is better

The filtering mechanism The average accuracy is should be enhanced for better around 96% which is based detection on GOPs It does not apply frame-wise detection result for various detection of inter-frame forgeries

Classification accuracy can be further improved by the use of STLK

Limitation

50 V. Kumar et al.

A Comprehensive Survey on Passive Video Forgery …

51

100 90 Moon Based

80

82.1

88.7

89.1

79.9

40

85.2

50

85.2

Opcal Based

76.1

60

85.3

Object based

82.6

70

X axis: Bit-rates Y axis: Quality Factor

30 20 10 0 Bitrate(3)

Bitrate(6)

Bitrate(9)

Fig. 5 Comparative outcomes of copy-paste forgery at different bit rate and quality factors

100 Noise Based

90 80

Opcal flow based Pixel Correcon

70 60

88.0

85.3

86.1

82.9 85.4

82.3

83.9

30

79.1

40

71.9

50

20 10

X axis: Number Of Frames Y axis: Quality Factor

0 Frame (30)

Frame (60)

Frame (100)

Fig. 6 Effect on quality factor percentage by insertion/deletion/duplication the different number of frames

complex lifelike scenarios, depicting scenes both indoor and outdoor. All of the forgeries were created plausibly to simulate practical forensic scenarios.

52

V. Kumar et al.

100 90 80 70

98.62

93.42

50

96.73

Inter Frame Forgery 60

Intra Frame Forgery Compression Based Forgery

40 30

X axis : Video Forgery Detection techniques.

20 10 0 Category 1

Category 2

Category 3

Y-axis : Accuracy Percenatage

Fig. 7 Cumulative accuracy in passive video forgery detection method

4 Major Highlights, Issues & Finding in Passive Video Forgery Detection The domain of video forensic and video anti-forensic is explored. The results consist majority of passive video methodologies discussed in this survey. Most of the techniques use GOP structure [48–53] because it is easier to understand and they are having fixed number of frames. Types of temper video can suffer and various source for passive technique used to detect attack. The major highlights for detecting forgery of video are following. • In inter-frame techniques, detection of forgery is done by taking one frame at single instance. • In intra-frame techniques, detection of forgery is done by establishing the relationship between two adjacent frames. • Various techniques in which detection of forgery via detection of double compression. • Detection of forgery by motion and brightness feature-based inter-frame forgery detection technique. • Pixel-level analysis-based techniques for detecting pixel similarities in video forgery. • Analysis by copy-paste forgery detection techniques by looking for similarities or correlation between same regions.

A Comprehensive Survey on Passive Video Forgery …

53

Digital video forensics was also seen as being in its rudimentary stages. The identification of digital forgery is a very difficult activity, and the lack of a widely available solution exacerbates the situation. The various issues in video forgery detection techniques obtained during this survey are following • A significant shortcoming is that on realistically manipulated video, they lack sufficient validation. Manually producing fake videos is very time-consuming and so most authors performed research on synthetically doctored sequences [54–56]. • Digital video forensics was also seen as still in its rudimentary stages. The identification of digital forgery is a rather complex job, and the lack of a widely applicable solution exacerbates the situation [57–59]. • Video forensic detects frame manipulation by double compression if forger directly modifies the encoded video than insufficient anti-forensic and counter anti-forensic strategies [60, 61]. • For better video forgery detection, a huge database of tempered video is required [62–65]. From video forgery detection, we analyse the performance of various techniques like optical-based, motion-based, object-based, noise-based, pixel correction-based and copy-paste detection techniques. The major finding we obtained during these technique analyses is following • Understanding the reliability factors in video forgery detection in much better way. Video forgery includes issue related to multimedia heterogeneity, issue related to editing software and content of video which effect the reliability. • In future combining the active and passive techniques for obtaining the better accuracy in quality factors of forged video. • Integration of fields like artificial intelligence, machine learning, signal processing, computer vision and deep learning with the discussed techniques can also produce more accurate result.

5 Conclusion and Future Scope Nowadays, mostly information is presented through videos rather than textually, earlier forgery commonly takes place with text information, but nowadays video forgery is common. Digital forensics on video is also in its infancy. Digital video’s fidelity to fact is under threat from hacking. Various video-editing tools like Adobe Photoshop & Illustrator, Cinelerra, Lightworks, GNU Gimp, Premier, Vegas are available easily [66–72], so video forensic is one of the major research areas to detect the forgery in video. This paper analyses the various techniques used in order to detect the forgery within the digital videos. To tackle the issues in video forgery detection mechanisms are researched over in this paper with their advantages, limitation and result. Video counter forensic and anti-forensic is also explored. In future, researcher and developers work on overall lack of vigour potentially induced by lack of standardized databases, pixel-based approach and motion

54

V. Kumar et al.

detection mechanism like tangent-based strategy that can be used to enhance for better encryption and decryption of video frames along with splicing techniques for enhancement.

References 1. R. Saranya, S. Saranya, & R. Cristin. (2017). Exposing Video Forgery Detection Using Intrinsic Fingerprint Traces 1 1, 2, 3 IEEE Access, 73–76. 2. Kelly, F. (2006). Fast probabilistic inference and GPU video processing [thesis], Trinity College (Dublin, Ireland). Department of Electronic & Electrical Engineering, p. 178. 3. Li J, He H, Man H, Desai S (2009) A general-purpose FPGA-based reconfigurable platform for video and image processing. In W. Yu, H. He, & N. Zhang (Eds.), Advances in neural networks—ISNN 2009. ISNN 2009. Lecture notes in computer science (Vol. 5553, Berlin: Springer). 4. H. T. Sen car, & N. Memon (2008). Overview of state-of-the-art in digital image forensics. Statistical Science and Interdisciplinary Research, 325–347. 5. Asok, D., Himanshu, C., & Sparsh, G. (2006). Detection of forgery in digital video. In 10th World Multi Conference on Systemics, Cybernetics and Informatics, pp. 16–19, Orlando. USA. 6. Su, L., Huang, T., & Yang, J. (2014). A video forgery detection algorithm based on compressive sensing. Springer Science and Business Media New York. 7. Shanableh, T. (2013). Detection of frame deletion for digital video forensics. Digital Investigation, 10(4), 350–360. 8. Hsu, C. C., Hung, T. Y., Lin, C. W., & Hsu, C. T. (2008). Video forgery detection using correlation of noise residue. In 2008 IEEE 10th workshop on multimedia signal processing, pp. 170–174. 9. Ghosh, S., Rana, A., & Kansal, V. (2019). Evaluating the impact of sampling-based nonlinear manifold detection model on software defect prediction problem. In S. Satapathy, V. Bhateja, J. Mohanty, & S. Udgata (Eds.), Smart intelligent computing and applications. Smart innovation, systems and technologies (Vol. 159), pp. 141–152. 10. Ghosh, S., Rana, A., & Kansal, V. (2017). Predicting defect of software system. In Proceedings of the 5th International Conference on Frontiers in Intelligent Computing: Theory and Applications (FICTA-2016), Advances in Intelligent Systems and Computing (AISC), pp. 55–67, 2017. 11. Kurosawa, K., Kuroki, K., & Saitoh, N. (1999). CCD fingerprint method identification of a video camera from videotaped images. In Proceedings of IEEE International Conference on Image Processing, Kobe, Japan, pp. 537–540. 12. Lukáš, J., Fridrich, J., & Goljan, M. (2006). Digital camera identification from sensor pattern noise. IEEE Transactions on Information Forensics and Security, 1(2), 205–214. 13. Goljan, M., Chen, M., Comesaña, P., & Fridrich, J. (2016). Effect of compression on sensorfingerprint based camera identification. Electronic Imaging, 1–10. 14. Mondaini, N., Caldelli, R., Piva, A., Barni, M., & Cappellini, V. (2007). Detection of malevolent changes in digital video for forensic applications. In E. J. Delp, & P. W. Wong (Eds.), Proceedings of SPIE Conference on Security, Steganography and Watermarking of Multimedia Contents (Vol. 6505, No. 1). 15. Wang, W., & Farid, H. (2007). Exposing digital forgeries in interlaced and deinterlaced video. IEEE Transactions on Information Forensics and Security, 2(3), 438–449. 16. Wang, W., & Farid, H. (2006). Exposing digital forgeries in video by detecting double MPEG compression. In: S. Voloshynovskiy, J. Dittmann, & J. J. Fridrich (Eds.), Proceedings of 8th Workshop on Multimedia and Security (MM&Sec’06) (pp. 37–47). ACM Press, New York.

A Comprehensive Survey on Passive Video Forgery …

55

17. Hsia, S. C., Hsu, W. C., & Tsai, C. L. (2015). High-efficiency TV video noise reduction through adaptive spatial–temporal frame filtering. Journal of Real-Time Image Processing, 10(3), 561–572. 18. Sitara, K., & Mehtre, B. M. (2018). Detection of inter-frame forgeries in digital videos. Forensic Science International, 289, 186–206. 19. Zhao, D. N., Wang, R. K., & Lu, Z. M. (2018). Inter-frame passive-blind forgery detection for video shot based on similarity analysis. Multimedia Tools and Applications, 77(19), 25389– 25408. 20. Kingra, S., Aggarwal, N., & Singh, R. D. (2017). Inter-frame forgery detection in H.264 videos using motion and brightness gradients. Multimedia Tools and Applications, 76(24), 25767–25786. 21. Kingra, S., Aggarwal, N., & Singh, R. D. (2017). Video inter-frame forgery detection approach for surveillance and mobile recorded videos. International Journal of Electrical & Computer Engineering, 7(2), 831–841. 22. Ulutas, G., & Muzaffer, G. (2016). A new copy move forgery detection method resistant to object removal with uniform background forgery. Mathematical Problems in Engineering, 2016. 23. Abbasi Aghamaleki, J., Behrad, A. (2016). Inter-frame video forgery detection and localization using intrinsic effects of double compression on quantization errors of video coding. Signal Processing: Image Communication, 47, 289–302. 24. Singh, R. D., & Aggarwal, N. (2016). Detection of re-compression, transcoding and framedeletion for digital video authentication. In 2015 2nd International Conference on Recent Advances in Engineering & Computational Sciences RAECS. 25. Bagiwa, M. A., Wahab, A. W. A., Idris, M. Y. I., Khan, S., & Choo, K. K. R. (2016). Chroma key background detection for digital video using statistical correlation of blurring artifact. Digital Investigation, 19, 29–43. 26. Baradel, F., Neverova, N., Wolf, C., Mille, J., & Mori, G. (2018). Object level visual reasoning in videos. In Lecture Notes in Computer Science (including Subseries Lecture Notes Artificial Intelligence, Lecture Notes Bioinformatics) (Vol. 11217, pp. 106–122). LNCS. 27. Afchar, D., Nozick, V., Yamagishi, J., & Echizen, I. (2018). MesoNet: A compact facial video forgery detection network. In 2018 IEEE International Workshop on Information Forensics and Security (WIFS). 28. Jia, S., Xu, Z., Wang, H., Feng, C., & Wang, T. (2018). Coarse-to-fine copy-move forgery detection for video forensics. IEEE Access, 6(c), 25323–25335. 29. Kaur Saini, G., & Mahajan, M. (2016). Improvement in copy—Move forgery detection using hybrid approach. International Journal of Modern Education and Computer Science, 8(12), 56–63. 30. Kaur, G., & Kaur, R. (2016). A video forgery detection using discrete wavelet transform and scale invarient feature transform techniques. 5(11), 1618–1623. 31. Chen, K., Wang, J., Yang, S., Zhang, X., Xiong, Y., Loy, C. C., & Lin, D. (2018). Optimizing video object detection via a scale-time lattice. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 7814–7823). 32. He, P., Jiang, X., Sun, T., Wang, S., Li, B., & Dong, Y. (2017). Frame-wise detection of relocated Iframes in double compressed H.264 videos based on convolutional neural network. Journal of Visual Communication and Image Representation, 48, 149–158. 33. Yao, H., Song, S., Qin, C., Tang, Z., & Liu, X. (2017). Detection of double-compressed H.264/AVC video incorporating the features of the string of data bits and skip macroblocks. Symmetry (Basel), 9(12), 1–17. 34. Rocha, A., Scheirer, W., Boult, T., & Goldenstein, S. (2011). Vision of the unseen: Current trends and challenges in digital image and video forensics. ACM Computing Surveys, 43(4), 26. 35. Milani, S., Fontani, M., Bestagini, P., Barni, M., Piva, A., Tagliasacchi, M., & Tubaro, S. (2012). An overview on video forensics. APSIPA Transactions on Signal and Information Processing, 1(1), 1–18.

56

V. Kumar et al.

36. Wahab, A. W. A., Bagiwa, M. A., Idris, M. Y .I., Khan, S., Razak, Z., & Ariffin, M. R. K. Passive video forgery detection techniques: a survey. In Proceedings of 10th International Conference on Information Assurance and Security, Okinawa, Japan, pp. 29–34. 37. Joshi, V., & Jain, S. (2015). Tampering detection in digital video e a review of temporal fingerprints based techniques. In Proceedings of 2nd International Conference on Computing for Sustainable Global Development, New Delhi, India, pp. 1121–1124. 38. Bestagini, P., Milani, S., Tagliasacchi, M., & Tubaro, S. (2013). Local tampering detection in video sequences. In Proceedings of 15th IEEE International Workshop on Multimedia Signal Processing. Pula, pp. 488–493. 39. Zhang, J., Su, Y., Zhang, M. (2009). Exposing digital video forgery by ghost shadow artifact. In Proceedings of 1st ACM Workshop on Multimedia in Forensics (MiFor’09) (pp. 49–54). NewYork: ACM Press. 40. Bidokhti, A., Ghaemmaghami, S.: Detection of regional copy/move forgery in MPEG videos using optical flow. In: International symposium on Artificial intelligence and signal processing (AISP), Mashhad, Iran, pp. 13–17 (2015). 41. De, A., Chadha, H., & Gupta, S. (2006). Detection of forgery in digital video. In Proceedings of 10th World Multi Conference on Systems, Cybernetics and Informatics (pp. 229–233). 42. Wang, W., Jiang, X., Wang, S., & Meng, W. (2014). Identifying video forgery process using optical flow. In Digital forensics and watermarking (pp. 244–257). Berlin: Springer. 43. Lin, G. -S., Chang, J. -F., Chuang, F. -H. (2011). Detecting frame duplication based on spatial and temporal analyses. In Proceedings of 6th IEEE International Conference on Computer Science and Education (ICCSE’11), SuperStar Virgo, Singapore, pp. 1396–1399. 44. Ghosh, S., Rana, A., & Kansal, V. (2017). Software defect prediction system based on linear and nonlinear manifold detection. In Proceedings of the 11th INDIACom; INDIACom-2017; IEEE Conference ID: 40353, 4th International Conference on—Computing for Sustainable Global Development (INDIACom 2107) (pp. 5714–5719). INDIACom-2017; ISSN 0973–7529; ISBN 978–93–80544–24–3. 45. Ghosh, S., Rana, A., & Kansal, V. (2018). A nonlinear manifold detection based model for software defect prediction. International Conference on Computational Intelligence and Data Science; Procedia Computer Science, 132(8), 581–594. 46. Ghosh, S., Rana, A., & Kansal, V. (2019). Statistical assessment of nonlinear manifold detection based software defect prediction techniques. International Journal of Intelligent Systems Technologies and Applications, Inderscience, Scopus Indexed, 18(6), 579–605. https://doi.org/ 10.1504/IJISTA.2019.102667. 47. Luo, W., Wu, M., & Huang, J. (2008). MPEG recompression detection based on block artifacts. In E. J. Delp, P. W. Wong, J. Dittmann, N. D. Memon, (Eds.), Proceedings of SPIE Security, Forensics, Steganography, and Watermarking of Multimedia Contents X (Vol. 6819), San Jose, CA. 48. Su, Y., Nie, W., & Zhang, C. (2011). A frame tampering detection algorithm for MPEG videos. In Proceedings of 6th IEEE Joint International Information Technology and Artificial Intelligence Conference, Vol. 2, pp. 461–464. Chongqing, China. 49. Vázquez-Padín, D., Fontani, M., Bianchi, T., Comesana, P., Piva, A., & Barni, M. (2012). Detection of video double encoding with GOP size estimation. In Proceedings on IEEE International Workshop on Information Forensics and Security, Tenerife, Spain, Vol. 151. 50. Su, Y., Zhang, J., & Liu, J. (2009). Exposing digital video forgery by detecting motioncompensated edge artifact. In Proceedings of International Conference on Computational Intelligence and Software Engineering (Vol. 1, no. 4, pp. 11–13). Wuhan, China. 51. Dong, Q., Yang, G., & Zhu, N. (2012). A MCEA based passive forensics scheme for detecting framebased video tampering. Digital Investigation, 9(2), 151–159. 52. Kancherla, K., & Mukkamal, S. (2012). Novel blind video forgery detection using Markov models on motion residue. Intelligent Information and Database System, 7198, 308–315. 53. Fontani, M., Bianchi, T., De Rosa, A., Piva, A., & Barni, M. (2011). A Dempster-Shafer framework for decision fusion in image forensics. In Proceedings of IEEE International Workshop on Information Forensics and Security (WIFS’11) (pp. 1–6), Iguacu Falls, SA. https://doi.org/ 10.1109/WIFS.2011.6123156.

A Comprehensive Survey on Passive Video Forgery …

57

54. Fontani, M., Bianchi, T., De Rosa, A., Piva, A., & Barni, M. (2013). A framework for decision fusion in image forensics based on Dempster-Shafer theory of Evidence. IEEE Transactions on Information Forensics and Security, 8(4), 593–607. https://doi.org/10.1109/TIFS.2013.224 8727. 55. Fontani, M., Bonchi, A., Piva, A., & Barni, M. (2014). Countering antiforensics by means of data fusion. In Proceedings of SPIE Conference on Media Watermarking, Security, and Forensics. https://doi.org/10.1117/12.2039569. 56. Stamm, M. C., & Liu, K. J. R. (2011). Anti-forensics for frame deletion/addition in mpeg video. In Proceedings of IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP’11) (pp. 1876–1879), Prague, Czech Republic. 57. Stamm, M. C., Lin, W. S., & Liu, K. J. R. (2012). Temporal forensics and anti-forensics for motion compensated video. IEEE Transactions on Information Forensics and Security, 7(4), 1315–1329. 58. Liu, J., & Kang, X. (2016). Anti-forensics of video frame deletion. [Online] https://www.paper. edu.cn/download/downPaper/201407-346. Accessed 9 July (2016). 59. Fan, W., Wang, K., & Cayere, F., et al. (2013). A variational approach to JPEG anti-forensics. In Proceedings of IEEE 38th International Conference on Acoustics, Speech, and Signal Processing (ICASSP’13) (pp. 3058–3062), Vancouver, Canada. 60. Bian, S., Luo, W., & Huang, J. (2013). Exposing fake bitrate video and its original bitrate. In Proceeding of IEEE International Conference on Image Processing (pp. 4492–4496). 61. CASIA Tampered Image Detection Evaluation Database. [Online]. https://forensics.idealtest. org:8080. Accessed 30 Mar (2016). 62. Tralic, D., Zupancic, I., Grgic, S., Grgic, M., CoMoFoD—New Database for Copy63. Move Forgery Detection. In: Proceedings of 55th International Symposium ELMAR, Zadar, Croatia (pp. 49–54), [Online]. https://www.vcl.fer.hr/comofod/download.html. Accessed 18 July (2016). 64. CFReDS—Computer Forensic Reference Data Sets, [Online]. https://www.cfreds.nist.gov/. Accessed 17 May (2016). 65. Kwatra, V., Schödl, A., Essa, I., Turk, G., & Bobick, A. F. (2003). Graph cut textures image and video synthesis using graph cuts. ACM Transactions on Graphics, 22(3), 277–286. 66. Pèrez, P., Gangnet, M., & Blake, A. (2003). Poisson image editing. ACM Transactions on Graph. (SIGGRAPH’03, 22(3), 313–318. 67. Criminisi, A., Pèrez, P., & Toyama, K. (2004). Region filling and object removal by exemplarbased image inpainting. IEEE Transactions on Image Processing, 13(9), 1200–1212. 68. Shen, Y., Lu, F., Cao, X., & Foroosh, H. (2006). Video completion for perspective camera under constrained motion. In Proceedings of 18th IEEE International Conference on Pattern Recognition (ICPR’06) (pp. 63–66). Hong Kong, China. 69. Komodakis, N., & Tziritas, G. (2007). Image completion using efficient belief propagation via priority scheduling and dynamic pruning. IEEE Transactions on Image Processing, 16(11), 2649–2661. 70. Patwardhan, K. A., Sapiro, J., & Bertalmio, M. (2007). Video inpainting under constrained camera motion. IEEE Transactions on Image Processing, 16(2), 545–553. 71. Hays, J., & Efros, A. A. (2007). Scene completion using millions of photographs. ACM Transactions on Graph (SIGGRAPH’07), 26(3), 1–7. 72. Columbia Image Splicing Detection Evaluation Dataset. [Online]. https://www.ee.col umbia.edu/ln/dmvv/downloads/AuthSplicedDataSet/AuthSplicedDataSet.htm. Accessed 3 June (2016).

DDOS Detection Using Machine Learning Technique Sagar Pande, Aditya Khamparia, Deepak Gupta, and Dang N. H. Thanh

Abstract Numerous attacks are performed on network infrastructures. These include attacks on network availability, confidentiality and integrity. Distributed denial-of-service (DDoS) attack is a persistent attack which affects the availability of the network. Command and Control (C & C) mechanism is used to perform such kind of attack. Various researchers have proposed different methods based on machine learning technique to detect these attacks. In this paper, DDoS attack was performed using ping of death technique and detected using machine learning technique by using WEKA tool. NSL-KDD dataset was used in this experiment. Random forest algorithm was used to perform classification of the normal and attack samples. 99.76% of the samples were correctly classified. Keywords DDoS · Machine learning · Ping of death · Network security · Random forest · NSL-KDD

S. Pande · A. Khamparia (B) School of Computer Science Engineering, Lovely Professional University, Phagwara, Punjab, India e-mail: [email protected] S. Pande e-mail: [email protected] D. Gupta Maharaja Agrasen Institute of Technology, New Delhi, India e-mail: [email protected] D. N. H. Thanh Department of Information Technology, School of Business Information Technology, University of Economics Ho Chi Minh City, Ho Chi Minh City, Vietnam e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 A. Khanna et al. (eds.), Recent Studies on Computational Intelligence, Studies in Computational Intelligence 921, https://doi.org/10.1007/978-981-15-8469-5_5

59

60

S. Pande et al.

1 Introduction With the ongoing convergence of data innovation (IT), various data gadgets are turning out to be massively muddled. Associated with one another, they keep on making furthermore spare significant computerized information, introducing a period of big data. However, the probability is extremely high that they may expose significant data as they transmit a lot of it through consistent correspondence with one another. A framework turns out to be more vulnerable as more digital devices are connected. Hackers may additionally target it to take information, individual data, and mechanical insider facts and break them for unlawful additions [1]. Given these conditions, attack detection system (ADS) ought to likewise be more smart and successful than previously to battle attack from hackers, which are continuously evolving. Confidentiality, integrity and availability can be considered as the main pillars of security [2, 3]. All these pillars are discussed below.

1.1 Confidentiality Confidentiality is also called as secrecy. The motive behind secrecy is to keep sensitive information away from illegitimate user and to provide access to the legitimate user. Along with this, assurance must be given on restricted access of the information.

1.2 Integrity Integrity means keeping up the data as it is without any modification in the data. Data must be received as it at the receiver end. To provide integrity, file permissions and user access controls can be used. A variety of techniques has been designed to provide integrity, and some of them are as follows: checksums, encryption, etc.

1.3 Availability Availability can also be called as accessibility. Availability means providing the required data whenever and wherever required by fixing all the issues as early as possible. Sometime, it is difficult to overcome the situation caused by bottleneck condition. RAID is one of the popular techniques used for providing availability. Precaution needs to be taken from the hardware context also. The hardware used must be kept away at secured places. Apart from this, firewalls can be used to prevent from malicious activity.

DDOS Detection Using Machine Learning Technique

61

1.4 DDoS Attack Dynamics As per the report of Kaspersky [4], growth in the frequency and the size of DDoS attack in the 2018 can be seen. One of the largest DDoS attacks was implemented on GitHub in the month of February, 2018, which consists of 1.3 TBPS of traffic transfer [5].

1.5 DDoS Tools Various tools are freely available for performing DDoS attack; some of them are listed below [6] • • • • • • • • • •

HULK (HTTP Unbearable Load King) GoldenEye HTTP DoS tool Tor’s Hammer DAVOSET PyLoris LOIC (Low Orbit Ion Cannon) XOIC OWASP DoS HTTP Post TFN (Tribe Flood Network) Trinoo.

2 Related Work Lot of researchers are working on the detection of the most DDoS which has its largest impact in the area of social networking by using deep learning and machine learning techniques. Some of the recent work done in this area is discussed below. Hariharan et al. [7] used machine learning C5.0 algorithm and have done the comparative analysis of the obtained results with different machine learning algorithms such as Naïve Bayes classifier and C4.5 decision tree classifier. Mainly, the author tried to work in offline mode. BhuvaneswariAmma N. G. et al. [8] have implemented a technique, deep intelligence. The author extracted the intelligence from radial basis function consisting of varieties of abstraction level. The experiment was carried out on famous NSL KDD and UNSW NB15 dataset, where 27 features were considered. The author claimed to have better accuracy compared to other existing techniques. Muhammad Aamir et al. [9] implemented feature selection method based on clustering approach. Algorithm was compared based on five different ML algorithms. Random forest (RF) and support vector machine (SVM) were used for training purpose. RF achieved highest accuracy of around 96%.

62

S. Pande et al.

Dayanandam et al. [10] have done classification based on features of the packets. The prevention technique tries to analyze the IP addresses by verifying the IP header. These IP addresses are used for differentiating spoofed and normal addresses. Firewalls do not provide efficient solution when the attack size increases. Narasimha et al. [11] used anomaly detection along with the machine learning algorithms for bifurcating the normal and attacked traffics. For the experiment, realtime datasets were used. Famous naive Bayes ML algorithm was used for classification purpose. The results were compared with existing algorithms like J48 and random forest (RF). J. Cui et al. [12] used cognitive-inspired computing along with entropy technique. Support vector machine learning was used for classification. Details from switch were being extracted from its flow table. The obtained results were good in terms of detection accuracy. Omar E. Elejla et al. [13] implemented an algorithm for detecting DDoS attack based on classification technique in IPv6. The author compared the obtained results with five different famous machine learning algorithms. The author claimed that KNN obtained the good precision around 85%. Mohamed Idhammad et al. [14] designed entropy-based semi-supervised approach using ML technique. This implementation consists of unsupervised and supervised compositions, among which unsupervised technique gives good accuracy with few false-positive rates. While supervised technique gives reduce false-positive rates. Recent datasets were used for this experiment. Nathan Shone et al. [15] implemented deep learning algorithm for classification of the attack. Along with this, it used unsupervised learning nonsymmetric deep autoencoder (NDAE) feature. The proposed algorithm was implemented on graphics processing unit (GPU) using TensorFlow on famous KDD Cup 99 and NSL-KDD datasets. The author claimed to obtain more accurate detection results. Olivier Brun et al. [16] worked in the area of Internet of Things (IoT) to detect the DDoS attack. The author implemented one of the famous deep learning techniques, i.e., random neural network (RNN) technique for detection of the network. This deeplearning-based technique efficiently generates more promising results compared to existing methods.

3 Implementation of DDoS Attack Using Ping of Death While performing ping of death attack, the network information needs to be gathered, and to achieve this, ipconfig command can be used. In Fig. 1, the detailed information of the network is gathered after giving ipconfig command. As soon as the network information is gathered, we can start performing the ping of death attack on the IP address. Enter the following command to start the attack:

DDOS Detection Using Machine Learning Technique

63

Fig. 1 DDoS attacks dynamics in 2018 [4]

ping-t –l 65500 XX.XX.XX.XX

• • • •

“ping” command transfer the data packets to the target “XX.XX.XX.XX” is the IP address of the target “−t” means sending packets repeatedly “−l” specifies the data packet load to be sent to the target.

Figure 2 shows the packet information after performing ping of death attack; this attack will continue till the target resources are exhausted. The primary goal of this

Fig. 2 Details of the network obtained using ipconfig

64

S. Pande et al.

Fig. 3 Packets transfer after implementing ping of death

type of DDoS attack is to utilize all the CPU memory and exhaust it. In Fig. 3, clearly we can see that before starting the attack, the performance graph was linear, and as soon as the attack is started, the spikes are visible. Figure 4 signifies that CPU is being utilized as much as possible, and this will continue till the complete network is exhausted. Details of the memory consumption, CPU utilization, uptime, etc., can be seen in Figs. 3, 4 and 5.

4 DDoS Detection Using Machine Learning Algorithm Random forest (RF) is one of the popular machine learning techniques which is used for classification developed by Leo Breiman [3]. The random forest produces different decision trees. Each tree is built by an alternate bootstrap test from the first information utilizing a tree classification algorithm. NSL-KDD dataset was used for this experiment [16]. The experiment was performed using a laptop with Windows 10 64-bit operating system, Intel (R) Core (TM) i5-2450 M CPU@ 2.50 GHz, having 8.00 GB RAM. Total instances used for training were 22,544, and the dataset consists of attributes 42. Random forest was used for training the model. 8.71 s was building time of the model, and 1.28 s was the testing time of the model. This experiment was carried out using Weka 3.8 tool. Table 1 provides the summary of the instances after classification using random forest. Table 2 shows the performance evaluation using various parameters. Table 3 consists of confusion matrix using normal & attack classification. • Accuracy: It measures the frequency of the attack instances of both classes correctly identified.

DDOS Detection Using Machine Learning Technique

65

Fig. 4 CPU specifications before the attack

Accuracy =

TP + TN TP + FN + FP + TN

• Precision: It is the ratio of the number of related attacks that were identified to the total number of unrelated and related attacks that were identified. Also known as positive predictive value. Precision =

TP TP + FP

• Recall: This is the ratio of the number of related attacks to the total number of related attacks received and also known as positive sensitive value.

Recall =

TP TP + FN

66

S. Pande et al.

Fig. 5 CPU specifications after the attack Table 1 Classification summary Correctly classified attack instances

22,490

Incorrectly classified attack instances

54

99.7605% 0.2395%

Table 2 Performance evaluation TP rate

FP rate

Precision

Recall

Class

0.998

0.002

0.997

0.998

Normal

0.998

0.002

0.998

0.998

Attack

Table 3 Confusion matrix

a 9689 32

b

Classification 22

12,801

a = normal b = attack

DDOS Detection Using Machine Learning Technique

67

5 Conclusion In this paper, several ongoing detection techniques for DDoS attack are discussed, especially using machine learning techniques. Along with this, list of freely available DDoS tools is also discussed. Command-based ping of death technique was used to perform DDoS attack. Random forest algorithm was used to train the model which resulted into 99.76% of correctly classified instances. In future, we will try to implement deep learning technique for the classification of the instances.

References 1. Ganorkar, S. S., Vishwakarma, S. U., & Pande, S. D. (2014). An information security scheme for cloud based environment using 3DES encryption algorithm. International Journal of Recent Development in Engineering and Technology, 2(4). 2. Pande, S., & Gadicha, A. B. (2015). Prevention mechanism on DDOS attacks by using multilevel filtering of distributed firewalls. International Journal on Recent and Innovation Trends in Computing and Communication, 3(3), 1005–1008. ISSN: 2321–8169. 3. Khamparia, A., Pande, S., Gupta, D., Khanna, A., & Sangaiah, A. K. (2020). Multi-level framework for anomaly detection in social networking, Library Hi Tech, 2020. https://doi.org/ 10.1108/LHT-01-2019-0023. 4. https://www.calyptix.com/top-threats/ddos-attacks-101-types-targets-motivations/. 5. https://www.foxnews.com/tech/biggest-ddos-attack-on-record-hits-github. 6. Fenil, E., & Mohan Kumar, P. (2019). Survey on DDoS defense mechanisms. John Wiley & Sons, Ltd. https://doi.org/10.1002/cpe.5114. 7. Hariharan, M., Abhishek, H. K., & Prasad, B. G. (2019). DDoS attack detection using C5.0 machine learning algorithm. I.J. Wireless and Microwave Technologies, 1, 52–59 Published Online January 2019 in MECS. https://doi.org/10.5815/ijwmt.2019.01.06. 8. NG, B. A., & Selvakumar, S. (2019). Deep radial intelligence with cumulative incarnation approach for detecting denial of service attacks. Neurocomputing. https://doi.org/10.1016/j. neucom.2019.02.047. 9. Aamir, M., & Zaidi, S. M. A. (2019). Clustering based semi-supervised machine learning for DDoS attack classification. Journal of King Saud University—Computer and Information Sciences, Production and hosting by Elsevier, https://doi.org/10.1016/j.jksuci.2019.02. 0031319-1578/_2019. 10. Dayanandam, G., Rao, T. V., BujjiBabu, D., & NaliniDurga, N. (2019). DDoS attacks—analysis and prevention. In H. S. Saini, et al. (Eds.), Innovations in computer science and engineering, Lecture notes in networks and systems 32. © Springer Nature Singapore Pte Ltd.https://doi. org/10.1007/978-981-10-8201-6_1. 11. NarasimhaMallikarjunan, K., Bhuvaneshwaran, A., Sundarakantham, K., & Mercy Shalinie, S. (2019). DDAM: Detecting DDoS attacks using machine learning approach. In N. K. Verma & A. K. Ghosh (Eds.), Computational Intelligence: Theories, Applications and Future Directions— Volume I, Advances in Intelligent Systems and Computing, 798, https://doi.org/10.1007/978981-13-1132-1_21. 12. Cui, J., Wang, M., & Luo, Y., et al. (2019). DDoS detection and defense mechanism based on cognitive-inspired computing in SDN. Future Generation Computer Systems. https://doi.org/ 10.1016/j.future.2019.02.037. 13. Elejla, O. E., Belaton, B., Anbar, M., Alabsi, B., & Al-Ani, A. K. (2019). Comparison of classification algorithms on ICMPv6 based DDoS attacks detection. In R. Alfred et al. (Eds.), Computational Science and Technology, Lecture Notes in Electrical Engineering 481. , Springer Nature Singapore Pte Ltd.https://doi.org/10.1007/978-981-13-2622-6_34.

68

S. Pande et al.

14. Idhammad, M., Afdel, K., & Belouch, M. (2018). Semi-supervised machine learning approach for DDoS detection.Applied Intelligence. . Springer Science+Business Media, LLC, part of Springer Nature 2018. https://doi.org/10.1007/s10489-018-1141-2. 15. Shone, N., Ngoc, T. N., Phai, V. D., & Shi, Q. (2018). A deep learning approach to network intrusion detection. IEEE Transactions on Emerging Topics in Computational Intelligence, 2(1). 16. Brun, O., Yin, Y., & Gelenbe, E. (2018). Deep learning with dense random neural network for detecting attacks against IoT-connected home environments. Procedia Computer Science, 134, 458–463, Published by Elsevier Ltd.

Enhancements in Performance of Reduced Order Modelling of Large-Scale Control Systems Ankur Gupta and Amit Kumar Manocha

Abstract The enhancements in model order reduction techniques is occurring at very fast rate to obtain the more accurate and reduced approximation of large-scale systems to easier the task to study the large-scale systems. In this paper, the enhancements occurring in the field of model order reduction is studied with the help of a test example. The initially developed techniques like balanced truncation are studied and compared with newly developed MOR techniques like dominant pole retention, clustering approach, response matching technique. The study reveals that the developments in MOR techniques are helping in designing reduced order approximation of large-scale systems with less error amongst the large-scale and reduced order system, and more study is required in this field to make the study of large-scale systems more accurate. Keywords Balanced truncation · Clustering · Dominant pole retention · Order reduction · Response matching

1 Introduction The designing of linear and dynamic systems of higher order is tough to tackle due to the problems in implementation and computation, and it is too tedious to be employed in reality. Model order reduction is a technique for simplification of the linear and dynamic high-order systems which are depicted by differential equations. The main

A. Gupta (B) Department of Electronics and Communication Engineering, Maharaja Ranjit Singh Punjab Technical University, Bathinda, Punjab, India e-mail: [email protected] A. K. Manocha Department of Electrical Engineering, Maharaja Ranjit Singh Punjab Technical University, Bathinda, Punjab, India e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 A. Khanna et al. (eds.), Recent Studies on Computational Intelligence, Studies in Computational Intelligence 921, https://doi.org/10.1007/978-981-15-8469-5_6

69

70

A. Gupta and A. K. Manocha

motive of model order reduction (MOR) is to replace the system with high order into a system with comparatively lower order by keeping intact the initial properties. The purpose of carrying out his simplification is to get a reduced order system of the higher-order system so that the initial and final systems both are same and identical in terms of the response of the system and other physical means of representation. Numerous researches have been done, and varied techniques have been suggested for the reduction of transfer function with high order [1, 2]. The ways of these techniques include Hankel-norm approximation [3], projection technique [4], Schur decomposition [5], continued fraction expansion approximation [6], Pade or moment approximation [7], stability-equation method [8], factor division method [9]. Each technique has its own benefits and drawbacks, and the most important parameter of concern amongst the limitations is difficult procedure of computation and maintaining stabilization in the reduced model. Many other approaches were also developed [10–18] to state the need of model order reduction by mixed approaches and various evolutionary technique. In this paper, timeline pertaining to the advancements in the various techniques in the model order reduction of the high-order system into the low-order system has been described in detail, and various illustrative examples have been used to justify the facts. The major driving force behind this study was to get a comprehensive view of the advancements so that the best technique can be used in the further work minimizing the drawbacks and highlighting the benefits amongst the numerous techniques developed till date. The first part of this paper defines the problem of model order reduction and then presents methods for the model described with detailed survey of each. Following it is a test example to show the comparison of different techniques by step response behaviour, and finally, the derived result is mentioned in the conclusion.

2 Defining the Problem Consider a dynamic system of linear nature which is described by the transfer function [11, 12, 19, 20] as G(s) =

am s m + am−1 s m−1 + · · · + a1 s + a0 N (s) = D(s) bn s n + bn−1 s n−1 + · · · + b1 s + b0

(1)

where m < n, The number of poles is n, and m depicts the number of zeros. The zeros and poles could be either complex or real or combinations of complex and real. The complex poles they occur in conjugate pair if they are present. The reduced rth-order system is given by

Enhancements in Performance of Reduced …

G(s) =

cr −1 s r −1 + cr −2 s r −2 + · · · + c1 s + c0 Nr −1 (s) = Dr (s) dr s r + dr −1 s r −1 + · · · + d1 s + d0

71

(2)

where r − 1 is the number of zeros and r is the number of poles, of the reduced order model Gr(s). The zeros and poles could be either complex or real or combinations of complex and real. The model order reduction’s aim is to decrease the linear system’s order, for the sake of maintaining, bring on a response with minimum error and for the system stability.

3 Description of Methods The accuracy of reduced order system obtained by different techniques is studied to check the improvements in the techniques of model order reduction. These techniques discussed in this paper are

3.1 Balanced Truncation The balanced truncation method [2] is the rudimentary technique for the model order reduction as maximum reduction methods depend on it for getting the system in the balanced form. This method comprises the state matrices A, B, C and D which are transformed to form a balanced system by the use of a non-singular matrix T such that 

   A , B  , C  , D  = T −1 AT, T −1 B, C T, D

(3)

  The balanced system A , B  , C  , D  is obtained which is the reduced order approximation of (A, B, C, D).

3.2 Mukharjee (2005) Mukharjee (2005) showed the response matching algorithm (dominant pole retention) to obtain reduced order system from original high-order system [21] The roots of the denominator polynomial (poles) of OHOS as shown in Eq. 1 can be of varied form, viz. distinct or repeated, real or complex conjugate. Using the technique of varied types of poles, a ROS of 3rd order is assumed having (A) all poles are repeated (B), one pole is distinct and two or more repeated poles (C), one pair of complex poles and one real pole (D), and all real poles. All the four conditions for ROS of three poles can be given as:

72

A. Gupta and A. K. Manocha

G 1r (s) =

a1 s 2 + b1 s + c1 (s + d1 )3

(4)

G 2r (s) =

a2 s 2 + b2 s + c2 (s + d2 )(s + e2 )2

(5)

when all poles are repeated

when one pole is distinct and other pole is repeated G 3r (s) =

a3 s 2 + b3 s + c3 (s + γ )(s + δ + jβ)(s + δ − jβ)

(6)

when one pole is real and one pair is complex poles G 4r (s) =

a4 s 2 + b4 s + c4 (s + d4 )(s + e4 )(s + f 4 )

(7)

when all poles are real Now the minimization is done by impulse and step transient response to find out the non-familiar parameters of the reduced order system obtained above. The unknown coefficients have to be detected in such a way that the integral square error (ISE) amongst the transient parts of the OHOS and ROS responses is minimal. Therefore, two sets of reduced order model would be obtained, one by the minimization of the ISE between transient parts of the unit step responses of OHOS and ROS and the other by minimization of the ISE between impulse responses of ROS and OHOS.

3.3 Philip (2010) Philip (2010) described the procedure to estimate the reduced order polynomial by dominant pole retention technique. Philip [8] described various algorithms to estimate the dominant poles of the OHOS given by Eq. (1). These algorithms are described as A. Dominant pole estimation using reci1procal transformation The transfer function G(s)’s as shown in Eq. (1) has its reciprocal transformation as,   1 an−1 + an−2 s + · · · + a0 s n−1 1 ˜ = G(s) = G s s bn + bn−1 s + bn−2 s 2 + · · · + b0 s n

(8)

Enhancements in Performance of Reduced …

73

By using this transformation, the reversal of polynomial’s coefficients is achieved. The inversion of the nonzero poles and zeros of the original transfer function is the essential property of reciprocal transformation. Now, the denominator polynomial of the ‘reciprocate’ is considered. ˜ D(s) = bn + bn−1 s + bn−2 s 2 + · · · + b0 s n

(9)

Then, the approximation of one dominant pole is calculated to be, rd =

b0 n b1

(10)

The calculation is in accordance with the results of classical algebra that the negation of the addition of its roots (only real parts) corresponds to (n − 1) degree term’s coefficients in a polynomial of n degree. Thus, through the division of the term obtained by polynomial’s degree, its average value of can be easily computed. The dominant root inversion’s approximate value will be shown as it is reciprocate polynomial. The reciprocal of this value is the original system’s approximate dominant root. B. Estimation of dominant pole by principal pseudo-break frequency The next approximation of system with dominant pole is the characteristic polynomial’s principal pseudo-break frequency [19]. The denominator polynomial of Eq. (1) gives the estimated dominant pole as, b0 r p =   2 b − 2b2 b0 

(11)

1

With r1 and r2 s knowledge, the polynomial in the denominator of reduced order system could be determined. C. Model of reduced order dependent on frequency Latest observations for MOR research used response matching which uses userspecific frequency. Moving by same path, the proposed technique is elaborated to make the model of reduced order to be able to match various frequencies frequency response. It is quite obvious to say that the technique employed for determining the real pole of less magnitude can be utilized to obtain the estimation of real pole of the highest magnitude as well, it can be described as, rh =

bn−1 n

(12)

By obtaining the average, we get one more estimation of pole. The above cited three estimated values give assistance in improving the approximation of the reduced order in approximately high-, medium- and low-frequency regions, i.e., the entire frequency range.

74

A. Gupta and A. K. Manocha

3.4 Desai and Prasad (2013) Desai and Prasad (2013) did the reduction of the model order by the assistance of two techniques combined together [22, 23] . The coefficients of the denominator of ROS are obtained by Routh approximation method for finding out the stable model. In the above-mentioned technique, initially the denominator of the high-order original system is reciprocated to get   1 n ˜ Dn (s) = s Dn s

(13)

Then, the α array from the coefficients of obtained polynomial from Eq. 13 is formed, and the values of α 1 , α 2, α 3 , ….…… α n parameters are obtained. The reduced (rth)-order denominator polynomial is obtained using D˜ r (s) = αr s Dr −1 (s) + Dr −2 (s) for · r = 1, 2, . . . and D−1 (s) = D0 (s) = 1 (14) Then, the reciprocal transformation is applied again for obtaining the reduced order system’s reduced denominator as Dr (s) = s r D˜ r

  1 s

(15)

Using the Big Bank Big Crunch (BBBC) theory, the numerator’s coefficients are obtained for minimization of the objective function ‘F’ which is known as the integral square error (ISE) between the transient responses of the OHOS and ROS. Big Bank Big Crunch is an algorithm just like genetic algorithm which operates on the principle of formation of universe.

3.5 Tiwari (2019) Tiwari (2019) carried out the reduction of the model by separating the OHOS into two parts: denominator and numerator by keeping the stability of the system [24]. The denominator part is reduced by the usage of technique of dominant pole retention with the additional concept of clustering. Within this algorithm, the quantitative analysis of the dominant poles of OHOS is done, and using MDI, formation of the dominancy of particular pole is done. The highest value MDI of a particular pole depicts that the pole has high controllability and observability. Then, a cluster of dominant poles is made, and a cluster center is found out by the application of Eq. (16)

Enhancements in Performance of Reduced …



1 |λ1 |

⎢ λc = ⎣



1 |λ1 |

75

+ 2



k i=2

+

k i=2

1 |λi |



k−1 ⎤

1 |λi |

⎥ ⎦

(16)

where λc is known as the cluster center obtained from k, where k is the number of poles (λ1 , λ2, λ3 , …, λk ). The number of poles of ROS clusters is equal to the number of clusters. The reduced order numerator is found out from a popular technique known as Pade’s approximation. It is a rational function N(s)/D(s) of degree m and n each.

4 Calculative Experiments The performance for all MOR methods discussed in Sect. 3 is compared with the help of numerical experiments on the basis of overshoot, integral square error (ISE), settling time and rise time with in the OHOS and ROS obtained after applying MOR technique. Integral square error is a measure of quality of the found out reduced order system as ∞ [y(t) − yr (t)]2

ISE =

(17)

0

where the response of OHOS is y(t) and the response of obtained ROS is yr (t). Test Example: Consider linear dynamic system of order nine used by [6, 9, 10, 20, 22] given by the following transfer function as G(s) =

s 4 + 35s 3 + 291s 2 + 1093s + 1700 s 9 + 9s 8 + 66s 7 + 294s 6 + 1029s 5 + 2541s 4 + 4684s 3 + 5856s 2 + 4620s + 1700

(18)

Applying all the techniques of MOR, the corresponding third-order transfer functions of the model of the reduced order are 0.1405s 2 − 0.8492s + 1.881 s 3 + 1.575s 2 + 3.523s + 1.717

(19)

G 2 (s) =

0.2945s 2 − 2.202s + 2.32 s 3 + 2.501s 2 + 4.77s + 2.32

(20)

G 3 (s) =

0.5058s 2 − 1.985s + 3.534 s 3 + 3s 2 + 5.534s + 3.534

(21)

G 1 (s) =

76

A. Gupta and A. K. Manocha Step Response 1.2

1

Amplitude

0.8

0.6

0.4 Original System Balanced Truncation Mukharjee Philip & Pal Desai & Prasad Tiwari & Kaur

0.2

0

-0.2

0

2

4

6

8

10

12

14

16

18

20

Time (seconds)

Fig. 1 Step response of original and reduced order models for test example

Table 1 Comparison between various reduced order models for test example Method of order ISE reduction

Steady-state value

Rise time (s)

Overshoot (%)

Settling time (s)

Original



1

2.85



8.72

Balanced truncation [2]

High

1.09

2.92





Mukharjee [21]

8.77 × 10–2

1

4.67

0

12.9

Philip [8]

2.82 × 10–2

1

2.99

0

7.6

Desai [23]

2.52 × 10–2

1

3.43

1.96

10.6

Tiwari [24]

1.74 × 10–2

1

2.92

0

6.91

G 4 (s) = G 5 (s) =

0.0789s 2 + 0.3142s + 0.493 s 3 + 1.3s 2 + 1.34s + 0.493

(22)

−0.4439s 2 − 0.4901s + 2.317 s 3 + 3s 2 + 4.317s + 2.317

(23)

The responses of all the MOR techniques are plotted by their step response behaviour as shown in Fig. 1. The quantitative comparison amongst all the methods is also carried out as given in Table 1 on the basis of integral square error, peak overshoot, rise time and settling time along with the achieved steady-state value.

Enhancements in Performance of Reduced …

77

The comparative analysis shows that initially developed balanced truncation technique gives a good approximation of large-scale system, but the amount of error is significant. Latest developed techniques decrease the amount of error whether it is steady-state error or integral square error.

5 Conclusion This paper shows the enhancement occurring in the area of model order reduction. The initially developed balanced truncation technique gives high erroneous reduced order system which suggested the requirement for the development of more improved techniques. After that Mukharjee in 2005 with the help of response matching developed, more accurate system which initiated the interest in the model order reduction and hence more advanced techniques were developed as described by Philip, Desai & Tiwari. These techniques improved the accuracy between original and reduced order system and reduced the error amongst them. The present work shows that the integral square error is reduced with the development in the study of MOR techniques, but the amount of error should be reduced more to find the exact approximation of original higher order system. This limitation in the present work can be removed with the design of a more advanced technique which can eliminate the error to improve the accuracy amongst the original higher-order system and reduced order system. So, future development in the area of model order reduction can help in obtaining more advanced techniques, which can make the reduced order system more accurate so that the study of large-scale systems can become easier.

References 1. Antoulas, A. C., Sorensen, D. C., & Gugercin, S. (2006). A survey of model reduction methods for large-scale systems. Math: Contemp. 2. Moore, B. C. (1981). Principal component analysis in linear systems: Controllablity, observability and model reduction. IEEE Transactions on Automatic Control, AC-26(1), 17–32. 3. Villemagne, C., & Skelton, R. E. (1987). Model reduction using a projection formulation. In 26th IEEE Conference on Decision and Control (pp. 461–466). 4. Safonov, M. G., & Chiang, R. Y. (1989). A Schur method for balanced-truncation model reduction. IEEE Transactions on Automatic Control, 34(7), 729–733. 5. Shamash, Y. (1974). Continued fraction methods for the reduction of discrete-time dynamic systems. International Journal of Control, 20(2), 267–275. 6. Shamash, Y. (1975). Linear system reduction using pade approximation to allow retention of dominant modes. International Journal of Control, 21(2), 257–272. 7. Chen, T. C., & Chang, C. Y. (1979). Reduction of transfer functions by the stability-equation method. Journal of the Franklin Institute, 308(4), 389–404. 8. Philip, B., & Pal, J. (2010). An evolutionary computation based approach for reduced order modeling of linear systems. IEEE International Conference on Computational Intelligence and Computing Research, Coimbatore, pp. 1–8.

78

A. Gupta and A. K. Manocha

9. Lucas, T. N. (1983). Factor division: A useful algorithm in model reduction. IEE Proceedings, 130(6), 362–364. 10. Sikander, A., & Prasad R. (2015). Linear time invariant system reduction using mixed method approach. Applied Mathematics Modelling. 11. Tiwari, S. K., & Kaur, G. (2016). An improved method using factor division algorithm for reducing the order of linear dynamical system. Sadhana, 41(6), 589–595. 12. Glover, K. (1984). All optimal hankel-norm approximations of linear multivariable systems and their L∞ -Error Bounds. International Journal of Control, 39(6), 1115–1193. 13. Le Mehaute, A., & Grepy G. (1983). Introduction to transfer and motion in fractal media: The geometry of kinetics. Solid State Ionics, 9 & 10, Part 1, 17–30. 14. Vishakarma, C. B., & Prasad, R. (2009). MIMO system reduction using modified pole clustering and genetic algorithm. Modelling and Simulation in Engineering. 15. Narwal, A., & Prasad, R. (2015). A novel order reduction approach for LTI systems using cuckoo search and Routh approximation. In IEEE International Advance Computing Conference (IACC), Bangalore, pp. 564–569. 16. Narwal, A., & Prasad R. (2016). Optimization of LTI systems using modified clustering algorithm. IETE Technical Review. 17. Sikander A., Prasad R. (2017), “A New Technique for Reduced-Order Modelling of Linear Time-Invarient system”, IETE Journal of Research. 18. Parmar, G., Mukherjee, S., & Prasad, R. (2007). System reduction using factor division algorithm and eigen spectrum analysis. International Journal of Applied Mathematical Modelling, 31, 2542–2552. 19. Cheng, X., & Scherpen, J. (2018). Clustering approach to model order reduction of power networks with distributed controllers. Advances in Computational Mathematics. 20. Alsmadi O., Abo-Hammour Z., Abu-Al-Nadi D., & Saraireh S. (2015). soft computing techniques for reduced order modelling: Review and application. Intelligent Automation & Soft Computing. 21. Mukherjee, S., & Satakshi, M. R. C. (2005). Model order reduction using response matching technique. Journal of the Franklin Institute, 342, 503–519. 22. Desai, S. R., & Prasad, R. (2013). A novel order diminution of LTI systems using big bang big crunch optimization and routh approximation. Applied Mathematical Modelling, 37, 8016– 8028. 23. Desai, U. B., & Pal, D. (1984). A transformation approach to stochastic model reduction. IEEE Transactions on Automatic Control, AC-29(12), 1097–1100. 24. Tiwari S. K., Kaur G. (2019), “Enhanced Accuracy in Reduced Order Modeling for Linear Stable/Unstable System”, International Journal of Dynamics and Control.

Solution to Unit Commitment Problem: Modified hGADE Algorithm Amritpal Singh and Aditya Khamparia

Abstract This research paper proposes a hybrid approach which is the extension of hGADE algorithm consisting of differential evolution and genetic algorithm aims at solving mixed-integer optimization problem called unit commitment scheduling problem. The ramp up and down constraints have been included for calculation of total operating cost of power system operation. The proposed approach is easy to implement and understand. The technique has been tested on six-unit system by taking into consideration various system and unit constraints for solving unit commitment problem. Hybridization of genetic algorithm and differential evolution has produced the significant improvement in overall results. Keywords hGADE · Thermal · Commitment · Ramp Rate · Genetic

1 Introduction Nowadays, there are only thermal plants or is it a combination of hydro and thermal or is it a combination of thermal, hydro, and nuclear. As far as modeling is concerned, nuclear is same as thermal, in fact, that is also called thermal plant. So, hydro, thermal, and nuclear are same as hydro and thermal [1]. The problem related to power system operation is hierarchical or multilevel. The problem starts with load forecasting, which is a very important problem even in control system or even in energy system. So, load has to be first ascertained, forecasted well in advance. One can has a short, very short term load forecasting, next 10 min how the load is going to change. So in case one needs power plant in 2025, one has to start planning right now because the gestation period for hydropower plant is 7–8 years that is the time we have. And even A. Singh (B) · A. Khamparia Department of Computer Science and Engineering, Lovely Professional University, Phagwara, Punjab, India e-mail: [email protected] A. Khamparia e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 A. Khanna et al. (eds.), Recent Studies on Computational Intelligence, Studies in Computational Intelligence 921, https://doi.org/10.1007/978-981-15-8469-5_7

79

80

A. Singh and A. Khamparia

in thermal power plant one can has some 5–6 years of gestation period. Gestation period is the time required before a megawatt is produced from power plant, from the time it is conceived, it is planned, very long term planning to decide the initiation of the installation of new power plants, because you have to decide which place, which site, which fuel, from where you are going to get resources. So, that is why, we need to do load forecasting. The major outcomes of the research are given as follows: • This research aimed at solving UC problem which is one of the biggest concerns for power companies. • Proposed a hybrid approach which is the extension of hGADE algorithm. • The ramp up and down constraints have been included for calculation of total operating cost of power system operation.

2 Unit Commitment In power system operations, there is a load. Now that load changes from hour to hour, day to day, and week to week. So, one cannot have a permanent solution which units should be on and which units should to be off. Suppose there is a given load, now we have to find out which units of the power station should be on to tackle that load, to take up that load and which units should are not required should be put off which can be represented by binary numbers 1 or 0, available or not available, working or not working [2]. This is the binary situation which solution you to find out, and this is called unit commitment problem solution. The generic unit commitment problem can be formulated mathematically which is given as Eq. 1. If u(i, t) = 0 and tidown < tidown,min then u(i, t + 1) = 0

(1)

f i is cost function of each generating unit i xit is generation level u it is state of a unit. Dt and Rt is demand and reserve requirement for time period t, respectively. The constraints which play an important role in unit commitment are as follows: i. ii. iii. iv. v. vi.

Maximum generating capacity Minimum stable generation Minimum up time Minimum down time Ramp rate Load generation constraint.

Notations u(i, t) Status of unit i at time period t. u(i, t) = 1 Unit i is on during time period t.

Solution to Unit Commitment Problem: Modified hGADE Algorithm

u(i, t) = 0 x(i, t) N R(t)

81

Unit i is off during time period t. Power produced by unit i during time period t. Set of available units. Reserve requirement at time t.

Minimum up time: Once a unit in a power system is running, it may not be shut down immediately. The mathematical representation of minimum up time is given as Eq. 2: up

up,min

If u(i, t) = 1 and ti < ti

then u(i, t + 1) = 1

(2)

Minimum down time: Once a unit shuts down, it may not be restarted immediately. The mathematical representation of minimum up time is given as Eq. 3: If u(i, t) = 0 and tidown < tidown,min then u(i, t + 1) = 0

(3)

Maximum Ramp Rate: The electrical output of a unit cannot change by more than a definite number over an interval of time to avoid damaging the turbine [3]. The mathematical representation of maximum ramp up rate and maximum ramp down rate is given as follows as Eqs. 4 and 5, respectively. Maximum ramp up rate constraint: up,max

x(i, t + 1) − x(i, t) ≤ Pi

(4)

Maximum ramp down rate constraint: x(i, t + 1) − x(i, t) ≤ Pidown,max

(5)

Load Generation constraint: It can be defined as the state when generator’s electricity production equal to demand of electricity and mathematical representation of it is given as Eq. 6. N  i=1

u(i, t)x(i, t) = L(t)

(6)

82

A. Singh and A. Khamparia

Reserve Capacity constraint: Sometimes, there is a need to increase production from other units to keep frequency drop within acceptable limits which results from unanticipated loss of a generating unit. The mathematical formulation of reserve capacity constraint is given as Eq. 7 [4]. N 

u(i, t)Pimax ≥ L(t) + R(t)

(7)

i=1

3 Techniques for Solving UC Problem Several investigators have tried various optimization techniques to find solutions to UC problem in the past [5–9]. The available solutions are categorized into three categories: (i) Conventional techniques (ii) Non-conventional techniques (iii) Hybrid techniques. Conventional Techniques Conventional techniques include dynamic programming [10], branch and bound [11], tabu search, Lagrangian relaxation [12], integer programming, interior point optimization, simulated annealing, etc. Walter et al. have presented the field-proven dynamic programming formulation of unit commitment. The following equation which is marked as 8 is the dynamic programming algorithm mathematical representation for unit commitment.  Fcost (M, N ) = min P {L}

cost

(M, N ) + S

cost

(M − 1, L : M, N )

+Fcost (M − 1, L)]

(8)

where Fcost (M, N ) least cost to arrive at state (M, N) production cost for state (M, N) Pcost (M, N ) Scost (M − 1, L: M, N ) transition cost from state (M − 1, L) to state (M, N). Arthur I. Cohen has presented a new algorithm based on branch and bound technique [11] which is different from other techniques as it assumes no priority ordering as most early techniques were priority list of units which defines the order in which units start-up or shut down.

Solution to Unit Commitment Problem: Modified hGADE Algorithm

83

Non-Conventional Techniques Evolutionary Algorithms From the last few years, the global optimization has received lot of attention from authors worldwide. The reason may be that optimization can play a role in every area, from engineering to finance, simply everywhere. Inspired by Darwin’s theory of evolution, evolutionary algorithms can also be used to solve problems that humans do not really know how to solve. Differential Evolution Differential evolution (DE) worked through identical steps as used by evolutionary algorithms. DE was developed by Storn and Price in year 1995 [13]. DE used to provide optimal solution (global maxima) as it never got trapped in local maxima. As compared to other algorithms, space complexity is quite low in DE [14]. Genetic Algorithm It belongs to the category of evolutionary algorithm. It is widely used to figure out the optimal solution to complex problems [2]. The mathematical representation of UC problem formulation using genetic algorithm (Eq. 9) is given as follows: ⎡ ⎣

T  N 

⎤ (ai + bi Pi j + ci Pi2j ⎦ ∗ u i j

j=1 i=1





T N ⎜ +⎝ σi + δ i j=1 i=1

1−e

−Tioff j Ti



⎞   ⎟ ⎠ ∗ u i j 1 − u i j−1

.

Subject to N 

(Pi j ) ∗ u i j − PD j = 0

i=1

TiON j > MUTi TiOFF > MDTi j where N T Pi j ai bi ci

units scheduling interval unit i’s generation for hour j coefficients of fuel cost

(9)

84

σ PDj TiON j MUT MDT Pimax PRj uij

A. Singh and A. Khamparia

coefficient of start-up demand for hour j ON time for unit i for hour j Min. up time Min. down time Max. generation of unit i Spinning reserve for hour j ON/OFF status for unit i at hour j.

Hybrid Techniques Numerous optimization algorithms have been devised in the past to address the optimal power flow. Examples of such algorithms are gray wolf optimizer [7], dragonfly algorithm, artificial bee colony, ant colony optimization, and so on. Himanshu Anand et al. have presented technique to solve profit-based unit commitment (PBUC) problem [15]. Anupam Trivedi et al. [7] have presented the unique approach for solving power system optimization problem popularly known as UC scheduling problem. Authors have named algorithm as hybridizing genetic algorithm and differential evolution (hGADE). The GA algorithm works well with binary variables while DE works well with continuous variables. The authors have taken the advantage of same to solve UC problem.

4 Hybridization of Genetic Algorithm and Differential Evolution Anupam Trivedi et al. have presented the unique approach for solving power system optimization problem popularly known as unit commitment scheduling problem. Authors have named algorithm as hybridizing genetic algorithm and differential evolution (hGADE) [16]. The constraints involved in UC are spinning reserve, least up time, least down time, start-up cost, hydro constraints, generator ‘must run’ constraints, ramp rate and fuel constraints. Authors of the paper have mentioned in their future work that ramp up/down constraint was neglected and not taken into consideration for solving unit commitment problem. So, it motivated us to work further and included ramp up and down constraint in calculation of cost. The fitness function has been designed for the same.

5 Proposed Approach Figure I describes the working of already implemented hGADE algorithm. The ramp up and down is considered for this research. In addition to this, new fitness function for differential evolution [17] and genetic algorithm has been formulated. Table 1

a ($/hr.)

0.00375

0.0175

0.0625

0.00834

0.025

0.05

Unit

1

2

3

4

5

6

3

3

3.25

1

1.75

2

b ($/MW hr.)

515

515

400

300

257

200

c ($/MW2 hr.)

Table 1 Input data for six-unit system

25

30

35

40

80

200

P min (MW)

12

10

10

15

20

50

P max (MW)

3

2

3

3

2

3

Min. up (hours)

1

1

2

1

2

1

Min. down (hours)

113

180

267

113

187

176

Start-up cost

40

60

60

90

130

130

Ramp up (MW/hr)

40

60

60

90

130

130

Ramp down (MW/hr)

Solution to Unit Commitment Problem: Modified hGADE Algorithm 85

86

A. Singh and A. Khamparia

Table 2 Load pattern (in MW) for 1 h interval Hour

1

2

3

4

5

6

7

8

9

10

Load (MW)

140

166

180

196

220

240

267

283.4

308

323

Hour Load (MW) Hour Load (MW) Hour Load (MW)

11

12

13

14

15

16

17

18

19

20

340

350

300

267

220

196

220

240

267

300

21

22

23

24

25

26

27

28

29

30

267

235

196

166

140

166

180

196

220

240

31

32

33

34

35

36

37

38

39

40

41

267

283.4

308

323

340

350

300

267

220

196

220

Table 3 Parameter setting

Parameter

Value

Genetic algorithm mutation rate

0.35

Genetic algorithm crossover rate

0.6

Differential evolution mutation rate

1

Differential evolution crossover rate

0.98

represents the input data for six-unit systems which consists of values of cost coefficients, minimum up/down costs, and start-up costs. Table 2 shows the load pattern (in MW) of interval 1 h. Table 3 shows the mutation and crossover rates defined for this research. Fitness Function: The ramp rate is considered for the calculation of overall cost of production of power plant. The fitness function considered for the working of genetic algorithm is given below (Eq. 10). Here, Fs is average cost per generation and Ft is mean of Fs  f =

1 if (1 − e/(max(Fs))) ∗ ramprate < Ft 0 otherwise

 (10)

The fitness function considered for the working of differential evolution algorithm is given below (Eq. 11)  f =

1 if (Fs) ∗ ramprate < (Ft/e) 0 otherwise

 (11)

Working of proposed approach: The algorithm starts with initial settings of various constraints considered for this research which includes ramp up, ramp down, start-up cost, and distribution of cost. The algorithm is iterative in nature. The number of iterations can be set as per the requirement. The number of iterations set to be zero. The distributed cost included

Solution to Unit Commitment Problem: Modified hGADE Algorithm

87

only for the first iteration. There are six units which are taken for power system analysis. These six units have to satisfy the load with minimum cost. The load is distributed with one hour interval. The operation cost is calculated as follows (Eq. 12): oc = a ∗ Pt2 + b ∗ Pt + c

(12)

Here, oc Operation cost a, b, c Cost coefficients P Maximum power. Then, the priority of power units are maintained as per the following Eq. 13: Priority =

oc Pmax

(13)

The priority of all units are calculated and sorted in ascending order means unit of higher priority (lower the number, higher is the priority) will take the load first. As per working of hGADE, genetic algorithm works on binary component and differential evolution algorithm works on the continuous component of chromosomes (Fig. 1). Results The research has been carried out on MATLAB 2016b. It has been found optimization made a significant difference in the cost of operation. The following are the results obtained under conditions specified in Tables 4 and 5. The results obtained are promising. The graph as shown in Fig. 2 shows the average cost of generators (units) over generations (iterations). Table 4 shows the unit commitment schedule of six units over ten generations. Here, “on” suggest unit is active in particular iteration and “off” indicates unit is not taken into consideration for calculation of total operating cost. It has been found that average cost of operation is 142,814.9603 $ and it get reduced to 142,809.8944 $ after applying optimization (hGADE) and it is shown in Table 5. Case 1 represents the total operation cost without using any optimization. Case 2 shows results with optimization. The comparative analysis clearly shows that there is a significant improvement with respect to cost if proposed approach is applied.

6 Conclusion The research is carried out using hybridization of genetic and differential evolution algorithms with consideration of ramp up and down rates. The fitness functions have been designed, respectively. It has been observed that the hybridization of

88

A. Singh and A. Khamparia

Start

Initialize Population

Fitness Evaluation of parent population

Find Best Solution

Yes

Condition Satisfied

GA working on binary component of chromosomes

Optimal Solution Output DE working on continuous component of chromosomes

No

Perform stochastic uniform selection

Perform DE mutation on continuous components

Perform GA Crossover on binary components Perform DECrossover on continuous components Perform GA mutation on binary components

Fitness Evaluation

Carry out replacement to form population of next generation

Fig. 1 Working of hGADE [16]

Solution to Unit Commitment Problem: Modified hGADE Algorithm

89

Table 4 UC schedule Unit 1

Unit 2

Unit 3

Unit 4

Unit 5

Unit 6

Generation 1

Off

On

Off

Off

On

Off

Generation 2

On

On

Off

Off

On

On

Generation 3

On

Off

On

On

Off

On

Generation 4

On

On

On

Off

On

On

Generation 5

On

Off

Off

Off

On

Off

Generation 6

On

Off

Off

Off

Off

Off

Generation 7

Off

On

On

Off

On

Off

Generation 8

On

On

Off

Off

Off

Off

Generation 9

On

Off

Off

On

Off

Off

Generation 10

On

On

On

Off

Off

On

Table 5 Optimization results (comparative analysis) Case

GA mutation rate

GA crossover rate

DE mutation rate

DE crossover rate

Cost

Case 1

0.35

0.60

1

0.98

142,814.9603

Case 2

0.35

0.60

1

0.98

142,809.8944

Fig. 2 Average cost difference of generators/units

evolutionary algorithms with ramp rates and newly designed fitness function showed significant improvement with respect to total operation cost.

90

A. Singh and A. Khamparia

References 1. Wood, A. J., & Wollenberg, B. F. (2007). Power generation, operation & control, 2nd edn. New York: John Wiley & Sons. 2. Håberg, M. (2019). Fundamentals and recent developments in stochastic unit commitment. International Journal of Electrical Power & Energy Systems. https://doi.org/10.1016/j.ijepes. 2019.01.037 3. Deka, D., & Datta, D. (2019). Optimization of unit commitment problem with ramp-rate constraint and wrap-around scheduling. Electric Power Systems Research. https://doi.org/10. 1016/j.epsr.2019.105948 4. Wang, M. Q., Yang, M., Liu, Y., Han, X. S., & Wu, Q. (2019). Optimizing probabilistic spinning reserve by an umbrella contingencies constrained unit commitment. International Journal of Electrical Power & Energy Systems. https://doi.org/10.1016/j.ijepes.2019.01.034 5. Zhou, M., Wang, Bo., Li, T., & Watada, J. (2018). A data-driven approach for multi-objective unit commitment under hybrid uncertainties. Energy. https://doi.org/10.1016/j.energy.2018. 09.008 6. Park, H., Jin, Y. G., & Park, J. –K. (2018). Stochastic security-constrained unit commitment with wind power generation based on dynamic line rating International. Journal of Electrical Power & Energy Systems. https://doi.org/10.1016/j.ijepes.2018.04.026. 7. Panwar, L. K., Reddy, S. K, Verma, A., Panigrahi, B. K., & Kumar, R. (2018). Binary grey wolf optimizer for large scale unit commitment problem. Swarm and Evolutionary Computation. https://doi.org/10.1016/j.swevo.2017.08.002 8. Tovar-Ramírez, C. A., Fuerte-Esquivel, C. R., Martínez Mares, A., & Sánchez-Garduño, J. L. (2019). A generalized short-term unit commitment approach for analyzing electric power and natural gas integrated systems. Electric Power Systems Research. https://doi.org/10.1016/ j.epsr.2019.03.005. 9. Zhou, Bo., Ai, X., Fang, J., Yao, W., Zuo, W., Chen, Z., & Wen, J. (2019). Data-adaptive robust unit commitment in the hybrid AC/DC power system. Applied Energy. https://doi.org/10.1016/ j.apenergy.2019.113784 10. Hobbs, W. J., Hermon, G., Warner, S., & Shelbe, G. B. (1988). An enhanced dynamic programming approach for unit commitment. IEEE Transaction on Power Systems. 11. Cohen, A. I., & Yoshimura, M. (1983). A branch-and-bound algorithm for unit commitment. IEEE Transactions on Power Apparatus and Systems. 12. Yu, X., & Zhang, X. (2014). Unit commitment using Lagrangian relaxation and particle swarm optimization. International Journal of Electrical Power & Energy Systems. 13. Price, K. V., & Storn, R. (1997). Differential evolution: A simple evolution strategy for fast optimization. Dr. Dobb’s Journal, 22(4), 18–24. 14. Singh, A., & Kumar, S. (2016). Differential evolution: An overview. Advances in Intelligent Systems and Computing. https://doi.org/10.1007/978-981-10-0448-3_17 15. Anand, H., Narang, N. & Dhillon, J. S. (2018). Profit based unit commitment using hybrid optimization technique. Energy. https://doi.org/10.1016/j.energy.2018.01.138. 16. Trivedi, A., Srinivasan, D., Biswas, S., & Reindl, T. (2015). Hybridizing genetical gorithm with differential evolution for solving the unit commitment scheduling problem. Swarm and Evolutionary Computation. https://doi.org/10.1016/j.swevo.2015.04.001 17. Dhaliwal, J. S., & Dhillon, J. S. (2019). Unit commitment using memetic binary differential evolution algorithm. Applied Soft Computing. https://doi.org/10.1016/j.asoc.2019.105502.

In Silico Modeling and Screening Studies of Pf RAMA Protein: Implications in Malaria Supriya Srivastava and Puniti Mathur

Abstract Malaria is a major parasitic disease that affects a large human population, especially in tropical and sub-tropical countries. The treatment of malaria is becoming extremely difficult due to the emergence of drug-resistant parasites. To address this problem, many newer drug target proteins are being identified in Plasmodium falciparum, the major casual organism of malaria in humans. Rhoptry proteins participate in the intrusion of red blood cells by the merozoites of the malarial parasite. Interference with the rhoptry protein function has been shown to prevent invasion of the erythrocytes by the parasite. As the crystal structure of RAMA protein of Plasmodium falciparum (Pf RAMA) is not yet available, the three-dimensional structure of the protein was predicted using comparative modeling methods. The structural quality of the generated model was validated using Procheck, which is based on the parameters of Ramachandran plot. The Procheck results showed 92.7% of backbone angles were in the allowed region and 0.4% in the disallowed region. This structure was studied for interaction with the entire library of compounds in ZINC database of natural compounds. The binding site of the protein was predicted using Sitemap and the entire library was screened against the target. 189,530 compounds were used as an input to HTVS for the first level of screening. The docking scores of the compounds were further calculated using “Extra Precision” (XP) algorithm of Glide. On the basis of docking scores, 54 compounds were selected for further analysis. The binding affinity was further calculated using MMGBSA method. The interaction studies using molecular docking and MMGBSA revealed appreciable docking scores and Gbind . 10 compounds were selected as promising leads with appreciable docking scores in the range of −17.891 to −5.328 kcal/mole. Our data generates evidence that the screened compounds indicate a potential binding to the target and S. Srivastava · P. Mathur (B) Centre for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University Uttar Pradesh, Noida 201313, Uttar Pradesh, India e-mail: [email protected] S. Srivastava e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 A. Khanna et al. (eds.), Recent Studies on Computational Intelligence, Studies in Computational Intelligence 921, https://doi.org/10.1007/978-981-15-8469-5_8

91

92

S. Srivastava and P. Mathur

need further evaluation. Also, the analysis of interaction of these compounds can be exploited for better and efficient design of novel drugs against the said target. Keywords Molecular dynamics · Rhoptry-associated membrane antigen (RAMA) · Virtual screening · Molecular docking · Malaria

1 Introduction Malaria is a major tropical parasitic disease and affects a large population in the countries located in this region [1]. According to a WHO World Malaria Report 2015, malaria has resulted in about 438,000 deaths globally [2]. Although major steps have been taken to reduce the burden of malaria, the ultimate aim of roll back malaria (RBM), which is zero death, is yet far from achieved. Various classes of antimalarial drugs such as quinoline derivatives, artemisinin derivatives, antifolates, antimicrobials, and primaquine are known. However, due to increasing incidences of resistance and adverse effects of existing drugs, there is a growing need for discovery and development of new antimalarials. Malaria is caused by Plasmodium species. These are apicomplexan parasites that contain secretory organelles such as rhoptries, micronemes, and dense granules. Rhoptries of Plasmodium are joined club-shaped organelles situated at the apical end of the parasite merozoites. When these merozoites attach on the surface of the human erythrocytes, rhoptries discharge their contents on the membrane [3]. The merozoites internalize and rhoptry disappears. Very less is known about rhoptry biogenesis due to lack of biomarkers of organelle generation. Microscopic examination reveals that rhoptry organelles are formed by continuous fusion of post-Golgi vesicles [3, 4]. Rhoptry is composed of proteins and lipids. Some rhoptry proteins have been analyzed at the molecular level while others have been identified using immunological reagents [5, 6]. In the present work, a rhoptry protein of the Plasmodium falciparum, namely rhoptry-associated membrane antigen (Pf RAMA) that appears to have a role in both rhoptry biogenesis and erythrocyte invasion has been studied. It has been suggested that RhopH3 and RAP1 show protein–protein interaction with RAMA. Recently, it has been shown that certain proteins like sortilin are involved in escorting RAMA from the Golgi apparatus to the rhoptries [7]. Considering the importance of RAMA and its crucial role in forming the apical complex in Plasmodium, the protein looks to be an interesting drug target. A threading-based model of Pf RAMA (PF3D7_0707300), prediction of the binding site and virtual screening using biogenic compounds belonging to ZINC database has been performed. The compounds showing promising docking scores were selected. Molecular dynamics simulation of the protein was performed separately to understand its stability.

In Silico Modeling and Screening Studies of Pf RAMA …

93

2 Materials and Methods A. Modeling of structure of PfRAMA The experimental conformation of Pf RAMA has not yet been deduced, therefore, a knowledge-based comparative modeling method was used for computationally predicting the 3D structure of the protein. Multiple-template modeling and threading method employing the local meta-threading-server (LOMETS) [8] was used to derive the 3D structure. The method aligned the target sequence with protein-fold templates from known structures. Various scoring functions such as secondary structure match, residue contacts, etc., were used and best fit alignment was generated using dynamic programming. After the threading alignment, the continuous fragments were excised from their respective template structures and pulled together to generate a full-length model. At last, final model with the most elevated scores was achieved. B. Protein refinement and validation Energy minimization of the modeled protein was performed using macromodel (version 9.9, Schrodinger) with OPLS 2005 force field and PRCG algorithm using 1,000 steps of minimization and energy gradient of 0.001 kcal/mol. The total energy of the structure was calculated and the overall quality was evaluated using Procheck [9] and Errat [10]. Further, a 50 ns molecular dynamics simulation using Desmond through the multistep MD protocols of Maestro, (version 10.3, Schrödinger) for further refinement of the modeled structure was carried out. OPLS 2005 molecular mechanics force field was used for performing the initial steps of MD simulations. The protein was solvated in cubic box with a dimension of 20 Å using simple point charge water molecules, which were then replaced with 2 Na+ counter ions for electroneutrality. Total 5000 frames were generated, out of which 2000 frames were used to get final structure of the Pf RAMA protein. The total energy of the simulated model was calculated and the overall quality was again validated using Procheck. C. Ligand preparation A library of 276,784 biogenic compounds from ZINC database was used for screening [11]. This library is composed of molecules of biological origin. The structures in the library were prepared for further analysis using LigPrep, version 3.5, Schrödinger [12]. For each structure, different tautomers were produced and a proper bond order was assigned. All possible (default value 32) stereoisomers for each ligand were generated. D. Molecular docking The MD simulated structure of Pf RAMA protein was used to perform the molecular docking studies in order to predict the protein–ligand interactions. Various steps of Schrödinger’s protein preparation wizard prepared the protein before docking calculations. Automatically, hydrogens were added using Maestro interface, Maestro, (version 10.3, Schrödinger) leaving no lone pair and using an explicit all atom model [13].

94

S. Srivastava and P. Mathur

As the binding site of the protein was not known, it was predicted using Sitemap, (version 3.6, Schrödinger). Molecular docking calculation for all the compounds, to determine the binding affinity of Pf RAMA protein binding site, was performed using Glide (version 6.8, Schrödinger) [14]. Receptor-grid file was generated after protein and ligand preparation using receptor-grid generation program. At the centroid of the predicted binding site, receptor grid was generated. A cube of size 10 Å × 10 Å × 10 Å was defined in the center of binding site for the binding of docked ligand and to occupy all the atoms of the docked poses one more enclosing box of 12 Å × 12 Å × 12 Å was also defined. The structure was studied for interaction with the entire library of biogenic compounds selected from the ZINC database. 276,784 compounds were screened using different filters (Qikprop, reactive, Lipinski’s rule of five) and selected compounds obtained were used as input for high throughput virtual screening, HTVS. The screened compounds were subjected to the next level of molecular docking calculations using “Standard Precision” (SP) algorithm. Compounds selected after SP docking were further refined by “Extra Precision” (XP) method of Glide. On the basis of XP docking scores, 10 compounds were selected for further analysis. The binding affinity was calculated based on molecular mechanics generalized Born surface area (MMGBSA) using Prime, (version 4.1, Schrödinger), and the interaction studies using molecular docking and MMGBSA [15] revealed appreciable docking scores and Gbind.

3 Results and Discussion A. Sequence analysis Pf RAMA protein sequence of 861 amino acids was primarily analyzed using BLAST against the PDB database to find structurally categorized proteins that display significant sequence resemblance to the objective protein utilizing the evolutionary information by accomplishing profile-profile alignment and the assessment of the likelihood that two proteins are correlated to each other as shown in Fig. 1. Sequence identity and query coverage for the templates available for Pf RAMA proteins were very low. B. Modeling of structure of selected Pf RAMA protein Three-dimensional structure of PfRAMA protein was constructed using LOMETS as shown in Fig. 2. This meta-server used nine threading programs and ensured a quick generation of the resultant structure. A predominantly helical structure was generated with many loops as connectors. C. Refinement of the structure model and minimization of energy Predicted model was further refined by performing molecular dynamics simulations for 50 ns using Desmond 2.2. To evaluate the structure deviation, RMSD was calculated during the simulation based on initial backbone coordinates for

In Silico Modeling and Screening Studies of Pf RAMA …

95

Fig. 1 Graphical results of BLAST query for PfRAMA; the regions numbers from the target database that lined up with the inquiry sequence

Fig. 2 Structure of modeled PfRAMA protein

the protein as represented in Fig. 3. RMSD plot revealed that the structure was stable after 35 ns. D. Validation of modeled structure of PfRAMA Validation of the protein structure was performed by SAVES server. The Procheck results showed 92.7% residues were in allowed regions and 0.4%

96

S. Srivastava and P. Mathur

Fig. 3 50 ns molecular dynamics simulations run of Pf RAMA protein for refinement of structure: RMSD of heavy atoms and back bone atoms

residues in disallowed region (Fig. 4). Thus, a good quality structure was generated and the refined structure with minimum energy was further used to perform molecular docking studies. E. Analysis of predicted PfRAMA protein binding site The binding site of PfRAMA protein was computationally predicted to analyze the protein for interaction with lead molecules. Five druggable binding sites were predicted using Sitemap. Binding sites having site score 1.286 was selected for receptor-grid generation as shown in Fig. 5.

Fig. 4 Validation of MD simulated structure of PfRAMA protein: Ramachandran plot which shows 92.7% residues were in allowed regions and 0.4% residues in disallowed region

In Silico Modeling and Screening Studies of Pf RAMA …

97

Fig. 5 Different predicted binding sites of the Pf RAMA protein

F. Molecular docking calculations of PfRAMA protein The interaction of the Pf RAMA with the various ligands was studied using molecular docking calculations using Glide, (version 6.8, Schrödinger). Total 189,530 compounds were used as input for HTVS in the first level of screening. The result of HTVS gave 60,657 compounds, which were further used in the next level of molecular docking calculations using “Standard Precision” SP. 1003 compounds selected after SP docking were further refined by “Extra Precision” (XP) method of Glide docking. On the basis of XP docking scores, 54 compounds were selected for further analysis. 10 compounds among the above, showing promising leads with appreciable docking scores in the range of −17.891 to − 5.328 are shown in Table 1. G. Binding affinity calculation The binding free energy was derived using MMGBSA. Table 1 shows the selected best ten compounds showing promising leads with appreciable docking scores and a range of Gbind score from −91.547 to −80.351 kcal/mol. After analyzing the binding mode, the ligand bearing the best (lowest) docking score and Gbind value, namely ZINC08623270 was selected for further calculations analysis of binding mode of the protein–ligand docked complex within the binding pocket of PfRAMA showed hydrogen bonding (H-bond) patterns as shown in Fig. 6a, b. A total of three hydrogen bonds were formed between the ligand and the protein, one main chain hydrogen bond with Tyr625 and two main chain hydrogen bonds with Asn614 (Fig. 6a, b).

98

S. Srivastava and P. Mathur

Table 1 Glide energy, docking score and MMGBSA (Gbind ) score of selected ligands S. No.

ZINC ID

Glide energy

Docking score

MMGBSA

1

ZINC08623270

−48.440

−17.891

−91.547

2

ZINC03794794

−36.907

−11.029

−90.356

3

ZINC20503905

−35.241

−7.427

−85.930

4

ZINC67870780

−43.708

−5.547

−85.490

5

ZINC67870780

−47.18

−7.561

−85.490

6

ZINC22936347

−34.835

−6.461

−84.941

7

ZINC15672677

−34.502

−6.02

−84.845

8

ZINC09435873

−43.937

−6.53

−82.966

9

ZINC09435873

−37.125

−6.095

−82.966

10

ZINC20503855

−33.255

−5.328

−80.351

Fig. 6 a Ligand interaction with Pf RAMA protein showing hydrogen bonds between the ligand and the protein, b three-dimensional structure fitting of ligand-1 with the binding site of Pf RAMA protein showing hydrogen bonds

In Silico Modeling and Screening Studies of Pf RAMA …

99

Fig. 7 Graph showing per residue energy a E vdw of ligand, b E ele of ligand

H. Per residue energy contribution The amino acids present in binding site showed significant contribution to the van der Waals (E vdw ) and electrostatic (E ele ) energy. Significant E vdw contribution was made by amino acids such as Lys-40, Leu-39, Gly-3, Glu-116, Asn-41, Tyr60, and Gln-4 (Fig. 7a). Appreciable E ele energy contribution was made by the residues, Lys-40, Lys-2, Lys-34, Lys-36, Met-1, Lys-153, and Arg-154 (Fig. 7b).

4 Conclusion In conclusion, a good quality three-dimensional structure of Plasmodium falciparum rhoptry-associated membrane antigen (Pf RAMA) protein was determined using comparative energy-based modeling method and molecular dynamics simulations. Ramachandran plot showed 92.7% of backbone angles of Pf RAMA protein in the allowed regions and 0.4% residues in disallowed region. A series of docking

100

S. Srivastava and P. Mathur

studies were performed and binding affinity of ligands with the protein evaluated. Out of ten molecules that showed appreciable docking scores and high affinity toward the binding site of the protein, ZINC08623270 was selected for further analysis. The popular name of the ligand was 1-(3-methylsulfanyl phenyl)-3[[5-[(4-phenylpiperazin-1-yl)methyl]quinuclidin-2-yl]methyl]urea. The interaction between the protein and this ligand was stabilized by three hydrogen bonds, hydrophobic as well as ionic interactions. Our data generates evidence that the reported compounds indicate a potential binding to the target and need further experimental evaluation. It is therefore proposed that this study could be the basis for medicinal chemists to design better and efficient compounds which may qualify as novel drugs against the said target of malaria, caused by Plasmodium falciparum.

References 1. Cowman, A. F., Healer, J., Marapana, D., & Marsh, K. (2016). Malaria: Biology and disease. Cell, 167, 610–624. 2. WHO. (2015). The World Malaria Report http://wwwwhoint/malaria/publications/worldmalaria-report-2015/report/en/. ISBN 978 92 4 156515 8. 3. Bannister, L. H., Mitchell, G. H., Butcher, G. A., & Dennis, E. D. (1986). Lamellar membranes associated with rhoptries in erythrocytic merozoites of Plasmodium knowlesi: A clue to the mechanism of invasion. Parasitology, 92(2), 291–303. 4. Jaikaria, N. S., Rozario, C., Ridley, R. G., & Perkins, M. E. (1993). Biogenesis of rhoptry organelles in Plasmodium falciparum. Molecular and Biochemical Parasitology, 57(2), 269– 279. 5. Preiser, P., Kaviratne, M., Khan, S., Bannister, L., & Jarra, W. (2000). The apical organelles of malaria merozoites: Host cell selection, invasion, host immunity and immune evasion. Microbes and Infection, 2(12), 1461–1477. 6. Blackman, M. J., & Bannister, L. H. (2001). Apical organelles of Apicomplexa: Biology and isolation by subcellular fractionation. Molecular and Biochemical Parasitology, 117(1), 11–25. 7. Hallée, S., Boddey, J. A., Cowman, A. F., & Richard, D. (2018). Evidence that the Plasmodium falciparum protein sortilin potentially acts as an escorter for the trafficking of the rhoptryassociated membrane antigen to the Rhoptries. mSphere, 3 (1), e00551–17. https://doi.org/10. 1128/mSphere.00551-17. 8. Wu, S., & Zhang, Y. (2007). LOMETS: A local meta-threading-server for protein structure prediction. Nucleic Acids Research., 35(10), 3375–3382. 9. Laskowski, R. A., MacArthur, M. W., Moss, D. S., & Thornton, J. M. (1993). PROCHECK: A program to check the stereochemical quality of protein structures. Journal of Applied Crystallography, 26, 283–291. 10. Colovos, C., & Yeates, T. O. (1993). Verification of protein structures: Patterns of non-bonded atomic interactions. Protein Science, 2(9), 1511–1519. 11. Irwin, J. J., Sterling, T., Mysinger, M. M., Bolstad, E. S., & Coleman, R. G. (2005). ZINC–a free database of commercially available compounds for virtual screening. Journal of Chemical Information and Modeling, 45, 177–182. 12. Greenwood, J. R., Calkins, D., Sullivan, A. P., & Shelley, J. C. (2010). Towards the comprehensive, rapid, and accurate prediction of the favorable tautomeric states of drug-like molecules in aqueous solution. Journal of Computer-Aided Molecular Design, 24(6–7), 591–604. 13. Andrec, M., Harano, Y., Jacobson, M. P., Friesner, R. A., & Levy, R. M. (2002). Complete protein structure determination using backbone residual dipolar couplings and sidechain rotamer prediction. Journal of Structural and Functional Genomics, 2(2), 103–11.

In Silico Modeling and Screening Studies of Pf RAMA …

101

14. Friesner, R. A., Banks, J. L., et al. (2004). Glide: A new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. Journal of Medicinal Chemistry, 47(7), 1739–49. 15. Jacobson, M. P., Friesner, R. A., Xiang, Z., & Honig, B. (2002). On the role of the crystal environment in determining protein side-chain conformations. Journal of Molecular Biology, 320(3), 597–608.

IBRP: An Infrastructure-Based Routing Protocol Using Static Clusters in Urban VANETs Pavan Kumar Pandey, Vineet Kansal, and Abhishek Swaroop

Abstract Vehicular ad hoc networks (VANETs) are a popular subclass of mobile ad hoc networks (MANETs). These kinds of networks do not have a centralized authority to control the network infrastructure. The data routing is one of the most significant challenges in VANETs due to its special characteristics. In this paper, an effective cluster-based routing algorithm for VANETs has been proposed. Unlike other clustering approaches, RSUs have been considered as a fixed node in VANETs and treated as cluster heads in this approach. Due to static clusters, the overhead to create and to maintain clusters has been reduced. Multiple clusters in large network headed by RSUs make routing more efficient and reliable. Three levels of routing mentioned with this approach. At first level routing, the source vehicle itself is capable to send data to the destination node. Second level routing needs RSU’s intervention in routing and RSU finds the path from source to destination vehicle. At third level, RSU broadcasts the message to other connected RSU to spread messages in a broader way. The proposed approach is suitable for urban VANETs because of the availability of the dense network with multiple RSUs. Some applications where the approach may be useful are emergency help, broadcasting information, reporting to authority or vehicles for any observation, and to collect nearby traffic-related information from RSU of the respective cluster. The static analysis of the proposed approach shows that the proposed approach is efficient, scalable, and able to reduce network overhead in large and dense urban VANETs.

P. K. Pandey (B) Dr. A.P.J. Abdul Kalam Technical University, Lucknow, India e-mail: [email protected] V. Kansal Institute of Engineering and Technology, Dr. A.P.J Abdul Kalam Technical University, Lucknow, India e-mail: [email protected] A. Swaroop Bhagwan Parashuram Institute of Technology, New Delhi, India e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 A. Khanna et al. (eds.), Recent Studies on Computational Intelligence, Studies in Computational Intelligence 921, https://doi.org/10.1007/978-981-15-8469-5_9

103

104

P. K. Pandey et al.

Keywords Vehicular ad hoc networks (VANETs) · Infrastructure-based routing · Cluster-based routing · Static cluster based · Urban environment

1 Introduction Vehicular ad hoc networks (VANETs) [1] are prominent subclass of mobile ad hoc networks (MANETs). It provides a special kind of framework to allow communication among several vehicles. In VANETs, vehicles create a huge network (millions of vehicles running on roads) without any centralized authority by considering each vehicle acts as a network node and router itself. VANETs [2] are kind of infrastructure-less self-organized networks where nodes are completely mobile. Unique characteristics of VANETs such as dynamic topologies, limited bandwidth, and limited energy made this one of most challenging network scenarios. As per Fig. 1, every vehicle in VANETs is well equipped [3] with wireless devices to communicate with other vehicles and road side units (RSUs). Nodes in VANETs can communicate to each other by following either single-hop or multi-hop connectivity. Each vehicles and RSU are part of VANETs as nodes in any networks. Vehicular communication can be further categorized in two different kinds of communication [4], one is vehicle-to-vehicle (V2V) and another one is vehicle-to-infrastructure (V2I) communication. V2V supports communication among vehicles only. However, V2I supports the inclusion of other nodes in communication as well like RSUs, traffic

Fig. 1 Architecture of VANET [5]

IBRP: An Infrastructure-Based Routing Protocol …

105

authorities, etc. Hence, vehicles can communicate directly if they are within their communication range and communication beyond their range can be possible through other infrastructure nodes. VANETs are used to design intelligent transportation systems because of their useful applications [6] such as transportation safety, traffic efficiency, and traffic improvements. Transportation safety [7] includes message dissemination about several alerts and warnings such as accident alerts, traffic situation alerts, poor road conditions alert, lane change warnings, overtaking warnings, and collision warnings. Traffic efficiency and improvement using VANETs focus to assist and enhance traffic flow by following the current situation of traffic. It provides comfort driving for drivers by sharing dynamic traffic information to several vehicles running on roads. These applications are helpful to avoid road accidents that directly reduces a lot of casualties on road. VANETs are kind of network, whichever having a lot of challenges those are yet to be addressed. These challenges include routing challenges, security challenges, reduced signal quality, degraded signal strength, and quality of communication. Additionally, the rapid change of vehicle’s position makes setup, implement, and deploy vehicular communication framework more difficult. To increase cooperation between vehicles and other nodes, there should be a way to transmit a messages from one node to another node that requires an effective routing procedure. By following characteristics of VANETs, routing is the most important research challenge in VANETs. Therefore, to design an effective and efficient routing protocol in VANETs, it is a key requirement for providing reliable communication in VANETs. Vehicles are not able to share messages among them without a well-defined routing mechanism. In order to address routing issues, several routing protocols have been already proposed in VANETs. This approach also focuses on the problem of routing and new routing approach has been proposed by the following efficient clustering technique. The major contributions of the present exposition are as follows. (1) A new cluster-based routing protocol for urban VANET’s has been proposed. (2) A mechanism has been added in the proposed approach to avoid flooding of messages in the networks. (3) The cache routes are used to apply control broadcasting. (4) Static performance analysis of the proposed approach shows that the proposed approach is efficient and scalable. The rest paper is organized in like way, where Sect. 2 explains routing challenges and discusses few routing protocols as well. Section 3 captures all details of the new routing approach proposed in this paper, and then the performance evaluation continues in Sect. 4 that helps to understand an effectiveness and efficiency of the designed routing approach, and the last Sect. 5 concludes the paper.

106

P. K. Pandey et al.

2 Related Work The way of transmitting message from one node to another node known as routing. Numerous different routing protocols are proposed based on different kinds of network scenarios. Routing protocols are designed for completely connected networks like MANETs cannot be used effectively and efficiently in VANETs as well. Therefore, different kinds of routing procedures need to be designed by supporting dynamic topology, frequent link breakage, and high mobility of nodes. In VANETs, several routing protocols [8] have already been proposed to achieve efficient routing between nodes. Efficiency and usability of a routing protocol can be measured by different parameters of quality of services (QoS) such as end-to-end delay, round trip delay, packet loss, jitter, and interference. Based on different routing behavior used in protocols, numerous routing approaches [9] can be further classified in different categories. As per Fig. 2, major routing protocols can be classified into five different categories by following the routing mechanism used. Each category of the protocol has been discussed separately. Topology-based routing protocols [11] are based on information of complete network topology. Each node in networks keeps track of every other nodes in networks and maintain a routing table by capturing best route information to every other nodes in the network. This mechanism can be further classified into three categories, proactive, reactive, and hybrid routing protocol. Destination-based distance vector (DSDV) [12], ad hoc on-demand distance vector (AODV) [13], and zone routing protocol (ZRP) [14] are few popular routing protocols from this category. Position-based routing protocol [15, 16] uses the current position or location of vehicles using a global positioning system (GPS) or any other related technology.

Fig. 2 Routing protocols in VANET [10]

IBRP: An Infrastructure-Based Routing Protocol …

107

This is also known as geographical-based routing protocols. Therefore, each nodes in network just rely on position and location information collected from different sources and transmit a message from one node to other accordingly. Distance Routing Effect Algorithms for Mobility (DREAM) [17] and Greedy Perimeter Stateless Routing (GPSR) [18] are two major representatives of this category. Broadcast-based routing protocol uses flooding techniques to collect routing information in VANETs. The broadcast scheme allows any nodes to send packets to every other node. Hence, it consumes a lot of bandwidth in network to provide reliable communication. This scheme is very useful when any information needs to be circulated to all neighbor nodes like emergency notification, etc. Distributed Vehicular Broadcast Protocol [19] (DV-CAST) and Hybrid Data Dissemination Protocol (HYDI) [20] are the most suitable protocols in this category. Geocast routing protocol is a separate category of routing protocol that is based on transmission type. Geocast routing is based on zone of relevance (ZoR) that contains a group of vehicles with similar properties. Unlike the broadcast routing mechanism, this routing defines a way to send packet from source to particular ZOR only. No routing table or location information required for this approach. Robust Vehicular Routing (ROVER) and Distributed Robust Geocast (DRG) are best examples of geocast routing mechanisms. Cluster-based routing protocol [21, 22] supports divide of a large network in several sub-networks that are known as clusters. One cluster head to be elected for each cluster that will be responsible for communication beyond the cluster. There are different criteria can be used to form cluster such as direction of movement and speed of vehicles. Location-Based Routing Algorithm with Cluster-Based Flooding (LORA-CBF) and Cluster for Open IVC Network (COIN) are famous protocol on this category. Cluster-Based routing is one of the most popular routing categories nowadays. Recently, a lot of enhancement already proposed under this category. Our contribution also lies in the same category. Therefore, some recent enhancements in this category have been presented. Mobility-based and stability-based clustering scheme proposed in [23] is a clustering technique to design routing protocol. This approach is suitable for urban areas with the assumption that every vehicle should be aware about the position and location of every other neighboring vehicle. According to the clustering approach proposed in the paper, every cluster should have one cluster head, two gateway node, and rest nodes are cluster members. Vehicular multi-hop algorithm for stable clustering (VMaSC) is proposed in [24]. This is kind of multi-hop clustering approach allows to select cluster head based on the relative speed with other neighbor vehicles and link stability. This approach supports direct connection between cluster head and cluster member to offer connections with reduced overhead. Destination aware context-based routing protocol [25] with hybrid soft computing cluster algorithm have been proposed. In this clustering approach, two different soft computing algorithms are discussed. First one is hybrid clustering algorithm, that is,

108

P. K. Pandey et al.

the combination of geographic and context-based clustering approach. This combination reduces control overhead and traffic overhead in network. Another algorithm is destination-aware routing protocol that controls routing of packets between clusters and it improves routing efficiency. An unsupervised cluster-based VANET-oriented evolving graph (CVoEG) model associated reliable routing scheme [26] is also one of clustering-based routing approaches. In order to provide reliable communication, existing VANET-oriented evolving graph (VoEG) has been extended further by introducing clustering technique in existing model. In this paper, link stability is used to mark vehicles either as cluster head or cluster member. Additionally, reliable routing scheme known as CEG-RAODV has been discussed based on the CVoEG model. Moving zone-based routing protocol (MoZo) has been proposed in [27]. This approach proposes an architecture to handle multiple moving clusters where each cluster contains vehicles based on their moving patterns. This approach explains in details about the formation of moving clusters and maintenance of those clusters with design of routing strategy through these clusters. Apart from these approaches, numerous other clustering techniques are proposed. Therefore, the clustering approach in recent trends on routing enhancements in VANETs.

3 Proposed Routing Approach VANETs are a collection of several vehicles and roadside units (RSUs), where each vehicle supposed to be equipped with onboard units (OBUs). Designated OBUs help vehicles to communicate each other and communicate with other infrastructure nodes also like RSUs. Vehicles in the network are limited to communicate with other vehicles and RSUs within a certain limit of distance. OBUs embedded vehicles are enough capable to maintain and broadcast traffic-related information to respective RSUs such as current position of vehicles, speed, direction, current time, and traffic events. Infrastructure-based routing protocol (IBRP) is also based on the clustering approach. Therefore, IBRP allows the complete network to be divided into several clusters by following the clustering approach. In this approach, each vehicle repeatedly exchange messages with RSU within their range. Based on messages exchanged with vehicles, each RSU forms a cluster with all vehicles within its communication range. In every cluster, RSU acts as cluster head and other vehicles act as cluster members. Therefore, RSU has to maintain information about all vehicles in the respective cluster.

IBRP: An Infrastructure-Based Routing Protocol …

109

3.1 System Model VANETs contain hierarchical architecture that consists of multiple levels. At the top of the hierarchy, transport authority (TA) or traffic control authority (TCA) is responsible to monitor and manage traffic conditions. TA or TCA uses Web server (WS) or application server (AS) to monitor vehicles and RSUs through a secure channel. Then, RSUs situated at third level and works as gateway router for all vehicles at the lowest level. Generally, RSUs have higher computational power and a higher transmission range than OBUs equipped with vehicles. Therefore, RSUs are expected to be placed in high-density area like intersection point of multiple roads, etc. Based on the transmission capability of vehicles and RSUs, we assume that every vehicles can communicate with other vehicles within the distance range of up to 500 m. In the same time, vehicles in the same cluster can communicate through their RSUs within the range of up to 1000 m. A further assumption is for a vehicle’s identity and each vehicle assumed to have some unique identification number in VANETs. Vehicle identification number could be registration number itself or some other unique number derived from registration number using some algorithms. Use of registration numbers will help vehicles to initiate communication toward any random vehicles. In this approach, the road network of the urban areas has been represented in form of a graph. In the designed graph, each edge represents any road and each vertex represents an intersection point of multiple roads. RSUs deployed in networks are marked as few highlighted vertices in the graph. Let one road (R) segment where source vehicle (S) wants to send message to destination vehicle (D), where the speed of vehicle represented by V 1 at the timestamp of T 1 and direction of movement varies from −1 to 1. The direction is −1 when vehicle moves from respective RSU and + 1 if the vehicle moves toward respective RSU. In case of no movement direction of vehicle movement can be considered 0. Then, RSU can calculate the further position of the vehicle in different timestamp by itself and send that information to other vehicles in the cluster. Each RSU receives all such information from different vehicles within range like identity, speed, direction of movement, etc. RSU maintains routing table for all vehicles by analyzing provided details and keep this routing information updated periodically.

3.2 Data Structure and Message Format Apart from a logical explanation, implementation insights also needed to get complete details of the algorithms. Therefore, the required data structures and format of different messages are mentioned here. As part of the implementation, we need to have two separate entities in protocol one is a vehicle and another one in RSU.

110

P. K. Pandey et al.

Data structure at vehicle:Data structure at vehicles Vi; for (0 < i < N) in VANETs of N vehicles. ID(i): Unique identification number of vehicle i. S(i): Current state of vehicle i. V(i): Current speed of vehicle i. D(i): Direction of movement of vehicle i. where Di belongs to (−1, 0, 1). L(i): Current location of vehicle i. R(i): Identification number of RSU as cluster head. Neighbors (i): Map of neighbor vehicles maintained on vehicle i. it will be defined as map < ID(i), Path (i) > where path(i): List of nodes to traverse to reach V(i). RSU(i): List of RSUs in communication range. Data structure at RSU: Data structure at RSU R(j); for (0 < j < M) in VANETs of M RSUs. ID(j): Unique identification number of RSU j. S(j): Current state of vehicle i. Members (j): Map of member vehicles maintained on RSU (j). it will be defined as map < ID(i), Path (i) > where ID(i) is identification number of vehicle and path(i) is list of nodes to traverse to reach V(i). Old_members (j): Map of old member vehicles maintained on RSU (j). it will be map < ID(i), R(x) > where ID(i) is identification number of recently left vehicle and R(x) is identification number of current RSU of that vehicle. RSU(j): List of RSUs directly connected to R(j). Message format: There are several messages are to be used in this protocol that needs to be defined for better understanding of approach. HELLO {ID(i), V(i), D(i), L(i)}: Message to be sent from vehicle to RSU while joining cluster. HELLO_ACK {R(j), Neighbor(i)}: Message to be sent from RSU to vehicle in response of HELLO. BYE {R(i)}: Message sent by vehicle to respective RSU, while leaving any cluster. BYE_ACK {NONE}: response from RSU to vehicle, in response of BYE. MESSAGE {Vs(i), Vd(i), Path(Vd(i)), String}: structure to keep information related to data to be sent from one node to another. Where Vs(i) is source vehicle identification number, Vd(i) is destination vehicle identification number, Path(Vd(i)) is path traversed so far to reach destination vehicle, and string is data to be sent. Some data structures to be used for search operation frequently like a list of neighbors maintained on vehicles and members maintained on RSU. Therefore, a map is used for these data structures to reduce the complexity of search operation on these data structures. Initialization of all data structures for both entities is explained below.

IBRP: An Infrastructure-Based Routing Protocol …

111

Initialization of vehicle V(i): for (i = 1 to N) set Si = IN; set IDi; fetch nearby RSU; set list RSUs(i); set R(i) = none; neighbors(i) = none; wait on queue of V(i); end for Initialization of RSU R(j): for (j = 1 to M) set Sj = IN; set IDj; fetch all RSU connected directly; set list RSU (j); set members(j) = none; set old_members = none; wait on queue of R(j) end for

3.3 Cluster Formation The cluster formation process starts as soon as the vehicle starts and ready to communicate with nearby RSU. Once OBU equipped with the vehicle is ready with vehicle details including movement details. It will send the “HELLO” message to all nearest RSU. Hello, message consists identity of a vehicle (Id), the current location of the vehicle (Lt), speed of the vehicle (Vt), direction of movement (Dt), and current timestamp (t). RSU receives a message, analyze details received and update their routing information w.r.t to that vehicle. Based on provided traffic information, RSU fetches all neighbors of the new joined vehicle and publish neighbor list back to the vehicle in “HELLO_ACK” message that is designated as a response of “HELLO” message. Same “HELLO” message should be sent periodically by vehicle until or unless it receives response from any RSU or vehicle crosses intersection points. After crossing the intersection point, there is a high probability of change in traffic parameters like direction after turning their way, speed based on new road condition, etc. Therefore, OBU again prepares a new set of data and starts sending a new “HELLO” message to selected RSUs after crossing the intersection point. By following this way of communication, RSU will have routing information of all vehicles within its range and each vehicle should have all other neighbor’s information that can be reachable directly. RSU is known as master vehicle here, as it maintains routing information of all vehicles within its range and responsible to keep information updated. Cluster state transition: In proposed algorithms, at any moment, every vehicle is marked as one of these five states: initial (IN), start election (SE), wait response (WR), cluster member (CM), and isolated member (IM).

112

P. K. Pandey et al.

Initial (IN)—Initial state of the vehicle is state before joining any of cluster. Any new vehicle shall be treated in this state for a certain period of time. Start election (SE)—In this state, vehicles try to join the relevant clusters. After expiring initial timer, vehicle changes state from IN to SE and starts sending HELLO messages to all neighbors. For RSU, in this state, RSU is ready to process HELLO/BYE request and respond accordingly. Wait response (WR)—After sending HELLO message, vehicle change state from SE to WR. In this state, vehicle waits response from respective RSU to become members of that cluster. In case no response received within a certain period of time, vehicle moved to SE state again. Cluster member (CM)—After successfully exchange of HELLO messages and HELLO_ACK from RSU, vehicle changes state to CM because the vehicle is now part of cluster. If the vehicle supposed to change cluster, then the vehicle state gets changed to SE again, after exchanging BYE and BYE_ACK messages. Cluster head (CH)—This state corresponds to RSU only. After responding HELLO_ACK for HELLO request, RSU supposed to be marked in CH state from IN. For RSU, after cleaning up complete cluster RSU changes its state from CH to IN again. Isolated member (IM)—Vehicles are not part of any cluster to be marked in IM state. Vehicles either completed a trip or changing cluster moved to IM state. To illustrate proper state transition, few events are also designated to understand state flow properly. INIT_T—This event corresponds to the timer of 30 s initially to settle OBU and to get ready to exchange messages. HELLO—“HELLO” event signifies messages triggered from vehicle to RSU while cluster formation. HELLO_ACK—“HELLO_ACK” event is the response of “HELLO” message from RSU to vehicle. It is to confirm that cluster formed properly and respective RSU is cluster head. BYE—BYE event signifies case of leaving cluster and the vehicle sends a BYE message to RSU while leaving the cluster. BYE_ACK—“BYE_ACK” event triggered by RSU in response of “BYE”, when gracefully exit to be performed between vehicle and RSU. WAIT_T—This event corresponds to the timer of 20 s. This is time to wait for response of “HELLO” and “BYE” messages. DROP_T—This event corresponds to timer of 20 s to wait whether the request is dropped by RSU or not. START—This event gets triggered, whenever OBU gets power on with vehicle gets started. STOP—As same as “START” this event gets triggered, whenever OBU gets power off with vehicle gets stopped. The state diagram presents state transition flow based on triggered events. These state diagrams help to understand the complete flow of vehicles and RSUs. In this approach, state diagrams for vehicles and RSU captured separately to explain their roles properly.

IBRP: An Infrastructure-Based Routing Protocol …

113

First, we talk about the state diagram of the vehicle presented in Fig. 3, where the vehicle starts from IN state initially. First event is “INIT_T” occurred on expiry of the respective timer. This timer shall be started for a fixed time period of 30 s to stabilize the vehicle and enable them to have proper data for joining the cluster. This event cause changes in state from IN to SE state. Vehicle in SE state start communicating with nearby RSUs to join the correct cluster. In SE state, the vehicle starts sending “HELLO” messages to RSUs and moved to WR state. “HELLO_ACK” event occurred when respective RSU responds to the “HELLO” message received from vehicle. Vehicles in WR state receives HELLO_ACK and change their state to CM state. In case no response received within 20 s, then “WAIT_T” event-triggered and vehicle changes state back to SE. If any vehicle movement pattern forced vehicle to leave cluster, then the respective vehicle has to convey its RSU for the same. For leaving cluster or changing cluster “BYE” and “BYE_ACK” events are marked in the proposed approach. BYE event specifies vehicle intimation to RSU before leaving any cluster but that keeps the vehicle still in CM state until or unless the BYE_ACK event gets triggered. BYE_ACK expected to be triggered when BYE message properly responded by RSU. After receiving BYE_ACK, the state will be changed to SE state again. The last event is “STOP” initiated by a vehicle when OBU finds vehicles shut down after completing current trip and state moved from CM to IM state. So in the state of IM, a vehicle state again gets changed to IN after receiving the “START” event that gets triggered when OBU finds the vehicle started. After the vehicle state diagram, we talk about state diagram of RSU presented in Fig. 4. RSU also starts form IN state and ready to receive messages from vehicles. In first, HELLO event occurred when RSU receives the “HELLO” message from any vehicle. Then, RSU changes its state from IN to SE state and start analyzing data received from the vehicle. If RSU finds that vehicle is part of the cluster, then RSU responds vehicle with “HELLO_ACK” and changes state to CH state. For subsequent HELLO and BYE messages, RSU will remain in CH state only. However, it will change the state from CH to SE for processing further requests. In case RSU does not find request suitable enough to respond due to any reason, DROP_T event occurred and RSU moved to CH state again. After responding to all those requests by HELLO_ACK and BYE_ACK, RSU changes its state to CH state to process other Fig. 3 State diagram of vehicles

114

P. K. Pandey et al.

Fig. 4 State diagram of RSU

requests. In case of BYE_ACK request or STOP received from the last vehicle in a cluster, RSU changes its state from CH to IN again. (1) Clustering procedures: Cluster formation starts as soon as OBU gets ready. OBU equipped vehicle is expected to prepare and send traffic-related information to RSU. For detail understanding of approach, algorithms and pseudocodes are mentioned below. Algorithms and Pseudocodes: Step-wise step procedure of cluster formation are captured with pseudocode of respective algorithms in this paper that gives detail insights of the idea proposed here.

3.3.1

Vehicle Side

1. The vehicle prepares and sends HELLO message to RSU, with all relevant information. 2. Start timer T for time period 20 s. 3. Receives HELLO_ACK from RSU and updates routing table information. 4. If timer T gets expired. 5. Then repeat steps 1–3 again. 3.3.2 1. 2. 3. 4. 5.

RSU Side

Receives HELLO request and check the communication range of the vehicle. If vehicle within communication range of RSU. Then add vehicle entry on RSU. Prepare HELLO_ACK and send back to the vehicle. Otherwise, drop HELLO message.

IBRP: An Infrastructure-Based Routing Protocol …

115

For sending HELLO message to RSU: if V(i) starts OR signal strength gets weaker calculate and set value for Li, Vi and Di; encode HELLO message with all above data; for (j = 1 to size of RSU(i) ) send HELLO to Rj; start wait timer; set Si = WR; end for end if On receiving HELLO from vehicle: if HELLO received in the queue of R(j) from V(i) set Si = SE; start timer for 20 secs; parse Li, Vi, Di; check range for vehicle V(i) if (if vehicle in range) Add vehicle V(i) in members(j); process HELLO_ACK to vehicle else if (vehicle not in range OR timer gets expired) Drop HELLO message; end if end if

For sending HELLO_ACK message to vehicle: if vehicle V(i) found in the range of R(j) add vehicle V(i) in member (j); prepare HELLO_ACK data; set R(j); fetch neighbor list for V(i); add list in HELLO_ACK; send HELLO_ACK to V(i); set Si = CH; else drop HELLO message; end if On receiving HELLO_ACK from RSU: if HELLO_ACK received from R(j) stop wait timer; set Si = CM; parse HELLO_ACK; set neighbors(i) of V(i); R(i) = R(j); end if

3.4 Routing This section covers the procedure to send a message from source to destination by taking advantage of the proposed clustering approach. The complete network now is classified in several clusters and each cluster has been controlled by RSU. Additionally, every vehicle will have a list of neighbor to send message directly. Therefore, two kinds of communication frameworks will be supported. One when

116

P. K. Pandey et al.

source and destination node lies in communication range of same RSU. And other one, when source and destination belong to the communication range of different RSUs. (1) Intra-cluster routing: Intra-cluster routing explains mechanism when source vehicle and destination vehicles belong to the same cluster and communication to happen within the same cluster. In this case, source node checks list of their neighbors first those are directly reachable. If destination node belongs to that list, then source node forwards the message [Si, NONE, M, Di] to destination vehicle directly. Where Si is unique identifier of source vehicle, “NONE” indicates that destination directly reachable from source, Di is a unique identifier of destination vehicle, and M is information to transmit In case of destination does not belong to a list of neighbors then source node will forward the message [Si, NONE, M, Di] to RSU. Then, RSU checks their routing table and finds next node toward destination node Di and forwards the message to that node after adding RSU ID in a list of hops [Si, Ri, M, Di]. By following the same mechanism that RSU will forward the message to further nodes by adding their identifier details, until the message reaches to destination node. That list of nodes traverse will be saved by destination node and that will be used by back-trace message if some immediate reply back needed for message instead of preparing a new route again. Destination will keep that path record data up to a certain time interval, after that route data will be removed. (2) Inter-cluster routing: Inter-cluster routing specifies the way of communication between vehicles belong to different clusters. In case of communication between vehicles from different clusters, RSU will not find destination nodes in their routing table after receiving a message from the source node. Then, RSU first check for a list of vehicles whichever is associated with this RSU earlier. If destination does not belong in that list also, then RSU will broadcast messages to all other directly reachable RSUs by adding their address into message [Si, Ri, M, Di]. Further, next RSU will check its routing table and list of earlier associated vehicles. If the destination is associated with this RSU earlier, then new RSU will be tracked. Otherwise, the message will be broadcasted to RSUs again after adding the identity of the current RSU. If RSU will not find destination vehicle after broadcasting up to two-level, RSUs will drop messages to avoid network overhead further. While changing cluster, old RSU will keep information of new RSU up to 60 min by assuming that the vehicle will be associated with new RSU up to an hour. It will help to increase message delivery percentage with reduced network overhead. Therefore, while processing messages, RSU will check vehicles in their cluster, then it will check the list of vehicles maintained by RSU earlier. If destination vehicle belongs to that list, then RSU will forward a message to the new RSU. It will increase the probability to reach destination nearby new RSU elected for that node. End-to-end routing algorithm and pseudocode: complete end-to-end routing algorithm is mentioned below including inter-cluster and within cluster routing. Routing

IBRP: An Infrastructure-Based Routing Protocol …

117

algorithm mentioned here in term of pseudocode only, by taking the implementation of the algorithm into consideration. On sending message (M) from source vehicle (Vs) to destination vehicle (Vd): /*****Vehicle Side*****/ if message M is valid message for x = 1 to size of neighbors (Vs) if (neighbors(Vs)[x]) equal to Vd extract path(Vd) from neighbors (Vs); set path (Vd) to MESSAGE; set Vs, Vd, M to MESSAGE; dispatch MESSAGE to send; else set R(Vs) as destination in MESSAGE set Vs, M to MESSAGE; set Vs in path (Vd); dispatch MESSAGE to send; endif endif

/***** RSU Side *****/ if R(j) receives valid message from Vs parse MESSAGE and fetch Vs, Vd, path (Vd) and M; fetch broadcast counter from MESSAGE for x =1 to size of members(R(j)) if members of R(j)-[x] equal to Vd fetch path(Vd) from members(R(j))[x]; add R(j) in path (Vd); set Vs, Vd, path(Vd), and M in MESSAGE; send MESSAGE to next node towards path (Vd); else if old_members of R(j)-[x] equal to Vd get RSU-[x] from old_members of R(j)-[x] add R(j) in path (Vd); set Vs, Vd, path(Vd), and M in MESSAGE; send MESSAGE to RSU[x]; else if number of broadcast < 2 fetch RSU(j); increment broadcast counter in MESSAGE; set Vs, Vd, path(Vd), and M in MESSAGE; for y = 1 to size of RSU(j) send MESSAGE to RSU(j)-[y]; else drop MESSAGE; end if end if

118

P. K. Pandey et al.

3.5 Cluster Maintenance As part of maintenance, the vehicle changing cluster is a major challenge to address in this approach. Cluster size will be the same every time, as RSU is in a fixed position in VANETs. Therefore, this cluster will be almost static and only cluster members will be changed from time to time. RSU will have a certain formula based on details received from OBU associated with vehicles. RSU will check whether the vehicle is in range or not, using that formulae. On the vehicle side, same formulae will be used to check whether cluster change is required or not. Assumed Distance = Current distance ± (Direction × expected distance to be traversed in next 10 minutes) where direction would be either −1 or +1 based on the direction of movement. Moving toward RSU direction will be −1 moving opposite direction of RSU then the direction will be +1. Assumed distance should be within a range of communication. While changing cluster first vehicle will ask for new RSU and will send BYE to current RSU after joining a new cluster. The vehicle will send BYE to old RSU with information of the new RSU. So that RSU will have new RSU information up to a certain time period. It will be used while sending any message to that vehicle via new RSU directly without broadcasting to other RSUs. Algorithms and Pseudocode: Step-wise step procedure of cluster transition are captured with pseudocode of respective algorithms in this paper that gives insights of handoff technique used in this approach. Vehicle Side 1. 2. 3. 4. 5.

Vehicle sends BYE message to RSU with relevant information. Start timer T for time period 20 s. Receives response from RSU and cleanup cluster information. If timer T gets expired. Then repeat steps 1–3 again. RSU Side

1. 2. 3. 4. 5.

Receives BYE request and check communication range of vehicle. If vehicle beyond communication range of RSU. Then remove vehicle entry from RSU. Prepare BYE_ACK and send to vehicle. Otherwise, drop BYE message.

IBRP: An Infrastructure-Based Routing Protocol …

119

For sending BYE message to RSU: if signal strength gets weaker for vehicle V(i) prepare BYE message; set R(i) in a message; start wait timer; set Si = WR; send BYE to RSU R(i); end if On receiving BYE from vehicle: if BYE received on queue parse BYE message; set old_members (j); set S(j) = SE; start drop timer; end if For sending BYE_ACK message to vehicle: if BYE processed successfully from V(i) prepare BYE_ACK data; stop drop timer; set S(j) = CH; send BYE_ACK to V(i); end if On receiving BYE_ACK from RSU: if BYE_ACK received from R(j) stop wait timer; parse HELLO_ACK; reset entry of R(j) from R(i). end if

4 Performance Analysis The performance of routing protocols for VANETs is measured in terms of message overhead, message delivery time, and probability of packet loss. As far as message overhead and message delivery time are concerned three cases are possible: 1. The destination is in the neighbor list: In this case, the source will directly send the message to the destination and one message will be required and message delivery time will be T (where T is maximum message propagation delay. 2. The destination is not in the neighbor list but the same cluster: The source will forward the request to its current RSU which will, in turn, forwards the message to the destination. Thus, the two messages will be required and message delivery time will be 2 T. 3. The destination is neither in the neighbor list of node nor in the member list of RSU: In this case, the following subcases are possible: • The destination was previously associated with RSU: The next RSU for the destination is known to current RSU, hence, the message will be forwarded to

120

P. K. Pandey et al.

the next RSU which in turn will forward the message to the destination. Thus, the message overhead will be three messages (Source → RSU, RSU → next RSU, Next RSU → destination) and the message delivery time will be 3 T. • The destination was not previously associated with RSU: The current RSU will broadcast the message to all neighboring RSU. These neighboring RSUs will check their respective member list and if anyone finds it in the member list, they will forward the message to the destination. In this case, the number of messages required will be n + 2 messages (Source → RSU, RSU → All Neighboring RSU, Next RSU → destination) and the message delivery time will be 3 T. However, if No RSU contains the destination as a member but has the information about next RSU, the message will be forwarded to the next RSU which in turn will forward the message to the destination. In this case, n + 3 messages will be required and message delivery time will be 4 T. If none of the cases is satisfied, the message will not be delivered. However, the applications considered are such that this is highly probable that the destination will be near the source only. Hence, it is highly unlikely that the destination is not covered even by the two-hop away RSU’s from the current RSU. Thus, the probability of message loss is very low.

5 Conclusion In the present exposition, an effective and efficient clustering-based routing protocol for urban VANETs has been presented. In order to divide a large network in several clusters, infrastructure nodes such as RSUs have been used as cluster head. Therefore, this approach gives us an advantage over other clustering approaches by supporting static clusters in terms of cluster size and range. Further, it makes the proposed approach simple, precise, and scalable. In order to reduce network load for broadcasting packets, some cache routes are used and controlled broadcasting is used. Our approach supports broadcast of the message up to two levels only which avoids flooding. Based on approach assumptions and suitability, it is recommended for a well-connected areas such as urban areas. The static performance analysis of the proposed approach proves the scalability and efficiency. The dynamic performance evaluation of IBPR and making it secure is left as a future work.

References 1. Basagni, S., Conti, M., & Giordano, S. (2013). Mobile Ad hoc networking: Cutting edge directions. Book Second Edition: Willey IEEE Press Publisher. 2. Moridi, E., & Hamid, B. (2017). RMRPTS: A reliable multi-level routing protocol with Tabu search in VANET. Telecommunication Systems,65(1), 127–137.

IBRP: An Infrastructure-Based Routing Protocol …

121

3. Kasana, R., & Sushil, K. (2015). Multimetric Next Hop Vehicle Selection for geocasting in vehicular Ad-Hoc networks. In International Conference on Computational Intelligence & Communication Technology (CICT) (pp. 400–405). IEEE. 4. Dua, A., Kumar, N., & Bawa, S. (2014). A systematic review on routing protocols for vehicular Ad Hoc networks. Vehicular Communications, 1(1), 33–52. 5. Ahmad, I., Noor, R. M., Ahmedy, I., Shah, S. A. A., Yaqoob, I., Ahmed, E., & Imran, M. (2018). VANET–LTE based heterogeneous vehicular clustering for driving assistance and route planning applications. Elsevier Computer Networks, 145, 128–140. 6. Fekair, M., Lakas, A., & Korichi, A. (2016). CBQoS-VANET: Cluster-based artificial bee colony algorithm for QoS routing protocol in VANET. In International Conference on Selected Topics in Mobile & Wireless Networking (MoWNeT), (pp. 1–8). 7. Singh, S., & Agrawal, S. (2014). VANET routing protocols: Issues and challenges. In Proceedings of RAECS-2014 UIET Panjab University Chandigarh (pp. 205–210). 8. Sharma, Y. M., & Mukherjee, S. (2012). A contemporary proportional exploration of numerous routing protocol in VANET. International Journal of Computer Applications (0975–8887). 9. Singh, S., & Agrawal, S. (2014). VANET routing protocols: Issues and challenges. In Proceedings of 2014 RAECS UIET Panjab University Chandigarh (pp 205–210). IEEE. 10. Altayeb, M., & Mahgoub, I. (2013). A survey of vehicular Ad hoc networks routing protocols. International Journal of Innovation and Applied Studies, 3(3), 829–846. 11. Singh, S., & Agrawal, S. (2014). VANET routing protocols: Issues and challenges. In Proceedings of IEEE Recent Advances in Engineering and Computational Sciences (RAECS) (pp. 1–5). 12. Dhankhar, S., & Agrawal, S. (2014). VANETs: A survey on routing protocols and issues. International Journal of Innovative Research in Science, Engineering and Technology, 3(6), 13427–13435. 13. Perkins, C., Belding-Royer, E., & Das, S. (1997). Ad hoc on-demand distance vector (AODV) routing. In Proceedings of 2nd IEEE WMCSA (pp. 90–100). 14. Haas, Z. J. (1997). The Zone Routing Protocol. 15. Kumar, S., & Verma, A. K. (2015). Position based routing protocols in VANET: A survey. Wireless Personal Communications, 83(4), 2747–2772. 16. Liu, J., Wan, J., Wang, Q., Deng, P., Zhou, K., & Qiao, Y. (2016). A survey on position-based routing for vehicular ad hoc networks. Telecommunication Systems, 62(1), 15–30. 17. Basagni, S., Chlamtac, I., Syrotiuk, V., & Woodward, B. (1998). A distance routing effect algorithm for mobility (DREAM). In Proceedings of ACM International Conference on Mobile Computing and Networking, Dallas, TX, pp. 76–84. 18. Karp, B., & Kung, H. (2000). Greedy perimeter stateless routing for wireless networks. In Proceedings of ACM International Conference on Mobile Computing and Networking (MobiCom 2000), Boston, MA, pp. 243–254. 19. Tonguz, O. K., Wisitpongphan, N., & Bai, F. DV-CAST: A distributed vehicular broadcast protocol for vehicular ad hoc networks. IEEE Wireless Communication, 1. 20. Maia, G., André, L. L., Aquino, D., Viana, A. C., Boukerche, A., Loureiro, A. A. F. (2010). HyDi: A hybrid data dissemination protocol for highway scenarios in vehicular ad hoc networks, DIVANet@MSWiM, pp. 47–56. 21. Luo, Y., Zhang, W., & Hu, Y. (2010). A new cluster based routing protocol for VANET. In Proceedings of the 2nd International Conference on Networks Security Wireless Communications and Trusted Computing, IEEE Xplore Press, Wuhan, Hubei, China, pp. 176–180. 22. Zhang, Z., Boukerche, A., & Pazzi, R. (2011). A novel multi-hop clustering scheme for vehicular ad-hoc networks. In Proceedings of the 9th ACM International Symposium on Mobility Management and Wireless Access (pp. 19–26). 23. Ren, M., Khoukhi, L., Labiod, H., Zhang, J., & Vèque, V. (2017). A mobility-based scheme for dynamic clustering in vehicular ad-hoc networks (VANETs). Vehicular Communications, 9, 233–241. 24. Ucar, S., Ergen, S. C., & Ozkasap, O. (2015). Multihop-cluster-based IEEE 802.11p and LTE hybrid architecture for VANET safety message dissemination. IEEE Transactions on Vehicular Technology, 65(4), 1–1.

122

P. K. Pandey et al.

25. Aravindan, K., Suresh, C., & Dhas, G. (2018). Destination-aware context-based routing protocol with hybrid soft computing cluster algorithm for VANET. Journal of Soft Computing, 1–9. 26. Khan, Z., Fan, P., Fang, S., & Abbas, F. (2018). An unsupervised cluster-based vanet-oriented evolving graph (CVoEG) model and associated reliable routing scheme. IEEE Transactions on Intelligent Transportation Systems. 27. Lin, D., Kang, J., Squicciarini, A., et al. (2017). MoZo: A moving zone based routing protocol using pure V2V communication in VANETs. IEEE Transactions on Mobile Computing, 16(5), 1357–1370.