Strategic System Assurance and Business Analytics [1st ed.] 9789811536465, 9789811536472

This book systematically examines and quantifies industrial problems by assessing the complexity and safety of large sys

493 91 19MB

English Pages XIX, 602 [601] Year 2020

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Front Matter ....Pages i-xix
Investing Data with Machine Learning Using Python (Anish Gupta, Manish Gupta, Prateek Chaturvedi)....Pages 1-9
Probabilistic Assessment of Complex System with Two Subsystems in Series Arrangement with Multi-types Failure and Two Types of Repair Using Copula (Pratap Kumar, Kabiru H. Ibrahim, M. I. Abubakar, V. V. Singh)....Pages 11-26
Recognition and Optimizing Process Lagging Indicators During Software Development (Amrita Wadhawan, Shalu Gupta)....Pages 27-36
Risk and Safety Analysis of Warships Operational Lifetime Defects (Uday Kumar, Ajit Kumar Verma, Piyush Pratim Das)....Pages 37-44
Facebook Data Breach: A Systematic Review of Its Consequences on Consumers’ Behaviour Towards Advertising (Emmanuel Elioth Lulandala)....Pages 45-68
Fuzzy DEMATEL Approach to Identify the Factors Influencing Efficiency of Indian Retail Websites (Loveleen Gaur, Vernika Agarwal, Kumari Anshu)....Pages 69-84
Application of Statistical Techniques in Project Monitoring and Control (Shalu Gupta, Deepshikha Bharti)....Pages 85-97
Sensor-Based Smart Agriculture in India (Pradnya Vishwas Chitrao, Rajiv Divekar, Sanchari Debgupta)....Pages 99-107
Integrating UTAUT with Trust and Perceived Benefits to Explain User Adoption of Mobile Payments (Rishi Manrai, Kriti Priya Gupta)....Pages 109-121
Improving the Quality of Customer Service of Financial Institutions: The Implementation of Challenger Banks (Alexey V. Bataev, Dmitriy G. Rodionov)....Pages 123-138
Corporate Image Building in the Indian IT Sector: Conceptualizing Through HR Practices (Renu Rana, Shikha Kapoor)....Pages 139-153
Vague Reliability Analysis of Standby Systems with an (N + 1) Units (Kapil Kumar Bansal, Mukesh K. Sharma, Dhowmya Bhatt)....Pages 155-161
Experimental Study for MRR and TWR on Machining of Inconel 718 using ZNC EDM (Rajesh Kumar, Balbir Singh)....Pages 163-173
Blind Quantitative Steganalysis Using CNN–Long Short-Term Memory Architecture (Anuradha Singhal, Punam Bedi)....Pages 175-186
Performance of Autoregressive Tree Model in Forecasting Cancer Patients (Sukhpal Kaur, Madhuchanda Rakshit)....Pages 187-200
Developing a Usage Space Dimension Model to Investigate Influence of Intention to Use on Actual Usage of Mobile Phones (Geeta Kumar, P. K. Kapur)....Pages 201-229
Examining the Relationship Between Customer-Oriented Success Factors, Customer Satisfaction, and Repurchase Intention for Mobile Commerce (Abhishek Tandon, Himanshu Sharma, Anu Gupta Aggarwal)....Pages 231-243
Smart Industrial Packaging and Sorting System (Sameer Tripathi, Samraddh Shukla, Shivam Attrey, Amit Agrawal, Vikas Singh Bhadoria)....Pages 245-254
Credentials Safety and System Security Pay-off and Trade-off: Comfort Level Security Assurance Framework (Habib ur Rehman, Mohammed Nazir, Khurram Mustafa)....Pages 255-274
A Study of Microfinance on Sustainable Development (Shital Jhunjhunwala, Prasanna Vaidya)....Pages 275-285
Development of Decision Support System for a Paper Making Unit of a Paper Plant Using Genetic Algorithm Technique (Rajeev Khanduja, Mridul Sharma)....Pages 287-302
The Challenge of Big Data in Official Statistics in India (Pushpendra Kumar Verma, Preety)....Pages 303-314
CUDA Accelerated HAPO (C-HAPO) Algorithm for Fast Responses in Vehicular Ad Hoc Networks (Vinita Jindal, Punam Bedi)....Pages 315-324
A Survey of Portfolio Optimization with Emphasis on Investments Made by Housewives in Popular Portfolios (Sunita Sharma, Renu Tuli)....Pages 325-333
Single Vacation Policy for Discrete-Time Retrial Queue with Two Types of Customers (Geetika Malik, Shweta Upadhyaya)....Pages 335-349
Intuitionistic Fuzzy Hybrid Multi-criteria Decision-Making Approach with TOPSIS Method Using Entropy Measure for Weighting Criteria (Talat Parveen, H. D. Arora, Mansaf Alam)....Pages 351-364
Dynamic Analysis of Prey–Predator Model with Harvesting Prey Under the Effect of Pollution and Disease in Prey Species (Naina Arya, Sumit Kaur Bhatia, Sudipa Chauhan, Puneet Sharma)....Pages 365-380
Univariate and Multivariate Process Capability Indices—Measures of Process Performance—A Case Study (Vivek Tyagi, Lalit Kumar)....Pages 381-392
Improving Customer Satisfaction Through Reduction in Post Release Defect Density (PRDD) (Hemanta Chandra Bhatt, Amit Thakur)....Pages 393-405
An Architecture for Data Unification in E-commerce using Graph (Sonal Tuteja, Rajeev Kumar)....Pages 407-417
Road Accidents in EU, USA and India: A critical analysis of Data Collection Framework (Alok Nikhil Jha, Geetam Tiwari, Niladri Chatterjee)....Pages 419-443
Improving the QFD Methodology (Yury S. Klochkov, Albina Gazizulina, Maria Ostapenko)....Pages 445-460
Software Defect Prediction Based on Selected Features Using Neural Network and Decision Tree (Prarna Mehta, Abhishek Tandon, Neha)....Pages 461-475
Testing-Effort Dependent Software Reliability Assessment Integrating Change Point, Imperfect Debugging and FRF (Rajat Arora, Anu Gupta Aggarwal, Rubina Mittal)....Pages 477-489
Assessing the Severity of Software Bug Using Neural Network (Ritu Bibyan, Sameer Anand, Ajay Jaiswal)....Pages 491-502
Optimal Refill Policy for New Product and Take-Back Quantity of Used Product with Deteriorating Items Under Inflation and Lead Time (S. R. Singh, Karuna Rana)....Pages 503-515
Blockchain-Based Social Network Infrastructure (Charu Virmani, Tanu Choudhary)....Pages 517-528
Performance Analysis of an Active–Standby Embedded Cluster System with Software Rejuvenation, Fault Coverage and Reboot (Preeti Singh, Madhu Jain, Rachana Gupta)....Pages 529-541
Analysis of Jeffcott Rotor and Rotor with Disk Using XLrotor (H. N. Suresh, N. Madhusudan, D. Sarvana Bavan, B. S. Murgayya)....Pages 543-556
Foodie: An Automated Food Ordering System (Chetanya Batra, Gandharv Pathak, Siddharth Gupta, Saurabh Singh, Chetna Gupta, Varun Gupta)....Pages 557-562
Prioritization of Different Types of Software Vulnerabilities Using Structural Equation Modelling (Swati Narang, P. K. Kapur, D. Damodaran)....Pages 563-578
Fog-Based Internet of Things Security Issues (Omar H. Alhazmi)....Pages 579-587
Estimation of Fatigue Life of PLCC Solder Joints Under Vibration Loading (Rohit Khatri, Diana Denice, Manoj Kumar)....Pages 589-602
Correction to: A Study of Microfinance on Sustainable Development (Shital Jhunjhunwala, Prasanna Vaidya)....Pages C1-C2
Recommend Papers

Strategic System Assurance and Business Analytics [1st ed.]
 9789811536465, 9789811536472

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Asset Analytics Performance and Safety Management Series Editors: Ajit Kumar Verma · P. K. Kapur · Uday Kumar

P. K. Kapur Ompal Singh Sunil Kumar Khatri Ajit Kumar Verma   Editors

Strategic System Assurance and Business Analytics

Asset Analytics Performance and Safety Management

Series Editors Ajit Kumar Verma, Western Norway University of Applied Sciences, Haugesund, Rogaland Fylke, Norway P. K. Kapur, Centre for Interdisciplinary Research, Amity University, Noida, India Uday Kumar, Division of Operation and Maintenance Engineering, Luleå University of Technology, Luleå, Sweden

The main aim of this book series is to provide a floor for researchers, industries, asset managers, government policy makers and infrastructure operators to cooperate and collaborate among themselves to improve the performance and safety of the assets with maximum return on assets and improved utilization for the benefit of society and the environment. Assets can be defined as any resource that will create value to the business. Assets include physical (railway, road, buildings, industrial etc.), human, and intangible assets (software, data etc.). The scope of the book series will be but not limited to: • • • • • • • • • • • • •

Optimization, modelling and analysis of assets Application of RAMS to the system of systems Interdisciplinary and multidisciplinary research to deal with sustainability issues Application of advanced analytics for improvement of systems Application of computational intelligence, IT and software systems for decisions Interdisciplinary approach to performance management Integrated approach to system efficiency and effectiveness Life cycle management of the assets Integrated risk, hazard, vulnerability analysis and assurance management Adaptability of the systems to the usage and environment Integration of data-information-knowledge for decision support Production rate enhancement with best practices Optimization of renewable and non-renewable energy resources.

More information about this series at http://www.springer.com/series/15776

P. K. Kapur Ompal Singh Sunil Kumar Khatri Ajit Kumar Verma •



Editors

Strategic System Assurance and Business Analytics

123



Editors P. K. Kapur Amity Center for Interdisciplinary Research Amity University Noida, India Sunil Kumar Khatri Amity University Tashkent, Uzbekistan

Ompal Singh Department of Operational Research University of Delhi New Delhi, Delhi, India Ajit Kumar Verma Western Norway University of Applied Sciences Haugesund, Norway

ISSN 2522-5162 ISSN 2522-5170 (electronic) Asset Analytics ISBN 978-981-15-3646-5 ISBN 978-981-15-3647-2 (eBook) https://doi.org/10.1007/978-981-15-3647-2 © Springer Nature Singapore Pte Ltd. 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

Preface

This volume deals with the Strategic System Assurance and Business Analytics, ever-growing areas of research from the recent past. The goal of system assurance is to verify that a product conformed to all the desired set of system requirements for safety, security, reliability, maintainability, and availability. Business analytics, on the other hand, provides a data-centric decision-making viewpoint to analyze business performance by utilizing statistical and quantitative methods, information technology, management science, and data mining. Thus, system assurance and business analytics guide a business firm in proper planning, future prediction, and effectual decision making. This volume provides secure, simple access to key performance indicators and metrics that enable clients to be proactive in improving their business operations. This book furnishes a key solution for the ongoing and impending managerial decision problems by applying various analytical techniques. It provides a platform for young researchers to derive maximum utility in the area of analytics by subscribing to the idea of managing system performance and business operations. By concentrating on the system assurance, business performance, and decision support system, this volume will facilitate the management in optimal decision making. This book strategically examines and quantifies the industrial problems by assessing the complexity and safety of large systems. Focusing on exploring the real-world business issues, this volume contains papers on system performance management, software engineering, quality, information technology, multi-criteria decision making, information technology, system security and assurance, applied statistics, soft computing techniques, and business operations. We are grateful to various authors who have made significant contributions and could contribute their research work to this volume. We are also grateful to several

v

vi

Preface

reviewers for their comments and suggestions, which helped in improving the quality of the chapters. We hope that this volume makes significant contributions in the field of Strategic System Assurance and Business Analytics. Noida, India New Delhi, India Tashkent, Uzbekistan Haugesund, Norway

P. K. Kapur Ompal Singh Sunil Kumar Khatri Ajit Kumar Verma

Acknowledgements

A great many people have helped in numerous ways with the successful publication of this book. We begin by expressing our deep and sincere gratitude to various researchers who have provided their insightful research work. We sincerely thank all the researchers for their distinguished contributions in compiling this book. We were also fortunate to have excellent reviewers who provided invaluable feedback and recommendations that refined the quality of chapters. We would also like to appreciate the untiring efforts and dedication of Saurabh Panwar and Vivek Kumar, Ph.D. scholars, Department of Operational Research, University of Delhi, Delhi, India, in assisting us for the successful and timely completion of this book. We are also grateful to all other professionals who helped in the publishing process. We owe the greatest gratitude to all the professionals and readers who have carefully proofread the chapters for errors and inaccuracy. As always, we wish to convey our profound gratitude to our family and friends for their unconditional support and encouragement. Lastly, we apologize for any omissions. Prof. (Dr.) P. K. Kapur Dr. Ompal Singh Prof. Sunil Kumar Khatri Prof. Ajit Kumar Verma

vii

Contents

1

Investing Data with Machine Learning Using Python . . . . . . . . . . . Anish Gupta, Manish Gupta, and Prateek Chaturvedi

2

Probabilistic Assessment of Complex System with Two Subsystems in Series Arrangement with Multi-types Failure and Two Types of Repair Using Copula . . . . . . . . . . . . . . . . . . . . . Pratap Kumar, Kabiru H. Ibrahim, M. I. Abubakar, and V. V. Singh

3

4

5

6

7

1

11

Recognition and Optimizing Process Lagging Indicators During Software Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Amrita Wadhawan and Shalu Gupta

27

Risk and Safety Analysis of Warships Operational Lifetime Defects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Uday Kumar, Ajit Kumar Verma, and Piyush Pratim Das

37

Facebook Data Breach: A Systematic Review of Its Consequences on Consumers’ Behaviour Towards Advertising . . . . . . . . . . . . . . . Emmanuel Elioth Lulandala

45

Fuzzy DEMATEL Approach to Identify the Factors Influencing Efficiency of Indian Retail Websites . . . . . . . . . . . . . . . . . . . . . . . . Loveleen Gaur, Vernika Agarwal, and Kumari Anshu

69

Application of Statistical Techniques in Project Monitoring and Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shalu Gupta and Deepshikha Bharti

85 99

8

Sensor-Based Smart Agriculture in India . . . . . . . . . . . . . . . . . . . . Pradnya Vishwas Chitrao, Rajiv Divekar, and Sanchari Debgupta

9

Integrating UTAUT with Trust and Perceived Benefits to Explain User Adoption of Mobile Payments . . . . . . . . . . . . . . . . . . . . . . . . . 109 Rishi Manrai and Kriti Priya Gupta

ix

x

Contents

10 Improving the Quality of Customer Service of Financial Institutions: The Implementation of Challenger Banks . . . . . . . . . . 123 Alexey V. Bataev and Dmitriy G. Rodionov 11 Corporate Image Building in the Indian IT Sector: Conceptualizing Through HR Practices . . . . . . . . . . . . . . . . . . . . . 139 Renu Rana and Shikha Kapoor 12 Vague Reliability Analysis of Standby Systems with an (N + 1) Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Kapil Kumar Bansal, Mukesh K. Sharma, and Dhowmya Bhatt 13 Experimental Study for MRR and TWR on Machining of Inconel 718 using ZNC EDM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Rajesh Kumar and Balbir Singh 14 Blind Quantitative Steganalysis Using CNN–Long Short-Term Memory Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Anuradha Singhal and Punam Bedi 15 Performance of Autoregressive Tree Model in Forecasting Cancer Patients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Sukhpal Kaur and Madhuchanda Rakshit 16 Developing a Usage Space Dimension Model to Investigate Influence of Intention to Use on Actual Usage of Mobile Phones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 Geeta Kumar and P. K. Kapur 17 Examining the Relationship Between Customer-Oriented Success Factors, Customer Satisfaction, and Repurchase Intention for Mobile Commerce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 Abhishek Tandon, Himanshu Sharma, and Anu Gupta Aggarwal 18 Smart Industrial Packaging and Sorting System . . . . . . . . . . . . . . . 245 Sameer Tripathi, Samraddh Shukla, Shivam Attrey, Amit Agrawal, and Vikas Singh Bhadoria 19 Credentials Safety and System Security Pay-off and Trade-off: Comfort Level Security Assurance Framework . . . . . . . . . . . . . . . . 255 Habib ur Rehman, Mohammed Nazir, and Khurram Mustafa 20 A Study of Microfinance on Sustainable Development . . . . . . . . . . 275 Shital Jhunjhunwala and Prasanna Vaidya 21 Development of Decision Support System for a Paper Making Unit of a Paper Plant Using Genetic Algorithm Technique . . . . . . . 287 Rajeev Khanduja and Mridul Sharma

Contents

xi

22 The Challenge of Big Data in Official Statistics in India . . . . . . . . . 303 Pushpendra Kumar Verma and Preety 23 CUDA Accelerated HAPO (C-HAPO) Algorithm for Fast Responses in Vehicular Ad Hoc Networks . . . . . . . . . . . . . . . . . . . 315 Vinita Jindal and Punam Bedi 24 A Survey of Portfolio Optimization with Emphasis on Investments Made by Housewives in Popular Portfolios . . . . . . . . . 325 Sunita Sharma and Renu Tuli 25 Single Vacation Policy for Discrete-Time Retrial Queue with Two Types of Customers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 Geetika Malik and Shweta Upadhyaya 26 Intuitionistic Fuzzy Hybrid Multi-criteria Decision-Making Approach with TOPSIS Method Using Entropy Measure for Weighting Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 Talat Parveen, H. D. Arora, and Mansaf Alam 27 Dynamic Analysis of Prey–Predator Model with Harvesting Prey Under the Effect of Pollution and Disease in Prey Species . . . . . . . 365 Naina Arya, Sumit Kaur Bhatia, Sudipa Chauhan, and Puneet Sharma 28 Univariate and Multivariate Process Capability Indices—Measures of Process Performance—A Case Study . . . . . . 381 Vivek Tyagi and Lalit Kumar 29 Improving Customer Satisfaction Through Reduction in Post Release Defect Density (PRDD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393 Hemanta Chandra Bhatt and Amit Thakur 30 An Architecture for Data Unification in E-commerce using Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 Sonal Tuteja and Rajeev Kumar 31 Road Accidents in EU, USA and India: A critical analysis of Data Collection Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419 Alok Nikhil Jha, Geetam Tiwari, and Niladri Chatterjee 32 Improving the QFD Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . 445 Yury S. Klochkov, Albina Gazizulina, and Maria Ostapenko 33 Software Defect Prediction Based on Selected Features Using Neural Network and Decision Tree . . . . . . . . . . . . . . . . . . . . 461 Prarna Mehta, Abhishek Tandon, and Neha 34 Testing-Effort Dependent Software Reliability Assessment Integrating Change Point, Imperfect Debugging and FRF . . . . . . . 477 Rajat Arora, Anu Gupta Aggarwal, and Rubina Mittal

xii

Contents

35 Assessing the Severity of Software Bug Using Neural Network . . . . 491 Ritu Bibyan, Sameer Anand, and Ajay Jaiswal 36 Optimal Refill Policy for New Product and Take-Back Quantity of Used Product with Deteriorating Items Under Inflation and Lead Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503 S. R. Singh and Karuna Rana 37 Blockchain-Based Social Network Infrastructure . . . . . . . . . . . . . . 517 Charu Virmani and Tanu Choudhary 38 Performance Analysis of an Active–Standby Embedded Cluster System with Software Rejuvenation, Fault Coverage and Reboot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529 Preeti Singh, Madhu Jain, and Rachana Gupta 39 Analysis of Jeffcott Rotor and Rotor with Disk Using XLrotor . . . 543 H. N. Suresh, N. Madhusudan, D. Sarvana Bavan, and B. S. Murgayya 40 Foodie: An Automated Food Ordering System . . . . . . . . . . . . . . . . 557 Chetanya Batra, Gandharv Pathak, Siddharth Gupta, Saurabh Singh, Chetna Gupta, and Varun Gupta 41 Prioritization of Different Types of Software Vulnerabilities Using Structural Equation Modelling . . . . . . . . . . . . . . . . . . . . . . . 563 Swati Narang, P. K. Kapur, and D. Damodaran 42 Fog-Based Internet of Things Security Issues . . . . . . . . . . . . . . . . . 579 Omar H. Alhazmi 43 Estimation of Fatigue Life of PLCC Solder Joints Under Vibration Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589 Rohit Khatri, Diana Denice, and Manoj Kumar

Editors and Contributors

About the Editors P. K. Kapur is Director of Amity Center for Interdisciplinary Research, Amity University, Noida, and former Dean of the Faculty of Mathematical Sciences, University of Delhi, India. He has been the President of the SREQOM (Regd.) since 2000 and is former President of ORSI. He is the Editor-in-Chief of the International Journal of System Assurance Engineering and Management, Springer (India). He has published over 375 papers in respected international journals and proceedings and has coauthored two books on software reliability with OR applications. Ompal Singh is an Associate Professor at the Department of Operational Research, University of Delhi, India. He obtained his Ph.D. in Software Reliability from the University of Delhi in 2004, and has more than 15 years of experience in teaching, research, and consultation in the area of data handling, data analysis, and modeling in the various fields of operational research. He has published more than 100 research papers in respected international journals and proceedings. Sunil Kumar Khatri is Director, Amity University, Tashkent Campus, Tashkent, Uzbekistan and is a Fellow of Institution of Electronics and Telecommunication Engineers (IETE). He has edited seven books and seven special issues of international journals, as well as more than 100 papers in respected international and national journals and proceedings. He holds various patents, and has been involved in a number of international training projects to his credit. He is an Editor of International Journal of Systems Assurance Engineering and Management, Springer, and is on the editorial boards of several journals in the USA, Egypt, Hong Kong, Singapore and India. Ajit Kumar Verma is a Professor (Technical Safety) of Engineering, Western Norway University of Applied Sciences, Haugesund, Norway. He was previously a Senior Professor at the Department of Electrical Engineering at IIT Bombay, India.

xiii

xiv

Editors and Contributors

He has published over 250 papers in various journals and conferences. He is a senior member of the IEEE and a life Fellow of the IETE and Editor-in-Chief of IJSAEM and the Journal of Life Cycle Reliability and Safety Engineering.

Contributors M. I. Abubakar Department of Mathematics, Kano University of Science and Technology, Kano, Nigeria Vernika Agarwal Amity University, Noida, Uttar Pradesh, India Anu Gupta Aggarwal Department of Operational Research, University of Delhi, New Delhi, Delhi, India Amit Agrawal Department of Electrical and Electronics Engineering, ABES Engineering College, Ghaziabad, India Mansaf Alam Department of Computer Science, Jamia Millia Islamia, New Delhi, India Omar H. Alhazmi Department of Computer Science, Taibah University, Medina, Saudi Arabia Sameer Anand S.S. College of Business Studies, University of Delhi, New Delhi, India Kumari Anshu Amity University, Noida, Uttar Pradesh, India H. D. Arora Department of Applied Mathematics, Amity Institute of Applied Sciences, Amity University, Noida, Uttar Pradesh, India Rajat Arora Department of Operational Research, University of Delhi, New Delhi, Delhi, India Naina Arya Department of Mathematics, Amity Institute of Applied Sciences, Amity University, Noida, Uttar Pradesh, India Shivam Attrey Department of Electrical and Electronics Engineering, ABES Engineering College, Ghaziabad, India Kapil Kumar Bansal Department of Mathematics, SRM IST Delhi NCR Campus, Ghaziabad, India Alexey V. Bataev Peter the Great St. Petersburg Polytechnic University, St. Petersburg, Russia Chetanya Batra Jaypee Institute of Information Technology, Noida, India

Editors and Contributors

xv

Punam Bedi Department of Computer Science, University of Delhi, Delhi, India Vikas Singh Bhadoria Department of Electrical and Electronics Engineering, ABES Engineering College, Ghaziabad, India Deepshikha Bharti Quality Assurance, Centre for Development of Advanced Computing, Noida, Uttar Pradesh, India Sumit Kaur Bhatia Department of Mathematics, Amity Institute of Applied Sciences, Amity University, Noida, Uttar Pradesh, India Dhowmya Bhatt Department of IT, SRM IST Delhi NCR Campus, Ghaziabad, India Hemanta Chandra Bhatt Hughes Systique Private Ltd., Gurugram, India Ritu Bibyan Department of Operational Research, University of Delhi, New Delhi, India Niladri Chatterjee Department of Mathematics, Indian Institute of Technology Delhi, New Delhi, Delhi, India Prateek Chaturvedi Amity University Greater Noida Campus, Greater Noida, India Sudipa Chauhan Department of Mathematics, Amity Institute of Applied Sciences, Amity University, Noida, Uttar Pradesh, India Pradnya Vishwas Chitrao Department of General Management, Symbiosis Institute of Management Studies (SIMS), Pune, India Tanu Choudhary Department of Computer Science and Engineering, Manav Rachna International Institute of Research and Studies, Faridabad, Haryana, India D. Damodaran Center for Reliability, Chennai, India Piyush Pratim Das Luleå University of Technology, Luleå, Sweden Sanchari Debgupta Department of General Management, Symbiosis Institute of Management Studies (SIMS), Pune, India Diana Denice Control Instrumentation Division, Bhabha Atomic Research Centre, Mumbai, India Rajiv Divekar Department of General Management, Symbiosis Institute of Management Studies (SIMS), Pune, India Loveleen Gaur Amity University, Noida, Uttar Pradesh, India Albina Gazizulina Peter the Great St. Petersburg Polytechnic University, St. Petersburg, Russia

xvi

Editors and Contributors

Anish Gupta Amity University Greater Noida Campus, Greater Noida, India Chetna Gupta Jaypee Institute of Information Technology, Noida, India Kriti Priya Gupta Symbiosis Center for Management Studies—NOIDA, Constituent of Symbiosis International (Deemed University), Pune, India Manish Gupta Buddha Institute of Technology, Gorakhpur, India Rachana Gupta Department of Mathematics, Hindu Girls College, Sonipat, Haryana, India Shalu Gupta Quality Assurance, Centre for Development of Advanced Computing, Noida, Uttar Pradesh, India Siddharth Gupta Jaypee Institute of Information Technology, Noida, India Varun Gupta Amity School of Engineering and Technology, Amity University, Noida, India Kabiru H. Ibrahim Department of Mathematics, Kano University of Science and Technology, Kano, Nigeria Madhu Jain Department of Mathematics, Indian Institute of Technology, Roorkee, Uttarakhand, India Ajay Jaiswal S.S. College of Business Studies, University of Delhi, New Delhi, India Alok Nikhil Jha TRIPP, Department of Civil Engineering, Indian Institute of Technology Delhi, New Delhi, Delhi, India Shital Jhunjhunwala Department of Commerce, Delhi School of Economics, University of Delhi, New Delhi, India Vinita Jindal Department of Computer Science, Keshav Mahavidyalaya, University of Delhi, New Delhi, India Shikha Kapoor Amity International Business School, Amity University, Noida, Uttar Pradesh, India P. K. Kapur Amity Center of Interdisciplinary Research, Amity University, Noida, Uttar Pradesh, India Sukhpal Kaur Guru Kashi University, Bathinda, Punjab, India Rajeev Khanduja Jawaharlal Nehru Sundernagar, Himachal Pradesh, India

Government

Engineering

College,

Rohit Khatri Homi Bhabha National Institute, Mumbai, India Yury S. Klochkov Peter the Great St. Petersburg Polytechnic University, St. Petersburg, Russia

Editors and Contributors

xvii

Geeta Kumar Amity International Business School, Amity University, Noida, Uttar Pradesh, India Lalit Kumar Department of Statistics, Meerut College, CCS University, Meerut, India Manoj Kumar Control Instrumentation Division, Bhabha Atomic Research Centre, Mumbai, India Pratap Kumar Department of Mathematics, GCET, Greater Noida, India Rajeev Kumar Jawaharlal Nehru University, New Delhi, India Rajesh Kumar Greater Noida Institute of Technology, Greater Noida, India Uday Kumar Luleå University of Technology, Luleå, Sweden Emmanuel Elioth Lulandala Department of Commerce, Delhi School of Economics, University of Delhi, New Delhi, India N. Madhusudan Diagnostic Engineers, Bengaluru, India Geetika Malik Department of Mathematics, Amity Institute of Applied Science, Amity University, Noida, India Rishi Manrai Symbiosis Center for Management Studies—NOIDA, Constituent of Symbiosis International (Deemed University), Pune, India Prarna Mehta Department of Operational Research, University of Delhi, New Delhi, Delhi, India Rubina Mittal Keshav Mahavidyalaya, University of Delhi, New Delhi, Delhi, India B. S. Murgayya Dayananda Sagar College of Engineering, Bengaluru, India Khurram Mustafa Department of Computer Science, Jamia Millia Islamia, New Delhi, India Swati Narang Amity Institute of Information Technology, AUUP, Noida, India Mohammed Nazir Department of Computer Science, Jamia Millia Islamia, New Delhi, India Neha Department of Operational Research, University of Delhi, New Delhi, Delhi, India Maria Ostapenko Tyumen Industrial University, Tyumen, Russia Talat Parveen Department of Applied Mathematics, Amity Institute of Applied Sciences, Amity University, Noida, Uttar Pradesh, India Gandharv Pathak Jaypee Institute of Information Technology, Noida, India

xviii

Editors and Contributors

Preety Assistant Professor, Faculty of Management, S.V. Subharti University, Meerut, Uttar Pradesh, India Madhuchanda Rakshit Guru Kashi University, Bathinda, Punjab, India Karuna Rana Department of Mathematics, CCS University, Meerut, India Renu Rana Amity International Business School, Amity University, Noida, Uttar Pradesh, India Habib ur Rehman DXC Technology, Noida, India Dmitriy G. Rodionov Peter the Great St. Petersburg Polytechnic University, St. Petersburg, Russia D. Sarvana Bavan Dayananda Sagar University, Bengaluru, India Himanshu Sharma Department of Operational Research, University of Delhi, New Delhi, Delhi, India Mridul Sharma Jawaharlal Nehru Sundernagar, Himachal Pradesh, India

Government

Engineering

College,

Mukesh K. Sharma Department of Mathematics, Ch. Charan Singh University, Meerut, India Puneet Sharma Department of Mathematics, Indian Institute of Technology Jodhpur, Jodhpur, Rajasthan, India Sunita Sharma Department of Mathematics, Kalindi College, Delhi University, New Delhi, India Samraddh Shukla Department of Electrical and Electronics Engineering, ABES Engineering College, Ghaziabad, India Anuradha Singhal Department of Computer Science, University of Delhi, Delhi, India Balbir Singh Shri Mata Vaishno Devi University, Katra, Jammu and Kashmir, India Preeti Singh Department of Mathematics, Hindu Girls College, Sonipat, Haryana, India S. R. Singh Department of Mathematics, CCS University, Meerut, India Saurabh Singh Jaypee Institute of Information Technology, Noida, India V. V. Singh Department of Mathematics, Yusuf Maitama Sule University, Kano, Nigeria H. N. Suresh Dayananda Sagar College of Engineering, Bengaluru, India

Editors and Contributors

xix

Abhishek Tandon Shaheed Sukhdev College of Business Studies, University of Delhi, New Delhi, Delhi, India Amit Thakur Hughes Systique Private Ltd., Gurugram, India Geetam Tiwari TRIPP, Department of Civil Engineering, Indian Institute of Technology Delhi, New Delhi, Delhi, India Sameer Tripathi Department of Electrical and Electronics Engineering, ABES Engineering College, Ghaziabad, India Renu Tuli Department of Applied Sciences, The NorthCap University, Gurugram, Haryana, India Sonal Tuteja Jawaharlal Nehru University, New Delhi, India Vivek Tyagi Department of Statistics, NAS College, CCS University, Meerut, India Shweta Upadhyaya Department of Mathematics, Amity Institute of Applied Science, Amity University, Noida, India Prasanna Vaidya Department of Commerce, Delhi School of Economics, University of Delhi, New Delhi, India Ajit Kumar Verma Western Norway University of Applied Sciences, Haugesund, Norway Pushpendra Kumar Verma Associate Professor, Department of Computer Science, IIMT University, Meerut, Uttar Pradesh, India Charu Virmani Department of Computer Science and Engineering, Manav Rachna International Institute of Research and Studies, Faridabad, Haryana, India Amrita Wadhawan Quality Assurance, Centre for Development of Advanced Computing, Noida, Uttar Pradesh, India

Chapter 1

Investing Data with Machine Learning Using Python Anish Gupta, Manish Gupta, and Prateek Chaturvedi

1.1 Introduction Machine learning plays vital role in self-driven cars, Google Assistance and Siri to news recommendation systems and trading. In investing world, machine learning is at an inflection point. It has been incorporated into various mainstream tools, recommendation engines for delivering news, sentiment analysis, and stock screeners [1]. Machine learning is a field of computer science that often uses statistical techniques to give computers the ability to “learn” (i.e., progressively improve performance on a specific task) with data, without being explicitly programmed. [2]

Investment management in now being incorporated by various new strategies and approached in today’s work. In today’s scenario, industry is debating on active versus passive investing, and machine learning investment algorithms are playing a vital role. All these implementations and uses of machine learning are affecting managers, investors and vendors. Managers are able to evaluate their long-term business strategies while considering scalability [3]. As artificial intelligence and machine learning capabilities advances in the same manner, firms and vendors that positions themselves properly will ultimately benefit from greater returns via scalable programs and fewer expenses because the process will be automated. Vendors will be required to position themselves in right manner so connectivity and data management capabilities can be succeeded [4]. A. Gupta (B) · P. Chaturvedi Amity University Greater Noida Campus, Greater Noida, India e-mail: [email protected] P. Chaturvedi e-mail: [email protected] M. Gupta Buddha Institute of Technology, Gorakhpur, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 P. K. Kapur et al. (eds.), Strategic System Assurance and Business Analytics, Asset Analytics, https://doi.org/10.1007/978-981-15-3647-2_1

1

2

A. Gupta et al.

1.2 Supervised and Unsupervised Machine Learning In supervised learning, we have input and an output variable, and for mapping function, we use some algorithms. Suppose input variable is X and output variable is Y, and then, mapping function is Y = f (X ) The goal of this algorithm is to predict output variables based on the input variables. Unsupervised learning is more closely aligned with what some call true artificial intelligence—the idea that a human interaction will not be required for making computer learn to identify complex processes and patterns [5]. Although unsupervised learning is prohibitively complex for some simpler enterprise use cases, it also explores the possibility of solving problems that human being normally would not tackle. Some examples of unsupervised machine learning algorithms include k-means clustering, principal and independent component analysis and association rules. Unsupervised learning is goaled to model the distribution and underlying structure in data to learn more about data. We have no corresponding output variables and have only input variables [6].

1.3 Word Done To explain this paper, whole work has been divided into three phases: 1. Data extraction 2. Data analysis 3. Prediction.

1.3.1 Data Extraction Data extraction is a tedious job for researchers. In this paper, we have extracted data from different sources and then compiled them together using various logical conditions to create data files used for final analysis and prediction [7]. We add data from different sources in our data pool. Some of these sources include Yahoo Finance, Google Finance, Quandl and S&P 500 Index. The starting data is from Yahoo Finance. These data were in HTML form, and after parsing these data, we extract meaningful data. These data points form the basic building blocks of our paper. All our analysis and predictions are based on the calculations that are made on these data points.

1 Investing Data with Machine Learning Using Python

3

After gathering these data points from the files of Yahoo Finance, we downloaded S&P 500 Index data for the same time. We continue to add features to our data pool with assurance that our data is getting more accurate, as that is our end goal: to have a data set with many features each contributing in the later performed predictions. Here is a some of the data points that we choose to work on in this paper. Ief Key_ Status(gather = [ Total Debt/Equity . 

Trailing P/E .  Price/Sales . 

Price/Book .  Profit Margin ]) We use Quandl to fetch the stock prices for the stocks, and we integrate that in our data too. We fetch the stock prices till last year and increase the length of our dataset, so that more and accurate data is available for data analyzing through the machine which is going to perform predictions. Thus, after many iterations, we create the final dataset.

1.3.2 Data Analysis In this phase, we perform analysis on the data collected by us. We analyze the data of different companies. The data that is present to us is in the form of features or data points. And based on these data points or calculations based on these data points, we decide if a company is over-performer or under-performer [8]. If a company is an over-performer, which we decide based on these key features, then it may be possible that we want to buy its stock. But if a company is an underperformer, then we do not want to buy that company’s stock. We group our companies into two categories—under-performer (0) and overperformer (1). We could have divided the companies into four groups: significant under-performer, under-performer, over-performer and significant over-performer. But we do not want that; we just want the significant over-performer companies buying whose stocks might make us a lot of money. So, we put the significant over-performers in one category, i.e., the stocks that we want to buy (1) and the other three in different category, i.e., the stocks that we do not want to buy. Please note that we gather the data from different resources and the data is pretty accurate, but all of these sources are free, so there might be some inconsistency in the data set, rest assured to a minimal degree.

After the end of this phase, we are ready to perform predictions based on this analysis. Now, let us look at a snapshot of the graphs plotted for all the companies (Fig. 1.1).

4

A. Gupta et al.

Fig. 1.1 Starting level of analysis

The first graph was plotted during the earlier stages; we can see in the graph that some companies are plotted red, while the rest are plotted green, but as we moved forward, making progress and increasing our standards, we include less and less companies in the buying list or in the over-performer list (Fig. 1.2). In the second graph, we see lot of red area, that’s because we raised our standards and have less companies in over-performing class.

1.3.3 Predictions 1.3.3.1

Prediction 1

This is the last phase of the paper. In this phase, we perform predictions in our dataset. The algorithm that we use for our predictions is Support Vector Machine (SVM) algorithm [8]. SVM is a supervised machine learning algorithm. In this algorithm, we must first train the machine, and after training, the machine performs predictions [4]. To train the machine, we use a training set, and then to measure the accuracy of our predictions, we use the testing set. The first step in training the machine is the fitting of (x, y). For training our machine, we put all the values of our data points (FEATURES) in the NumPy array form in x, and in y we add the values of only one data point, i.e., “STATUS” with

1 Investing Data with Machine Learning Using Python

5

Fig. 1.2 Mid-level of analysis

over-performer being 1, and the under-performer being 0. We train our machine on the training set and then perform predictions. Now, let us look at the snapshot of the code, showing its libraries, features and key variables (Snapshot 1.1). After training our machine on training dataset, we perform predictions on the testing dataset. Then, we find the accuracy of our predictions and the invest list, i.e., the list of stocks that we might want to invest in the future. We perform these predictions after many iterations of filtering the data, so as to ensure our predictions are accurate. Each time we run our script, we achieve different accuracy and different invest list, so our final iteration is running the script in a loop 4–5 times, and then, we select the companies common to each iteration in the loop to create our final invest list (Snapshot 1.2). Using the snapshot of the output of our prediction script, we print the length of the data, the accuracy is achieved in each iteration, and then there is a list of selected companies, i.e., the final invest list.

1.3.3.2

Prediction 2

For second prediction, we use k-means shift algorithm, which is an unsupervised machine learning algorithm, in our dataset. K-means algorithm is a very powerful machine learning algorithm. In this algorithm, we fit the value of our features and

6

A. Gupta et al.

Snapshot 1.1 Libraries, features and key variables

then let the machine divide it into clusters. We can decide the number of clusters in which we want to divide our data. Here, we decide the number of clusters by elbow method, which we do by calculating within cluster sum of squares (WCSS) and plotting it against range (1, 16) (Graph 1.1).

1 Investing Data with Machine Learning Using Python

7

Snapshot 1.2 Output of prediction script

Here is a snapshot of different clusters that came through after running the scripts (Fig. 1.3).

8

A. Gupta et al.

Graph 1.1 Elbow plot

Fig. 1.3 K-means cluster

1.4 Conclusion Here we use two algorithms for the predictions performed in our dataset. The first algorithm is Support Vector Machine (SVM) algorithm, and the second one is kmeans cluster algorithm. SVM is a supervised machine learning algorithm, whereas k-means clustering is an unsupervised machine learning algorithm. We pass the same data through both our algorithms and get different predictions. We are then concerned with the accuracy and validity of both the predictions. When we use SVM in our dataset, we train the machine and after training the machine perform predictions, but when we use k-means, we put the data in front of the machine, which then classifies the data into different clusters and then performs

1 Investing Data with Machine Learning Using Python

9

prediction. We may choose the number of clusters, and we may also choose how we decide to train our machine in SVM.

References 1. Zito T, Wilbert N, Wiskott L, Berkes P (2008) Modular toolkit for data processing (MDP): a python data processing framework. Front Neuroinform 2 2. en.wikipedia.org 3. www.ssctech.com 4. Hastie T, Tibshirani R, Friedman JH (2009) In the elements of statistical learning: data mining, inference, and prediction. Springer, New York 5. Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9:1871–1874 6. machinelearningmastery.com 7. van Rossum G (1996) Python language reference. Electronic hypertext document. Available from http://www.python.org 8. Luo B, Zhang Y, Pan Y-H (2005) Face recognition based on wavelet transform and SVM. In: 2005 IEEE international conference on information acquisition 9. Jain A, Gairola R, Jain S, Arora A (2018) Thwarting spam on Facebook, Chap. 4. IGI Global

Chapter 2

Probabilistic Assessment of Complex System with Two Subsystems in Series Arrangement with Multi-types Failure and Two Types of Repair Using Copula Pratap Kumar, Kabiru H. Ibrahim, M. I. Abubakar, and V. V. Singh

2.1 Introduction In the modern technological world, the complexity of the newly designed complex systems has become more, advance, critical, and simultaneously complex day by day. The requirement of timely inspections, maintainability of a system is mandatory for a dynamical system to keep adequately performable and reduce failure effects. Many researchers have developed different advance mathematical models to address the problems of failures and maintained proper performance by computing performancebased reliability measures. Redundancy is one of the techniques to enhance the reliability and availability of physical systems. Together with the redundancies, the configurations, of type, k-out-of-n: F and k-out-of-n: G, are mostly implemented in designing various complex mechanical, electrical, and electronic critically designed systems which are equally essential to enhance the reliability of the system. The researchers have studied the reliability characteristics of different types of the system using mathematical modeling and analyzed the various measures using different techniques. Researchers, including Dhillon and Anude [1], Goel and Gupta [2], Jain and Gopal [3], and Chung [4], studied various types of complex repairable systems having the k-out-of-n: G, form of configurations with several kinds of failure and P. Kumar Department of Mathematics, GCET, Greater Noida, India e-mail: [email protected] K. H. Ibrahim · M. I. Abubakar Department of Mathematics, Kano University of Science and Technology, Kano, Nigeria e-mail: [email protected] M. I. Abubakar e-mail: [email protected] V. V. Singh (B) Department of Mathematics, Yusuf Maitama Sule University, Kano, Nigeria e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 P. K. Kapur et al. (eds.), Strategic System Assurance and Business Analytics, Asset Analytics, https://doi.org/10.1007/978-981-15-3647-2_2

11

12

P. Kumar et al.

single repair. When we come to review the real-life circumstances, it is observed that there are numerous situations where multiple repairs are required to recover the failed systems quickly. As soon as such type of opportunity occurs the systems is repaired by means of copula [5] (joint probability distribution). The performance of the repairable system is profoundly affected due to deliberate failure and reboot delay of maintenance. The authors Kumar and Singh [6] have analyzed the performance of the repairable system with intentional failure and reboot delay. Yusuf [7] have evaluated the availability of a repairable system under deterioration under imperfect maintenance repairs. Yusuf [8] have analyzed the reliability of two units active parallel system connected to an external supporting device using Kolmogorov’s forward differential equation method. The configuration of type k-out-of-n: G and r-out-of-m: F has been analyzed by Gulati et al. [9] with particular attention with 2-out-of-2: F and 1-out-of-2: G using a Gumbel–Hougaard family copula. Authors Ram et al. [10] studied a system with standby unit under waiting repair strategy and have examined the system for heterogeneous possibilities of failure and repair rates. Kadyan et al. [11] studied customary reliability measures of a two units system failing warranty, beyond warranty and degradation mode under preventive maintenance policy. Park [12] has analyzed a system with multicomponent with imperfect repair and warranty cost analysis. Ram and Kumar [13] have analyzed the performance of a complex repairable system under consideration of 1-out-of-2: G scheme with perfect switching using the systematic approaches to evaluate the reliability measures of an order under mixed configuration. Researchers Singh and Ram [14] have analyzed a system consisting two subsystem with multi-state failure and two types of repair using the k-out-of-n: G; operation policy and have highlighted on 2-out-of-3: G; a structure for computations as individual cases. The researchers Singh and Rawal [15] deliberate the reliability features of a repairable system, composing the two subsystems with controller and human failure. Yen et al. [16] computed the reliability and sensitivity of the controllable repair system with warm standbys and working breakdown. Researchers El-Damcese and Sodany [17] premeditated reliability measures, including a sensitivity assessment of the k-out-of-n: G; standby repairable system with common cause failure using the Markov model. The authors Singh et al. [18] studied reliability measures (availability, reliability, MTTF, and cost analysis) of a system consisting of two subsystems at super-priority, priority, and ordinary unit via preemptive resume repair policy. Singh and Ayagi [19] examined the performance of a complex repairable system involving two subsystems in the series configuration through copula distribution. Recently, Singh and Ayagi [20] have also analyzed a complex repairable system with three independent units with several types of failure and two types of repair by copula repair approach. Rawal et al. [21] have studied reliability indices of internet data center with various types of failures together with server failure, cooling failure and catastrophic failure with two types of repair employing copula approach. Andrews and Moss [22] have studied the reliability measures especially concerning to the risk assessment analysis in reliability context. Singh et al. [23] have studied a complex system with two redundant unit in parallel configuration together with the main unit and priority repair to first failed redundant unit.

2 Probabilistic Assessment of Complex System with Two Subsystems …

13

However, researchers have considered different models and examined the performances and accessibility of a complex system to predict better performance. Most of the authors studied the complex repairable systems that are treated as single repair between two contiguous transition states. In the present paper, several reliability measures of a complex repairable system comprising two subsystems together with k-out-of-n; G configuration using two types of repair have been studied. The designed structure of the system comprises two subsystems in a series arrangement. The subsystem-1 has four non-identical units arranged in a parallel alignment and working under 2-out-of-4: G, policy, and subsystem-2 has a single unit, linked in a series alignment with the subsystem-1. In the state S0 , the system is in the perfect state without failure effect. After failing of any one unit in subsystem-1, the system approaches the states S1 , S3 , S7 , and S9 (minor partial failure/degraded states), similarly due to failure of two units in subsystem-1, the system will approach to the state S2 , S4 , S8 , and S10 (major partial failure/danger states). The system will be a failure at states S5 , S6 , S11 , S12 , and S13 . The system will be a complete failed state due to a controller failure. The supplementary variable technique has been implemented to analyze the system for rational valuation for individual cases.

2.1.1 State Description S0 : From the state transitions, S0 is a fault-free state where the both the subsystems are in functioning outstanding condition. S1 : The state S1 is a degraded state with minor partial failure in the subsystem-1 due to the collapse of the first unit of the subsystem-1. S2 : The state is a degraded state with major partial failure in subsystem-1. S3 : The state S3 represents a minor failure state due to the failing of the third unit of subsystem-1. S4 : This state S4 represents a degraded state with major partial failure. The failed system is in repair, and the intervened repair time lies in (x, t). S5 : In the transition diagram, the state S5 represents a complete failed state, in arrears to the major partial failure in the subsystem-1, and failure of subsystem-2, the system is under repair using copula. S6 : The state S6 configured as a complete failed state, as the three units in a subsystem1 fail. The failed system is under repair using copula with the repair μ0 (x). S7 : The state S7 symbolizes a minor partial failure of the fourth unit of subsystem-1. S8 : This state S8 represents a degraded state due to major partial failure in the subsystem-1. The subsystem-1 is in repair ϕ(x), and elapsed repair time lies in (x, t).

14

P. Kumar et al.

S9 : In the state representation diagram, the S9 represents a minor partial failure cause of failing of the second unit of subsystem-1. S10 : The state S10 , is a degraded state with major partial failure in the system due to the subsystem-1, as its second and third units have failed. The subsystem-1 is in usual general repair and lapses restoration time lies in (x, t). S11 : This state represents a complete failed state due to the failure of three units in the subsystem-1. The system is employed to repair using copula as per repair policy. S12 : This state S12 represents a complete failed state due to failing of three units in the subsystem-1 (second, third, and fourth units) and failure of subsystem-1 though the subsystem-2 is in good condition. S13 : The state S13 represents the complete failure due to controller failure, which automatically stops working of the system.

2.1.2 Assumptions The following postulation has presumed during the study of the mathematical model: 1. The state, S0 , is a perfect state in which both subsystems and its controller is in virtuous functioning condition. 2. The system works effectively until the units of subsystem-1 satisfy the working policy, and the subsystem-2 is in fault-free mode. 3. The system fails due to the failure of the subsystem subsystem-2 though the subsystem-1 is exclusively good condition. 4. The system fails damage due to the failure of the controller of the subsystem-1. 5. The minor/major partially failed the general distribution repairs states of the subsystem-1, but the entire failed states of the subsystem-1, and the subsystem-2 is restored by Gumbel–Hougaard family copula distribution. 6. As soon as the system gets fixed, it starts working with full efficiency and no impairment during the repair. 7. The overhauled system mechanisms as a new and restoration did not affect the performance of the repaired system. 8. The controller failure is treated as complete failure to protect other units. 9. All failure rates in subsystem-1 and subsystem-2 are constants and follow a negative exponential distribution.

2 Probabilistic Assessment of Complex System with Two Subsystems …

15

2.1.3 Notations t

Time variable on the timescale

s

A variable for Laplace transform of time variable t in all expressions

λ1 /λ2 /λ3 /λ4

Failure rates of units of subsystem-1

λ5

The failure rate of the unit of the subsystem-2

λc

The failure rate of the controller

ϕ1 (x), ϕ2 (x), ϕ3 (x), ϕ4 (x)

Repair rates for the units I, II, III, and IV of the subsystem-1

μ0 (x), μ0 (y)

Repair rates for complete failed states in subsystem-1 and subsystem-2

Pi (x, t)

The probabilities of state transition from S i to S j at instant i = 1, 2, 3, 4 … 13, with x is a repair variable and t is time variable, respectively

P(s)

A notation for Laplace transformation of state probability P(t)

E p (t)

Expected profit by operation of the system in the time interval [0, t)

K 1, K 2

Revenue generation and maintenance cost per unit time in the range [0, t), respectively

Sϕ (x)

Standard notation function defined as Sϕ (x) = ϕ(x)e−

L[Sϕ (x)]

with a repair rate ϕ(x) ∞ L[Sϕ (x)] = 0 e−sx Sϕ (x)dx = S ϕ (s) Laplace transform of

x 0

ϕ(x)dx

function Sϕ (x) μ0 (x) = C θ (u1 (x), u2 (x))

The expression of joint probability distribution (failed state Si to the perfect state S0 ) given as Cθ (u 1 (x), u 2 (x)) = exp[x θ + {log φ(x)}θ ]1/θ , wherever u1 = φ(x) and u2 = ex , where θ as a parameter, 1 ≤ θ ≤ ∞. For x = 1, φ(x) = 1, and θ = 1, the expression approaches to μ0 (x) = e = 2.7183

2.1.4 System Configuration and State Transition Diagram See Fig. 2.1.

2.2 Mathematical Formulation of the Model By the probability deliberations, the following set of differential Eqs. (2.1)–(2.14) for various states is associated with the present mathematical model.

16

Fig. 2.1 State conversion diagram of the model

P. Kumar et al.

2 Probabilistic Assessment of Complex System with Two Subsystems …

 ∂ + λ1 + λ2 + λ3 + λ4 + λ5 + λc ∂t ⎧ ∞ ⎫ ∞ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ϕ (x)P (x, t)dx + ϕ (x)P (x, t)dx ⎪ ⎪ 1 1 2 9 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 0 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ∞ ∞ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ + ϕ3 (x)P3 (x, t)dx + ϕ4 (x)P7 (x, t)dx ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 0 0 ⎪ ⎪ ⎪ ⎪ ⎪ ∞ ⎪ ∞ ⎪ ⎪ ⎨ ⎬ P0 (t) = + μ0 (x)P5 (x, t)dx + μ0 (x)P6 (x, t)dx ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 0 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ∞ ∞ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ + μ (x)P (x, t)dx + μ (x)P (x, t)dx 0 11 0 12 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 0 ⎪ 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ∞ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ + μ (x)P (x, t)dx 0 13 ⎪ ⎪ ⎪ ⎪ ⎩ ⎭

17





(2.1)

0

 ∂ ∂ + + λ2 + λ5 + λc + ϕ1 (x) P1 (x, t) = 0 ∂t ∂x   ∂ ∂ + + λ3 + λc + ϕ2 (x) P2 (x, t) = 0 ∂t ∂x   ∂ ∂ + + λ4 + λ5 + λc + ϕ3 (x) P3 (x, t) = 0 ∂t ∂x   ∂ ∂ + + λ1 + λc + ϕ4 (x) P4 (x, t) = 0 ∂t ∂x   ∂ ∂ + + μ0 (x) P5 (x, t) = 0 ∂t ∂x   ∂ ∂ + + μ0 (x) P6 (x, t) = 0 ∂t ∂x   ∂ ∂ + + λ2 + λ5 + λc + ϕ4 (x) P7 (x, t) = 0 ∂t ∂x   ∂ ∂ + + λ1 + λc + ϕ2 (x) P8 (x, t) = 0 ∂t ∂x   ∂ ∂ + + λ3 + λ5 + λc + ϕ2 (x) P9 (x, t) = 0 ∂t ∂x   ∂ ∂ + + λ4 + λc + ϕ3 (x) P10 (x, t) = 0 ∂t ∂x

(2.2) (2.3) (2.4) (2.5) (2.6) (2.7) (2.8) (2.9) (2.10) (2.11)

18

P. Kumar et al.

 ∂ ∂ + + μ0 (x) P11 (x, t) = 0 ∂t ∂x   ∂ ∂ + + μ0 (x) P12 (x, t) = 0 ∂t ∂x   ∂ ∂ + + μ0 (x) P13 (x, t) = 0 ∂t ∂x 

(2.12) (2.13) (2.14)

Boundary Conditions P1 (0, t) = λ1 P0 (t)

(2.15)

P2 (0, t) = λ2 λ1 P0 (t)

(2.16)

P3 (0, t) = λ3 P0 (t)

(2.17)

P4 (0, t) = λ4 λ3 P0 (t)

(2.18)

P5 (0, t) = λ3 P2 (0, t) + λ5 (P1 (0, t) + P3 (0, t) + P7 (0, t) + P9 (0, t) + P0 (t)) (2.19) P6 (0, t) = λ1 P4 (0, t)

(2.20)

P7 (0, t) = λ4 P0 (t)

(2.21)

P8 (0, t) = λ2 P7 (0, t)

(2.22)

P9 (0, t) = λ2 P0 (t)

(2.23)

P10 (0, t) = λ3 P9 (0, t)

(2.24)

P11 (0, t) = λ4 P10 (0, t)

(2.25)

P12 (0, t) = λ1 P8 (0, t)

(2.26)

P13 (0, t) = λc (P0 (t) + P1 (0, t) + P2 (0, t) + P3 (0, t) + P4 (0, t) + P7 (0, t)) + P8 (0, t) + P9 (0, t) + P10 (0, t) (2.27)

2 Probabilistic Assessment of Complex System with Two Subsystems …

19

Initial Conditions P2 (0) = 1 and other state probabilities are zero at t = 0 as P1 (x, 0) = 0 for i = 1 − 13

(2.28)

2.2.1 Solution of the Model By Laplace transformation of Eqs. (2.1)–(2.27) with the help of initial condition (2.28), one can obtain. [s + λ1 + λ2 + λ3 + λ4 + λ5 + λc ] ⎫ ⎧ ∞ ∞ ∞ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 1 + ϕ1 (x)P1 (x, s)dx + ϕ2 (x)P9 (x, s)dx + ϕ3 (x)P3 (x, s)dx ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 0 0 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ∞ ∞ ∞ ⎪ ⎪ ⎬ ⎨ P 0 (s) = + ϕ4 (x)P7 (x, s)dx + μ0 (x)P5 (x, s)dx + μ0 (x)P6 (x, s)dx ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 0 0 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ∞ ∞ ∞ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ + μ0 (x)P11 (x, s)dx + μ0 (x)P12 (x, s)dx + μ0 (x)P13 (x, s)dx ⎪ ⎪ ⎪ ⎭ ⎩ 0

0



0



∂ + λ2 + λ5 + λc + ϕ1 (x) P 1 (x, s) = 0 ∂x   ∂ s+ + λ3 + λc + ϕ2 (x) P 2 (x, s) = 0 ∂x   ∂ s+ + λ4 + λ5 + λc + ϕ3 (x) P 3 (x, s) = 0 ∂x   ∂ s+ + λ1 + λc + ϕ4 (x) P 4 (x, s) = 0 ∂x   ∂ s+ + μ0 (x) P 5 (x, s) = 0 ∂x   ∂ s+ + μ0 (x) P 6 (x, s) = 0 ∂x   ∂ s+ + λ2 + λ5 + λc + ϕ4 (x) P 7 (x, s) = 0 ∂x   ∂ s+ + λ1 + λc + ϕ2 (x) P 8 (x, s) = 0 ∂x s+

(2.29) (2.30) (2.31) (2.32) (2.33) (2.34) (2.35) (2.36) (2.37)

20

P. Kumar et al.



 ∂ + λ3 + λ5 + λc + ϕ2 (x) P 9 (x, s) = 0 ∂x   ∂ s+ + λ4 + ϕ3 (x) P 10 (x, s) = 0 ∂x   ∂ s+ + μ0 (x) P 11 (x, s) = 0 ∂x   ∂ s+ + μ0 (x) P 12 (x, s) = 0 ∂x   ∂ s+ + μ0 (x) P 13 (x, s) = 0 ∂x

s+

(2.38) (2.39) (2.40) (2.41) (2.42)

P 1 (0, s) = λ1 P 0 (s)

(2.43)

P 2 (0, s) = λ1 λ2 P 0 (s)

(2.44)

P 3 (0, s) = λ3 P 0 (s)

(2.45)

P 4 (0, s) = λ4 P 3 (0, s)

(2.46)

P 5 (0, s) = λ3 P 2 (0, s) + λ5 (P 1 (0, s) + P 3 (0, s) + P 4 (0, s) + P 7 (0, s) + P 0 (s)) (2.47)

P 13 (0, s) = λc

P 6 (0, s) = λ1 λ2 P 3 (0, s)

(2.48)

P 7 (0, s) = λ4 P 0 (s)

(2.49)

P 8 (0, s) = λ2 P 7 (0, s)

(2.50)

P 9 (0, s) = λ2 P 0 (s)

(2.51)

P 10 (0, s) = λ3 P 9 (0, s)

(2.52)

P 11 (0, s) = λ4 P 10 (0, s)

(2.53)

P 12 (0, s) = λ4 P 8 (0, s)

(2.54)

P0 (s) + P2 (0, s) + P3 (0, s) + P4 (0, s) +P7 (0, s) + P8 (0, s) + P9 (0, s) + P10 (0, s)

 (2.55)

2 Probabilistic Assessment of Complex System with Two Subsystems …

21

Solving (2.29)–(2.42) with the help of Eqs. (2.43) to (2.55), one may get 1 D(s)

(2.56)

λ1 1 − S ϕ1 (s + λ2 + λ5 + λc ) D(s) (s + λ2 + λ5 + λc )

(2.57)

(λ1 λ2 ) 1 − S ϕ2 (s + λ3 + λc ) . D(s) (s + λ3 + λc )

(2.58)

λ3 1 − S ϕ4 (s + λ4 + λ5 + λc ) D(s) (s + λ4 + λ5 + λc )

(2.59)

λ3 λ4 1 − S ϕ4 (s + λ1 + λc ) D(s) (s + λ1 + λc )

(2.60)

P 0 (s) = P 1 (s) =

P 2 (s) = P 3 (s) =

P 4 (s) = P 5 (s) =

(λ1 λ2 λ3 + λ5 (1 + λ1 + λ3 + λ4 + λ2 )) (1 − S μ0 (s)) D(s) (s) λ3 λ1 λ2 (1 − S μ0 (s)) D(s) s

(2.62)

P 7 (s) =

λ4 (1 − S ϕ2 (s + λ1 + λc )) D(s) (s + λ1 + λc )

(2.63)

P 8 (s) =

λ2 λ4 (1 − S ϕ4 (s + λ1 + λc )) D(s) (s + λ1 + λc )

(2.64)

λ2 (1 − S ϕ4 (s + λ3 + λ5 + λc )) D(s) (s + λ3 + λ5 + λc )

(2.65)

λ1 λ2 (1 − S μ0 (s + λ4 + λc )) D(s) (s + λ4 + λc )

(2.66)

P 11 (s) =

λ2 λ3 λ4 (1 − S μ0 (s)) D(s) s

(2.67)

P 12 (s) =

λ4 λ1 λ2 (1 − S μ0 (s)) D(s) s

(2.68)

P 6 (s) =

P 9 (s) =

P 10 (s) =

P 13 (s) = where

(2.61)

λc (1+ (λ1 + λ4 )(1 + λ2 ) + (λ2 + λ3 )(1 + λ4 )(1 − S μ0 (s)) D(s) s

(2.69)

22

P. Kumar et al.

⎫ ⎧ (s + λ1 + λ2 + λ3 + λ4 + λ5 + λc ) ⎪ ⎪ ⎪ ⎪ ⎪ ⎤⎪ ⎡ ⎪ ⎪ ⎪ ⎪ λ λ ϕ λ ϕ 1 1 1 2 2 ⎪ ⎪ ⎪ ⎪ + ⎪ ⎪ ⎪ ⎪ ⎥ ⎢ + λ + ϕ + λ + ϕ + λ + λ ⎪ ⎪ (s ) (s ) 2 5 1 3 5 2 ⎪ ⎪ ⎥ ⎢ ⎪ ⎪ ⎪ ⎪ ⎥ ⎢ ⎪ ⎪ ϕ λ ϕ λ λ μ λ λ λ 3 3 3 4 4 3 4 1 0 ⎪ ⎪ ⎥ ⎢ ⎪ ⎪ + + + ⎪ ⎪ ⎥ ⎢ ⎪ ⎪ + λ + ϕ + λ + ϕ + λ + λ + μ (s ) (s ) (s ) ⎨ 4 5 3 2 5 4 0 ⎥⎬ ⎢ ⎥ ⎢ λ λ λμ λ μ D(s) = λ4 λ2 λ3 μ0 λ2 λ3 λ4 μ0 4 2 3 0 2 0 ⎥⎪ ⎢ ⎪ + + − ⎢+ ⎥⎪ ⎪ ⎪ ⎥⎪ ⎢ + μ + μ + μ (s ) (s ) (s ) ⎪ ⎪ 0 0 0 ⎪ ⎥⎪ ⎢ ⎪ ⎪ ⎪ ⎪ ⎥ ⎢ (1 + (λ + λ )(1 + λ ) + (λ + λ )(1 + λ ))μ λ ⎪ ⎪ c 1 4 2 2 3 4 0 ⎪ ⎪ ⎥ ⎢ ⎪ ⎪ + ⎪ ⎪ ⎥ ⎢ ⎪ ⎪ + μ (s ) ⎪ ⎪ 0 ⎥ ⎢ ⎪ ⎪ ⎪ ⎪ ⎦ ⎣ ⎪ ⎪ λ λ + λ (1 + λ + λ + λ + λ ))μ (λ ⎪ ⎪ 1 2 3 5 1 3 4 2 0 ⎪ ⎪ ⎭ ⎩ + (s + μ0 ) Let P up (s) P down (s) are the sum of Laplace transform of the state conversion probabilities in which the system is operative and inoperative mode. Then P up (s) =

10 

P i (s) and P down (s) = 1 − P up (s)

i=0

⎧ ⎫ λ1 ϕ1 λ1 λ2 ϕ2 ⎪ ⎪ ⎪ ⎪ 1 + + ⎪ ⎪ ⎪ ⎪ + λ + ϕ + λ + ϕ + λ + λ (s ) (s ) ⎪ ⎪ 2 5 1 3 4 1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ϕ λ ϕ λ λ 3 3 3 4 4 ⎪ ⎪ ⎪ ⎪ + + ⎨ ⎬ 1 (s + λ4 + λ5 + ϕ3 ) (s + λ1 + λ2 + ϕ4 ) P up (s) = λ2 λ3 ϕ3 λ2 ⎪ ⎪ D(s) ⎪ ⎪ ⎪ ⎪ + + ⎪ ⎪ (s + λ1 + λ5 + ϕ2 ) (s + λ1 + λ3 + ϕ2 ) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ λ ϕ λ ϕ λ ⎪ ⎪ 4 4 2 3 3 ⎪ ⎪ ⎩+ ⎭ + (s + λ3 + λ5 + ϕ2 ) (s + λ4 + λ1 + ϕ3 )

(2.70)

2.3 Analytical Study of the Model for the Particular Case 1. Availability Analysis: Setting S μ0 (s) = S exp[x θ +{log ϕ(x)}θ ]1/θ (s) =

exp[x θ + {log ϕ(x)}θ ]1/θ ϕ , S ϕ (s) = θ θ 1/θ s + exp[x + {log ϕ(x)} ] s+ϕ

Taking the different set of values of the failure rates of subsystems as λ1 = 0.15, λ2 = 0.17, λ3 = 0.16, λ4 = 0.13, λ5 = 0.20, and λc = 0.20 and the repair rates μ0 , in (2.70), and φi (x) = 1 for i = 1, … 4 θ = 1, φ = 1 and x = 1 S μ0 (s) = s+μ 0 then inverse Laplace transform of the expression, reduced expression one can have, the availability of the system as:

2 Probabilistic Assessment of Complex System with Two Subsystems …

Pup (t) =

⎧ 0.0224256233e−1.330000t − 0.00350027e−1.34000000t ⎪ ⎪ ⎪ ⎪ ⎪ −3.441975785t ⎪ − 0.0100423068e−1.73286801t ⎪ ⎨ +0.15907838e

23

⎫ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎬

−0.0011861126e−1.54368553t − 0.025719682e−1.398594739t ⎪ ⎪ ⎪ ⎪ ⎪ −1.175669504t ⎪ ⎪ − 0.000744080523e−1.163634338t ⎪ ⎪ ⎪ −0.0006419320e ⎪ ⎪ ⎪ ⎪ ⎭ ⎩ −1.100862622t −0.130909470t −0.099193988e + 1.094756122e (2.71)

For the different values of time variable t, in expression (2.71), one can identify the future behavior of system availability. 2. Reliability Analysis: Assuming all repairs, φi (x), i = 1, 2, 3, 4, μ0 (x) and μ0 (y) in Eq. (2.70) to zero and to taking the failure rates as, λ1 = 0.15, λ2 = 0.17, λ3 = 0.15, λ4 = 0.13, λ5 = 0.20, and λc = 0.20 and then compelling inverse Laplace to transform of resulting expression, one can get the reliability function of the system as represented in Eq. (2.72) given as:

R(t) =

0.333333333e−0.5100000t + 0.2700000e−0.34000000t + 0.22848485e−0.3300000t +0.377777778e−0.54000000t − 0.209595956e−0.990000000t



(2.72) The different values of time t, in (2.71), the reliability expression R(t), and the corresponding variations in reliability can study for the time. One can conclude that the values of reliability are smaller than values of availability for the same values of time t. 3. Mean Time to System Failure (MTSF) Analysis: If P up (s) is the sum of Laplace transform of state transition probabilities in which the system is in operational condition than treating all repairs to zero in the resulting expression, one can compute MTSF corresponding to failure rates. ⎧ λ1 λ 1 λ2 λ3 ⎫ ⎪ ⎪ 1 + + + ⎪ ⎪ ⎪ ⎪ ⎪ λ2 + λ 5 λ3 + λ 4 λ4 + λ 5 ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ λ 3 λ4 λ4 λ 2 λ4 ⎬ 1 + + + MTSF = lim P up (s) = λ1 + λ 2 λ1 + λ 5 λ1 + λ 3 ⎪ λ1 + λ 2 + λ 3 + λ 4 + λ 5 ⎪ s→0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ λ λ λ ⎪ ⎪ 2 2 3 ⎩+ ⎭ + λ3 + λ 5 λ1 + λ 4

(2.73) Taking the failure rates λ1 = 0.15, λ2 = 0.17, λ3 = 0.16, λ4 = 0.13, λ5 = 0.20, and λS = 0.20 and varying λ1 , λ2 , λ3 , λ4 , λ5 , and λc = 0.20 one by one, respectively, and keeping other failure rates fixed in (2.73), one can obtain the variation of mean time to system failure (MTSF) corresponding to the failure rates.

24

P. Kumar et al.

4. Sensitivity Analysis of (MTSF): The sensitivity analysis of meantime to system failure (MTSF) of the system can study through the partial derivatives of (MTSF) concerning the failure rates λi , i = 1, 2, 3, 4, and 5 of the system. By retaining the set of parametric values of the failure rates after partial differentiation of MTTF concerning failure rates and then varying λ1 = 0.15, λ2 = 0.17, λ3 = 0.16, λ4 = 0.16, λ5 = 0.20, and λc = 0.20 in resulting expression, one can compute the sensitivity of MTSF corresponding to variation in the failure rates. The sensitivity of MTSF corresponding to failure rate λi is given as ∂(MTSF) λi at i = 1, 2, 3, 4, 5. ∂λi 5. Cost/Profit Analysis: Let the maintenance facility is always ready to provide, then the expected profit function throughout the operations for the system in the interval 0 ≤ t ≤ ∞ can t be calculated by the formula given as, E p (t) = K 1 0 Pup (t)dt − K 2 t; (K 1 , K 2 ) are revenue generation and service cost in unit time. For the same set of the parameter of failure and repair rates in (2.70), one can obtain the expression.

E p (t) = K 1

⎧ ⎫ −0.01686137e−1.33000000t + 0.0026119608e−1.3400000t ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ −3.44197578t −1.7328680t ⎪ ⎪ −0.046217172e + 0.0579519429e ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ +0.000768364e−1.543685532t + 0.0183896603e−1.398594739t ⎪ ⎬ ⎪ +0.00054601e11.175669504t + 0.000639445e−1.163634338t ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ −1.100862622t −0.130909470t ⎪ ⎪ ⎪ ⎪ +0.0901056918e − 8.362696127e ⎪ ⎪ ⎪ ⎪ ⎩ ⎭ +8.2210

− K2t

(2.74)

Fixing K 1 = 1 and K 2 = 0.6, 0.5, 0.4, 0.3, 0.2, and 0.1 in (2.74), the expected profit E p (t) respect to the in time t can be studied. One can conclude from (2.74) when the service cost K 2 has reduced the expected profit increases. It can be seen that when the repair follows a general distribution, the profit remains low and when the repair follows two types of repair, the net profit increases with variation in the time variable.

2.4 Result Analysis and Conclusions For the enactment of the intended system based on reliability traditional measures for fixed values of failure and repair rates, the expressions of availability (2.71) and reliability (2.72) of the system resulting to the failure rates, λ1 = 0.15, λ2 = 0.17, λ3 = 0.16, λ4 = 0.16, λ5 = 0.20, and λc = 0.20 have developed. From the expressions for availability (2.71) and reliability (2.72), one can examine that the availability decreases with respect to the time with the variation of the time. Availability of the

2 Probabilistic Assessment of Complex System with Two Subsystems …

25

system decreases with time and approaches the steady state and then to value zero after an appropriately extensive interval of time. Henceforth, one can safely forecast the imminent behavior of a complex repairable system at any period for a given set of parametric values. Reliability of the system also decreases when the time passes. One can obviously check that decrement in the values of reliability is greater than available for the given parametric values of the failure rates. The information of mean time to system failure (MTSF) in respect of variation of the values of failure rates can be obtained from the expression (2.73) for MTSF in Eq. (2.73). The change in the values of MTSF is directly associated with system reliability. The computations of MTSF for different values of failure rates, λ1 , λ2 , λ3 , λ4 , and λ5 , and λc in the graphical representation may help to plan to design a reliable system. Finally, by fixing the revenue cost per unit time K 1 = 1 and varying the service costs K 2 for any fixed value less than K 1 , the profit can be computed. One can check from (2.74) that when the service cost reduces the profit increases. In overall, for smaller value of service cost, the predictable profit becomes high in contrast to the high service cost.

References 1. Dhillon BS, Anude OC (1994) Common cause failure analysis of a k-out-of-n: G system with the repairable unit. Microelectron Reliab 34(3):429–442 2. Goel LR, Gupta PP (1999) Analysis of a k-out-of-n unit system with two types of failure and preemptive maintenance. Microelectron Reliab 24(5):877–880 3. Jain SP, Gopal K (1985) Reliability of k-to-1-out-of-n systems. Reliab Eng 12(3):175–179 4. Chung WC (1993) Reliability analysis of a k-out-of-n: G redundant system in the presence of chance with multiple critical errors. Microelectron Reliab 33(3):331–334 5. Nelsen RB (2006) An introduction to copulas, 2nd edn. Springer, New York 6. Kumar D, Singh SB (2016) Stochastic analysis of the complex repairable system with deliberate failure are emphasizing reboot delay. Commun Stat-Simul Comput 45(2):583–602 7. Yusuf I (2015) Availability modeling, and evaluation of a repairable system subject to minor deterioration under imperfect repairs. Int J Math Oper Res 7(1):45–51 8. Yusuf I (2016) Reliability modelling of a parallel system with a supporting device and two types preventive maintenance. Int J Math Oper Res 25(3):269–286 9. Gulati J, Singh VV, Rawal DK, Goel CK (2016) Performance analysis of complex system in the series configuration under different failure and repair discipline using Gumbel–Hougaard family copula. Int J Reliab Qual Saf Eng 23(2):1–21 10. Ram M, Singh SB, Singh VV (2013) Stochastic analysis of a standby complex system with waiting for repair strategy. IEEE Trans Syst Man Cybern Part A Syst Hum 43(3):698–707 11. Kadyan MS, Niwas R, Kumar J (2015) Probabilistic analysis of two reliability models of a single-unit system with preventive maintenance beyond warranty and degradation. Maint Reliab 17(4):535–543 12. Park M (2014) Warranty cost analysis for multicomponent systems with imperfect repair. Int J Reliab Appl 15(1):51–64 13. Ram M, Kumar A (2015) Performability analysis of a system under 1-out-of-2: G scheme with perfect reworking. J Braz Soc Mech Sci Eng 37(3):1029–1038 14. Singh VV, Ram M (2014) Multi-state k-out-of-n-type system analysis. Math Eng Sci Aerosp 5(3):281–292

26

P. Kumar et al.

15. Singh VV, Rawal DK (2011) Availability analysis of a system having two units in a series configuration with the controller and human failure under different repair policies. Int J Sci Eng Res 2(10):1–9 16. Yen TC, Chen WL, Chen JY (2016) Reliability and sensitivity analysis of the controllable repair system with warm standbys and working breakdown. Comput Ind Eng 97(3):84–97 17. El-Damcese MA, El-Sodany NH (2015) Reliability and sensitivity analysis of the k-out-of-n: G warm standby parallel repairable system with replacement at common cause failure using Markov model. Reliab Theory Appl 10(4):39–46 18. Singh VV, Singh SB, Ram M, Goel CK (2010) Availability analysis of system having three units super priority, priority and ordinary under primitive resume repair policy. Int J Reliab Appl 11(1):41–53 19. Singh VV, Ayagi HI (2017) Study of reliability measures of the system consisting of two subsystems in the series configuration using copula. Palestine J Math 6(2):102–111 20. Singh VV, Ayagi HI (2018) Stochastic analysis of a complex system under preemptive resume repair policy using Gumbel–Hougaard family copula. Int J Math Oper Res 12(2):273–291 21. Rawal DK, Ram M, Singh VV (2014) Modeling and availability analysis of internet data center with various maintenance policies. Int J Eng Trans A 27(4):599–608 22. Andrews JD, Moss TR (1993) Reliability, and risk assessment. Longman Scientific and Technical; Wiley, USA 23. Singh VV, Gulati J, Rawal DK (2018) Performance assessment of two unit’s redundant system under different failure and repair policies using copula. Int J Reliab Appl 19(2):114–129

Chapter 3

Recognition and Optimizing Process Lagging Indicators During Software Development Amrita Wadhawan and Shalu Gupta

3.1 Introduction [1] At present, the software industry has become very competitive, and each day, new software or mobile applications are launched in the market. The expectations of the customers/end users are enhanced. The availability of many products in the market has raised the benchmark of product quality. These situations create a lot of pressure on project managers to meet the deadline of projects, maintaining product quality and being within budget. The only way to meet the expectations is to use the quantitative statistical approach and capture the metrics data. For this recognition and optimization of the process, lagging indicators during software development become very crucial. Lagging indicators are “output” metrics identified in the projects. They are easily computed and improved by leading indicators. A leading indicator is a subprocess measurement of any SDLC phase. The difference between the two is that a leading indicator can control change and a lagging indicator can only measure. This paper is categorized as follows: Section 3.2 describes major challenges faced during software development. Section 3.3 specifies the equilibrium between project constraints. Section 3.4 specifies recognition of relevant process leading and lagging indicators. Section 3.5 specifies monitoring of process lagging and leading indicators. Section 3.6 specifies optimization of process lagging and leading indicators. Section 3.7 describes conclusion and major benefits achieved in software development.

A. Wadhawan (B) · S. Gupta Quality Assurance, Centre for Development of Advanced Computing, Noida, Uttar Pradesh, India e-mail: [email protected] S. Gupta e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 P. K. Kapur et al. (eds.), Strategic System Assurance and Business Analytics, Asset Analytics, https://doi.org/10.1007/978-981-15-3647-2_3

27

28

A. Wadhawan and S. Gupta

3.2 Major Challenges Faced in Software Development [2] Customer satisfaction is mainly with quality product delivered within scheduled time and budget. In order to achieve this, there are some major challenges faced mentioned below: 1. The project managers are provided with stringent resources within which they have to complete the project efficiently and effectively. This results in resistance from project managers about maintaining of project data for statistical analysis. 2. Extraction of project data periodically related to SDLC documents like review reports, defect reports, change requests and project plans from various stakeholders is a cumbersome job. 3. Identification of project-specific leading and lagging indicators for quantitative statistical management is also a major challenge for the project managers. 4. Consistently monitoring the projects using statistical technique is also a challenge as there is a risk of its loosing its priority along with other project activities and product delivery.

3.3 Equilibrium Between Project Constraints [2, 3] In every project, there are some major constraints like time, cost and quality as shown in Fig. 3.1. This triangle is an important tool for prioritizing and decision making. Most of the times due to customer pressure and other factors, the project manager focuses on single constraint and trade-off other constraints. Ignoring the other constraints affects the project delivery accordingly. Project manager has to ensure equilibrium between three constraints. Monitoring and controlling all the three constraints may not be practically possible. But at least monitoring of two project constraints leads to successful project delivery. Fig. 3.1 Project constraints

3 Recognition and Optimizing Process Lagging Indicators …

29

3.3.1 Selection of Triple Constraints [4] The selection of triple constraints is also a challenging task for the project manager. The project manager has to clearly understand the requirement and priority of the customer. Based on that selection of two critical constraints is to be finalized.

3.3.1.1

Time

Time leads to timely delivery of the product to the customer. A faster delivery may be achieved by adding more resources in the project. But that will increase the costs. Similarly, we can maintain the timeliness at the given cost, but then quality may be compromised. The project manager may choose different lagging indicators related to time such as schedule variance, cycle time.

3.3.1.2

Cost

Cost and resources are the foundation of all projects. Many times, projects are fixedcost projects. The project manager has to ensure the scope creep does not take place beyond limit or control the unnecessary expenses. Different lagging indicators related to cost may be decided such as cost variance, effort variance, requirement stability index.

3.3.1.3

Quality

Customer satisfaction mainly may be achieved through the quality of the product. Variation control is the heart of quality control. The quality of the product may be measured in terms of defect leakage, defect density, review effectiveness, accuracy, etc.

3.4 Recognition of Relevant Process Lagging and Leading Indicators [3, 5] In the organization, first and foremost quality policy is devised by the top management. Based on quality policy, business goals are identified. These business goals are translated into organization quantitative process performance objectives (Org. QPPO). Further, lagging indicators are identified based on Org. QPPO (Fig. 3.2). The project manager has a major challenge to recognize and identify the project lagging indicators. This identification is based on various factors such as • Customer requirement

30

A. Wadhawan and S. Gupta Quality Policy

Business Goal

Organizational Quantitative Process Performance Objective

Lagging Indicator

Sub Process

Sub Process Sub Process

Leading Indicator

Leading Indicator Leading Indicator

Fig. 3.2 Flowchart of quality process

• • • •

Project agreement Type of project Phase of project Organization business goal.

3.4.1 Customer Requirement The expectation of the customer from final product needs to be understood. For example, if the product is mission critical type, the product quality is of utmost important. In this scenario, project manager should identify quality-related lagging indicators.

3.4.2 Project Agreement Project agreement interpretation should be clear to project manager. Sometimes, penalty clauses for delay in project are defined in the agreement. This indicates that time-related lagging indicators should be identified.

3 Recognition and Optimizing Process Lagging Indicators …

31

3.4.3 Type of Project Type of project also affects the identification of relevant lagging indicators. Domain of the projects may be of different types such as embedded, health informatics, eLearning.

3.4.4 Phase of Project The project lagging indicators are based on the phase of projects such as development or maintenance. In maintenance projects, delivery time is shorter so accordingly appropriate lagging indicators are identified.

3.4.5 Organization Business Goal Project lagging indicators are selected in line with the organizational business goal. If the organization business goal is to reduce the effort variance and improve the quality of the products, then project manager should monitor and control his project at least on the same parameters.

3.4.6 Recognition of Leading Indicators After identification of lagging indicators, relevant subprocesses are identified. For example, if defect leakage is identified as lagging indicator, the subprocesses that may control the defect leakage are • • • • • • • •

Requirement process Requirement review process Design process Design review process Coding process Coding review process Unit testing process Unit testing review process.

Using these subprocesses, the applicable leading indicators are identified to control the lagging indicator (Table 3.1).

32

A. Wadhawan and S. Gupta

Table 3.1 Translation of business goal Business goal

Org. QPPO

Lagging indicator (Y factor)

Subprocess

Leading indicators (X factors)

Improve the quality of deliverables

To achieve 10% reduction in defect leakage and SD ≤ 0.1

Defect leakage

Requirement process Requirement review process Design process Design review Coding process Code review

Requirement effort Req. review effort Design effort Design review effort Coding effort Code review effort

3.5 Monitoring of Process Lagging and Leading Indicators [3, 6] The project manager starts the monitoring of identified project lagging and leading indicators. For this project, data is collected periodically from different sources. Collected data is analyzed and monitored using control charts. Control charts are the quality tool which shows the mean and dispersion of data. It is a reflector of what is happening in the project. The pictorial representation of the current scenario of the project is useful to the project manager in understanding and analyzing the situation specifically when the data is large. It gives him insight early on in the project so that he can set realistic project goals, i.e., lagging indicators and roadmap to achieve those goals. The said roadmap is the set of relevant leading indicators or control knobs which can be fine tuned to the optimum limit depending on the project constraints. These charts also indicate the presence of outlier if any in the data. The outliers are analyzed using root cause analysis. This root cause analysis will help in improving the process in terms of mean and dispersion of data. This further improves the lagging indicators. Figure 3.3 shows the control chart with outlier. Figure 3.4 shows the control chart with no outlier after taking the corrective actions. Due to paucity of time and other activities, project managers majorly focus on a single lagging indicator. But focussing on single parameter sometimes will not produce an effective outcome as desired by the client. This also unbalances the equilibrium between project constraints. For example, if the quality of the project is strictly adhered to, then this may make it a costly end product and the project deadline may also be exceeded. Similarly, if the time aspect is solely controlled, then such a product may not be of good quality. Both these situations will not satisfy the customer. Hence to maintain the balance between the project constraints such as time, cost and quality and to produce effective outcome of the project, monitoring of more than

3 Recognition and Optimizing Process Lagging Indicators …

33

Fig. 3.3 Control chart with outlier

Fig. 3.4 Control chart with no outlier

one lagging indicator is required. Relevant data for the selected lagging indicators needs to be maintained simultaneously. Project manager sets the quantitative goal of the identified lagging indicators in his project based on organizational business goal. For monitoring the goal, project data is collected and analyzed. This gives the project manager the present values of the lagging and leading indicators. After that, regression equations are used to find the probability of achieving the project goal. This is done by improving the leading indicators. This process is done iteratively and improvement in lagging indicator is monitored throughout the life cycle of the project.

3.6 Optimization of Process Lagging and Leading Indicators [6] Continuous monitoring of lagging and leading indicators leads to the optimization of these factors. To optimize these factors, the project manager takes some corrective actions such as

34

A. Wadhawan and S. Gupta

• Planning of reviews in all SDLC phases such as requirement, design, coding and testing • Inclusion of domain experts in technical reviews • Use of checklist during reviews • Detailed effort estimation and micro-level schedule preparation • Following up the defined processes in the organization • Providing trainings to the resources at right time • Use of automated tools for project management and software development activities • Detailed impact analysis for change requests • Adequate manpower allocation. This action improves the leading indicators which further improves the lagging indicators. Figure 3.5 shows the improvement in lagging and leading indicators through monitoring of two project constraints and application of the abovementioned corrective actions.

Fig. 3.5 Improvement in lagging and leading indicators

3 Recognition and Optimizing Process Lagging Indicators …

35

3.7 Conclusion Project leading and lagging indicators facilitate true prediction about the project outcome. The role of project manager is crucial in the context of identification of relevant lagging and leading indicators based on the different factors. The main focus of the paper is to identify adequate lagging and leading factors, monitor these factors periodically and optimize these factors by taking corrective actions. These lagging and leading indicators help to predict the meeting of defined quantitative project objective. If the desired objectives meeting prediction is low, then some action may be taken to control the project by improving the process or using some tools. Optimizing the project through leading and lagging indicators also indicates the maturity of the organization. Secondly, paper also explains the significance of project constraints. The balance between these constraints is essential to achieve predictable outcome of the project. It is also essential in the project to monitor all factors statistically for a project. This will lead to successful project execution and delivery. Acknowledgements This work was carried out under the CMMI processes implementation. The authors wish to thank Smt. Priti Razdan Associate Director C-DAC Noida for supporting this work and giving valuable feedbacks.

References 1. Software Engineering Institute (2010) CMMI® for development, version 1.3. Technical report 2. Margarido IL, Faria JP, Vieira M, Vidal RM (2013) Challenges in implementing CMMI high maturity: lessons learnt and recommendations 3. Venkatesh J, Cherurveettil P, Thenmozhi S, Balasubramanie P (2012) Analyse usage of process performance models to predict customer satisfaction. Int J Comput Appl 47(12) 4. Project Management Institute (2013) A guide to the project management body of knowledge, 5th edn. Project Management Institute Inc. 5. Mishra UK, Sudhan KH, Gupta S (2009) Establishing process performance baselines and models for statistical control of software projects 6. Eckerson WW (2009) Performance management strategies. How to create and deploy effective metrics. TDWI best practices report 2009 7. Fabijan A, Dmitriev P, Olsson HH, Bosch J (2017) The benefits of controlled experimentation at scale. In: SEAA’17, Vienna, Austria, 30 Aug–1 Sept 2017 8. Bellamy LJ, Sol VM (2012) A literature review on safety performance indicators supporting the control of major hazards. National Institute for Public Health and Environment

36

A. Wadhawan and S. Gupta Amrita Wadhawan is working as Senior Technical Officer in C-DAC Noida. She is a Post Graduate in Physics and M. Tech. (IT). She has 23 years of experience in Academics and Quality Assurance. Currently, she is associated with the Quality Assurance Group. Her areas of interest include Software Quality Management, Software Engineering, Organizational Behavior.

Shalu Gupta is working as Principal Technical Officer in CDAC Noida. She is certified Project Management Professional. She has done Masters in Computer Science and sixteen years of experience in software development. She has worked in the field of NMS, SNMP, Optical comm., DSLAM, OCR and Quality Assurance. She has worked in various companies like C-DoT, Wipro Technology and Flextronics Software Systems. Currently, she is associated with the Quality Assurance Group. She has published ten international and national research papers. Her area of interest includes Software Quality Assurance, Software Metrics, Quality Management and Testing.

Chapter 4

Risk and Safety Analysis of Warships Operational Lifetime Defects Uday Kumar, Ajit Kumar Verma, and Piyush Pratim Das

4.1 Introduction Shipbuilding Industry is as old as mankind itself. However, warship building differs significantly from commercial shipbuilding in very critical aspects [1]. The best practices of commercial shipbuilding processes are of limited use to warship building due to the evolutionary nature of warship design often leaves gaps in freezing of design issues prior to production. Two main issues which drive this gap is the ambition of the owner (often the Ministry of Defense/Navy) to incorporate the latest in technology and the lack of profit motive/commercial usage of the machineries. The owner in warship building emphasizes the technological advantage and availability as opposed to stable technical procedures and commercial exploitation paradigms. Additionally, the quality and quantity of the equipment fit are highly diverse in a warship vis-à-vis a commercial liner as is the operating principle and the end use of the ship. All these makes building a warship a very different and interesting project which requires a different set of standards and models which incorporate the differing risk acceptance and end motives of warships design and operation. Indian warship design and operation is a relatively new area of Industry which started with the Leanders built under license but has matured to indigenous design and operation. Since Indian warships operate in mostly tropical waters, the behavior of the ships and equipment is unique to local conditions. Maintenance and downtime

U. Kumar · P. P. Das (B) Luleå University of Technology, Luleå, Sweden e-mail: [email protected] U. Kumar e-mail: [email protected] A. K. Verma Western Norway University of Applied Sciences, Haugesund, Norway e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 P. K. Kapur et al. (eds.), Strategic System Assurance and Business Analytics, Asset Analytics, https://doi.org/10.1007/978-981-15-3647-2_4

37

38

U. Kumar et al.

paradigms are under evolution at every level of design and maintenance to cope with localized conditions. While every aspect of design process is important, a validation of the design visà-vis, the operational availability of the systems is a crucial feedback loop that is required to isolate weak areas of design based on empirical data. This loop has been a weak area due to: 1. Lack of sufficient usage data to form a coherent hypothesis (due to the sheer variety and roles of equipment being used on a warship). 2. Change in design and mission requirements of new designs which vary significantly to the already in use ships. 3. Incorporation of advanced state-of-art technologies which do not have sufficient maturity due to the very nascent inventions. 4. Lack of a viable model to practically use the findings from empirical data into the design considerations. In this paper, we intend to study the failure and availability field data of a class of warships which have a significant level of maturity in design and operation in Indian Navy, i.e., the Destroyer Class. The aim is to derive a methodology of prioritizing those equipment and systems which despite available design and maintenance optimizations have a high risk of failure and thus are areas that need further refinement in design stage. Although individual equipment and systems are well studied and the reliability and risk profile are well known, the paper attempts to map out the performance on the field and use it as a metric to isolate out those high priority systems which require priority attention.

4.2 Warship Design Cycle 4.2.1 Warship Design Production and Operation Lifecycle The most commonly followed standard for lifecycle System is ISO/IEC/IEEE 15288:2015, which is a generic LCA tool with Concept, Development, Production, Utilization/Support, and Retirement as different stages of lifecycle. As applied to Indian warship, building and operational lifecycle follow the following template with minor variations:

Precontract

Concept refinement. The ship’s strategic requirement, tactical capabilities, and staff requirements are conceptualized

Design

Contract award to shipyards Basic design Functional design Production design (continued)

4 Risk and Safety Analysis of Warships Operational Lifetime Defects

39

(continued) Construction

Steel cutting Keel laying Block/modular construction Installation of propulsion and power systems Installation of sensors and armaments Sea trials Delivery/commissioning

Operation

Operational deployments Refits/self-maintenance period/assisted maintenance period Midlife upgrades

Decommissioning

Ship formally at long notice followed by decommissioning and handing over for scrapping/other purpose

The precontract and design phase are the most critical phases as the outline and requirements are finalized with ample bandwidth for the inclusion of specific equipment at a later stage to include the latest and the best. The functional and production design phase is the one where the equipment/system is deliberated upon and finalized. The functional and production design phase is where the groundwork is converted into concrete systems and the ordering takes place by the shipyards for the equipment/systems. This is the area of interest of our paper.

4.2.2 Warship Design Production and Operation Lifecycle—Risks and Mitigation Regime The uncertainties faced during the functional design phase: 1.

Comparative risk and reliability data unavailability on systems/equipment vis-à-vis maritime usage 2. Benchmarking of cost and effectiveness matrix 3. Inclusion of nascent technologies 4. Future-proofing requirements 5. COTs standards compatibility with warship requirements 6. Lifecycle maintenance and upgrade costs 7. Long-term utility and flexibility of systems 8. Modularity and ease of replacement 9. Vendor reliability and quality 10. Availability of in-house support and ToT for high end systems over the lifecycle of the systems 11. Obsolescence management during and after the production phase. These uncertainties are a part of the overall risk of shipbuilding and are translated to risks during operational phase of the ship where these systems are exploited.

40

U. Kumar et al.

Various risk mitigation strategies are in place to deal with the above-mentioned uncertainties including: 1. Performance and feedback on similar systems fitted on Indian and foreign warships 2. User evaluation trials 3. Feedback from trials and monitoring agencies 4. Feedback on previous versions of technology from OEM/vendors 5. Parallel commercial usage feedback on emerging technologies 6. Performance of equipment and systems in parallel navies. Although qualitative and heuristic feedbacks are easier to obtain, they are subject to individual opinions and agendas and quantitative data-based feedback is as good as the data available on the field and the quality of data. Rarely does the vendor/OEM emphasize the negative characteristics and risks of the propriety systems. With close to a thousand major and minor systems being envisaged on board any modern warship (apart from Hull and structural superstructure), the designers are faced with a formidable task of classifying the critical systems from the noncritical systems leading to an inefficient use of resources in selection of systems with adequate risk profiles. This is a lacuna that can be substantially met by a data-driven feedback approach from using data with parallel ships platforms to isolate out those systems with high risk, which require attention and are a weak area of design vis-à-vis operational availability.

4.2.3 Warship–Systems Approach A typical warship is divided into three types of systems: 1. Hull 2. Propulsion and engineering systems 3. Electrical (including sensors and navigation/communication). All three systems are important to meet the ‘Float, Move and Flight or Fight’ requirements of the ship. And all three systems are challenging for warship design. Especially, since a warship is designed to operate over a 20–25 years lifecycle, the core systems must be designed to be future proof and resilient. However, in terms of technical maturity, the Hull systems tend to electrical systems. Electrical systems are at the greatest threat of obsolescence and are likely to have paradigm shifts even during the design stage. Warships are not designed in isolation [2]. Apart from a few pilot technology demonstrators, most warships are an upgrade (also called a ‘follow on’) of an existing design with minor Hull variations and upgradation of propulsion and sensor package [3]. This provides us with a convenient benchmark for carrying out a comparative study toward mitigating design risks for a new warship of the same ‘Class.’ An

4 Risk and Safety Analysis of Warships Operational Lifetime Defects

41

analysis of operational data collected over a substantial lifecycle of operation of an existing platform is a rich source of knowledge of on-ground performance of the systems considering all limitations and conditions that the systems are exploited under. Although the systems envisaged on a new ship may differ in form and function, the overall operational envelope is an iterative extrapolation of the existing systems.

4.3 Case Study 4.3.1 Methodology of Study For carrying out the study of relating the operational lifetime defects on warship equipment design, following rationale were used: 1. Selection of the platform. The study is focused on an already mature design of warships that is in use and is likely to be used as a basis for future extrapolations. A series of available classes of ships were scrutinized and the following classes were critically analyzed: (a) (b) (c) (d)

Destroyer class Frigates class Corvettes Auxiliaries and tankers.

The study revealed that out of all these only the destroyers and frigates are the ones whose design and usage are based on an extrapolated design. For example, the P 15 class led to the P 15 A class and further is leading to the P 15 B class. The study is particularly suited to such a class which is based on incremental design improvements and equipment upgrades. Additionally, such ships have matured, and data is available on usage and risk/failure profiles of systems over a large period of operational usage. 2. Available operational data collection and validation of data. Availability of field data is a big and decisive factor is building up of any lifecycle study and the same was collected from the available maintenance database over a 15-year period with almost 4500 data points. 3. Data analysis of the collected material. The inventory and fit definition of a typical warship consists of approximately 900 systems and equipment. Considering a twenty-year period of operational life, the number of defects and repairs affected are close to approximately 15,000 defects. Classification of these systems is broadly into three different categories: (a) Hull (structural) (b) Engineering (propulsion and air conditioning) (c) Electrical (power and sensors).

42

U. Kumar et al.

The relative functional risk categorization of the systems is divided into a hierarchical structure of: 1. Float (High) 2. Move (Medium) 3. Fight or flight (low). These risks are broad but interlinked. For example, failure of an AC system can lead to failure of sensors which can lead to failure of a communication system. The different failure modes and their interconnecting resultant failures are highly diversified and can lead to a huge number of situations which are too numerous to be of any practical value. The idea was to isolate and rank systems, which are high risk (Risk = Frequency × Cost of Consequence). While risk ranking of components has been widely studied [4], systems risk ranking in a warship is unique due to the following reasons: 1. Regimes of operation vary widely 2. Environment of operation is highly corrosive as compared to commercial machinery 3. There is a tendency to overlook certain scheduled maintenance routines due to operational requirements 4. Quite a substantial technology is very new and is yet to stabilize to incorporate CBM/effective maintenance procedures due to dearth of historical data 5. Long periods of idling followed by intense activity and operations 6. Unavailability of high skill experts and exact spares at the desired location of operation. The above-mentioned factors subject the systems of a warship to unwanted and unforeseen stresses that make the systems behave differently than in the commercial domain. To empirically find out which is the general behavioral patterns of failure, it is critical that we employ simplest methods and gradually work toward a more complex approach, which closely matches that of the real-world problem scenario. Out of all available risk ranking methods, Machinery Risk Analysis (MRA) developed by Inspection Capabilities for Enhanced Ship Safety (INCASS) FP7 EU funded research project aims to tackle the issue of ship inspection, identification of high-risk ships and elaborated by [5, 6] was found to be most correlated with the problem of risk ranking we face on warships system classification. The following methodology of MRA was used to rank the systems: 1. 2. 3. 4.

Consideration of ship’s main systems/sub-systems/components Consideration of failure types Input consideration of failure rates per component and failure cause Output representation to the user of PoF (%) per component/sub-system/main system.

4 Risk and Safety Analysis of Warships Operational Lifetime Defects

43

Followed by: 1. Re-validation with experts 2. Mapping out the vulnerabilities and lacunae existing 3. Conclusions and feedback from the data to be used for risk ranking of systems during design phase.

4.3.2 Risk Calculation Risk classically is defined as [7, 8] by answering the questions • What can go wrong? • How likely is it? • What are the consequences? Out of approximately 62 risk methodologies available, the most widely used is the probabilistic risk assessment (PRA) which defines risk essentially as: Risk = failure probability × Cost of damage (consequences) related to the failure Probability of failure In this study, failure is defined as unavailability of the equipment functionally. This is especially true for a warship, as the mission times of most critical equipment are all required to be functioning at their optimal efficiency. Most, if not all, equipment, have inbuilt redundancies to cater to any internal failures. Downtime of the equipment is a fair measure of the failure and the same has been used to derive the probability of failure from available data. Cost of damage (consequences) related to the failure The stated mission of a warship is to float, move and fight. The cost of the failure affects these three mission objectives. While the ranking and weightage of relative importance of these three are not rigorously defined, a simple measure of 1, 2, 3 has been used to rank the ‘consequences’ of failure of a particular system/equipment.

4.3.3 The Analysis The study was carried out with a data base of 4500 defects of 600 separate systems/equipment which are critical to the float, move, fight capability of the ship. The ‘cost’ of the defect was based on two factors: 1. Downtime-based probability of failure (PoF) 2. Cost ranking (Float = 3, Move = 2, Fight = 1). Thus, overall rank of an equipment/system was calculated as: Overall Risk Ranking = Downtime (PoF) × Cost Ranking Ranking was carried out to establish a pareto chart to map out the equipment which contributes to the 80% of the Cost of the Ships operation. The financial ‘cost’ was not

44

U. Kumar et al.

considered significant as the actual cost that a warship is concerned is ‘availability at any cost.’ Commercial considerations unlike civilian operational shipping are not a significant factor while a warship is in action.

4.4 The Conclusion Out of 600 equipment and systems, 120 equipments contributed to approximately 80% of the ‘Cost.’ The study was aimed at finding out those critical equipment/systems which require attention at design/operation stage to allocate resources. This study localized the fireman system, sanitary system, and switchboard distribution systems as the most critical systems which contribute to the main cost of a warship in operation.

4.5 Way Ahead The study only outlines the highly critical systems for further study and its effect during the design process to limit the ‘cost’ over lifecycle of the warship. The following way ahead is planned: 1. Re-validation with experts 2. Mapping out the vulnerabilities and lacunae existing 3. Conclusions and feedback from the data to be used for risk ranking of systems during design phase.

References 1. Shanks AJ (2008) Technology management in warship acquisition. In: INEC, Apr 2008, pp 1–11 2. Raper RG (1970) Designing warships for a cost-effective life, vol 185. Parsons Memorial Lecture 3. Xie L, Wei RX, Jiang TJ, Zhang P (2009) Generalized PLS regression forecast modeling of warship equipment maintenance cost. In: 16th annual conference proceedings, 2009 international conference on management science and engineering, ICMSE 2009, pp 607–612 4. Tixier J, Dusserre G, Salvi O, Gaston D (2002) Review of sixty-two risk analysis methodologies of industrial plants. J Loss Prev Process Ind 15(4):291–303 5. Dikis K, Lazakis I, Taheri A, Theotokatos G (2015) Risk and reliability analysis tool development for ship machinery maintenance. In: 5th international symposium on ship operations, management and economics, Athens, Greece, 28–29 May 2015, pp 1–10 6. Baliwangi L, Arima H, Ishida K, Artana KB (2006) Optimizing ship machinery maintenance scheduling through risk analysis and life cycle cost analysis. In: 25th international conference on offshore mechanics and arctic engineering, OMAE 2006 7. Tronskar JP, Franklin RB, Zhang L (2004) Application of risk and reliability methods for developing equipment maintenance strategies. In: 5th annual plant reliability & maintenance conference, Nov 2004 8. Verma AK, Srividya A, Karanki DR (2010) Basic reliability mathematics. Reliab Saf Eng 15–70

Chapter 5

Facebook Data Breach: A Systematic Review of Its Consequences on Consumers’ Behaviour Towards Advertising Emmanuel Elioth Lulandala

5.1 Introduction The advancement in digitalisation and computerisation of information has enabled marketers to collect massive consumers’ personal data [1–3]. As a result, marketers hold unprecedented large amount of information than any other time in history, and consumers are increasingly losing control on their information, which heightens the risk of compromise in consumers’ privacy. Researchers have revealed that many companies compromise privacy by using, selling or sharing consumers’ data with other third-part companies [4, 5, 6]. For instance, in 2018, data of 562,455 Indian Facebook users was breached [7]. Such collection, processing, use and even sharing of users’ information with third parties without prior consent are clear breach of privacy [8, 9, 10]. Privacy breach is manifested in many ways including cyber crimes, identity theft and criminal targeting of users [11]. Consumers’ privacy concerns are exacerbated by data breach scandals from e-commerce companies to social networking sites (SNSs) like Facebook [7, 5]. Consequently, protection of consumers’ privacy has become a global concern of governments and businesses. In addressing it, European Union (EU) parliament passed EU General Data Protection Regulations (EU GDPR) to protect EU citizens’ personal data. EU GDPR requires consent of the consumers for data processing, anonymisation of data collection, notification of data breach, hiring GDPR compliance officer and ensures safety of cross-border transfer of data [2]. Likewise, the government of India not only has ruled privacy as a fundamental right but also proposed data protection framework which is based on best practices from EU, UK, Canada and USA [12]. Both EU GDPR and India draft of Personal Data Protection Bill 2018 propose huge fines and jail terms for privacy violations. Also following the most recent Facebook data breach, the USA contemplates stricter regulations to protect personal data in SNS [13]. On the other hand, marketers have E. E. Lulandala (B) Department of Commerce, Delhi School of Economics, University of Delhi, New Delhi, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 P. K. Kapur et al. (eds.), Strategic System Assurance and Business Analytics, Asset Analytics, https://doi.org/10.1007/978-981-15-3647-2_5

45

46

E. E. Lulandala

been designing privacy tools and improving transparency of privacy policies, in order to give consumers control over their information [4, 14]. It is important to realise that collection of personal information enables marketers to personalise products and services and improve consumers purchase experience through targeting and retargeting of ads [15, 16]. However, some companies share such consumer information with third parties which make it more vulnerable to data breaches [17, 10]. Data breach vulnerability is the potential for misuse of information by firm’s rivals or other third parties [16]. In this case, information is accessed and used by third parties without users’ consent, awareness or control of flow of information. In addition, due to increased globalisation and digitisation of information, data breach incidences have been increasing. A cyber security breaches survey revealed that about 50% of UK businesses [18] and 85% of midsized companies in the USA experienced breach or attack [17]. The most recent data breach incidence which has drawn author’s interest is with the SNS giant, Facebook. Facebook is strategically important in marketing communication due to the fact that it is the market leader in social media advertising with 1.4 billion daily active users and 2.13 billion monthly users; it also achieved significant growth in its revenue to $40.65 billion as of end of December 2017 and accounting for 60% of SNS advertising revenue [19, 20]. In fact, Facebook is an important player in SNS advertising, and hence, the study of Facebook data breach is warranted on both theoretical and empirical basis.

5.1.1 Applications as a Tool for Data Breach The use of applications commonly known as “apps” is on the rise globally and now easily accessed through SNS [21]. Facebook has more than 550,000 applications, which are used by 70% of users to play games, chat and share interest [22, 23]. Although applications are allowed in order to enhance users’ experience, they have become a tool for data breach. Studies show that potential for data breach is high when firm allows the use of applications developed by other companies on its Website or platform [24, 21]. Similarly, Wall Street Journal reported in 2010 that most applications on Facebook transmit identifiable information (ID) to others. Facebook ID is a unique number assigned to every user, and it enables app developers to get access to users’ names, profile details and friend list regardless of the privacy settings [23]. Applications have been transmitting ID to advertising agents and data tracking firms, which link their Internet data bases with Facebook extracted data in order to track customers’ activities online. In addition, [22] found that Facebook ID is also transmitted when consumers click on ads. This happens despite Facebook policy’s prohibition of transmission of users’ information [25]. Inadequate controls and lack of closer monitoring of information accessed and shared by applications led to the largest data breach ever in SNS, the Facebook–Cambridge analytical scandal.

5 Facebook Data Breach: A Systematic Review …

47

5.1.2 Facebook–Cambridge Analytical Privacy Breach Cambridge Analytica data mining project started in 2014 after it entered into a commercial data sharing agreement with Global Science Research (GSR) owned by Cambridge University researcher, Aleksandr Kogan [24]. Facebook data was mined through a personality test application known as “this is your digital life” developed by Aleksandr Kogan for research. Facebook users were paid up to $5 to take the test and gave consent to share information for academic purpose. Even though, due to Facebook technological design, personal data was not only collected from survey participants but also from all profiles in their list of friends. Consequently, the application collected personal information from 87 million Facebook profiles around the globe [26, 24]. Out of 87 million profiles, 70 million was from the USA, more than 1 million each from Philippines, UK, and Indonesia, 310,000 Australia and 562,455 Facebook users in India [7, 22]. Moreover, leaked personal data included names, gender, date of births, age, posts, likes, statuses, location, photos, relationship status and friend lists. The collected data was later matched with personality test results in order to model the users [26, 24]. Again, the leakage did not stop even after uninstalling the application because according to cyber security experts, leakage could only stop by deleting the cookies in the device used to access Facebook [27]. The collected information was used to develop a software model that could predict and influence the US Facebook users during the 2016 general elections. Facebook users were targeted with ads that related to issues that are important to them, aiming to influence their political views.

5.1.3 Consequences of Privacy Failure Privacy failure has consequences on users’ trust, thus affecting the effectiveness of marketing communication. As described by Greg Walden, (Chairman of the US Congress Committee on Energy and Commerce), during congressional hearing of Facebook co-founder Mark Zuckerberg that “Users trust Facebook with great deal of information about their lives based on the belief they can easily navigate and control private settings and trust that their personal information is in good hands. If the company fails to keep its promises on how personal data will be in use, that’s’ breach of trust, must have consequences” [28]. Safety of personal data is critical in maintaining trust and developing positive response from users. In order to protect trust, firms have been hiding privacy breach information, for instance, Facebook knew about data breach since 2015 but failed to notify its users until it was whistle blowed in 2018 [13]. As a result, after the information came public, it lost $130 billion in market value and lost some key advertising clients. In order to grasp an understanding of the effects of privacy breach specifically on consumer behaviour towards advertising on Facebook, the next section highlights research on privacy from the previous studies.

48

E. E. Lulandala

5.1.4 Research Problem and Objectives Information systems and marketing researchers have investigated the consequences of privacy concerns on purchasing intention and buying behaviour [29, 30, 31, 32, 33], while few researchers have specifically investigated how data breach incidences influence the market value and reputation of the firm [17, 16]. Researches based on event study approach have shown that data breach leads to significant depreciation of firm’s market value in both long-term and short term, [34, 17, 16]. They have also argued that due to negative publicity and potential backlash from customers, marketers should consider breaches as service failures rather than breakdown in information system [17]. Facebook failure to notify its users about the breach is indication that they treat data breach as failure in information system rather than a service failure. In a study grounded on gossip theory, [16] associated information vulnerability resulting from privacy failure with negative performance effects evokes feelings of violations and betrayal to consumers. Another study showed that data breach is contextual depending on the industry and that the risk for breach is high in information technology investments [21]; Facebook is in IT industry and therefore is associated with high risk. Moreover, privacy scholarship has dedicated efforts in exploring other areas linked to personalised advertising [33], consumer protection [35], purchasing behaviour [31, 36], legal and ethical issues [37] and SNS [38]. There is conspicuous inadequacy of studies on effects of data breach on marketing communication. Thus, research of data breach in marketing context is inevitable. Despite what is known about data breach and its consequences, it is not yet clear how it influences response of consumers to marketing communication in SNS [39, 16, 40]. The extant privacy literature has put much focus on effects of data breach on market value and reputation of firms [34, 17]. Despite the call of some scholars like [17] to study the impact of privacy failure on marketing communication, there is conspicuous inadequacy of the literature addressing the effects of data breach on consumers’ response to advertisements in SNS. In this regard, the intriguing question that remained unanswered is how perceived data breach affects consumer behaviour towards ads in Facebook. More specifically, answers are sought for these questions; • What is the relationship between perceived data breach with ad acceptance? • How does perceived data breach influence ad engagement? and • How is perceived data breach associated with ad avoidance? In light of these intriguing questions, the objectives of this paper are twofold; first, to analyse privacy with respect to Facebook advertising and secondly, to investigate the consequences of data breach on consumers’ behaviour towards Facebook ads. The contribution to knowledge of this paper is the proposed model for impact of perceived data privacy breach on consumer ad behaviour in Facebook. The remaining part of this paper is organised as follows: Section two presents the methodology, and section three focuses on the literature review and theoretical development in which constructs and concepts are discussed. In section four, the research model and hypothesis for the key questions of the study are formulated, followed by section

5 Facebook Data Breach: A Systematic Review …

49

five, discussion and implications. Conclusion, limitations and scope for future study are discussed in Sect. 5.6. Acknowledgement is given in the last section and followed by reference list of this paper.

5.2 Methodology In order to capture what the existing literature informs us about the focal questions of this study, the author reviewed about 84 papers for two months, July and August 2018. Broadly, sources of the literature came from a range of sources including Journal of Marketing, Advertising, Information Systems, Information Technology and Management, Service Research, Computers in Human Behaviour, Applied Social Psychology and online news papers database. Research articles were obtained through online search in Google Scholar, Proquest and Research gate. The online search was conducted in the first week and also concurrently when reading papers by referring the reference list. The search started by breaking down the focal questions of the study into specific search words. The key search words used included privacy, privacy failure, privacy concerns, SNS advertising, Facebook, data breach and consumer behaviour. The initial search was broad and resulted to more than 5000 search results in Google Scholar; however, majority of papers were from other disciplines, i.e. information, finance and law. The search was further narrowed by targeting papers related to marketing. Marketing search results were mainly from e-commerce. An attempt was made to specifically search papers related to privacy in SNS; very few were obtained. The few obtained were focused on impact of privacy concerns on information disclosure. To get wider insights, the search was broadened to include e-commerce privacy-related studies. Majority of e-commerce papers were addressing privacy and information disclosure, purchase intention and buying behaviour. Furthermore, the selection of papers was based on the screening criteria that an article was related to key research questions, peer reviewed, less than 10 years old and conducted in e-commerce Websites or SNS contexts. Out of 131 searched papers, 84 were found useful and 47 were rejected for failing to address research questions in online advertising context. In addition, majority (82) of accepted articles were empirical papers and few (2) meta-analysis review articles and reports. A thorough reading of at least two papers per day was done for one month. To keep ourselves on track, notes were taken during reading and were organised in a matrix developed using MS Excel. At least five relevant quotes for each paper were gathered. By using filter function in the MS Excel, papers were categorised on the basis of topics covered, and creatively topical themes were created. The themes included privacy perspectives, data breach, informational privacy, privacy concerns, trust and theories. These have been discussed in Sect. 5.3’ literature review and theoretical development. Eventually, the research model and hypotheses were developed in Sect. 5.3 based on reviewed papers. All articles were lawfully obtained through Delhi School of Economics’ e-Library access and have been cited accordingly.

50

E. E. Lulandala

5.3 Literature Review and Theoretical Development 5.3.1 The Concept of Privacy Privacy is amorphous and multidisciplinary concept [41]. Scholars have studied it for more than 100 years, but yet they have not reached a universal articulation, and this has resulted into discipline-specific definitions [11, 37, 4216, 3] as discussed below.

5.3.1.1

Legal and Psychological Perspectives

One of the early legal contributions was given in 1890 by Warren and Brandeis, who defined privacy as the “right to be left alone”. This right is both legal and moral [43]. The legality and morality of privacy is currently the driving force for strict data privacy regulations and laws to protect people from misuse of online personal data. Contrary to the legal perspective, in psychology, privacy is a state of mind or an emotion or a feeling or desire to be alone [42, 44]. Therefore, privacy is intrinsic in nature, and it is not sensible for someone to claim that privacy has been violated. In 1967, Psychologist Westins theorised that privacy is a short-lived decision of an individual to withdraw from the society by choosing to be in any of the four states, i.e. solitude, intimacy, reserve or anonymity. Psychological perspectives of privacy have been instrumental in understanding privacy concerns, trust, intrusion and emotional violation of the users of SNS with respect to privacy breaches and how they impact consumers buying behaviour [45, 15].

5.3.1.2

Economic and Information Systems Perspectives

Different from psychologists and lawyers, economists view privacy as a resource that needs to be managed to ensure market efficiency [46]. In this sense, privacy is a property, a value that can be used to get supernormal profits in the marketplace. In information economics, privacy calculus theory proposes that individuals are rational and therefore compares the costs and benefits of privacy [4]. Implying that individuals forego some privacy as long as it is beneficial. This view is highly relevant in SNS context; researchers have revealed that users are willing to disclose information when there are some incentives like online discounts, offers, bonuses [4, 41]. They are willing to trade off privacy and economic benefits. Information system scholars associate the concept of privacy with control of information. As [47] emphasised that privacy is attained only when access to information is limited for others. In the same note, in 1975, Altman noted that privacy is achieved when there is discriminant control of access to personal information. Meaning that individuals have the power to control who can access their information according to their preferences. Privacy

5 Facebook Data Breach: A Systematic Review …

51

has also been defined in terms of restrictions on the flow of information in a particular context [48]. In this case, privacy is context-specific, and therefore, there is no question of whether the information, by its nature, is private or public; it entirely depends on persons’ control of the flow of that information in different contexts. Both economic and information system perspectives supplement each other; privacy is a resource because of discriminant control of the flow of information, which create scarcity of information and making it a valuable resource.

5.3.1.3

Sociological and Marketing Perspectives

On the other hand, sociologists define privacy as collection and use of information in the context of power influence among individuals, groups and society [46]. Philosophers Fried and Rachels (as cited in [37]) view privacy as the foundation for stable relationships, in such a way privacy nourishes intimacy and trust, enabling people to enjoy diversity of relationships. This view not only highlights the social relationships but also highlights the need for control of who should access ones’ information. In the context of SNS, trust is critical in facilitating self-disclosure [37]. Therefore, more trust to SNS ensures more disclosure of information. Moreover, in marketing context, privacy is defined in terms of access, use and dissemination of consumer information for marketing purposes [49]. In this case, users decide what information is accessed by whom and shared to whom and to what extent, and therefore, breach of consumer privacy depends on two key issues, first is whether consumers can control access, use and dissemination of information and second is whether they are aware about it [50]. In online context, marketers collect huge amount of information about their consumers, i.e. demographics, shopping details, preferences and tastes and even very private information. Any secondary use of such details without prior consent is violation of privacy and highly objectionable by consumers [3, 51]. Advancement of technology has empowered marketers to collect, use and disseminates information for marketing purpose without consent from users; this reflects unbalanced power relationship between marketers and consumers.

5.3.1.4

Operationalisation of the Concept of Privacy

The concept is still fuzzy and amorphous, since it is still not clear whether privacy is a right, a feeling, a state of mind, relationship, a property, information control or access control. Despite its complexity, researchers are not precluded from studying it [48]. Therefore, we have adopted psychological, information systems and marketing perspectives as applied by many researchers [5, 11, 37, 42, 44, 45, 49, 50]. It is therefore conceptualised as a state of mind in which SNS users have awareness and exert control on access, use and dissemination of information shared in SNS for marketing purposes. To develop a model of how breach of privacy affects consumers behaviour, the next section discusses data breach, social contract theory and gossip theory.

52

E. E. Lulandala

5.3.2 Theoretical Perspectives of Data Breach 5.3.2.1

Data Breach

Data breach which is also known as privacy failure refers to a compromise of data security that leads to unauthorised disclosure, access, transfer, destruction, copying or viewing of protected information by untrusted third parties [16, 52, 53, 46]. It takes different forms from physical loss of digital devices to more complex hacking/malware of computer systems. Data breach has recently gained attention of marketing scholars on different areas, its impact on reputation and share value [52], customer royalty [54], consumer attitudes [55] service failures and firms’ performance [17]. Generally, researchers have largely omitted consumer implications as part of their frameworks [16] to study data breach. To address this gap in privacy literature, the current paper has focused on the impact of data breach on users’ behaviour towards SNS ads particularly in Facebook. To achieve this, in the next part, insights are drawn from social contract theory.

5.3.2.2

Social Contract Theory

Social contract theory was developed by Greek philosophers, Socrates, Thomas Hobbes, John Locke and Jean-Jacques Rousseau in political and societal context. The theory explains the relationship of a person, and the society is based on agreed principles or laws that bind the society together and ensure its existence and that people have moral and political obligation to obey. According to Socrates, the agreement is implicit, and it depends on people’s choice [56, 17]. The theory implies that a person has liberty to either leave or stay in the society. Remaining within the society means agreement with the laws and punishment in case of its violation. This theory has been applied to study consumer behaviour in online context by many scholars, [56, 17, 31, 57]. In SNS, both SNS platforms and users enter into virtual agreement that has implicit and explicit terms and conditions. In Socrates’ view, both users and SNS platforms have obligations to comply. Users enter into virtual contract with Facebook by voluntarily registering through creating personal profile, and they are entitled to receive communication services in exchange of personal information [58]. Facebook has the moral and legal obligation to protect users’ information and provide services to the users. Any breach of users’ information is a violation of psychological agreement, which is consequential in terms of eroding users’ trust and behaviour towards the SNS [59, 17]. Furthermore, scholars affirm that users respond to violation of virtual contracts in three ways: firstly, cognitively by loosing trust in future transactions, secondly, emotionally get hurt and feel violated and thirdly, behaviourally by reducing willingness to buy, negative word of mouth and generally avoiding the service provider [60, 17, 61]. Again, drawing insights from Socrates’ social contract theory, users have an option to withdraw their membership in these SNS. A survey conducted

5 Facebook Data Breach: A Systematic Review …

53

by Ponemon indicates that 31% of online consumers discontinued their relationship with the breached company. However, [14] argued that due to long experience in SNS, users are reluctant to leave since they are not willing to lose the online network of friends and communities [31]. He reported that users are worried that clicking ads online makes personal information vulnerable and therefore hesitate to accept ads or sometimes avoid them. This implies that breach of information affects also engagement of SNS users with ads and increases the perception of vulnerability.

5.3.2.3

Gossip Theory

Gossip theory has been extensively applied to explain human psychological and behavioural response when faced with vulnerability [16]. Gossip is defined as unwarranted evaluative communication/transfer of information about an absent vulnerable third party [62, 63]. Gossips are common in the society, and about 67% of all communications in the society are based on gossip topics [64]. People are experts in gossips, know its impact and often avoid becoming gossip target. Individuals react with a series of emotional and behavioural responses when they know that they are the target of gossip. The emotional responses include a feeling of betrayal, negative affect and violation [65]. Individuals feel violation of right to privacy. This results to low trust and heightened privacy concerns [16]. Thus, in SNS context, data breach is the gossip because it involves unpermitted transfer of information. Furthermore, the theory implies that data breach results into strong emotional violation, which subsequently affects trust and consumer behaviour. Emotional violation is the negative affection that people have as a result of betrayal or breach of trust or being violated their rights [16]. Furthermore, gossip theory identifies transparency and control as the factors that reduce the negative influence of gossip [65]. Transparency means that the gossip target is fully aware about the details of the information being transmitted and the potential harm that is likely to happen; in this way, the target can develop means to protect him/herself. Reflecting on Facebook data breach, Facebook was required to notify its users about the breach in order to reduce emotional violation. Control is the degree of which the target controls the flow of information; in gossip context, the target has less control, and it is this perceived lack of control that aggravates the negative effects of gossip [16]. In Facebook case, this is when the breach has taken place and users can no longer control flow of information. Eventually, users’ emotional violation increase.

5.3.3 Social Networking Sites Advertising and Informational Privacy Privacy issues have recently attracted interest of researchers in SNS advertising. An online survey conducted among SNS users in the USA found that personalised ad messages influence the effectiveness of ads and reduce the likelihood of an ad avoidance; however, the study indicated that highly personalised ads raise privacy concern

54

E. E. Lulandala

among users and ultimately increase ad avoidance [29]. This implies that relevant ads that attract users’ attention are less likely to be avoided and likely to persuade users to spread the ad and get more engaged. He also found that not only privacy concern plays a mediating role between perceived ad relevance and ad avoidance but also positively influencing ad avoidance. This is consistent with other scholars [8, 37, 38, 1, 32], who have indicated that online users worry that advertisers collect their personal information and use for marketing purposes without their consent, and some users resolve not to click the ads and simply ignore them because they do not have alternative. Reference [32] studied privacy awareness among Facebook users in universities; they revealed that Facebook users in South Africa were not much aware about privacy tools. Users trust Facebook as honest platform and share sensitive personal information without recognising the risk for misuse. Related to privacy settings, they found that users’ information is publicly available and can be accessed and misused easily because privacy tools are not used. Thus, it is difficult to achieve full information privacy. The concept of information privacy was predicted long before emergence of information technologies. In 1986, Mason predicted that increased usage of information technologies would cause major problems related to information, privacy, accuracy, property and accessibility [11]. Information privacy refers to the claim for individuals or groups or organisations to decide on who, when, how and to what extent the information can be communicated with others or ability to command information about oneself [11, 47]. In online context, scholar [1], drawing insights from social contract theory, established that collection, control and awareness were the most important dimensions of informational privacy. According to social contract theory, micro-social contracts norms must be consented by well-informed parties and justified by the right of exit and voice [1, 31, 56]. In informational privacy context, collection is perceived to be fair only when consumer has control and is aware about the intended purpose of collecting personal information. It means that users feel that potential privacy risk is high when someone collects and uses information without their consent and awareness [35]. The theory is also based on the principle of procedural justice that emphasises on control. The principle suggests that procedures are perceived as fair only when one can exercise control over them, and this is particularly important on SNS in which users assume high risk by sharing personally identified information. Despite its importance in building trust, privacy awareness is a hurdle in majority SNS. The extent to which users of SNS are informed about privacy practices in SNS like Facebook is referred as privacy awareness [1]. Moreover, [32] found that awareness is the key as far as privacy is concerned. Privacy concerns increase as consumers become aware of marketer’s tracking of information without their consent [66]. In another study on dimensions of privacy concern, [3] suggested that control, awareness and usage of information other than the originally intended are the underlying dimensions for users’ privacy concerns. Similarly, when investigating the privacy controversy associated with Facebook news feed format introduced in September 2006, [67] established that users’ perception of privacy concerns increases due to perceived loss of control and compromised information access. Facebook news feed format culls

5 Facebook Data Breach: A Systematic Review …

55

new information from users’ profiles and broadcasts it to the network of friends in form of news headline in initial pages. Therefore, information is more accessible than before. The product received monstrous backlash from users as it was perceived as compromising users’ control and access on personal information. Reference [67] also noted that about 55% of users were less willing to disclose personal information, as [41] have also shown that awareness concerns have significant relationship with selfdisclosure [41]. Users who are more aware about privacy are less likely to disclose sensitive personal information on SNS. Therefore, we argue that privacy awareness plays a moderating role in the relationship between data breach and users’ online behaviour. Privacy awareness is critical in shaping psychology of consumers in terms of privacy concerns and trust.

5.3.4 Privacy Concerns and Online Consumers Behaviour Concern refers to anxiety or worry [68, 47]. In information context, privacy concern refers to individuals’ worry or subjective opinion about fairness of information practices [50]. The worry stems from the fact that marketers collect a great deal of information online (i.e. from surfing to credit cards to SNS), which can be potentially misused. Industrial and government studies in the USA indicated that privacy concern is a barricade to growth of e-marketing [58, 36]. A survey conducted by Miyazaki and Fernandez [36] revealed that security of personal information, financial information and online fraudulent behaviours predict online consumer behaviour and perceived risks. Furthermore, contradictory findings exist with respect to the role of experience on privacy concern; some studies indicate that privacy concerns are very high for consumers with longer online experience, while other studies report the opposite [36]. The contradiction calls for further studies; however, it is very clear that experience moderates influence of privacy concerns on behaviour. The implication is that experience plays a significant role in determining Facebook users’ behaviour towards ads. Retargeting of ads is another cause of privacy concern among online users. Ad retargeting is defined as exposing consumers with ads that have content that they had previously searched online [69]. Despite its benefits (i.e. matches users’ goals and interests, increasing ad effectiveness (delivering right message at right time to the right persons), positive attitude and high purchase intention, however), privacy is compromised [70, 71]. As a result, users perceive retargeting as privacy invasion. In an experimental study, [40] showed that scepticism towards retargeted ads on Facebook increase for adolescents with high privacy concerns, and this as a result lowers their purchase intention and increases ad avoidance. Their study was based on reactance theory, which explains that individuals desire freedom and autonomy in making choices, and therefore, they react whenever they feel that their freedom to think and act as they choose is compromised [40]. As advertisers track users’ information in SNS without users’ consent, retargeting of ads can be perceived as a threat to autonomy and freedom, leading to users’ retaliation by avoiding ads. In addition, socio-demographic factors are also significant determinant of privacy concerns. In the study conducted in European member states to understand perceived Internet

56

E. E. Lulandala

privacy concerns, [45] found that age and gender and level of education significantly determine users’ privacy concerns. They noted that young and old users worry less about privacy; this is due to lack of awareness about privacy protection techniques but also inadequate understanding of SNS. Education was found to positively influence perceived privacy concerns. This implies that socio-demographic factors also moderate how users’ perceived privacy concerns determine their behaviour towards ads on Facebook.

5.3.5 Trust and Online Consumer Behaviour Trust beliefs refer to the extent to which users maintain that marketers are dependable in upholding fair information practices with respect to personal data safety [72, 1]. Users’ trust to online vendors and advertisers affects consumers’ privacy concerns and behaviour. The role of trust has been investigated by Chellappa and Sin [15] in their study on personalisation vs privacy. They defined personalisation as tailoring users’ buying experience with their personal and preference information. Ad retargeting as studied by Cho and Cheon [69] is a form of personalisation. Marketers collect personal and preference information of users from SNS and combine with other offline database to provide personalisation benefits, i.e. convenience consumption of personalised services. Personalisation not only depends on information collection and processing capabilities of the firm but also depends consumers’ willingness to share personal and preference information. Trust plays a critical role in determining willingness to share information and use personalised services [15, 73]. Online users are sceptical with advertising industry due to potential risk of information misuse [40]. Trust risk model maintains that in potentially risk contexts, trust directs users’ behaviour [74]. The presence of online trust building factors such as simple and clear privacy policy, privacy tools and transparency on collection and use of information gives confidence to the users that their information is safe and that fair information practices are upheld [75]. Among other services, advertising is the major revenue generating activity of SNS like Facebook. Facebook has been using users’ information to target and retarget its users and improve their online buying experience. It can be argued that users’ trust on Facebook and advertisers influences both their privacy concerns and their subsequent behaviour towards Facebook ads.

5.3.6 Consumers’ Online Data Protection Increasing online fraudulent appropriation of personal and financial information is detrimental to consumer behaviour. Some of the threats faced by online users include device hacking, spyware for tracking online behaviour and placement of cookies [76]. Research in this area has focused on information privacy protection in e-commerce context as opposed to SNS. Reference [35] studied protection of consumers against online privacy and theft; they claimed that companies, employees and external thieves

5 Facebook Data Breach: A Systematic Review …

57

are involved in compromising data safety. Threat to data security is high when data is stored electronically online. Social contract theory provides for reciprocal arrangement between participants, each with expectations to be met [56, 17]. In SNS, users provide personal information in return of improved SNS services. Users chose to use SNS because they believe that the benefits outweigh the risks/costs for providing information. However, when there is an apparent risk of the information being misused, they tend to protect themselves. According to protection motivation theory, people tend to protect themselves when they perceive that the risk is likely to occur and it is severe or when protective behaviour will reduce the risk [3]. In e-commerce Websites, consumers protect themselves by ensuring safety of online forms, applying anonymous browsing, using privacy policies and rejecting or deleting cookies. In addition, they refuse to share personal information online, not buying online, and some ask companies not to share their information with third parties [57, 35]. Research has confirmed that consumers respond to this privacy threats by adopting protective behaviours such as fabrication (falsifying information, misrepresentation), protection by using privacy tools and withhold by refusing to purchase or register and seeking advice from others [50, 30, 57, 40]. Within social networking environment, users tend to untag, delete comments, ignore and sometimes block ads or unregister from the networking Website [30]. According to power-responsibility equilibrium (PRE) framework, government and powerful marketers have responsibility to protect consumers’ privacy through their policies and regulations, failure to which, retaliatory response from individual customers is expected [30, 40]. In line with PRE framework, users are expected either to protect themselves by avoiding the ads or accepting ads and engaging them depending on their information sensitivity. Reference [14] conducted an experimental study based on the information processing theory of motivation to understand overcoming of information privacy. This theory is based on the premise that people form expectations based on information processing it terms of behaviour and outcome. Mitigation of privacy concerns is related with positive valence which leads to higher motivational score. In addition, financial incentives and conveyance were found to significantly increase motivation for people to register in Webs. Based on this study, individuals protect their privacy only after taking into account the outcomes of such actions. Consistent with privacy calculus theory, individuals are ready to trade off privacy for other benefits. Also, individuals tend to behave inconsistently with their privacy concerns with respect to information disclosure in online synchronous social interactions [50]. Putting it in SNS context, despite the fact that privacy concerns may negatively affect behaviour towards ads on SNS, users of SNS behave by considering the consequences of their actions, implying some will still accept and engage ads and others avoid depending on the incentives, motives, convenience and experience. On the basis of this discussion, a research model and its hypotheses are developed in the next section.

58

E. E. Lulandala

5.4 Research Model and Hypotheses The conceptual model addressing the influence of perceived data breach on consumer behaviour towards ads in Facebook is presented in “Fig. 5.1”; it was developed based on the review of the previous scholarly works on privacy and consumer behaviour in online context. It was built from social contract theory and gossip theory, both discussed in the previous section. The constructs of the model were selected based on their significance as cited in the extant literature on privacy and consumer behaviour. The model in “Fig. 5.1” suggests that perceived data breach has a direct influence on privacy concerns, trust, emotional violation, ad acceptance, ad engagement and ad avoidance. It also explains and predicts that the mediating variables, privacy concern, trust and emotional violation are interrelated. Furthermore, it proposes that privacy concerns, trust and emotional violation mediates the influence of perceived data breach on three ad behaviour constructs, ad acceptance, ad engagement and ad avoidance. It is worthy to note that our model keep in control moderating variables, personalisation, SNS experience, financial incentives, transparency, age, gender, education, individual’s privacy sensitivity, users’ control of information and nature of personal information. In the next part, all the constructs of the model are explained, followed by the hypotheses.

5.4.1 Perceived Data Breach Perceived data breach is a construct that measures attitude of individuals to data security compromises in SNS. Facebook collects huge amount personal information from its users, and this has increased its vulnerability to data breaches that affect not only

Fig. 5.1 A research model for the impact of perceived data privacy breach on consumer behaviour towards Facebook advertising

5 Facebook Data Breach: A Systematic Review …

59

those who do not use privacy settings but also those using the strictest privacy settings [23]. A survey conducted by the Ponemon Institute’s in 2007 among American consumers reported that 84% of consumers were worried and more concerned about their privacy online. As a result firms face customer backlash that leads to negative publicity and depreciated market value [17]. [38] reported that data breaches have heightened fear of information vulnerability among SNS users when using Facebook ads. According to social contract theory, information breach can be decoded as a breach or violation of trust [59], between users and Facebook that result into erosion of trust [17]. Moreover, the Ponemon Institute’s survey found that 57% of consumers lost their trust and confidence on breached companies. Following the 2018 Facebook data breach, its Chief Operating Officer, Sherly Sandberg, admitted that Facebook data breach had serious risk on trust bestowed to them by users and that it is their responsibility to ensure trust is restored [26]. Generally, trust diminishes as consumer information vulnerability increases due to marketers’ data practices. In addition, the current study is modelled by gossip theory that suggests that people are emotionally hurt when their personal data or information is transmitted or used by others without their awareness and control. This heightens feelings of emotion violation and privacy concerns [16]. The literature propounds that data breach is related to ad acceptance, privacy concerns, emotional violation, trust, ad avoidance and ad engagement. We therefore put forward the following hypotheses: H1a. Perceived data breach has a negative impact on ad acceptance H1b. Data breach has positive impact on privacy concerns H1c. Data breach has a negative impact on Trust H1d. Data breach has a positive impact on emotional violation H1e. Data breach has a positive impact on ad avoidance H1f. Data breach has a negative impact on ad engagement.

5.4.2 Privacy Concerns Privacy concern connotes users’ subjective evaluation of worries over information practices by marketers, and it is expressed in form of perceptions and attitudinal beliefs about privacy. Privacy concerns have been shown to have negative influence on consumers’ response [16, 3]. Also, [31, 36] in their study on behavioural effects of online privacy revealed that it has negative effects on consumers’ purchase intention, disclosure of information and willingness to engage in e-commerce. In a survey conducted in the USA, it was reported that Americans have become more concerned about privacy, and they believe that it is under serious threat. Despite the efforts by companies to mitigate privacy concerns through disclosure of privacy policies or use privacy seals connoting fair information practices, surveys show that most consumers admit that the policies are not easy to comprehend and few read them [31].

60

E. E. Lulandala

Trust and privacy concerns construct work together to produce behavioural effects [10]. Some scholars have used trust as a mediating variable of privacy concerns on some behavioural constructs particularly purchase intention [4, 10]. At the same time, some scholars have used trust as antecedent for privacy concerns, while others have conceptualised privacy as affecting behaviour directly [11, 70, 71]. In the context of data breaches in SNS, trust plays a determinant role in predicting users’ reaction to advertisements and therefore conceptualised as related to both privacy concerns and ad behaviour constructs. As explained in other studies, privacy concern has influence on consumer behaviour towards ads, i.e. ad acceptance, ad engagement and ad avoidance. Therefore, the current study suggests the following hypotheses: H2a. Privacy concern has negative influence on ad acceptance behaviour H2b. Privacy concern has negative influence on ad behavioural engagement H2c. Privacy concern has positive influence on ad avoidance behaviour H2d. Privacy concern is negatively related with trust.

5.4.3 Trust Trust is a construct that measures the confidence of users on reliability of Facebook in protecting their personal information [72, 1]. When faced with risk and uncertain online environment, individuals rely on trust beliefs to direct their behaviour. Indeed, trust in the SNS is important in boosting online interactions [39] and information sharing [77], encouraging acceptance and engagement with ads in SNS. [40] Studied processing of retargeted Facebook ads among adolescents found that adolescents have low trust and high privacy concern when receiving retargeted ads (tailored to their preferences). This implies that trust is linked with privacy concerns, and the degree of personalisation influences how trust affects users’ behaviour towards ads. According to a study on consumer adoption of SNS in the Netherlands, [78], perceived trust was found to positively affect the intention to use SNS services. Moreover, in e-shopping context trust and interactivity or interactions are related [79, 80], such that more trust more engaging interactions. Likewise, [81] reported that higher trust increases click-through in e-commerce Websites. However, [82] gave a different perspective; it reported that 49% of respondents rated SNS ads as “bad” and 10% perceived as untrustworthy. This suggests that users with low trust are likely to exhibit ad avoidance behaviour due to privacy concern; this is in line with [16] who argued that trust leads to positive marketing outcomes including ad acceptance and willingness to share information provided that privacy is salient. As discussed in the literature, the previous studies suggest that trust has influence on consumer ad behaviour. Therefore, this study advances the following hypotheses: H3a. Trust has positive influence on ad acceptance behaviour H3b. Trust has positive influence on ad behavioural engagement H3c. Trust has negative influence on ad avoidance behaviour.

5 Facebook Data Breach: A Systematic Review …

61

5.4.4 Emotional Violation Emotional violation is a construct that measures the consumers’ negative feelings that results from breach of personal information [6]. In order to understand this construct, the model is informed by gossip theory that suggests people tend to respond negatively when they get to know that they are target of gossip. This may include a series of negative psychological reactions; they get emotionally hurt and feel violated and high feeling of betrayal [83, 16]; in business context, this leads to reduced trust and higher emotional violation. In SNS context, breach of users’ sensitive information and usage by third parties to target ads infringe their right to privacy and attract a series of negative psychological and behavioural response including unfollowing, skipping ads or even deregistering as users of the SNS platform. However, emotional violation decreases when users’ trust is high [16]; in this case, consumers feel less vulnerable and can show a positive behavioural response. Furthermore, [16] found that trust and emotional violation mediate the effect of information vulnerability to information disclosure, switching behaviours and negative word of mouth in online context. On the basis of the reviewed literature, it is proposed that users’ feelings of emotional violation due to data breach are related to their behavioural reaction towards SNS ads. We therefore propose the following hypotheses: H4a. Consumers’ emotional violation has negative influence on ad acceptance behaviour H4b. Consumers’ emotional violation has negative influence on ad behavioural engagement H4c. Consumers’ emotional violation has positive influence on ad avoidance behaviour H4d Emotional violation has negative influence on consumers’ trust.

5.4.5 Ad Behaviour In the model, consumer ad behaviour consists of three behavioural constructs, ad acceptance, ad engagement and ad avoidance. Ad acceptance refers to actual use of ads in SNS; this involves clicking on ads to watch, listen or read [84]. Ad engagement refers to consumers’ involvement and attention to the ad, and it is expressed in several ways, when consumers share their positive sentiments or experiences, asking questions, use advertised product/services as reference in their accounts/posts, engaging in interactions with the marketer, sharing and tagging ad with friends [82, 85]. In addition, ad avoidance refers to actions aiming to circumvent or destroy ads in SNS [86]. This can be through skipping, ignoring or even blocking ads from appearing in one’s SNS profile.

62

E. E. Lulandala

5.5 Discussion and Implications SNS particularly Facebook has become an integral part of life for many people. Its ubiquitous nature allows people to share information around the world through posts, comments, sharing, statuses, private messages and likes. This has enabled Facebook to collect massive data about its users and has transformed Facebook from a communication to a more complex media and data company whose business model entirely depends on users’ personal data. Data is a valuable resource to SNS and online marketers. As a result due to holding massive users’ data, Facebook has become more vulnerable to data breaches. The objectives of this paper were twofold, to analyse privacy with respect to Facebook advertising and to investigate the consequences of data breach on consumers’ behaviour towards Facebook ads. Privacy in Facebook is in state of partiality; there have been improvements, but technically privacy has not yet been achieved. Legally, users have the right to be left alone; however, it was found that Facebook tracks users’ online activities (browsing history) both when logged in and logged off Facebook, without users’ consent. Furthermore, through the use of unique Facebook ID numbers, users can still be tracked online regardless using privacy settings or deregistering from the platform. Psychologically, the reviewed literature has revealed that privacy concerns are growing among SNS users. Consumers are increasingly worried about using Facebook ads in fear of their information being stolen. However, consumers’ psychology was found to be influenced by transparency, availability of privacy tools, SNS experience, motives and ability to control information. This explains why consumers use Facebook regardless of their high privacy concerns. We have also found that significant efforts are done by Facebook to enable users to control the flow of information in order to improve privacy. The efforts include clear privacy policy and less complex privacy settings. However, the challenge identified in the literature includes online tracking that bypass privacy settings and lengthy policies that are hard to read. We also found that Facebook deliberately has been concealing data breach incidences from their users, i.e. it did not notify users about the breach in 2015 until when it got whistle blowed in 2018. This is a clear violation of general data regulations that requires Facebook to notify users about collection, use and even data breach incidences as they occur. Furthermore, the current privacy settings enable Facebook to collect information by default after registration. This is a clear violation of privacy because at any point in time, users are not aware about what information is being collected and for what purpose. Users do not have control over their own personal data, resulting to higher perceptions of data breach. Theoretical contribution of this paper is a proposed model (Fig. 5.1) of how perceived data breach influences users’ behaviour towards ads and proposed hypotheses explained in Sect. 5.3. From the systematic review of literature, researchers call for studies to address the effect of perceived data breach on consumers’ response to marketing communication; the model of this study fills this gap in knowledge by focusing on Facebook ads. The model proposes that perceived data breach affects

5 Facebook Data Breach: A Systematic Review …

63

consumers’ behaviour towards Facebook ads directly and through mediating variables. Directly, perceived data breach influences behaviour negatively by discouraging acceptance and engagement of ads, and it positively influences ad avoidance. Users will tend to avoid ads when they become aware about data breach. Moreover, perceived data breach has psychological influence on consumers. It increases privacy concerns and emotional violation and the same time reduces trust of users to Facebook. The resultant effect on consumers’ psychology ultimately affects consumers’ behaviour. Privacy concerns and emotional violation are proposed to have negative relationship to ad acceptance and engagement while positively related to avoidance of ads. The proposed model keeps in control other moderating variables such as Facebook experience, financial incentives, transparency, age, gender, education, individual’s privacy sensitivity, users’ control of information and nature of personal information. The findings of this paper have four major empirical implications. First is for the governments to enact robust personal data protection regulations that will prohibit online tracking of consumers. Facebook and other SNS should not have access to information beyond what consumers have shared on it. Secondly, Facebook and other SNS need to develop privacy settings that give freedom to consumers to opt in or out when asked for consent to collect information. The current data collection by default settings deprives users’ privacy. Most often they are unaware about what information is collected and for what purpose. Privacy can only be achieved when users have final voice on their personal data. Thirdly, in order to reduce psychological concerns, Facebook and other SNS need to build trust among users by increasing transparency, i.e. notifying users any privacy compromises encountered, what information may have been accessed and the extent of the damage. Moreover, they need to develop user-friendly privacy policy and settings in order to develop trust among the users. Fourthly, Facebook needs to take privacy breach seriously as service failure, since their business model depends on consumers’ data, and consumers’ data protection needs to be a priority. The negative influence of data breach on behaviour towards ads can reduce effectiveness of Facebook as medium for advertising.

5.6 Conclusion, Limitations and Scope for Further Research The findings of this study suggest that protection of consumers’ privacy in Facebook and other SNS is in deficit. Privacy deficit aggravates perceived data breach which in turn influences adversely the behaviour of users towards Facebook ads. The current study recommends a legal framework by governments to protect citizens from online tracking and ensure safety of their personal data. In addition, it calls for Facebook and other SNS to take any data breach seriously as service failure and build trust with users by ensuring transparency and improved privacy settings that give control of information to the users.

64

E. E. Lulandala

Although this paper contributes to the literature by proposing a model for effect of perceived data breach on consumer ad behaviour, it has few limitations that provide scope for further research. Firstly, this work is based on a review of literature to propose a model, and therefore, the propositions given cannot be generalised. This provides opportunity for empirical study to test the model. Secondly, the review is based on 84 articles; scholars can embark into a more comprehensive review to get more insights. Thirdly, the model assumes that other moderating variables such as Facebook experience, motives, personalisation, gender and education are in control; however, the literature shows that they significantly moderate influence of privacy concerns and trust on consumer behaviour [16, 37, 87]. Scholars are invited to build up on the proposed model to investigate further the influence of moderating variables on consumers’ behaviour towards ads. Fourthly, the mediating variables used may not be exhaustive; further research is recommended to understand other mediating variables influenced by perceived data breach. Sixthly, the proposed model has included only psychological construct because we wanted to understand how perceived data breach affects the attitude, emotions and cognitive trust with respect to Facebook advertising. It will be desirable to investigate how other constructs affect consumer behaviour towards advertising in Facebook. Acknowledgements The author is deeply grateful to Prof. Kavita Sharma and Matli Toti (Ph.D. Scholar, University of Delhi) for enormously valuable comments and suggestions.

References 1. Malhotra NK, Kim SS, Agarwal J (2004) Internet users’ information privacy concerns (IUIPC): the construct, the scale, and a causal model. Inf Syst Res 15(4):336–355. doi 10.1287.l040.0032 2. PWC (2018) An overview of the changing data privacy landscape in India. Retrieved from https://www.pwc.in 3. Sheehan KB, Hoy MG (2000) Dimensions of privacy concern among online consumers. J Public Policy Market 19(1):62–73 4. Dienlin T, Metzger MJ (2016) An extended privacy calculus model for snss: analyzing selfdisclosure and self-withdrawal in a representative u.s. sample. J Comput-Mediated Commun 21:368–383. https://doi.org/10.1111/jcc4.12163 5. Foxman ER, Kilcoyne P (1993) Marketing practice and consumer privacy: Ethical issues. J Public Policy Market 12:106–119 6. Gregoire Y, Robert JF (2008) Customer betrayal and retaliation: when your best customers become your worst enemies. J Acad Mark Sci 36(2):247–261 7. AgarwalS (2010) Bureau. Economic Times. Retrieved from|//economictimes.indiatimes.com/ 8. Baek TH, Morimoto M (2012) Stay away from me. J Advertising 41(1):59–76. https://doi.org/ 10.2307/23208321 9. Bergstrom A (2015) Online privacy concerns: a broad approach to understanding the concerns of different groups for different uses. Comput Hum Behav 53(1):419–426. http://dx.doi.org/ 10.1016/j.chb.2015.07.025 10. Smith HJ, Dinev T, Xu H (2011) Information privacy research: an interdisciplinary review. MIS Q 35(4)989–1015 11. Belanger F, Crossler RE (2011) Privacy in the digital age: a review of information privacy research in information systems. Manage Inf Syst Res Centre Univ Minn 35(4):1017–1041

5 Facebook Data Breach: A Systematic Review …

65

12. Study finds Personal Data Protection Draft Ambiguous (2018) Economic Times. Retrieved from https://economictimes.indiatimes.com/ 13. Reints R (2018) Taken a quiz lately? Your Facebook data may have been exposed. Retrieved from http://fortune.com/ 14. Hann IH, Hui KL, Lee SYT, Png IPL (2007) Overcoming online information privacy concerns: an information-processing theory approach. J Manage Inf Syst. 24(2):13–42 15. Chellappa RK, Sin RG (2005) Personalization versus privacy: an empirical examination of the online consumer’s dilemma. Inf Technol Manage 6:181–202 16. Martin KD, Borah A, Palmatier R (2017) Data privacy: effects on customer and firm performance. J Market 81:36–58. http://dx.doi.org/10.1509/jm.15.0497 17. Malhotra A, Malhotra CK (2011) Evaluating customer information breaches as service failures: an event study approach. J Serv Res 14(1):44–59. https://doi.org/10.1177/1094670510383409 18. Klahr R, Shah JN, Sheriffs P, Rossington T, Button M, Pestell G (2017) Cyber security breaches survey 2017. MORI Social Research Institute and Wang Institute for Criminal Justice Studies, University of Portsmouth. Retrieved from https://www.ipsos.com/ 19. Facebook (2018) Facebook reports fourth quarter and full year 2017 results. Retrieved from: https://s21.q4cdn.com/ 20. Roettgers J (2018) Facebook says it’s cutting down on viral videos as 2017 Revenue tops $40 Billion. Retrieved from http://variety.com/ 21. Sen R, Borle S (2015) Estimating the contextual risk of data breach: an empirical approach. J Manage Inf Syst 32(2):314–341. doi https://doi.org/10.1080/07421222.2015.1063315 22. Badshah N. (2018) Facebook to contact 87 million users affected by data breach. The Guardian, 12 June 2018. Retrieved from: https://www.theguardian.com 23. Steel E, Fowler GA (2010) Facebook in privacy breach. Wall Street J. Retrieved from http:// www.wsj.com/ 24. Sallyann N (2018) The Facebook data leak: what happened and what’s next. Retrieved from http://www.euronews.com/t 25. Baker SM, Gentry JW, Rittenburg TL (2005) Building understanding of the domain of consumer vulnerability J Macromark 25(2):128–139. https://doi.org/10.1177/0276146705280622 26. Baty E (2018) Here’s why you may be seeing a warning on your facebook newsfeed today 09 Apr 2018. Retrieved from https://www.cosmopolitan.com 27. Give me another chance: Zuckerberg on leading Facebook. Economic Times, 08 May, 2018. Retrieved from//economictimes.indiatimes.com/ 28. Wagner K (2018) Congress doesn’t know how Facebook works and other things we learned from Mark Zuckerberg’s testimony. Retrieved from https://www.recode.net/ 29. Jung AR (2017) The influence of perceived ad relevance on social media advertising: an empirical examination of a mediating role of privacy concern. Comput Hum Behav 70:303–309. http://dx.doi.org/10.1016/j.chb.2017.01.008 30. Lwin M, Wirtz J, Williams JD (2007) Consumer online privacy concerns and responses: a power–responsibility equilibrium perspective. J Acad Market Sci 35:572–585. https://doi.org/ 10.1007/s11747-006-0003-3 31. Tsai JY, Egelman S, Cranor L, Acquisti A (2011) The effect of online privacy information on purchasing behavior: an experimental study. Inf Syst Res 22(2):254–268. doi 10.1287.l090.0260 32. Nyoni P, Velempini M (2017) Privacy and user awareness on Facebook. South Afr J Sci 114(5/6/2017):2017–0103. http://dx.doi.org/10.17159/sajs.2018/20170103 33. Tucker CE (2014) Social networks, personalized advertising, and privacy controls. J Market Res 51(5):546–562. http://dx.doi.org/10.1509/jmr.10.0355 34. AcquistiA, Friedman A, Telang R (2006) Is there a cost to privacy breaches? an event study security and assurance. In: Twenty seventh international conference on information systems, Milwaukee, 2006 35. Milne GR, Rohm AJ, Bahl S (2004) Consumers’ protection of online privacy and identity. J Consum Aff 38(2):217–232. Accessed from http://www.jstor.org/stable/23860547

66

E. E. Lulandala

36. Miyazaki AD, Fernandez A (2001) Consumer perceptions of privacy and security risks for online shopping. J Consum Aff 35(1):27–44 37. Caudill EM, Murphy PE (2000) Consumer online privacy: legal and ethical issues. J Public Policy Market 19(1):7–19 38. Dey K, Mondal P (2018) Social networking websites and privacy concern: a user study. Asian J Inf Sci Technol 8(1):33–38 39. Dwyer C, Hiltz SR, Passerine K (2007) Trust and privacy concern within social networking sites: a comparison of facebook and myspace. In: Americas conference on information systems proceedings, 2007. Retrieved from http://aisel.aisnet.org/amcis2007/339 40. Zarouali B, Ponnet K, Walrave M, Poelsh K (2017) Do you like cookies? Adolescents’ sceptical processing of retargeted Facebook-ads and the moderating role of privacy concern and a textual debriefing. Comput Hum Behav 69:157–165. https://doi.org/10.1016/j.chb.2016.11.050 41. Zlatolas LN, Welzer T, Hericko M, Holbl M (2015) Privacy antecedents for SNS self-disclosure: the case of Facebook. Comput Hum Behav 45:158–167. https://doi.org/10.1016/j.chb.2014. 12.012 42. Dinev T, Xu H, Smith JH, Hart P (2013) Information privacy and correlates: an empirical attempt to bridge and distinguish privacy-related concepts. Eur J Inf Syst 22:295–316 43. Clarke R (1999) Internet privacy concerns confirm the case for intervention. Commun ACM 42(2) 44. Wacks R (2010) Privacy: a very short introduction. Oxford Press, New York. https://doi.org/10.1093.003.0001 45. Cecere G, LeGuel F, Souli N (2012) Perceived internet privacy concerns on social network in Europe. Munich Personal RePEc Archive. Retrieved from https://mpra.ub.unimuenchen.de 46. Waldo J, Lin H, Millett L (2007) Engaging privacy and information technology in a digital age. https://doi.org/10.17226/11896 47. Westin AF (1967) Privacy and freedom. Athenaum, New 48. Nissenbaum H (2010) Privacy in context: Technology, policy, and the integrity of social life. Stanford University Press, Palo Alto 49. Nill A, Aalberts RJ (2014) Legal and ethical challenges of online behavioral targeting in advertising. J Curr Issues Res Advertising 35:126–146 50. Jiang Z, Heng CS, Choi BCF (2013) Privacy concerns and privacy-protective behavior in synchronous online social interactions. Inf Syst Res 24(3):579–595 51. Wang P, Petrison LA (1993) Direct marketing activities and personal privacy. J Direct Mark 7:7–19 52. Ponemon Institute (2017) The impact of data breaches on reputation and share value, a study of marketers, it practitioners and consumers in the united kingdom. Retrieved from///E:/PhD/Papers/inbox/ponemon_data_breach_impact_study_uk.pdf 53. SO/IEC 27040 (2015) Information technology security techniques storage security. Retrieved from https://www.iso.org 54. Gemalt D (2017) Data breaches and customer loyalty. Retrieved from https://techday.com 55. Ablon E, Lillian H, Heaton J, Lavery Y, Romanosky S (2016) Consumer attitudes toward data breach notifications and loss of personal information. Retrieved from https://www.rand.org/ 56. Friend C, Social contract theory. Internet Encyclopaedia of Philosophy, Retrieved from https:// www.iep.utm.edu/soc-cont/ 57. Youn S (2009) Determinants of online privacy concern and its influence on privacy protection behaviors among young adolescents. J Consum Aff 43(3) 58. Culnan MJ, Armstrong PK (1999) Information privacy concerns, procedural fairness, and impersonal trust: an empirical investigation. Organ Sci 10(1):104–115 59. Hoffman DL, Novak TP, Peralta MA (2013) Information privacy in the market space: implications for the commercial uses of anonymity on the web. Inf Soc 15(2):129–139 60. Lewicki RJ, Bunker BB (1996) Developing and maintaining trust in work relationship in trust in organizations. In: Kramer RM, Tyler TR (eds) Frontiers of theory and research. Thousand Oaks

5 Facebook Data Breach: A Systematic Review …

67

61. Wang S, Huff L (2007) Explaining a buyer’s response to a seller’s violation of trust. Eur J Market 41(9–10):1033–1052 62. Feinberg F, Willer R, Stellar J, Keltner D (2012) The virtues of gossip: reputational information sharing as prosocial behavior. J Person Soc Psychol 102(5):1015–1030 63. Foster EK (2004) Research on gossip: taxonomy, methods, and future directions. Rev General Psychol 8(2):78–99 64. Dunbar RIM (2004) Gossip in evolutionary perspective. Rev Gener Psychol 8(2):100–10 65. Baumeister RF, Zhang L, Vohs KD (2004) Gossip as cultural learning. Rev Gener Psychol 8(2):111–121 66. Cespedes FV, Smith HJ (1993) Database marketing: new rules for policy and practice. Sloan Manage Rev 34:8–12 67. Hoadley CM, Xu H, Lee JJ, Rosson MB, Privacy as information access and illusory control: the case of the Facebook News Feed privacy outcry. Electron Comm Res Appl https://doi.org/ 10.1016/j.elerap.2009.05.00 68. Robbin A (2001) The loss of personal privacy and its consequences for social research. J Gov Inf 28(5):493–527 69. Cho CH, Cheon HJ (2004) Why do people avoid advertising on the internet? J Advertising 33(4):89–97. http://dx.doi.org/10.1080/ 00913367.2004.10639175 70. Goldfarb A (2013) What is different about online advertising? Rev Ind Organ 44(2):115–129. http://dx.doi.org/10.1007/s11151-013-9399-3 71. Kalyanaraman S, Sundar SS (2006) The psychological appeal of personalized content in web portals: does customization affect attitudes and behaviour? J Commun 56(1):110–132. http:// dx.doi.org/10.1111/j.1460- 2466.2006.00006.x 72. Gefen DE, Karahanna DW (2003) Trust and online shopping: an integrated model. MIS Quart 27(1) 73. Friedman B, Kahn P, Howe DC (2000) Trust online. Commun ACM 43(12):34–40 74. Sirdeshmukh DJ, Singh BS (2002) Consumer trust, value, and loyalty in relational exchanges. J Market 66:15–37 75. Shneiderman B (2000) Designing trust into online experiences. Commun ACM 43(12):34–40 76. Cohen A (2001) Internet insecurity. Time 02 July 2001 77. Coppola N, Hiltz SR, Rotter N (2004) Building trust in virtual teams. IEEE Trans Prof Commun 47(2):95–104 78. Lorenzo-Romero C, Constantinides E, Alarco´n-del-Amo M, Consumer adoption of social networking sites: implications for theory and practice. J Res Interact Market 5(2/3):170–188. https://doi.org/10.1108/17505931111187794 79. Dennis C, Merrilees B, Jayawardhena A, Wright LT (2009) E-consumer behaviour. Eur J Mark 43(9/10/2009):1121–1139. https://doi.org/10.1108/03090560910976393 80. Fotis JN (2015) The use of social media and its impacts on consumer behaviour: the context of holiday travel. Ph.D. thesis, Bournemouth University 81. Aguirre E, Mahr D, Grewel D, Ruyter KD, Wetzels M (2015) Unravelling the personalization paradox: the effect of information collection and trust-building strategies on online advertisement effectiveness. J Retail 91:34–59 82. Myspace.com (2007) Social network advertising: making friends. new media age. London 83. Beersma B, Kleef GAV (2012) Why people gossip: an empirical analysis of social motives, antecedents, and consequences. J Appl Soc Psychol 42(11):2640–70 84. Dehghani M, Niaki MK, Ramezani I, Sali R (2016) Evaluating the influence of YouTube advertising for attraction of young customers. Comput Hum Behav 59:165–172. http://dx.doi. org/10.1016/j.chb.2016.01.037 85. Boerman SC, Kruikemeier S (2016) Consumer responses to promoted tweets sent by brands and political parties. Comput Hum Behav 65:285–294

68

E. E. Lulandala

86. Kelly L, Kerr G, Drennan J (2010) Avoidance of advertising in social networking sites. J Interact Advertising 10(2):16–27. doi https://doi.org/10.1080/15252019.20https://doi.org/10. 10722167 87. Dewar K (2017) The value exchange: generating trust in the digital world. Bus Inf Rev 34(2):96– 100. https://doi.org/10.1177/026638211771

Chapter 6

Fuzzy DEMATEL Approach to Identify the Factors Influencing Efficiency of Indian Retail Websites Loveleen Gaur, Vernika Agarwal, and Kumari Anshu

6.1 Introduction In today’s business set-up, when the world is pacing in the online environment, Websites have emerged as the first point of contact for the consumers. An effective Website has become indispensable nowadays to be effectual and fruitful in this Internet-dominated world [1]. The prime goal of a well-thought-out Web design is to generate a remarkable buyer’s experience. This experience must be incomparable and should make the guests retreat to the vendor’s site on several events. Experience is considered ‘the key battleground for today’s global competition’ [2]. Customers’ opinions formed during the Web experience are key to the accomplishments of an online retailer’s Website. In the world of online shopping, buyers are conscious not only about the performance, for example, how easy it is to navigate the site and find to the required piece of information, but also their involvement and the experience created in the whole process, for example, if they sense a feeling of comfort and accomplishment in the entire process. Hoffman and Novak [3] in their flow theory of Website shopping experience advocate that the consumers while navigating through the retailer’s site are affected by the Website content richness. This may impact customers’ focus and flow, which consequently determine consumer experience. Nielsen [4] developed the usability

L. Gaur · V. Agarwal · K. Anshu (B) Amity University, Noida, Uttar Pradesh, India e-mail: [email protected] L. Gaur e-mail: [email protected] V. Agarwal e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 P. K. Kapur et al. (eds.), Strategic System Assurance and Business Analytics, Asset Analytics, https://doi.org/10.1007/978-981-15-3647-2_6

69

70

L. Gaur et al.

engineering concept which indicated five usability attributes. They are learnability, efficiency, memorability, errors (low error rate) and satisfaction. Research indicates that the Nielsen’s attributes of usability can be applied as guiding principle for effective Website design. In business, persistent realization comes from two groups: new customers and repeat customers. Ever since it is a well-known fact that it always costs more to appeal and attract new customers than to preserve the current customers. Customer preservation is more critical than customer attraction. The basic way to retain customer is to provide customer satisfaction. Delighting the customers goes a step ahead of satisfying customers. Pleased and happy consumers are more effective and real advocates for a business than all the funded commercials and advertisement to convert the patrons into customers and also to keep engaged the existing customers [5]. While an organization’s or business accomplishment is dependent on several factors, the focus of this research is on Website features, the user interface. This interface acts as a ‘window’ of the business that users interrelate with. Its design significantly affects users’ perception and potential attitude towards the organization. Here, with the help of fuzzy DEMANTLE tool, we would attempt to resolve this indistinct state and identify the factors influencing efficiency of Indian retail Websites. Study of the various factors are done for understanding the aspects that influence Website efficiency, and fuzzy DEMANTLE tool is implemented on different e-retailers to rank crucial factors and achieve the aim to get established in the market.

6.2 Objective 1. To identify the factors influencing efficiency of Indian retail Websites. 2. To evaluate the relationships between the identified factors that can lead to enhanced online user experience.

6.3 Methodology For the present study, we have tried to include many indices at different points of time so as to get a list of essential Website features. The DEMANTEL approach is used to understand the relationships between these Website features. A questionnaire was prepared to gather relevant data from the experts in this field. The linguistic judgments of these experts/decision-makers were taken on the list of Website features. A total of five experts were considered for the present analysis. The focus area of the present research is Delhi, NCR.

6 Fuzzy DEMATEL Approach to Identify the Factors…

71

6.3.1 Fuzzy DEMATEL Approach The growing Indian retail sector, specifically the online retailers, is pressuring the industry to look for innovative ways to attract customer attention. To respond to these changing market dynamics, the focus of the online retailers is diverted towards improvising their online portals. Thus, the present study attempts to provide a framework to the online retailers for identifying the crucial factors, which can impact the buying behaviour of customers. These metrics are identified through the literature review and detailed discussions with the decision-makers (DMs). DEMATEL methodology is utilized to segment these factors into cause and effect group. DEMATEL methodology helps us in understanding the relationship between the variables. Most of the multi-criteria decision-making approaches are hierarchical in nature, which provides ranking of the alternatives based on the criteria set. DEMATEL is the only multi-criteria technique which helps in understanding the relationships between the criteria. Since, in the present study, we want to understand the relationships between the factors which influence the Website efficiency; hence, we choose DEMATEL approach. The present study combines fuzzy set theory with DEMATEL so as to include the linguistic assessment by the DMs. The following are the basic steps for fuzzy DEMATEL approach [6]: Step 1 Identifying the decision body and shortlisting factors. In this step, the decision-making body is formed, who will assist in identifying the goal of the study followed by shortlisting the factors. Step 2 Identifying the fuzzy linguistic scale. The scale for evaluation of the metrics and the corresponding fuzzy numbers for evaluation are identified in this step. Step 3 Determining the assessments of decision-making body. In this step, the relationship between shortlisted metrics was determined to generate initial direct matrix (IDM). Step 4 Defuzzification of the IDM matrix. The linguistic assessments by the DMs were defuzzified into crisp value by using converting fuzzy data into crisp scores (CFCS) defuzzification [7]. Let Z i j = (ail j , bil j , cil j ) be the effect of metric i on metric j for lth DM. Z i j = (ail j , bil j , cil j ) represents a triangular fuzzy number (TFN) where ail j , bil j , cil j are left, middle and right values. The steps of CFCS method are as follows: Step 4.1 Normalization:

xail j =

xbil j =

  ail j − min ail j max cil j − min ail j   bil j − min ail j max cil j − min ail j

(6.1)

(6.2)

72

L. Gaur et al.

 xcil j =

cil j − min ail j



max cil j − min ail j

(6.3)

Step 4.2 Determine the left (l × a) and right (l × c) normalized value: lxail j = 

xbil j 1 + xbil j − xail j



(6.4)

xcil j  lxcil j =  1 + xcil j − xbil j Step 4.3 Calculate total normalized crisp value:     lxail j 1 − lxail j + lxcil j lxcil j   xil j = 1 − lxail j + lxcil j Step 4.4 Calculate crisp value:   = minail j + xil j maxcil j − minail j

kil j

(6.5)

(6.6)

(6.7)

Step 4.5 Determining the normalized direct relation matrix using following equations:  1 (6.8) ki j = ki1j + ki2j + · · · + kil j l Step 5 Determining the normalized direct relation matrix. The DRM is normalized as:  A where K = ki j n×n = s ⎛ ⎞ s = max⎝ max

1≤i≤n

n j=1

ki j , max

1≤ j≤n

n

ki j ⎠

Step 6 Calculate the total relation matrix T.    T = ti j n×n = lim K + K 2 + K 3 · · · + K m = K (I − K )−1 m→∞

(6.9)

i=1

(6.10)

Step 7 Construct the casual diagram. Let Ri and C j be n × 1 and 1 × n matrices with respect to the row-sum and column-sum of the T. Ri shows the overall effect that by factor i exerts on other factor while C j shows the overall effect that factor j receives. Thus, (Ri + C i ) specifies the significance level of ith factor, while (Ri − C i ) specifies the resultant effect of ith factor. The DMs sometimes fix a minimum level called the threshold level to reduce the factors that have negligible effect on

6 Fuzzy DEMATEL Approach to Identify the Factors…

73

others. In such cases, the factors that have value higher than threshold limit will be chosen to map in the dataset (Ri + C i , Ri − C i ) in the form of casual diagram.

6.4 Numerical Interpretation In order to respond to the changing market dynamics and attract customer attention, the online retailers need to understand the crucial factors, which can impact the buying behaviour of customers. In this context, the foremost steps are to shortlist the essential factors and then segment them into cause and effect groups. The company set up a decision body comprising of general manager, two senior managers, quality manger and marketing manager to acquire the necessary segmentation. The decision body followed the DEMATEL methodology for analysing the factors in consideration. First, lists of factors were identified based on the literature review and discussion with these decision-makers (DMs) as given in Table 6.1. The DMs also decided to use fuzzy five-pointer linguistic scale for assessment of these metrics. The value of this scale is denoted using five linguistic terms: ‘very high influence (VH)’ having a triangular fuzzy number (TFN) as (0.75, 1.0, 1.0), ‘high influence (H)’ with TFN Table 6.1 List of factors Notation

Factors

Description

References

FM1

Aesthetics

Aesthetics associates to attention, emotion and understanding. An aesthetic interface appeals to consumer’s attentiveness and gets them involved and engrossed in an action. It also bears a clear, exclusive appearance of the interface. Classical aesthetics denotes uniformity in design and concepts like ‘cleanliness’, enjoyable, ‘symmetrical’ and ‘aesthetic’. Expressive aesthetics signifies the design traits like creativity, special effects, originality, sophisticated and fascinating

Jennings [8], Sautter et al. [9], Lavie and Tractinsky [10]

(continued)

74

L. Gaur et al.

Table 6.1 (continued) Notation

Factors

Description

References

FM2

Compatibility

Nowadays, people use mobiles and tablets for online purchasing. Businesses are now investing more time and resources to recognize the multichannel customer experience. Today’s shoppers are expected to buy products online on any device. Thus, crafting an amalgamated customer experience is exceptionally vital for businesses. They are required to offer Web, mobile and tablet touch points that line up with staple brand characteristics backing up corporate goals. The colours and images should be analogous throughout social media channels in order to generate an integrated experience for the customer and also to stay competitive

Bilgihan et al. [11)

FM3

Load time

The time lapsed when a customer requests a fresh web page and when the web page is fully accessed by them. Fast web pages exhibit their content incrementally, as it is loaded by the browser. This helps giving the user the information they requested as soon as it is available

Manhas [12]

(continued)

6 Fuzzy DEMATEL Approach to Identify the Factors…

75

Table 6.1 (continued) Notation

Factors

Description

References

FM4

Navigation

Navigation systems are key constituent that assists customers to identify the location and content to find among accessible resources on Web. A well-crafted navigation system is one of the important factors regulating accomplishment of a Website

Farnum [13], Hughes et al. [14], George [15], Maloney and Bracke [16]

FM5

Content

Website content refers to the features, utilities, information and merchandises presented on a Website, not including features of Web look or design. Website content elements is significant in assisting consumer in making judgment when buying on Internet

Huizingh [17], Adel and Palvia [18], [19], Ranganathan and Ganapathy [20]

FM6

Security

Security conveys the competence of a firm to defend and protect its customers from online deceit through security procedures. It comprises both managerial and technical methods. A tab on who can access the data within the organization and the purpose of this access. The technical processes include elements, such as encryption in managing of the data, data storage on secure servers and the usage of passwords

Mandi´c [21]

(continued)

76

L. Gaur et al.

Table 6.1 (continued) Notation

Factors

Description

References

FM7

Integration with social media

Many of the information on social media about any merchandise, brand and business are consumer generated and communicated through social networks, blogs, online groups, customer forums, etc. These networks are not in marketers control, and many facts and figures exchanged are related to customer experiences from consuming merchandises or services and user remarks as product appraisals, endorsements to other consumers, comments about enhancements and even guidance for usage

Constantinides et al. [22]

FM8

Accessibility

This is a significant constituent in the designing of Websites. The World Wide Web Consortium (W3C) has defined the term as ‘individuals having incapacities can perceive, apprehend, traverse, and interact with the web’. They may include optical, vocal, bodily and neural incapable individuals. So, Web accessibility is of grave importance in B2C online Websites for users of all ages and especially with disabilities. Accessible Website can use of assistive proficiency such as screen readers, speech recognition, alternate indicating devices, alternative keyboards and the Website displays

Lazar and Sears [23], Sohaib and Kang [24], Lazar et al. [25]

(continued)

6 Fuzzy DEMATEL Approach to Identify the Factors…

77

Table 6.1 (continued) Notation

Factors

Description

References

FM9

Ease of use

The magnitude to which a shopper considers that using a Website would be uncomplicated. Online buying has been enthusiastically accepted as it has made the process simpler. Ease of use deciphers into non-complexity degree and promotes the amount to which Internet is perceived uncomplicated at best

Davis [26], Limayem et al. [27]

FM10

Multimedia

It indicated any element related to Website which is dissimilar from transcribed, written content. It includes elements such as images, graphics and infographics, videos, interactive content. It helps in breaking the script such that it permits customers to intake it at a manageable speed. It also offers customer to engross with that is not textual content. Multimedia makes for a great user experience. It also assists in declining recoil rate as it tempts users to remain on the web page for an extended duration

Bilgihan’s [28], https:// www.webpagefx.com/ multimedia.htm

(0.5, 0.75, 1.0), ‘low influence (L)’ with TFN (0.25, 0.5, 0.75), ‘very low influence’ with TFN (0, 0.25, 0.5) and no influence (No) with TFN (0, 0, 0.25). Following this, in the step, the relationship between these factors was identified these DMs using fuzzy linguistic scale. Table 6.2 gives the initial direct matrix (IDM) assessment of the identified factors as given by general manger. Once these initial relationships are identified by all the three DMs, then using CFCS method Eqs. (6.1)–(6.8), we aggregate these assessment data. Table 6.3 shows the aggregated IDM.

No

No

No

H

H

No

H

VH

L

VH

FM1

FM2

FM3

FM4

FM5

FM6

FM7

FM8

FM9

FM10

FM1

H

No

VL

No

No

No

VL

VL

No

No

FM2

VH

No

No

VL

VL

L

No

No

VH

VH

FM3

Table 6.2 Linguistic assessments by general manager

L

VH

VH

NO

NO

VH

No

No

VL

H

FM4

No

No

No

H

No

No

H

H

No

L

FM5

No

L

No

VL

No

No

No

No

No

No

FM6

VL

VL

No

No

L

No

VL

No

No

H

FM7

L

VH

No

No

No

L

VH

No

VL

VH

FM8

L

No

VH

No

No

VL

VH

No

No

L

FM9

No

L

L

VL

No

No

L

VH

H

VH

FM10

78 L. Gaur et al.

0.33

0.33

0.57

0.79

0.66

0.33

0.80

0.96

0.60

0.92

FM1

FM2

FM3

FM4

FM5

FM6

FM7

FM8

FM9

FM10

FM1

Table 6.3 Integrated IDM

0.68

0.63

0.74

0.56

0.32

0.70

0.78

0.68

0.33

0.33

FM2

0.83

0.54

0.53

0.33

0.32

0.66

0.44

0.33

0.88

0.96

FM3

0.61

0.88

0.79

0.58

0.58

0.79

0.33

0.46

0.60

0.76

FM4

0.79

0.58

0.74

0.70

0.58

0.33

0.74

0.71

0.58

0.51

FM5

0.49

0.58

0.44

0.43

0.33

0.48

0.58

0.33

0.32

0.33

FM6

0.78

0.66

0.75

0.33

0.58

0.61

0.56

0.35

0.56

0.76

FM7

0.61

0.73

0.33

0.75

0.49

0.78

0.88

0.58

0.78

0.96

FM8

0.55

0.33

0.87

0.68

0.63

0.43

0.88

0.54

0.58

0.51

FM9

0.33

0.66

0.61

0.78

0.49

0.79

0.65

0.78

0.71

0.96

FM10

6 Fuzzy DEMATEL Approach to Identify the Factors… 79

80

L. Gaur et al.

Further, the normalized direct relation (NDR) matrix was determined using IRD matrix as given in Table 6.4. Again, the total relation matrix as shown in Table 6.5 was obtained applying step 6. Then, applying formulas given in step 7, we get the causal diagram given in Fig. 6.1 by mapping a dataset of (C i + Ri , C i − Ri ) as given in Table 6.6. The influence of one metric over other can be given by identifying threshold level in the total relation matrix as discussed in step 7. On analysing this causal diagram given in Fig. 6.1, it can be interpreted that metrics were visually distinguished into the cause and effect group. Aesthetics, navigation, security, integration with social media and ease of use come under the cause group, while compatibility, load time, content, accessibility and multimedia fall under the effect group.

6.5 Conclusion Owing to the growing market of horizontal e-commerce platforms, the market for e-commerce players has increased manifolds. This has led to shift in the focus of the e-retailers towards improvising their online portals. In this direction, the first and the foremost step is to understand the factors that affect the buying behaviour of Indian consumers, which is the motivation behind the present work. The present study focuses on developing a quantitative assessment framework in order to identify and evaluate the relationships between the factors that can lead to enhanced online user experience using fuzzy set theory and DEMATEL technique. The Website builds the most imperative interface for the online shoppers providing the first-hand impression about the vendors on the Internet 1. Online shopping is the interface between the buyer and the retailer through the machine. It is thus essential that the retailer’s Website should appeal the customers and hence should be aesthetic. Website aesthetics should also be supported by a good and easy to use and navigate system. The purpose of retailer’s Website should be to generate an experience that is unparallel and that will reinforce the customers’ intent to revisit the site more often. At times, customers are apprehensive of the actions to be taken if anything goes wrong. So, the retailer’s Websites where buyers are able to get connected to different social media sites are able to attract customers 29. This finally emboldens trust and loyalty among the buyers, thus enhancing their customer experience. Attributes such as aesthetics, navigation and ease of use are very important for the customer Liu [30]. These factors if not properly taken care of by the e-retailers may lead to customers switching to other retailers. The paper aims to identify and evaluate the contextual relationships between the identified factors which will aid the e-retailer to enhance online user experience. The paper broadly classifies the elements into cause and effect group and presents a visual graph for the decision-making body. This will further help the organization to focus on a smaller set of factors. Outcomes of this model can be used further in other decision-making approaches in order to get an in-depth understanding of the problem.

0.04

0.04

0.08

0.11

0.09

0.04

0.11

0.13

0.08

0.13

FM1

FM2

FM3

FM4

FM5

FM6

FM7

FM8

FM9

FM10

FM1

0.09

0.09

0.10

0.08

0.04

0.10

0.11

0.09

0.04

0.04

FM2

Table 6.4 Normalized direct relation matrix

0.12

0.07

0.07

0.04

0.04

0.09

0.06

0.04

0.12

0.13

FM3

0.08

0.12

0.11

0.08

0.08

0.11

0.04

0.06

0.08

0.11

FM4

0.11

0.08

0.10

0.10

0.08

0.04

0.10

0.10

0.08

0.07

FM5

0.07

0.08

0.06

0.06

0.04

0.06

0.08

0.04

0.04

0.04

FM6

0.11

0.09

0.10

0.04

0.08

0.08

0.08

0.05

0.08

0.11

FM7

0.08

0.10

0.04

0.10

0.07

0.11

0.12

0.08

0.11

0.13

FM8

0.07

0.04

0.12

0.09

0.09

0.06

0.12

0.07

0.08

0.07

FM9

0.04

0.09

0.08

0.11

0.07

0.11

0.09

0.11

0.10

0.13

FM10

6 Fuzzy DEMATEL Approach to Identify the Factors… 81

0.82

0.73

0.72

0.90

0.84

0.60

0.83

0.94

0.82

0.91

FM1

FM2

FM3

FM4

FM5

FM6

FM7

FM8

FM9

FM10

FM1

0.80

0.76

0.84

0.73

0.55

0.78

0.83

0.68

0.67

0.75

FM2

Table 6.5 Total relation matrix

0.83

0.75

0.82

0.71

0.55

0.78

0.79

0.64

0.74

0.84

FM3

0.86

0.86

0.92

0.80

0.64

0.85

0.84

0.70

0.76

0.87

FM4

0.87

0.80

0.89

0.80

0.62

0.78

0.88

0.73

0.75

0.83

FM5

0.59

0.58

0.61

0.54

0.42

0.57

0.61

0.48

0.50

0.57

FM6

0.84

0.78

0.86

0.72

0.60

0.78

0.82

0.65

0.71

0.83

FM7

0.93

0.89

0.92

0.88

0.67

0.91

0.97

0.77

0.83

0.96

FM8

0.80

0.74

0.87

0.76

0.61

0.76

0.86

0.67

0.71

0.79

FM9

0.87

0.87

0.94

0.86

0.65

0.90

0.92

0.78

0.81

0.94

FM10

82 L. Gaur et al.

6 Fuzzy DEMATEL Approach to Identify the Factors…

83

Fig. 6.1 Causal diagram

Table 6.6 Identification of cause and effect segments Ri

Ci

Ri + C i

Ri − C i

FM1

8.24

8.16

16.41

0.08

8.24

FM2

7.26

7.42

14.69

−0.15

7.26

FM3

6.86

7.50

14.37

−0.63

6.86

FM4

8.46

8.13

16.60

0.33

8.46

FM5

7.98

7.99

15.98

−0.01

7.98

FM6

5.96

5.51

11.48

0.45

5.96

FM7

7.67

7.64

15.31

0.03

7.67

FM8

8.64

8.76

17.41

−0.12

8.64

FM9

7.90

7.63

15.53

0.27

7.90

FM10

8.36

8.60

16.96

−0.24

8.36

Ri

References 1. Gaur L, Anshu K (2018) Consumer preference analysis for websites using e-TailQ and AHP. Int J Eng Technol 14–20 2. Weinman J (2015) The cloud and the economics of the user and customer experience. IEEE Cloud Comput 2(6):74–78 3. Hoffman DL, Novak TP (1996) Marketing and hypermedia computer mediated environments: conceptual foundations. J Mark 60(3):50–68 4. Nielsen J (1993) Usability engineering. Academic Press, San Diego 5. Heimlich J (1999) Evaluating the content of web sites. Environmental education and training partnership resource library, Ohio State University Extension 6. Agarwal V, Govindan K, Darbari JD, Jha PC (2016) An optimization model for sustainable solutions towards implementation of reverse logistics under collaborative framework. Int J Syst Assur Eng Manag 7(4):480–487

84

L. Gaur et al.

7. Opricovic S, Tzeng GH (2003) Defuzzification within a multicriteria decision model. Int J Uncertainty Fuzziness Knowl-Based Syst 11(05):635–652 8. Jennings M (2000) Theory and models for creating engaging and immersive commerce Websites. In: Proceedings of SIGCPR 2000. ACM Press, pp 77–85 9. Sautter P, Hyman MR, Lukosius V (2004) E-Tail atmospherics: a critique of the literature and model extension. J Electron Commer Res 5(1):14–24 10. Lavie T, Tractinsky N (2004) Assessing dimensions of perceived visual aesthetics of web sites. Int J Hum Comput Stud 60(3):269–298 11. Bilgihan A, Kandampully JA, Tingting Z (2016) Towards a unified customer experience in online shopping environments: antecedents and outcomes. Int J Qual Serv Sci 8 12. Manhas J (2013) A study of factors affecting websites page loading speed for efficient web performance. Int J Comput Sci Eng 1(3):32–35 13. Farnum C (2002) Information architecture: five things information managers need to know. Inf Manag J 36(5):33–39 14. Hughes J, McAvinia C, King T (2004) What really makes students like a website? What are the implications for designing web-based language learning sites? ReCALL 16(1):85–102 15. George C (2005) Usability testing and design of a library website: an iterative approach. Acad Res Lib 21(3):167–180 16. Maloney K, Bracke P (2004) Beyond information architecture: a systems integration approach to web-site design. Inf Technol Lib 23(4):145–152 17. Huizingh EKRE (2000) The content and design of web sites: an empirical study. Inf Manag 37:123–134 18. Adel AM, Palvia PC (2002) Developing and validating an instrument for measuring userperceived web quality. Inf Manag 39:467–476 19. Lohse GL, Spiller P (1999) Internet retail store design: how the user interface influences traffic and sales. J Comput Mediated Commun. Available at http://jcmc.indiana.edu/vol5/issue2/lohse. htm 20. Ranganathan C, Ganapathy S (2002) Key dimensions of business-to-consumer web sites. Inf Manag 39:457–465 21. Mandi´c M (1989) Privatnost i sigurnost u elektroniˇckom poslovanju, Tržište br. 2, Zagreb 22. Constantinides E, Lorenzo C, Gómez-Borja MA, Geurts P (2008) Effects of cultural background on internet buying behaviour: towards a virtual global village? In: Psaila G, Wagner R (eds) E-commerce and web technologies. EC-Web 2008. Lecture notes in computer science, vol 5183. Springer, Berlin 23. Lazar J, Sears A (2006) Design of e-business web sites. In: Handbook of human factors and ergonomics. Wiley, New York, pp 1344–1363 24. Sohaib O, Kang K (2013) The importance of web accessibility in business to-consumer (B2C) Websites’. In: 22nd Australasian software engineering conference, pp 1–11 25. Lazar J, Alfreda D, Greenidge K (2004) Improving web accessibility: a study of webmaster perceptions. Comput Hum Behav 20:269–288 26. Davis F (1989) Perceived usefulness, perceived ease of use and user acceptance of information technology. MIS Q 13:319–340 27. Limayem M, Khalifa M, Frini A (2000) What makes consumers buy from Internet? A longitudinal study of online shopping. IEEE Trans Syst Man Cybern 30:421–432 28. Bilgihan S (2016) 4 reasons multimedia is a crucial element for websites. https://www. webpagefx.com/multimedia.htm 29. Anshu K, Gaur L, Agarwal V (2019) Evaluating satisfaction level of Grocery E-Retailers using intuitionistic fuzzy TOPSIS and ECCSI model. In: 2017 international conference on infocom technologies and unmanned systems (Trends and future directions) (ICTUS), pp 276–284 30. Liu CA (2000) Exploring the factors associated with web site success in the context of electronic commerce. Inf Manag 38:23–33

Chapter 7

Application of Statistical Techniques in Project Monitoring and Control Shalu Gupta and Deepshikha Bharti

7.1 Introduction In these modern days, software development is becoming complex and expectations of the customers are increasing. The customers are setting high benchmark for the product quality. Product quality assessment is essential to meet the customer expectation. Project monitoring at each phase is crucial and achievable by applying statistical techniques. Metrics data generated from projects also helps to optimize the processes defined in the organization. Process improvement is accomplished using effectively statistical project monitoring techniques. This paper categorized as follows: Section 7.2 describes challenges faced during project execution. Section 7.3 explains the life cycle of metrics data collection and analysis. Section 7.4 explains creation of process performance baseline (PPB) and model. Section 7.5 explains application of statistical techniques in project monitoring and control. Section 7.6 describes how to measure improvements in projects. Section 7.7 describes conclusion and major benefits achieved using statistical techniques.

7.2 Major Challenges in Project Execution and Data Collection [1] Project monitoring and control is one of the biggest challenges in project execution. To control the projects and track the progress, the statistical techniques are applied. S. Gupta (B) · D. Bharti Quality Assurance, Centre for Development of Advanced Computing, Noida, Uttar Pradesh, India e-mail: [email protected] D. Bharti e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 P. K. Kapur et al. (eds.), Strategic System Assurance and Business Analytics, Asset Analytics, https://doi.org/10.1007/978-981-15-3647-2_7

85

86

S. Gupta and D. Bharti

Data is collected from the projects to carry out the data analysis and derivation of process performance models. During this process, the major challenges faced are summarized below: 1. An unrealistic expectation of clients and stakeholders and vague estimations leads to schedule and cost overrun. 2. Identification of appropriate metrics for the projects in the initiation stage and data collection as per the identified metrics is a major challenge for the project managers. 3. Fast delivery and changes in software industry force the project managers to make the delivery time shorter. Due to shorter delivery time, maintaining the project data has been ignored. 4. From the main product, if the customization for different customers takes place, then data collection for different customers for the customization was ignored and this leads to incomplete data.

7.3 Life Cycle of Metric Data Collection and Analysis Throughout project execution, project data is collected from different projects. In order to follow a process improvement approach, we have to frame the decision point before diving into data analysis for its integrity and completeness. Usually, the best way to data analysis is to adopt a hypothesis-driven approach. Outlier analysis and data normality testing are also part of data analysis. Creating a clear hypothesis ensures that your data analysis is focused, reliable and time-efficient. Once you begin, we must only focus on collecting and analyzing data that is directly related to proving or disproving hypothesis. Based on hypothesis results, data is organized into different groups/buckets. After that, data normalization is completed. Process performance baselines are prepared for process (Y factor) and subprocesses (X factors). Using the normalized data, the process performance model or regression equation is prepared. The regression equation reflects the relationship between Y and X factors. Regression equations are used by project managers in the project to execute the statistical techniques. These techniques work as predictive tools to monitor the project objective or quantitative goal. The life cycle of metric data collection and analysis is repetitive periodically throughout the project execution (Fig. 7.1).

7.3.1 Project Data Collection Method Project data should be collected for each phase of the development lifecycle. The phase-wise metrics data is collected in data sheet as shown in Fig. 7.2. Data from each SDLC phase of the project is captured in the data sheet module wise. Mainly, the data sheet captures the

7 Application of Statistical Techniques in Project …

87

Project 1

Project 2

Data Collection

Analysis of Data

Process Performance Baseline

Project N

Process Performance Model Collection of next cycle of Data

Statistical Techniques Application

Fig. 7.1 Life cycle of metrics data collection and analysis

• Module wise planned effort and actual effort for requirement, design, coding, unit testing, system testing and user acceptance phase. • Review effort and defects in every phase. • Module wise planned start date and end date. • System testing defects. • System testing bug fixing efforts.

7.4 Creation of Process Performance Baseline and Models [2] Process performance baselines and models are derived from data collected from the software development processes. Process performance baselines are derived from the process execution in the projects. This baseline basically consists of mean and ±3 standard deviation to show the variation of process execution. The appropriate process performance baselines are then used as input source for the formation of process performance model for a new project. Process performance baseline (PPBs) and process performance model (PPMs) are used in project control for monitoring the project using different statistical techniques.

7.4.1 Process Performance Baseline Creation [3] Key activities to acquire the process performance baseline involve: 1. Collection of project data in data sheet 2. Review of project data for integrity and completeness

88

S. Gupta and D. Bharti

Fig. 7.2 Metrics data sheet

3. Removal of extreme outliers falling beyond the three-sigma limits. 4. Remaining data points having common cause variations are in use for establishing organizational baselines. 5. Normality test is conducted to test the normality of data based on p value. If P value is >0.05 data is said to be normal (Fig. 7.3). 6. Project data baselines are established by calculating mean and standard deviation (Fig. 7.4).

7 Application of Statistical Techniques in Project …

89

Fig. 7.3 Normality test

Fig. 7.4 Process performance baseline

7. Control chart is generated by calculating upper control limit and lower control limit using derived mean and standard deviation UCL = μ + 3 × σ LCL = μ − 3 × σ If all the data points within the UCL and LCL range, it means process is stable (Fig. 7.5).

90

S. Gupta and D. Bharti

Fig. 7.5 Control chart

7.4.2 Process Performance Model Creation After establishing process performance baseline, process performance model (PPM) is created using the identified critical subprocess. Regression testing is conducted for identified subprocesses to generate the model. During model generation, the following parameters should be satisfied 1. R-Sq >50% indicates that the model fits the data. 2. R-Sq (adj) >50% indicates the number of predictors. This reflects the suitability of model. 3. p-value >0.05. 4. The regression equation should be logically acceptable. 5. The regression equation gives the relationship between subprocess (independent factors) and process (dependent factor). The dependent factors, i.e., Y factors, shall be improved by controlling the independent factors, i.e., X factors (Fig. 7.6).

7 Application of Statistical Techniques in Project …

91

Fig. 7.6 Sample process performance model

7.5 Application of Statistical Techniques in Project Monitoring and Control Project progress tracking and monitoring are required throughout the life cycle of the project. A statistical approach for historical project data stored in a project database is applied to achieve the project goal. This approach is supported by periodic data collection and statistical analysis of collected data at any time during the project. Emphasis is placed upon achieving results that are easy to use and interpret, rather than on sophisticated statistical methods. Process performance baseline and models are used as statistical tools project managers set the quantitative goal of the project. The monitoring of the goal is carried out using what-if analysis. During analysis, probability of meeting the goal is calculated. If probability of meeting the goal is less than 50%, then different causal analysis and resolution techniques are applied.

7.5.1 What-if Analysis [4] Quantitative goal of the project is set based on the organization process performance baseline. Process performance model is used to control the subprocesses. Probability of achieving the project goal is calculated. This probability helps us to know how realistic is to achieve project goal. This also helps us to know how much improvement needs to be carried out in subprocess to achieve the project goal. This process is completed using what-if analysis. In what-if analysis, subprocesses values are increased or decreased to control the main process or Y factor. At each stage, probability of meeting the project quantitative goal is measured (Fig. 7.7).

7.5.2 Causal Analysis and Resolution (CAR) Based on what-if analysis outcome, the causal analysis is performed to identify the root cause to improve the subprocesses. The analysis is performed using different techniques, such as 5-why analysis, ANOVA and Pareto analysis. Base on the analysis, resolution is identified to improve the subprocesses.

92

S. Gupta and D. Bharti

Fig. 7.7 Sample what-if analysis

Fig. 7.8 5-why analysis sample

7.5.2.1

5-Why Technique

This technique is basically used to do root cause analysis using 5-why technique. In this technique, the analysis is carried out by asking why questions to determine the relationship between different parameters and find out the actual method to improve the independent factors (Fig. 7.8).

7.5.2.2

ANOVA

Root cause through ANOVA test

7 Application of Statistical Techniques in Project …

93

Fig. 7.9 ANOVA

Analysis of variance (ANOVA) is a statistical method that is used to derive the differences between two or more means. One-Way ANOVA One-way analysis of variance is to evaluate or compare when factor variables are more than three groups based on one-factor variable (Fig.7.9). Two-Way ANOVA Two-way analysis of variance is to evaluate or compare when variable is more than two groups based on one-factor variable.

7.5.2.3

Pareto Chart

Pareto chart is a quality tool used to identity efforts should be focused to achieve the greatest improvement. According to Pareto tool, 80% of the problems resolved or improved by doing 20% changes in the processes/techniques/tools. It is based on frequency of defects/problems/issues (Fig. 7.10).

7.6 How to Measure Improvements in Projects [5] Hypothesis testing is executed to measure and statistically validate the improvements in project by applying different statistical techniques. Two sample t-tests (Mean) and F-test are performed.

94

S. Gupta and D. Bharti

Fig. 7.10 Pareto chart

7.6.1 Two Sample T-Test T-test is a hypothesis test that compares the process performance baseline means of two data cycles. If p value in the test is greater than 0.05, this signifies two cycles project data mean is statistically different. As P value is 0.05, the data in Phase 1 and Phase 2 is not significantly different, so while there are visible improvements, statistically there is no change in both data (Fig. 7.11).

Fig. 7.11 T-test

7 Application of Statistical Techniques in Project …

95

Fig. 7.12 F-test

7.6.2 Two Sample F-Tests F-test is a hypothesis test that compares the process performance baseline standard deviation of two data cycles. If p value in the test is less than 0.05, this signifies two cycles project data standard deviation is statistically different (Fig. 7.12). Further, process capability index is measured to measure the capability of process performance. Process capability is measured using Cp and Cpk analysis.

7.7 Conclusion Project management involves constantly taking decisive steps based on the progress of project. The objective of the project management is to have the capability to reliably predict cost, schedule and quality outcomes. The statistical method provides the potential to greatly enhance management information for the purpose of management control. Statistical analysis techniques, i.e., the systematic analysis of quantitative project data, help to get important information, see patterns and useful trends for the decisionmaking process emerge. Adoption of statistical processing gives accurate prediction about the project progress and product quality. Causal analysis of project data helps

96

S. Gupta and D. Bharti

in identifying the root cause of the issues and finds out how the improvement in the subprocesses shall be carried out. Further, the improvements are also statistically validated using hypothesis testing. This has been proved to be effective to appraise software projects. Acknowledgements This work was executed out under the CMMI processes implementation. The authors want to acknowledge Smt. Priti Razdan Associate Director and Smt. R. T. Sundari C-DAC Noida for supporting this work and valuable feedback.

References 1. Margarido IL, Faria JP, Vieira M, Vidal RM, Challenges in implementing CMMI high maturity: lessons learnt and recommendations [email protected] 2. Stoddard RW, Ben Linders SEI, Sapp EM, Robins W, Exactly what are process performance models in the CMMI. Air Logistics Center 12 June 07 3. Silveira M, Supporting the CMMI metrics framework thru level 5 EDS electronic data systems do Brasil Ltd., Sreenivasan S, Sundaram S, Process performance model for predicting delivered defect density in a software scrum project. Research Scholar at Anna University, Chennai 4. Wang X, Ren A Liu X, Researching on quantitative project management plan and implementation method 5. Venkatesh J, Cherurveettil P, Thenmozhi S, Balasubramanie P (2012) Analyse usage of process performance models to predict customer satisfaction, 47(12) 6. India, vol 18(5), Ver. IV (Sep–Oct 2016), pp 60–73. e-ISSN: 2278-0661, p-ISSN: 2278-8727

Ms. Shalu Gupta is working as Principal Technical Officer in C-DAC Noida. She is certified Project Management Professional. She has done Masters in Computer Science and sixteen years of experience in software development. She has worked in the field of NMS, SNMP, Optical comm., DSLAM, OCR and Quality Assurance. She has worked in various companies like CDoT, Wipro Technology and Flextronics Software Systems. Currently, she is associated with the Quality Assurance Group. She has published ten international and national research papers. Her area of interest includes Software Quality Assurance, Software Metrics, Quality Management and Testing.

7 Application of Statistical Techniques in Project …

97

Ms. Deepshikha Bharti is working as a Project Engineer (IT) in C-DAC, Noida. She has done Masters in Electronics and Communication. She has four years of work experience in Quality Assurance. She has worked in the field of ISO9001, ISO27001, and Quality Assurance and has published nine research papers in a National and International conference. Her area of interest includes Software Quality Assurance, Software Metrics.

Chapter 8

Sensor-Based Smart Agriculture in India Pradnya Vishwas Chitrao, Rajiv Divekar, and Sanchari Debgupta

8.1 Introduction The chapter talks about various factors affecting agriculture in India. It examines how drones or sensor-based technology can help in overcoming the mal-effects of these factors. It also discusses government’s role in the use of drones for improving soil health and helping farmers to monitor their crops and ultimately getting a good yield.

8.1.1 Drone Technology India even today is dependent to a great extent on its agriculture. Various factors like climate change, irregular rainfall, water scarcity, unavailability of electricity and many others cause many problems. Today remote technology can answer some of the problems caused by climate factors. One can use sensor-based wireless network in the form of drones for agriculture. Only a few years ago, drones were associated with warfare. Today, unmanned aerial vehicles or UAVs (nicknamed drones) are much talked about utility value in different fields like journalism, corporate, film making and social work. They are a form of robots and an outcome of the new Internet of Things era. Farmers consistently require precise and latest data related to the status of their crops and the environmental status of their farm. Aircrafts have been deployed right from the beginning of the twentieth century. Since then, agricultural experts have progressively utilized sensorbased instruments for determining crop health from the sky. UAVs are evidence of refined sensor technology and are useful at the micro-level or for small-scale farms. P. V. Chitrao (B) · R. Divekar · S. Debgupta Department of General Management, Symbiosis Institute of Management Studies (SIMS), Pune, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 P. K. Kapur et al. (eds.), Strategic System Assurance and Business Analytics, Asset Analytics, https://doi.org/10.1007/978-981-15-3647-2_8

99

100

P. V. Chitrao et al.

8.2 Literature Review Drones have no onboard pilot and are self-propelled. They are variously known as unmanned aerial vehicles (UAVs), or unmanned aerial/aircraft systems (UASs; to cover elements that are ground based to the system) and sometimes as remotely piloted aircraft systems (RPAS). There can be two types of drones: The first are devices capable of autonomous flight called drones; a ground-based operator controls the second type and is called remotely piloted vehicles (RPVs). Drones were initially created for military purposes during the Second World War [4]. Recently, their usage has increased, thanks to the miniaturization and price reduction of sensory apparatus like cameras and GPS, which made feasible by the sector of smartphone [1]. Today, we can get different kinds of drones. They can be classified on the basis of variety, size, their life span and what other devices can be fitted on them [12]. The simplest classification is on the basis of whether they have devices that are fixed wing or rotary wing. Today, however, there are new types that have the feature of fixed as well as rotary wings. Drones’ capabilities depend on the number of instruments that can be mounted on them. Usually, equipment like video and still cameras (either thermal that are passively reflected or devices that have infrared radiation or convert thermal energy into electromagnetic energy), devices that have audio monitoring, loudspeakers, sprayers that spray liquid herbicides, electromechanical devices that measure acceleration forces, devices that emit light and satellite navigation system used to determine the position of objects on ground level. In reality, the effectiveness of drones is measured by how heavy they are and how much cargo they can carry—normally big sized drones can hold heavy equipment, while small ones can carry only very light devices. The twentieth century witnessed drones being used basically for operations conducted by armed forces, especially around the 1990s just after the first Gulf War took place [11]. But in the present century, drones are extensively being utilized for civil applications. In fact, the twenty-first century can be termed as “the drone age” [1]. Police forces in different parts of the world utilize drones for enforcing law in situations where they have to control crowds, observe people whom they suspect at night and patrol international borders [4]. Drones are also used for satellite farming [6, 9], for monitoring and managing fire (Merino et al. 2012) and for delivering medicines to regions inaccessible due to various reasons [5]. Drones are also used in some countries for commercial product delivery [2, 3]. The law monitors the way that drones are used in most countries. As per UK laws, data collected through overt monitoring (with CCTV) should be given to those being filmed when they ask for it. On the other hand, covert systems need to be first permitted under the Regulation of Investigatory Powers Act 2000 [4]. The US laws regarding small drones are more or less like the UK norms [16]. If large drones are required to be used, it is essential to get a certificate of authorization which is costly and tough to acquire [13]. The USA has also declared illegal the use of thermal image cameras for filming people without a warrant as per the Fourth Amendment

8 Sensor-Based Smart Agriculture in India

101

[4]. South Africa has formulated draft regulations regarding how drones are to be used. In India, the Ministry of Defense delayed the Ministry of Environment’s plans to use drones at Kaziranga National Park [10]. In May 2016, the Director General of Civil Aviation (DGCA) issued norms for securing a unique identification number (UIN) and also for securing permission to use a remotely piloted aircraft for civil purposes. The DGCA stated that this was because of increase in the use of UAS for non-defense reasons like judging impairment of value of property and life in areas with natural disasters, for surveys, supervising infrastructure, commercial photography, chalking and inspection in the air of ground areas, etc. In March, the Indian Government made it compulsory for people coming to India to declare drones in customs forms. After declaring the same, users have to pay tax on the item. The rule was enforced from 1st of April 2016. Drones are unmanned and so there is scope of a pilot getting injured in a crash. So in that respect, drones are safer than piloted aircraft [7] (Rango et al. 2006). They are also less dangerous for people on the ground during a crash since they are not as big in size; as a result, they are capable of causing less damage when they crash than big manned aircrafts [7]. Many drones have inbuilt safety devices that enable them to abandon a pre-planned mission and come back if there are any problems. At the same time, because they are pilotless, there is a greater possibility of drones crashing than piloted aircraft [4, 8] and consequently injuring those who are on terra firma. It is yet to be established whether drones are more of an advantage than a disadvantage when it comes to the question of secure operation. Agriculture plays a pivotal role in the supply of food. But various factors like change in climate, increase in requirement of food and inadequate water resources have created a need for modified agriculture methods. Technological progress in wireless sensor network has reduced the size of the mechanisms which in turn has facilitated their increased use in the field of agriculture. The trend thus is increasingly toward data-driven agriculture. Technology is increasingly being used to come up with innovations like Drones or unmanned vehicles that are meant to increase the production of food with less labor. Drones enable farmers to have three kinds of elaborate views: • First, it enables farmers to detect various problems right from problems related to the application of controlled amounts of water to plants at needed intervals to differences in soil quality, and even infections caused by pest and fungi that are invisible to the human eye. • Second, they are capable of having multi-spectral images, since they capture data from both the electromagnetic radiation and also the visible band, and combine them to make a picture of the crop that makes prominent differences between healthy and infected plants. The naked eye cannot see these differences. • They are even capable of surveying a crop every hour. They can combine these hourly, or daily or weekly surveys to create a sequence of displays or visuals that can depict variations in the crop, and show areas of potential problems or chances for more efficient supervision of crop.

102

P. V. Chitrao et al.

8.2.1 Drones in Indian Agriculture The Indian Government has started and launched a collective research project in July 2016 that uses drones in the field of farming sector for judging how good a soil is and paying reimbursement for losses on account of floods. The project aims to use drones and develop a locally assessed basic version of it to find out whether the soil meets its range of ecosystem functions appropriate to its environment and integrate it with optical and radar sensors for agricultural use on a very large scale in the coming years. The project is called “SENSAGRI: Sensor-based Smart Agriculture.” Indian Council of Agricultural Research (ICAR) has partnered with six institutes in order to execute this project sponsored by the Indian Government, Department of Electronics and Information Technology (DEITY), Ministry of Communication and Information Technology, Information Technology Research Academy (ITRA) and ICSR. Drones will shortly be monitoring farms in India, collecting the necessary geospatial data with the help of sophisticated devices that are frequently used to detect and respond to electrical or optical signals, and also images, etc., and sending this data immediately. This machinery and equipment developed from the application of scientific knowledge will be utilized basically at the local or regional level in the farming sector in order to understand land and crop health. In other words, drones give the extent, type and severity of damage, also send out warnings in advance and help decide the compensation to be paid as per crop insurance plans. An unmanned aerial vehicle applied to farming is basically a cost-effective aerial camera platform that has the feature of a system used to control the trajectory of an aircraft without constant “hands-on” control by a human operator being required and which uses a satellite navigation system to determine the ground position of an object (GPS) and also sophisticated devices that are frequently used to detect and respond to electrical or optical signals for gathering the required information, like a regular compact camera for images that are acquired by means of a camera or other sensors that are sensitive to visible or near-visible light in order to gather the required information. While the normal camera can give only some details about the growth of plant, coverage and miscellaneous details, sensors that measure reflected energy within several specific sections (also called bands) of the electromagnetic spectrum (multi-spectral sensors) are far more valuable and give full value of sensorbased technology. They enable one to view things that one cannot view with the naked eye, such as the level of moisture in the soil, how healthy the plant is, its stress factors and the stage of its fruits. Today, small businesses—like the San-Diego-based SlantRange—and also agricultural equipment manufacturing giants like John Deere are using information provided by aerial images that range from those that are visible to the naked eye to infrared images in order to get an idea of the height of the plant, the number of plants in the field and also how healthy the crop is, what are the nutritional deficiencies of the crop and the presence of weed or pathogen infestation. Many farmers in India are already lagging behind because they cannot afford the new technology, which means that AUS’ drones will probably only be used by very

8 Sensor-Based Smart Agriculture in India

103

large farms. At the same time, drones are becoming cheaper. For Indian farmers today, drones are useful as cheap aerial cameras. The work done by drones is less costly than works out cheaper than crop imaging done by an aircraft manned by a human, which comes at a price of around $1000 an hour. Farmers can now buy drones themselves for less than $1000 each. Drones give farmers the capability to determine more accurately the water content of soil, to detect quickly and precisely the application of controlled amounts of water to plants at needed intervals and pest issues and to get an overall judgement of the health of the farm, on a daily basis or even on an hourly basis if so required. If farmers get comprehensive, precise data, they are in a position to reduce water usage and also the level of chemical content in our environment and the food available to us. In short, this military technology can be developed into a green-tech tool.

8.3 Research Objective The researchers plan • • • •

To study the current state of sensor-based technologies in the Indian scenario. To find out the challenges of sensor-based agriculture techniques in India. To find out the various implementation measures in order to reduce cost. The role of govt. in bringing about the use of sensor-based technologies in agriculture.

8.4 Research Methodology On account of lack of funding or sponsorship, primary research was not possible; the researchers conducted the same as secondary research. At a later stage, the researchers would ideally like to conduct primary research in collaboration with another educational cum research institute in the field of agriculture, especially if research funds are allotted for the project.

8.5 Findings 8.5.1 Drones for Farmers “Drones” for most farmers denote basically a low-cost aerial camera. For them, they function like either a small flying machine, such as an airplane which is capable of flight using wings that generate lift caused by the aircraft’s forward airspeed and the

104

P. V. Chitrao et al.

shape of the wings or, generally, a quadrotor helicopter or quadrotor, quadcopters and other multi-bladed small helicopters. These aircrafts are fitted with a system that controls the trajectory of an aircraft without continuous “hands-on” control by a human operator being required; they use Global Positioning System (GPS) and a standard point-and-shoot camera with a small sensor and a fixed lens that is controlled by the autopilot; a set of instructions or computer-based programs can be used at ground level to synthesize photos taken from a higher vantage point into a high-resolution satellite image. It is different from a conventional small flying machine that is controlled remotely by an operator on the ground using a handheld radio transmitter; but in a typical drone manufactured for farmers, the software system executes all the flying right from auto lifting in the air to returning to the ground. The software embedded in it chalks out the flight path in a manner that will ensure optimum coverage of the vineyards; it also monitors the camera in order to get optimal images that can be analyzed later. Agricultural drones are today being used like any other consumer devices. Farmers today want to use less water and also less chemicals to kill pests and finally come up with better agricultural products. Higher quality information can enable the reduction of water usage as also that of chemical pesticides which in turn reduces the toxins in our surroundings and our diet. From this perspective, what began as knowledge for defense purposes may actually result in solutions that are environment friendly.

8.5.2 Sensor-Based Technology Used for Agriculture in Some Experiments A study was conducted by SSAI (Sree Sai Aerotech Innovations Pvt. Ltd [14]), a company in Chennai, to understand how efficient drones were in yielding aerial imagery of plant health. The company selected one hectare of a paddy farm of one of their company key team members. The experiment clearly showed that aerial imagery can provide a range of inputs for the average landholding size of an Indian farmer, i.e., about one hectare with just a single image itself. Since the images were acquired at the pre-harvest stage of the cropping cycle, it can be a value added to a farmer’s knowledge of both the technology and a complete cropping cycle. Each plant can be seen separately from the air on account of a contrast in color in the image between the color of the ground and the chlorophyll color emitted by the plant. Both formal and informal waterways also could be mapped with the gradient of the field to optimize irrigation to plants. One can also check whether there is weed growth within the interplant gap at the beginning and middle of the cropping season in the images as it is clearly visible in the aerial image. The aerial imagery also helped determine whether the crops were ready for harvesting. In another survey, a group of spatially dispersed and dedicated sensors for monitoring and recording the physical conditions of the environment and organizing the collected data at a central location was employed for a groundnut crop field to find

8 Sensor-Based Smart Agriculture in India

105

out any relation between crop–weather environment. The technology was utilized to procure information at the local level. The procedure was carried out in a region of India that receives precipitation below potential evapotranspiration, but not as low as a desert climate. The technology was indeed useful for getting the necessary information [15]. Drones or robotic technology started off as a means to take over certain farm tasks and to promote “precision agriculture.” So initially, just having an aerial view of one’s farm added value as farmers could pinpoint irrigation leaks, leaf color variation or pests. But gradually, farmers started complaining that by the time the drone images told them about certain issues, it was too late to rectify the situation. Farmers started demanding “actionable intelligence.” So drone manufacturers have started manufacturing drones that give crop health maps that show areas of potential yield loss with the help of Normalized Differentiation Vegetation Index (NDVI). Some drone manufacturers now also provide solutions to the problems detected.

8.6 Recommendations The researchers recommend the following: The functioning of WSN system must be independent and should last for a long period of time. The collection of information should enable the system to respond to the constantly changing event promptly. It is necessary for the system to operate for a long period of time with low conservation expenditure. Also the actual users of the mechanism do not have technical knowledge, so it must not be difficult to use. The main issue with WSN design is the amount of energy or power used. So design formulae require less energy. Analyzing the immense information in case of WSN is one of the major challenges, because the information obtained over here is unstructured. There is need to do the information recorded by the sensor database systems by the sensor nodes themselves, so whatever the information is sent to the server will be structured as well as easy to analyze. Also in some of the application, the information elicited is of no use. To reduce repetition, techniques like integrating multiple data sources are used which results in only important data being sent. Also, today, drone manufacturers today should move from just providing pictures toward analysis of the data and also recommend solutions. As a stand-alone product, drones are viewed as unnecessary cost and have little demand from poor farmers who deal with a variety of productivity constraints in addition to weather risks.

106

P. V. Chitrao et al.

8.7 Significance of the Study The benefits of the study proposed by the researchers are immense. Sequential data coming from such technology can be used to predict crop yields and will thereby enable farmers to plan their finances; it will also help the government to declare relief compensation or grants on a pre-emptive basis. This technology will be very useful for nations like India, where crops fail often and in a very random manner, as also tough to manage on account of their extensive magnitude. Often, the government becomes aware of the problem too late. By the time relief measures are put into action, the issue becomes unmanageable and affects many people.

References 1. Anderson C (2012) Here come the drones! August issue: Wired Magazine 2. Arthur C (2014) Amazon seeks US permission to test Prime Air delivery drones. The Guardian, 11 July. http://www.theguardian.com/technology/2014/jul/11/amazon-prime-airdelivery-drones. Accessed 10 Sept 2015 3. Domino’s Pizza (2014) Introducing the Domino’s DomiCopter! https://www.youtube.com/ watch?v=on4DRTUvst0. Accessed 3 Sept 2014 4. Finn RL, Wright D (2012) Unmanned aircraft systems: surveillance, ethics and privacy in civil applications. Comput Law Secur Rev 28:184–194 5. Hickey S (2014) Humanitarian drones to deliver medical supplies to roadless areas. The Guardian, 30 Mar 6. Hunt ER, Hively WD, Fujikawa SJ, Linden DS, Daughtry CST, McCarty GW (2010) Acquisition of NIR-green-blue digital photographs from unmanned aircraft for crop monitoring. Remote Sens 2:290–305 7. Jones GP, Pearlstine LG, Percival HF (2006) An assessment of small unmanned aerial vehicles for wildlife research. Wildl Soc Bull 34:750–758 8. Lee H-T, Meyn LA, Kim S (2013) Probabilistic safety assessment of unmanned aerial system operations. J Guid Control Dyn 36:610–617 9. Lelong CCD, Burger P, Jubelin G, Roux B, Labbe S, Baret F (2008) Assessment of unmanned aerial vehicles imagery for quantitative monitoring of wheat crop in small plots. Sensors 8:3557–3585 10. Naveen T (2014) Drones to keep eye on Panna tigers grounded by Wildlife Institute of India. The Times of India. http://timesofindia.indiatimes.com/city/bhopal/Drones-to-keep-eye-onPanna-tigersgrounded-by-Wildlife-Institute-of-India/articleshow/34441902.cms. Accessed 12 Sept 2014 11. Nonami K (2007) Prospect and recent research and development for civil use autonomous unmanned aircraft as UAV and MAV. J Syst Des Dyn 1:120–128 12. Paneque-Galvez J, McCall MK, Napoletano BM, Wich SA, Koh LP (2014) Small drones for community-based forest monitoring: an assessment of their feasibility and potential in tropical areas. Forests 5:1481–1507 13. Rango A, Laliberte AS (2010) Impact of flight regulations on effective use of unmanned aircraft systems for natural resources applications. J Appl Remote Sens 4:043539-1–043539-12. https:// doi.org/10.1117/1.3474649 14. Sree Sai Aerotech Innovations Pvt. Ltd (2016) Case study: drone based precision agriculture

8 Sensor-Based Smart Agriculture in India

107

15. Tripathy A, Adinarayana J, Vijayalakshmi K, Desai SMU, Ninomiya S, Hirafuji M, Kiura T (2014) Knowledge discovery and leaf spot dynamics of groundnut crop through wireless sensor network and data mining techniques. Comput Electron Agric 107:104–114 16. Watts AC, Perry JH, Smith SE, Burgess MA, Wilkinson BE, Szantoi Z, Ifju PG, Percival HF (2010) Small unmanned aircraft systems for low-altitude aerial surveys. J Wildl Manag 74:1614–1619

Chapter 9

Integrating UTAUT with Trust and Perceived Benefits to Explain User Adoption of Mobile Payments Rishi Manrai and Kriti Priya Gupta

9.1 Introduction and Literature Review Mobile technology has bought a revolution in the global banking system [1]. Along with the convenience, it also helps the bank to reach out their large pool of unbanked or underbanked customers especially in the urban markets. This not only helps their governments in financial inclusion, but also supports governments of developing nations, like India, to disperse the wages/salaries or other benefits to the accounts of the right people. Various countries around the globe have followed different models for financial inclusion because of country-specific regulatory financial infrastructure and customer requirements. Most of them have been successful in achieving the same but the others are still struggling to achieve financial inclusion. But it is quite certain that mobile payments will defiantly be a significant tool in achieving the same. A report by Business Insider Intelligence [2] claims that the usage of mobile payments is expected to reach $503 billion by 2020. Another survey conducted by Statista [3], the total global revenue from the mobile payment market will reach $930 billion by the end of 2018. Besides bringing in financial inclusion, the mobile payment industry brings in range of facilities like business account transfers, customerto-customer transfers, bill payments, proximity payments at the point of sale, and remote payments to buy goods and services. In emerging markets, mobile payments are growing faster than traditional payment mechanisms. China had surprisingly surpassed the US market as the largest global mobile payment market in 2015, with a total of $225 billion transactions [4].

R. Manrai (B) · K. P. Gupta Symbiosis Center for Management Studies—NOIDA, Constituent of Symbiosis International (Deemed University), Pune, India e-mail: [email protected] K. P. Gupta e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 P. K. Kapur et al. (eds.), Strategic System Assurance and Business Analytics, Asset Analytics, https://doi.org/10.1007/978-981-15-3647-2_9

109

110

R. Manrai and K. P. Gupta

Looking into the importance of the mobile payment technology, it becomes very important to understand what the factors which promote the BI to use are and what are the various factors which retard its growth. The previous studies have tried to find out various parameters which will affect the adoption of mobile payments like reduced efforts, improvement in performance, and benefits. On the contrary factors like risk, lack of trust in service provider or service itself had proved to affect the behavioral intentions to adopt mobile payments negatively. Several technology adoption models like UTAUT, DoI have been employed in the past to examine the role of customers, perceptions, intentions, and other factors influencing the citizen’s behavior toward mobile banking/mobile payment. Using extended models like UTAUT, researchers like Alam (2014) [5] from their research concluded that social influence (SI), performance expectancy (PE), effort expectancy (EE), facilitating conditions (FC), and perceived financial cost significantly affect individual intention to adopt mobile banking. Some other significant studies in the domain of technology adoption were done by combining the constructs of UTAUT along with trust. Their results were significant and delineate positive relationship with PE, EE, FC, and trust, with behavioral intentions to adopt mobile payments Slade [6]. Trust, as per various studies, has been reported as a significant determinant in adoption of new technology which includes transactional assurance, protection of information escrow services, etc. While adopting new technology, users with higher tendency to trust will adopt the service easily and that too with confidence [7]. Despite several attempts in the past to explain technology adoption models in the past by extending and assimilating it with other constructs, there is considerable room for a systematic exploration of significant factors that apply to a customer context which are voluntary in nature. In this research, an attempt to better understand wellestablished theories in the domain of technology adoption, i.e., unified theory of acceptance and use of technology (UTAUT), by integrating with perceived benefits as well as trust which have emerged as significant factors in the previous studies [8].

9.2 Hypothesis and Model Development 9.2.1 Unified Theory of Acceptance and Use of Technology (UTAUT) The UTAUT is a comprehensive model proposed by Venkatesh et al. [9] which was an amalgamation of various important theories and models of past. This theory proposed that out the various factors affecting the adoption of a new technology, effort expectancy, performance expectancy, facilitating conditions, social influence plays very important role. The term effort expectancy (EE) is described as the degree to which a technology brings comfort to a user [9]. Since there are a range of facilities

9 Integrating UTAUT with Trust and Perceived Benefits …

111

provided by mobile payments, it certainly reduces the user effort, thus conforming to its adoption [10]. Thus, the hypothesis: H1 : Effort expectancy (EE) has a positive and significant impact on behavioral intention to adopt mobile payments. The term performance expectancy (PE) means the degree to which a technology would help a user to enhance his/her job performance [9]. Being one of the crucial factors for a citizen to accept online banking [11], mobile banking [12, 13], and mobile payments [10] increase in PE tends to increase BI to adopt mobile payments. Thus, the hypothesis: H2 : Performance expectancy (PE) has a positive and significant impact on behavioral intention to adopt mobile payments. The next critical factor of the study is facilitating conditions (FC) which means the extent to which the user’s belief about the organizational and technical structure that exists to support the use of the payment system [9]. Usually, the usage of mobile banking and mobile payment technologies requires a specific skill set, resources, and technical infrastructure [14, 15, 17]. Thus, the hypothesis: H3 : Facilitating conditions (FC) have a positive and significant impact on behavioral intention to adopt mobile payments. The term social influence (SI) means “the degree to which a citizen perceives about his/her usage of the new system” [9]. In other words, the information and encouragements provided his/her friends, family members, colleagues, superiors, etc., using the technology, could play a crucial role in raising the customers’ awareness as well as the intention toward adopting technology [14–17]. Thus, the hypothesis: H4 : Social influence (SI) has a positive and significant impact on behavioral intention to adopt mobile payments.

9.2.2 Perceived Benefits (PBN) According to Lee [18], there are two main types of perceived benefits (PBN) in the context of online banking, which can be considered as direct and indirect advantages. Direct advantages refer to immediate and real benefits such as wider range of monetary benefits, faster operational speed, and increased information transparency. Indirect advantages refer to less tangible benefits, e.g., accessibility, 24 × 7 service, more investment opportunities and services [19]. Again Lee [18] has figured out that the customer intention to use mobile payment is positively affected by perceived benefits. Akturan [20] from his research has found a significant positive influence of PBN on the behavioral intention to use mobile banking. Accordingly, the present study proposes the following hypothesis: H5 : Perceived benefits have a positive impact on behavioral intention to adopt mobile payments.

112

R. Manrai and K. P. Gupta

9.2.3 Trust Research conducted by Erdem et al. [21] trust means that a service/service provider is dependable and has the required know-how to carry out transactions. This construct is also referred to as perceived credibility. For instance, Wang [22] have defined perceived credibility as “the degree to which a potential customer believes that the service will be free of security and privacy threats.” Lack of perceived credibility, in the case of mobile banking adoption, leads to consumers’ worries that their personal information and/or money might get transfer to someone else account without their knowledge [23]. The past studies suggest that trust has two important aspects, i.e., related to trust in service and trust in service provider. The past studies also indicate that people do not adopt mobile banking and mobile payment due to security issues and lack of perceived credibility and trust [24–26]. Thus, the hypothesis: H6 : Trust in service has a positive impact on behavioral intention to adopt mobile payments. H7 : Trust in service provider has a positive impact on behavioral intention to adopt mobile payments. Based on the hypothesis, the research suggests the conceptual framework mentioned in Fig. 9.1.

9.3 Research Methodology To achieve the objectives in the study, both exploratory and descriptive research designs were used. Trust in Service

Trust in Service provider

Facilitating

Effort expectancy

Performance

Behavioural to adopt Mobile Payments

Social influence

Perceived Benefits

Fig. 9.1 Conceptual framework

9 Integrating UTAUT with Trust and Perceived Benefits …

113

9.3.1 Research Design and Research Method A structured questionnaire was formed and used to collect data. This questionnaire which was the first pilot tested and then after checking the reliability and validity of the items, it was finally used for the data collection. The data from the pilot test were not used in the final phase of data collection to prevent skewing of data. Further, exploratory factor analysis (EFA) was used to derive significant factors. Multiple regression analysis (MRA) was further used to confirm the relationship of dependent variables with independent variables.

9.3.2 Survey Location and Target Respondents For testing the theoretical constructs, a survey was done in New Delhi, the capital city of country. The study targeted four hundred (out of which the researchers could get 341 correct responses) small businessmen including fruits and vegetable sellers and small shopkeepers; and migrant laborers including taxi drivers, auto-rickshaw drivers, manual rickshaw drivers, and domestic help servants, working in various parts of New Delhi. These respondents were basically unbanked or underbanked customer who had common characteristics like lower family incomes, poor educational background, unwilling to take financial risks, more likely to use alternative financial service providers, and so on [27]. The data were collected by using convenience sampling during March–April’ 2018. Out of the collected data, 341 responses were found complete and usable for the analysis.

9.3.3 Questionnaire Design Measurement items for performance expectancy, effort expectancy, social influence, facilitating conditions, and behavioral intention were adapted from [28, 29]. Items for perceived credibility were adapted from [12, 16, 24, 26]. Items for the trust were adapted from Belanger et al. [29]. Each item was measured on a seven-point Likert scale, ranging from 1(totally disagree) to 7 (totally agree). The questionnaire also included two demographic questions on age and gender in the survey (Table 9.1).

114

R. Manrai and K. P. Gupta

Table 9.1 Constructs and items of questionnaire used for primary data collection S. No.

Factor

Items

References

1

Performance expectancy

PE1: I find mobile payment services valuable in my daily life PE2: using mobile payment services increases my probability of achieving tasks that are important to me PE3: using mobile payment services helps me achieve results quickly PE4: using mobile payments increases my efficiency

[28]

2

Effort expectancy

EE1: my interface with mobile payment service is clear and understandable EE2: I find mobile payment service usage stress-free use EE3: it is easy for me to learn and become skillful at using mobile payment service EE4: learning the use of mobile payment is easy for me

3

Facilitating conditions

FC1: I have the required resources to use mobile payment services FC2: I have the required knowledge to use mobile payment services FC3: mobile banking/payment services are well-matched with other technologies I use FC4: I am helped by others at times I have difficulties using mobile payment bank services

4

Social influence

SI1: people important to me believe that I should use mobile payment services SI2: people who influence my behavior believe that I should use mobile payment services SI3: people whose opinions are valuable to me prefer that I should use mobile payment services (continued)

9 Integrating UTAUT with Trust and Perceived Benefits …

115

Table 9.1 (continued) S. No.

Factor

Items

References

5

Perceived benefits

PBN1: mobile payment services will give me greater control over my banking and financial transactions PBN2: using mobile payment services will be easy for me because I will not have to go to a bank PBN3: I think mobile payment services can help me in avoiding many unnecessary hassles PBN4: using mobile payment will help me economize my time in performing banking and financial transactions

[16, 18]

6

Trust in service

TS1: I do find any risk of financial losses using mobile payment services TS2: I do any risk of personal information theft while using mobile payment services TS3: my personal mobile phone information is secure during my usage of mobile payment services

[29]

7

Trust in service provider

TSP1: my mobile payment service provider is famous for its suitability TSP2: the services delivered by payment service provider are of great quality TSP3: my mobile payment service provider bank is a secure institution

[29]

8

Behavioral intention

BI1: I intend to use mobile payment services in the future BI2: I will be motivated to use mobile payment services in my daily life

[29]

9.4 Results of the Study 9.4.1 Sample Profile A total of 341 valid responses were received out of 400 circulated questionnaires which accounted for a response rate of 86%. Out of the total respondents, 55.45% of the survey respondents were small businessmen and 44.54% of the survey respondents were migrant laborers. As far as the demographical characteristics are concerned, 73.52% respondents were males and 26.48% of the respondents were females. The age varied from 18 to 66 years, and the average age was 34 years.

116 Table 9.2 KMO and Bartlett’s test

R. Manrai and K. P. Gupta Kaiser–Meyer–Olkin measure of sampling adequacy

0.885

Bartlett’s test of sphericity

Approx. Chi-Square

14,095.496

df

325

Sig.

0.000

9.4.2 Validity and Reliability The items of the survey questionnaire were based on extensive literature review to maintain content validity. This questionnaire was then pilot tested on academicians, policymakers, potential customers as well as people using mobile payments/Internet banking. Further, the construct validity was assessed by using factor analysis. Table 9.2 shows that Bartlett’s test of sphericity. The results of the test were found to be significant, indicating that the factor analysis was feasible. Also, the Kaiser– Meyer–Olkin (KMO) measure was large enough to support the data adequacy for factor analysis.

9.4.3 Model Testing Table 9.3 shows the variance as well as eigenvalues of the seven factors obtained through exploratory factor analysis. Software Package for Social Science) (SPSS22) was used to conduct exploratory factor analysis for analyzing the data. The EFA extracted seven significant factors which were able to explain about 92% variance in the dependent variable. The first three factors, i.e., PB, EE, and PE, were able to explain about 45% variance in the independent variable which reflects their significance in the model. Table 9.4 shows the factor loadings of all the factors obtained through EFA. All the factor loading obtained are greater than 0.5 which specifies that their high correlation Table 9.3 Eigenvalues and variance

Factor

Eigenvalue

Percent of variance cumulative

Percent of variance

1

11.998

15.178

15.178

2

4.051

15.095

30.273

3

2.595

14.692

44.965

4

1.803

13.483

58.448

5

1.546

11.769

70.217

6

1.029

11.057

81.274

7

1.001

10.708

91.982

9 Integrating UTAUT with Trust and Perceived Benefits …

117

Table 9.4 Factor loadings Component 1 Perceived benefits

Effort expectancy

Performance expectancy

Facilitating conditions

Social influence

Trust in service provider

Trust in service

PBN1

0.916

PBN2

0.940

PBN3

0.923

PBN4

0.926

2

EE1

0.844

EE2

0.848

EE3

0.844

EE4

0.863

3

PE1

0.773

PE2

0.860

PE3

0.866

PE4

0.833

4

FC1

0.790

FC2

0.818

FC3

0.804

FC4

0.809

5

SI1

0.948

SI2

0.953

SI3

0.958

6

TSP2

0.828

TSP3

0.831

TSP4

0.843

7

TS1

0.925

TS2

0.955

TS3

0.928

with the corresponding factors [30]. PBN and TS as seen from the table achieved the highest factor loading. Table 9.5 shows the number of items per construct and their respective Cronbach’s alpha value. All values obtained are above the critical value of 0.7, which proves the reliability of model constructs. Table 9.6 shows the results of the correlation matrix. There is a strong correlation seen between EE and FC. A positive and strong relation is found between EE, FC and SI and PBN. A negative but week relationship exists between PE and EE, FC SI as well as PBN.

118

R. Manrai and K. P. Gupta

Table 9.5 Reliability of model constructs Construct

No. of items

Cronbach’s Alpha

Effort expectancy

4

0.975

Performance expectancy

4

0.952

Facilitating conditions

4

0.979

Social influence

3

0.984

Perceived benefits

4

0.980

Trust in service

3

0.958

Trust in service provider.

3

0.946

Behavioral intentions

2

0.970

Table 9.6 Correlation table PE

PE

EE

FC

SI

TS

1

−0.253**

−0.375**

−0.430**

−0.140**

0.594**

0.664**

0.229**

−0.079

0.659**

1

0.692**

0.321**

−0.127*

0.606**

1

0.257**

−0.098

0.647**

EE

1

FC SI TS

1

TSP

TSP

PBN

0.376**

0.111*

−0.330**

0.446** −0.020

1

PBN

1

*signifies p< 0.05 level of significance and ** represents p40 years

Sig.

21–30 years

Result

Sig.

31–40 years

Sig.

>40 years

222 G. Kumar and P. K. Kapur

16 Developing a Usage Space Dimension Model to Investigate …

223

however, Family and Household Coordination was loading along with the factors of Expansion, Personal Information, Non-Personal Information, and ECommerce. The analysis further reveals that overall within intention to use, while inter-use does not have any significant influence on actual usage, intra-usage has a positive influence on the actual usage. This means that intention to use mobile for relating with the world outside (inter) does not influence the actual usage. However, personal use and personal choices (intra-uses) toward managing, sourcing, and using information toward enhanced productivity, efficiency convenience, for leisure, and for mobile-enabled shopping has a significant influence in determining the actual usage. The study indicates that there is a need to develop a new model to study current and future complex mobile uses. The proposed model is depicted as Fig. 16.6. The study further examined the effects of gender and age. The results indicate that there exist gender and age differences in the influence of intention to use to actual usage but only in the intra-intention to use which was stronger for men than for women and was strongest in the age group of 31–40 years. The analysis also reveals that age had a stronger influence on intra-intention to use than gender. This is also explained by the fact that the millennium generation across genders has been exposed to digital technologies at much younger ages and therefore their adoption and intention to use tasks, features, and services are much stronger than the professionals in the above 40 years.

Fig. 16.6 Proposed new model

224

G. Kumar and P. K. Kapur

16.6 Limitations, Implications, and Future Research The study has used responses only from NCR Delhi. Future studies can include diverse samples that could be more representative. Besides, since only data from self-reported usage has been used, there is possibility of difference between reported and observed usage. It is also to be noted that this study cannot be generalized and finding should to be taken as indicative in context to the specific realities of the period of study. The theoretical implications of this study provide an insight and overview of the age and gendered patterns in mobile usage and there is scope for further in-depth research to bring a greater understanding of the causes and consequences. The new approach of deploying usage dimension for studying the influence of intention to use on actual usage of mobile phones at a practical level has been verified by this study. It provides a better, more easy understanding of mobile usage that is not limited to technical specifications, features and applications but gives an analysis to how mobiles are adopted, deployed for navigating everyday life. Understanding the tasks, services, and application usage in real time along with age and gendered usage patterns among Corporate Users is the first important step and presents an opportunity for mobile service providers, manufacturers, and the application developers toward designing applications and systems that are tailor made for these users. It also helps give a picture on the potential adoption of new mobile features and applications. From an organizational perspective, in this current digital workplace scenario, the reported mobile usage data provides organizations insights for rethinking and realigning mobile use among their employees toward organizational coordination and task performance and toward harmonizing the professional and personal in everyday living across age and gender. It also provides pointers for organizations on how to build and create a digital culture to keep pace with the contemporary digitally integrated workforce. Future scope for research indicates for qualitative in-depth study to seek answers to the complex interplay of adoption, rejection, adjustment, prioritization, and the use of creative options among users. This will help in understanding how users adapt to less suitable technology among the features and functions in the mobile and make it suitable to their specific needs in everyday life. Such studies would be useful for the mobile industry as well as for academic purposes.

16 Developing a Usage Space Dimension Model to Investigate …

Appendix: Reported Usage Across Usage Space Dimensions

225

226

G. Kumar and P. K. Kapur

16 Developing a Usage Space Dimension Model to Investigate …

227

References 1. Taipale S, Fortunati L (2014) Capturing methodological trends in mobile communication studies. Inf Commun Soc 17(5):627–642 2. Jacobson RB, Mortensen RP, Cialdini CR (2011) Bodies obliged and unbound: differentiated response tendencies for injunctive and descriptive social norms. J Pers Soc Psychol 100(3):433– 448 3. Campbell SW, Ling R, Bayer JB (2014) The structural transformation of mobile communication. In: Oliver MB, Raney AA (eds) Media and social life. Routledge, New York 4. Ling R (2004) The mobile connection: the cell phone’s impact on society. Morgan Kaufmann, San Francisco 5. Ling R (2012) Taken for grantedness: the embedding of mobile communication into society. MIT Press, Cambridge 6. Ling R (2010) New tech, new ties: how mobile communication is reshaping social cohesion. MIT Press, Cambridge 7. Farman J (2012) Mobile interface theory. Routledge, New York 8. Ito M, Okabe D, Anderson K (2010) Portable objects in three global cities: the personalization of urban places. In: Ling R, Campbell SW (eds) The reconstruction of space and time: mobile communication practices. Transaction Publishers, New Brunswick, pp 67–87 9. Humphreys L (2005) Cellphones in public: social interactions in a wireless era. New Media Soc 7(6):810–833 10. Barkhuus L, Polichar VE (2011) Empowerment through seamfulness: smart phones in everyday life. Pers Ubiquitous Comput 15(6):629–639 11. Hislop D, Axtell C (2011) Mobile phones during work and non-work time: a case study of mobile, non-managerial workers. Inf Organ 21:41–56 12. Townsend K, Batchelor L (2005) Managing mobile phones: a work/non-work collision in small business. New Technol Work Employ 20(3):259–267 13. Sarker S, Sarker S, Xiao X, Ahuja M (2012) Managing employees’ use of mobile technologies to minimize work-life balance impacts. MIS Q Exec 11(4):143–157 14. Lanaj K, Johnson RE, Barnes CM (2014) Beginning the workday yet already depleted? Consequences of late-night smartphone use and sleep. Organ Behav Hum Decis Process 124(1):11–23 15. Kumar G, Prakash N (2016) A cross sectional study on mobile phone perceptions, usage and impact among urban women and men. Int J Interdiscip Multidiscip Stud 3(4):80–92 16. Palackal A, Nyaga Mbatia P, Dzorgbo D-B, Duque RB, Ynalvez MA, Shrum WM (2011) Are mobile phones changing social networks? A longitudinal study of core networks in Kerala. New Media Soc 13(3):391–410 17. Grover S, Basu D, Chakraborty K (2010) Pattern of Internet use among professionals in India: critical look at a surprising survey result. Ind Psychiatry J 19(2):94 18. Chen JV, Yen DC, Chen K (2009) The acceptance and diffusion of the innovative smart phone use: a case study of a delivery service company in logistics. Inf Manag 46(4):241–248 19. Ahn H, Wijaya ME, Esmero BC (2014) A systemic smartphone usage pattern analysis: focusing on smartphone addiction issue. Int J Multimed Ubiquitous Eng 9(6):9–14 20. Falaki H, Mahajan R, Kandula S, Lymberopoulos D, Govindan R, Estrin D (2010) Diversity in smartphone usage. In: Proceedings of the 8th international conference on mobile systems, applications, and services, MobiSys’10, pp 179–194 21. Kang J-M, Seo S-S, Hong JW-K (2011) Usage pattern analysis of smartphones. In: Network operations and management symposium (APNOMS), pp 1–8 22. Froehlich J (2007) My experience: a system for in situ tracing and capturing of user feedback on mobile phones. In: 5th international conference on mobile systems, applications and services. ACM 23. Venkatesh V, Morris MG, Davis GB, Davis FD (2003) User acceptance of information technology: toward a unified view. MIS Q 27(3):425–478

228

G. Kumar and P. K. Kapur

24. Singh T, Sharma A, Singh N (2015) Digital library acceptance model and its social construction: conceptualization and development. J Web Librariansh 9(4):162–181 25. Zhan Y, Wang P, Xia S (2011) Exploring the drivers for ICT adoption in government organization in China. In: 2011 fourth international conference on business intelligence and financial engineering (BIFE), pp 220–223 26. Anderson JE, Schwager PH (2004) SME adoption of wireless LAN technology: applying the UTAUT model. In: Proceedings of the 7th annual conference of the southern association for information systems, vol 7, pp 39–43 27. Birch A, Irvine V (2009) Preservice teachers’ acceptance of ICT integration in the classroom: applying the UTAUT model. EMI Educ Media Int 46(4):295–315 28. Venkatesh V, Sykes TA, Zhang X (2011) Just what the doctor ordered: a revised UTAUT for EMR system adoption and use by doctors. In: 44th Hawaii international conference on system sciences, pp 1–10 29. Ifinedo P (2012) Technology acceptance by health professionals in Canada: an analysis with a modified UTAUT model. In: 45th Hawaii international conference on system science (HICSS), pp 2937–2946 30. Ahmad MI (2014) Unified theory of acceptance and use of technology. In: Fourth international conference on ICT in our lives 2014, pp 1–4 31. Ling R, Yttri B (1999) Nobody sits at home and waits for the telephone to ring: micro and hyper coordination through the use of the mobile phone. Kjeller 32. Han Sze Tjong S, Weber I, Sternberg J (2003) Mobile youth culture: shaping telephone use in Australia and Singapore. In: ANZCA03 conference 33. Tamminen S, Oulasvirta A, Toiskallio K, Kankainen A (2004) Understanding mobile contexts. Pers Ubiquitous Comput 8(2):135–143 34. Katz JE, Sugiyama S (2005) Mobile phones as fashion statements: the co-creation of mobile communication’s public meaning. In: Ling R, Pedersen P (eds) Mobile communications: renegotiation of the social sphere. Springer, Surrey, pp 63–81 35. Campbell S, Russo T (2003) The social construction of mobile telephony: an application of the social influence model to perceptions and uses of mobile phones within personal communication networks. Commun Monogr 70(4):317–34 36. Marcus A, Chen E (2002) Designing the PDA of the future. Interactions 9(1):34–44 37. Marcus A, Chen E (2002) Development of a future wireless information device. In: Re: wireless, mobile device design seminar, pp 1–7 38. Marcus A (2005) Tutorial: mobile user-interface design for work, home, play and on the way. In: CHI-SA 39. Marcus A (2005) Tutorial: cross-cultural user-interface design for work, home, play and on the way. Paper presented at the CHI-SA 40. van Biljon JA (2006) A model for representing the motivational and cultural factors that influence mobile phone usage variety 41. Davis FD (1989) Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Q 13(3):319–340 42. Institute of Human Development (2013) Delhi human development report 43. Ericsson Consumer Lab (2014) Performance shapes smartphone behavior. Understanding mobile broadband user expectations in India 44. Singh P (2012) Smartphone: the emerging gadget of choice for the urban Indian: delivering consumer clarity. Nielsen Featured Insights 45. Campbell SW, Park YJ (2008) Social implications of mobile telephony: the rise of personal communication society. Sociol Compass J Br Sociol Assoc 2(2):371–387 46. Green N (2003) Outwardly mobile: young people and mobile technologies. In: Katz JE (ed) Machines that become us: the social context of personal communication technology. Transaction Publishers, New Brunswick, pp 201–218 47. Skog B (2002) 16 mobiles and the Norwegian teen: identity, gender and class. In: Katz JE, Aakhus M (eds) Perpetual contact: mobile communication, private talk, public performance. Cambridge University Press, Cambridge, pp 255–273

16 Developing a Usage Space Dimension Model to Investigate …

229

48. Hair JF, Anderson RE, Tatham RL, Black WC (1998) Multivariate data analysis, 5th edn. Prentice-Hall, Englewood Cliffs 49. Tabachnick BG, Fidell LS (2007) Using multivariate statistics, 5th edn. Allyn and Bacon, Boston 50. Kline RB (2004) Beyond significance testing: reforming data analysis methods in behavioral research. American Psychological Association 51. Joreskog KG, Sorbom D (2002) Lisrel VIII: structural equation modeling with the SIMPLIS command language, 5th print. Scientific Software International, Lincolnwood 52. Hu L-T, Bentler PM (1999) Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Struct Equ Model Multidiscip J 6(1):1–55 53. Bentler PM, Bonett DG (1980) Significance tests and goodness-of-fit in the analysis of covariance structures. Psychol Bull 88:588–600 54. Bollen K (1986) Sample size and Bentler and Bonett’s nonnormed fit index. Psychom Soc 51(3):375–377 55. Browne MW, Cudeck R (1993) Alternative ways of assessing model fit. In: Bollen KA, Long JS (eds) Testing structural equation models. Sage, Newbury Park, pp 136–162 56. Steiger JH (2007) Understanding the limitations of global fit assessment in structural equation modeling. Pers Individ Differ 42(5):893–898 57. Busch T (1995) Gender differences in self-efficacy and attitudes toward computers. J Educ Comput 12(12):147–163 58. Aronsson G, Dallner M, Aborg C (1994) Winners and losers from computerization: a study of the psychosocial work conditions and health of Swedish state employees. J Hum Comput Interact 6(1):17–36 59. Henwood F (1993) Establishing gender perspectives in information technology: problems, issues, and opportunities. In: Green E, Owen J, Pain D (eds) Gendered by design? Information technology and office systems. Taylor and Francis, London 60. Faulkner W (2001) The technology question in feminism: a view from feminist technology studies. Womens Stud Int Forum 24(1):79–95 61. Jin R, Punpanich W (2011) Influence of gender difference in reference group on smartphone users’ purchasing decision-making process. In: Seminar 1st June, 2011, course: BUSM08 degree project in international marketing and brand management 62. Borges AP, Joia LA (2015) Paradoxes perception and smartphone use by Brazilian executives: is this genderless? J High Technol Manag Res 26(2):205–218 63. Kumar G, Prakash N (2018) Investigating the role of gender as a moderator in influencing the intention to use and actual use of mobile telephony. Eur J Soc Sci (EJSS) 1(1)

Chapter 17

Examining the Relationship Between Customer-Oriented Success Factors, Customer Satisfaction, and Repurchase Intention for Mobile Commerce Abhishek Tandon, Himanshu Sharma, and Anu Gupta Aggarwal

17.1 Introduction The advancements in digital world have popularized the concept of electronic commerce (e-commerce) among the marketers [1]. This has led to the transformation of offline stores to go online, forcing them to create their own website. Having a website of their own is considered to influence the image of the firm, which consequently impacts its market share [2]. Moreover, these websites enable the customers to browse through their desired goods/services and gain further insights about them, get information about any offers and incentives, compare prices with competitor sites, and a smooth and trustworthy transaction interface [3]. With this liberty provided by the retailers, they have indulged themselves into a fierce competition to succeed in the marketplace. This has also resulted in hedonic and impulse buying behavior of browsers, and there occurs a need to create an eye-catchy and well-designed website so as to attract them to increase traffic intensity of their site [4]. But, electronic commerce has a demerit of using a non-portable computer. This has been rectified by telecom industries with the introduction of mobile commerce (m-commerce). This gives the customers an opportunity to shop or perform a transaction anytime and anywhere, by making use of an Internet-enabled phone or smartphone [5]. Though m-commerce is in its infancy stage, however, extant researchers are exploring this

A. Tandon (B) Shaheed Sukhdev College of Business Studies, University of Delhi, New Delhi, Delhi, India e-mail: [email protected] H. Sharma · A. G. Aggarwal Department of Operational Research, University of Delhi, New Delhi, Delhi, India e-mail: [email protected] A. G. Aggarwal e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 P. K. Kapur et al. (eds.), Strategic System Assurance and Business Analytics, Asset Analytics, https://doi.org/10.1007/978-981-15-3647-2_17

231

232

A. Tandon et al.

area and their efforts have been complemented by improved Internet facilities provided by the telecom industries all over the world, resulting in connecting the people worldwide [6]. M-commerce is a process of conducting businesses online using a wireless network. There are various advantages of m-commerce over its computer-based counterpart, namely instantaneity, ubiquity, localization, and personalization. With the popularity of electronic gadgets such as smartphones, notebooks, and tablets, along with availability of good connectivity and speed of the Internet, this concept is clearly overtaking its decade old predecessor. Moreover, m-commerce makes it possible for customers to have access to various services including mobile ticketing, mobile banking, mobile marketing, and other location-based support [7]. This exceptional success, especially in a developing country like India may be attributed to an increasing trend in the number of smartphone users in the world. In India, the retail mobile commerce revenue increased from 7.78 billion US dollar in 2015 to 15.27 billion US dollar in 2016, and is expected to reach 63.53 billion US dollar by 2020 [8], displaying largest increments in the number of mobile phone users over the years. Thus, in order to increase its customer base which leads to improved profitability, the e-vendors should introduce their own applications (apps), along with its website, to maintain its status in the industry [9]. Companies interested in studying the m-commerce system focus on purchase/delivery, alternative evaluation, evaluation after purchase, information search, and identification. With m-commerce gaining abundant market, it has motivated recent researchers to discuss about success factors related to m-commerce application. Previous literary works have divided the success factors into two categories: success factors from the firm’s perspective (called the critical success factors) and success factors from the view point of customers (called the customer-oriented success factors) [10]. While some pay attention on information system (IS) success, others concentrate on customer satisfaction side. These studies show that despite the similarity between electronic commerce and m-commerce in some aspects, they differ in various other features. It was observed that the customer-oriented success criteria defined for electronic commerce can be extended to define the customeroriented success factors for mobile commerce by adding the two criteria: mobility and personalization. The mobility takes care of the difference in the computer website and mobile apps whereas the personalization criterion covers the advancements in technology made over the recent past [11]. In this study, we add another factor, namely online customer feedback. The online customer feedback represents the effect of social communication between the online users. This factor explains the impact of online buzz on the buying behavior of customers. This paper consists of the latent variables, namely system quality (SQ), content quality (CQ), use (US), trust (TR), support (SU), mobility (MO), personalization (PE), and online customer feedback (OCF) as key determinants of m-commerce application success (MAS). Customers that experience fun while using the application and perceive its use to be hassle-free are the indicators of satisfaction toward that application. Customer satisfaction (CS) is the affective and cognitive judgment toward a product/service. With the advancements in Internet technology, the online retailers face

17 Examining the Relationship Between Customer-Oriented Success …

233

a fierce pressure to provide the customers a dynamic, exciting, and equally emotionally rewarding experience in comparison to their offline shopping experiences [3]. Also, multi-channel retailing has enabled the customers to get deep information about the products/services and also compare offers provided in no time [12]. Therefore, to achieve satisfaction, it is crucial to determine the factors that impact the online shopping behavior of customers. Repurchase intention (RI) is the keenness of the customers to purchase persistently using the same m-commerce application. In other words, repurchase intention is the consumer’s decision to purchase again from the same firm, keeping other exogenous conditions constant [13]. This paper proposes a conceptual framework to examine the relationship between m-commerce application success (MAS), customer satisfaction (CS), and repurchase intention (RI). Here, MAS is considered to be a second-order construct, whereas CS and RI are considered as multiple- and single-item constructs, respectively. We use structural equation modeling (SEM)-based approach to determine the path relationships among these latent variables. The analysis is performed using a survey generated data obtained from 530 respondents. The remaining paper is framed as follows: Research hypotheses are developed in Sect. 17.2. Section 17.3 discusses the methodology of the study. Data analysis is presented in Sect. 17.4. Discussions and conclusion related to the study are presented in Sect. 17.5.

17.2 Related Work and Hypotheses Development 17.2.1 M-Commerce Application Success (MAS) The era of 90s saw the introduction of m-commerce as an extended competitor of e-commerce. The most distinguishing quality of m-commerce is of using and accessing the desired information from anywhere and anytime, which has made it popular among today’s busy generation. This transformation of using a wireless Internet-based device for shopping as compared to computer-based Internet access has resulted in a shift in their purchasing behavior. This demonstrates the importance for the studies related to m-commerce success factors for the companies/users and what other alternatives they have. Success is a multi-dimensional variable measurable at three levels: individual, group, technical, and organizational, by using various economic, financial, or perceptual criteria [10]. The studies considering success factors for mobile commerce applications are still new and have a wider research perspective for researchers in this domain. The researchers have always been interested in studying the IS success and the work of DeLone and McLean [14] proved to be a major breakthrough. They talked about the various factors that lead to IS success. According to them, these factors were system quality, information quality, use, and user satisfaction. Their system success model was considered at individual, system, and organizational level. In spite of the fact that the point has been examined and critical commitments have

234

A. Tandon et al.

flourished in this field, yet a lesser amount of accentuation is put on coordinating the endogenous variable (customer satisfaction) with the exogenous variables. As indicated by DeLone and McLean, e-commerce systems consider use and user satisfaction as endogenous factors. Afterwards, DeLone and McLean [15] expanded their past model by presenting another build ‘net benefits’ that mainly emphasized on the role of SQ on developing successful IS systems. The refreshed D&M model is steady with the most recent changes in Web-based business framework by empowering the clients and providers to settle on purchasing and selling choices and do exchanges utilizing the Web. Afterward, an endeavor developed to broaden the Molla and Licker [16] model, which focused on CS impacting Web-based business system success by Wang [17]. In his investigation, a portion of the criteria presented by Molla and Licker [16] were supplanted by SQ, information quality, and user satisfaction. Also, SU and TR were combined as SQ. The Molla and Licker model altered the past examinations by catching the overlooked builds given by past model; few changes were made by renaming the variables as e-commerce system quality, CQ, and customer e-commerce satisfaction (CES). These progressions were made to involve the importance of customers’ satisfaction, which is driven to the cutting edge promoting the management policies [18]. Kabir and Akhtar Hasin [11] studied variables that lead to successful operation of m-commerce systems, keeping the customers’ perspective in foreground. They provided a multicriteria decision-making framework to evaluate the preference of each criterion for customers purchasing online through a wireless device. Their study was motivated by the Molla and Licker model of e-commerce success based on customeroriented factors, with adding two factors: mobility and personalization, to give a taste of mobile commerce and also to incorporate technological advancements. This study further attempts to extend previous models by incorporating ‘online customer feedback.’ Thus, we posit that H1: MAS is a multi-dimensional construct consisting of system quality, content quality, trust, use, support, mobility, personalization, and online customer feedback.

17.2.2 Customer Satisfaction (CS) It is defined as the customer’s fulfillment response in comparison to his preconceived notion toward a product/service. Satisfaction in online context is generally referred to as the customer’s contentment in accordance to his prior shopping experience [19]. With the Internet, the customers are able to get more transparent information about the products and services and also compare offers with other online market platform [20]. The mere reason for extracting data and to lead various exercises passes on the requirement for contentment of both the providers and the users and reaches out past relational elements of services and informational purpose. Gebauer and Shaw [10] divided online customer satisfaction into two types: core product

17 Examining the Relationship Between Customer-Oriented Success …

235

and service satisfaction, and satisfaction with the system and process used to deliver these products and services. In light of this discussion, we may argue that the online customer satisfaction is equivalent to the MAS. Thus, we posit that H2: MAS has a significant positive effect on CS.

17.2.3 Repurchase Intention (RI) In online terms, it is defined as making a repeat purchase using the same application over time. It is considered to one of the two behavioral components of loyalty. Previous literature admits customer satisfaction as a mediator between the satisfaction antecedents and behavioral components of loyalty [13]. If a firm considers customer satisfaction as its major objective, then it will be successful in enhancing customer behavioral intentions in the long run. Previous researches have agreed that customer’s relative degree of satisfaction and dissatisfaction with the purchased products/services influence their behavioral consequences [21]. Positive sentiments about a firm’s offerings play a significant role in the repeat purchase decision [18]. Since, solely providing a well finished application with correct information related to products/services, a reliable transaction interface, enabling the purchasers to post the reviews related to their browsing experience and product/service obtained, may not lead the customer to return to the application. It depends on the overall experience pre and post the purchase, with the help of that application. Thus, we posit that H3: CS has a significant positive effect on RI. H4: MAS has a signification positive effect on RI.

17.3 Methodology Here, we propose a model to examine the relationship between MAS, CS, and RI, by establishing a hypothesized path model as shown in Fig. 17.1. The objective is to test the significance of the assertions made in previous section. To test these hypotheses, a self-administered questionnaire was developed comprising of 26 items. The statements of these constructs used in the questionnaire are provided in Appendix. In order to obtain precise replies and decrease vagueness, a pilot study was performed on 60 respondents consisting of practitioners and academic experts handling m-commerce applications. The pilot group suggested few changes to the language of the questionnaire. After integrating the proposed changes, the updated questionnaire consisting of questions on 5-point Likert scale was prepared. Around 1500 questionnaires were e-mailed, of which 580 responses were registered. These responses were filtered and 530 useful responses were retained for analysis. The population mainly consisted of service class people, businessmen, and students residing in northern India. Moreover,

236

A. Tandon et al.

Fig. 17.1 Hypothesized model

the population seems apt as it consists of respondents that are active in online shopping and make frequent transactions through a particular m-commerce application. The survey was carried from June 1, 2018–August 31, 2018. This paper considers MAS and CS as multi-item constructs, while RI is taken as single-item construct. Empirical results signify that there is no clear priority between multi- versus single-item measures, and a combination of these might result in more relevant analysis [22]. The single-item measure is useful for evaluating a variable denoting overall effect like overall satisfaction, overall repurchase intention. According to them, single-item measures have some advantages such as these scales are less time consuming to complete and break the monotony experienced by the respondents while filling up the questionnaires. These methods face the demerit of difficulty in assigning scores to the variable. However, if it is impossible to define a construct in single item that is build-up of various items, here MAS and CS, then multi-item scale should be preferred [23].

17.4 Data Analysis Summary 17.4.1 Demographic Profile of the Respondents The basic characteristics of the survey respondents are presented in Table 17.1.

17 Examining the Relationship Between Customer-Oriented Success …

237

Table 17.1 Demographic characteristics Measure

Variables

Usable responses

Percentage

Gender

Male

310

58.49

Female

220

41.51

Age

Below 18

50

09.43

18–30

280

52.83

31–45

160

30.19

Education qualification

Above 45

40

07.55

Undergraduate

40

07.55

Graduate

140

26.42

Postgraduate

270

50.94

80

15.09

Student

260

49.06

Self employed

100

18.87

Service

170

32.07

Below 4

160

30.19

4–8

210

39.62

8–12

110

20.76

Above 12

50

09.43

Below 6

155

29.25

6–12

325

61.32

50

09.43

Other Nature of consumer

Annual household income (in lakhs)

Hours spent on online shopping (in a month)

Above 12

17.4.2 Validity and Reliability of Measures In this conceptual model, MAS and CS were considered to be multi-item construct, whereas RI as the only single-item construct. Moreover, MAS is considered to be a second-order construct manifested by eight factors, which in turn are made up of 23 attributes, whereas CS is believed to be made up of two attributes. This is upheld by applying exploratory factor examination (EFA) over 25 attributes altogether. For EFA, principal component analysis with varimax rotation and eigenvalue above 1 was considered and attributes with factor loadings above 0.4 were considered further. Kaiser-Meyer-Olkin (KMO) value of 0.731 and valid Bartlett’s Sphericity Test recommend valid usage of factor analysis. With Cronbach alpha value above 0.7 and composite reliability (CR) value above 0.7, the reliability of the constructs was validated. The constructs’ discriminant validity was observed with average variance extracted (AVE) values above 0.4. The results are shown in Table 17.2. This study assumes MAS to be a second-order construct comprising of system quality, content quality, use, trust, support, mobility, personalization, and online customer feedback. A second-order confirmatory factor analysis was run to find the significance of these attributes. Table 17.3 shows the results of CFA. The data was

238

A. Tandon et al.

Table 17.2 Validity and reliability results Variables

Items

Loadings

CR

AVE

CA

System quality

SQ1

0.563

0.700

0.495

0.720

SQ2

0.694

0.727

SQ3

0.625

0.717

SQ4

0.674

0.725

CQ1

0.620

CQ2

0.844

0.717

CQ3

0.687

0.714

Content quality

Use Support

0.437 0.777

U2

0.775

S1

0.541

S2

0.495

0.726

S3

0.758

0.716

0.719 0.848

T2

0.819

M1

0.819

M2

0.759

PER1

0.799

PER2

0.853

PER3

0.795

Online customer feedback

OCF1

0.824

OCF2

0.797

Customer satisfaction

CS1

0.682

CS2

0.833

Personalization

0.721

CQ4

S4

Mobility

0.553

U1

T1

Trust

0.768

0.724 0.905

0.705

0.723 0.726

0.737

0.521

0.718

0.716 0.923

0.743

0.728 0.716

0.835

0.639

0.719

0.891

0.699

0.727

0.715 0.723 0.747 0.874

0.681

0.744

0.856

0.651

0.742

0.739 0.748

AVE average variance extracted, CA Cronbach alpha, CR composite reliability

found to be feasible for this study as per the goodness-of-fit index value CFI = 0.889, TLI = 0.871, and RMSEA = 0.07; which were found to be within the threshold values suggested by previous researchers. This supports the acceptance of H1 that MAS is a multi-dimensional construct consisting of these eight first-order factors.

17.4.3 SEM Results In order to validate the hypothesized model, we applied SEM approach through SPSS AMOS 22. The goodness-of-fit index values CFI = 0.889, TLI = 0.891, RMSEA =

17 Examining the Relationship Between Customer-Oriented Success …

239

Table 17.3 Second-order CFA results First order

Path

Second order

Standardized beta

p value

SQ



MAS

0.689

***

CQ



MAS

0.635

***

US



MAS

0.537

***

TR



MAS

0.745

***

SP



MAS

0.762

***

MO



MAS

0.651

***

PER



MAS

0.803

***

OCF



MAS

0.874

***

***Significant at 5% level of significance

Table 17.4 Hypothesis testing results Independent variable

Dependent variable

Standardized beta

p-value

Result

MAS

CS

0.487

***

H2 accepted

CS

RI

0.366

***

H3 accepted

MAS

RI

0.006

0.221

H4 rejected

***Significant at 5% level of significance

0.066, justified the appropriateness of the structural model. Moreover, the hypothesis testing result, as shown in Table 17.4, provides support for Hypotheses 2 and 3, whereas H4 is not accepted.

17.5 Discussions and Conclusions This study considered MAS as a second-order construct and the second-order CFA results validate that assertion which is consequent in the acceptance of H1, which considers it to be a multi-item construct. In the first order, we assembled the factors SQ, CQ, TR, US, MO, SP, PER, and OCF. SQ helps in identifying whether the online customer is satisfied with the application’s performance or not, taking into account the hardware and software factors. The potential competitors in mobile commerce are just a click away and any dissatisfaction caused due to mobile application’s failure impacts the firm largely. The items which were found to be suitable under this head were online response time, 24-h availability, page loading speed, and visual appearance. The way of presenting the information on the website is termed as content quality. The quality of content and its capability to meet the needs and expectations of the customers affect the success of the organizations and determine customers’ conversion and retention. The attributes of CQ, which are taken up in this study are up-to-datedness, understandability, timeliness, and preciseness. The success of an e-commerce website

240

A. Tandon et al.

depends on the extent to which the website is useful to the customer. It consists of two components: information and transaction. Trust is an important criterion for mobile commerce success since customer’s insecure feelings may affect their disposition toward privacy issues and security. Trust helps in customer conversion and retention toward a mobile commerce site. It mainly considers security and privacy. The support and services provided by the operators are highly valued during all phases of transactions. The criterion includes the following attributes: tracking order status, payment alternatives, frequently asked question (FAQ), and account maintenance on the mobile commerce application’s website. Mobility is the property by which the services can be availed from anywhere and also at any time at the discretion of the user. Thus, m-commerce enables the customers to employ their services and make transactions anywhere. The two major attributes under this criterion are device and application. Personalization means to provide the products/services as per customer’s taste. Since mobile devices have major issues related to the battery capacity and its size and configuration, to increase their usability, there is a need for personalization. The attributes under this criterion are location, time, and individual preferences. The introduction of Internet has enabled the consumers to offer their shopping experience, which in this study is termed as ‘online customer feedback.’ The viewers can have access to the feedback provided by the previous purchasers, which majorly influence their buying behavior. Two major attributes considered under this criterion are online reviews and star ratings. The support of H2 validates the positive relationship between MAS and CS. As has been discussed by previous researchers, the application of the firm plays a major role in the affective and cognitive nature of the consumers. If their experience is desirable as per their expectations, then they portray a satisfaction toward that application, whereas the reverse may hold true in other case. It has also been observed that dissatisfied customers are more risk prone to the health of the organization as they spread a negative image of it. Thus, marketers must include an application that makes payment process smooth and safe, provides precise and correct information related to their offerings, provides feedback in the form of text as well as ratings to express their shopping experience, allows customers to build their own cart as per their individual preferences. As per the acceptance of H3 and rejection of H4, we may conclude that CS impacts RI whereas MAS has no direct relationship with RI. This is obviously true as mere application success may not indulge the customer to return and make a repurchase. It also depends on other exogenous variables such as product satisfaction, delivery or shipping experience or packaging, which may be taken to be some additional determinants of overall satisfaction with respect to a firm. Only when the customer is overall satisfied with the services of the m-commerce application will he intent to show the intention to make a repurchase. However, marketers must keep note of the visitors and their shopping pattern, so as to retain as much customers as possible. This is so because, maintaining the account of an old customer may be cost efficient, which may not be in the case of creating new customer base, a costly affair. In this paper, we proposed a path model to determine the interrelationship between MAS, CS, and RI. Hypotheses were set to test the significance of the associations

17 Examining the Relationship Between Customer-Oriented Success …

241

among these constructs. The model was validated through a survey generated data. The analysis was performed using structural equation modeling approach using SPSS AMOS 22. Results show that CS has a significant direct relationship with RI whereas there is no possible relationship among MAS and RI. Future studies may include loyalty constructs within the present model. Acknowledgements This research work was supported by the grants provided by Indian Council of Social Science Research, Delhi, India (File No.: 02/76/2017-18/RP/Major).

Appendix The measurement scale Items

Statements

SQ1

MC provides a time effective application

SQ2

MC provides an application which is always available

SQ3

MC application takes less time to reload/refresh and to migrate from various search results

SQ4

MC provides an attractive application

CQ1

MC application informs me about the product/service and their arrivals/departures

CQ2

The data is well displayed on the mobile commerce application

CQ3

MC provides the information about the products/services in no time

CQ4

MC provides exact information about the products/services without any discrepancies

US1

MC provides an interface with significant content about products/services

US2

MC provides an interface with smooth payment process

SP1

MC provides an application that tracks the delivery status of their purchased product/service

SP2

MC provides various payment options

SP3

MC provides an application that efficiently handles queries of the customer

SP4

MC allows me to create an account on their application to avail various incentives and exclusive offers

TR1

MC assures the customer that there is no breach of their personal information while making a transaction

TR2

MC keeps customer’s personal information confidential and protects their data from being included into their database which may be used for any other purpose apart from the purpose it was meant for

MO1

MC application is in accordance with my smart phone’s configuration

MO2

MC application provides an interface on my mobile phone through which I can have access to the product/services I look forward to have

PER1

I can have access to the MC application from anywhere (continued)

242

A. Tandon et al.

(continued) Items

Statements

PER2

I can have access to the MC application at my suitable time

PER3

MC provides me an opportunity to build my own shopping cart

OCF1

MC application provides an interface that posts linguistic comments provided by previous purchasers and browsers

OCF2

MC application provides an interface that posts star ratings provided by previous purchasers and browsers

CS1

Overall I am satisfied with the firm’s application

CS2

The quality of product meets my expectations

RI

I will again make a purchase using this application

References 1. Sharma H, Aggarwal AG (2019) Finding determinants of e-commerce success: a PLS-SEM approach. J Adv Manag Res 2. Aggarwal AG, Aakash (2018) Multi-criteria-based prioritisation of B2C e-commerce website. Int J Soc Syst Sci 10(3):201–222 3. Tandon A, Sharma H, Aggarwal AG (2019) Assessing travel websites based on service quality attributes under intuitionistic environment. Int J Knowl-Based Organ (IJKBO) 9(1):66–75 4. Aggarwal AG (2018) A multi-attribute online advertising budget allocation under uncertain preferences. J Eng Educ 14 5. Sharma H, Aggarwal AG, Suri G (2019) Success factors of m-commerce: a customer perspective. In: M-commerce: experiencing the phygital retail. Apple Academic Press, pp 71–100 6. Min Q, Ji S, Qu G (2008) Mobile commerce user acceptance study in China: a revised UTAUT model. Tsinghua Sci Technol 13(3):257–264 7. Nilashi M et al (2015) The role of security, design and content factors on customer trust in mobile commerce. J Retail Consum Serv 26:57–69 8. Statista (2018) Cited Aug 2018. Available from: https://www.statista.com/statistics/266119/ india-retail-mcommerce-sales/ 9. Lee H-M, Chen T (2014) Perceived quality as a key antecedent in continuance intention on mobile commerce. Int J Electron Commer Stud 5(2):123–142 10. Gebauer J, Shaw MJ (2004) Success factors and impacts of mobile business applications: results from a mobile e-procurement study. Int J Electron Commer 8(3):19–41 11. Kabir G, Akhtar Hasin A (2011) Evaluation of customer oriented success factors in mobile commerce using fuzzy AHP. J Ind Eng Manag 4(2):361–386 12. Kassim N, Asiah Abdullah N (2010) The effect of perceived service quality dimensions on customer satisfaction, trust, and loyalty in e-commerce settings: a cross cultural analysis. Asia Pac J Mark Logist 22(3):351–371 13. Lin H-H, Wang Y-S (2006) An examination of the determinants of customer loyalty in mobile commerce contexts. Inf Manag 43(3):271–282 14. DeLone WH, McLean ER (1992) Information systems success: the quest for the dependent variable. Inf Syst Res 3(1):60–95 15. DeLone WH, McLean ER (2003) The DeLone and McLean model of information systems success: a ten-year update. J Manag Inf Syst 19(4):9–30

17 Examining the Relationship Between Customer-Oriented Success …

243

16. Molla A, Licker PS (2001) E-commerce systems success: an attempt to extend and respecify the DeLone and MacLean model of IS success. J Electron Commer Res 2(4):131–141 17. Wang YS (2008) Assessing e-commerce systems success: a respecification and validation of the DeLone and McLean model of IS success. Inf Syst J 18(5):529–557 18. Lee WO, Wong LS (2016) Determinants of mobile commerce customer loyalty in Malaysia. Procedia-Soc Behav Sci 224:60–67 19. San Martín S, López-Catalán B, Ramón-Jerónimo MA (2012) Factors determining firms’ perceived performance of mobile commerce. Ind Manage Data Syst 112(6):946–963 20. Jimenez N, San-Martin S, Azuela JI (2016) Trust and satisfaction: the keys to client loyalty in mobile commerce. Acad Rev Latinoam Adm 29(4):486–510 21. Duževi´c I, Deli´c M, Kneževi´c B (2016) Customer satisfaction and loyalty factors of mobile commerce among young retail customers in Croatia. Gestão Soc 10(27):1476 22. Gardner DG et al (1998) Single-item versus multiple-item measurement scales: an empirical comparison. Educ Psychol Measur 58(6):898–915 23. Hoeppner BB et al (2011) Comparative utility of a single-item versus multiple-item measure of self-efficacy in predicting relapse among young adults. J Subst Abuse Treat 41(3):305–312

Chapter 18

Smart Industrial Packaging and Sorting System Sameer Tripathi, Samraddh Shukla, Shivam Attrey, Amit Agrawal, and Vikas Singh Bhadoria

18.1 Introduction The advancement in the automation and the industrial assembly techniques has given rise to its own cons which itself requires engineering solutions. The nations that are having greater assembling rate are known as developed, while those with lesser assembling rate are termed as third nations. Like every other industry, automation industry dwells upon the ever-growing need of improvement, cost cutting out of processes and operations. The increasing need of the automation systems makes the strategic evaluation necessary in the value chain and market awareness improvement [1]. Industrial automation primarily focuses on developing low-cost technology, lesser need of maintenance, long durability and making the system user friendly. Industrial automation and robotics have key role in the expansion of industries. In 1980s, robots were seen mimicking human actions and were used to perform tasks like machine tending, transferring materials from one location to another, painting, welding, etc., moreover these tasks did not require high accuracy and precision. But with the onset of 90s, industrial robots became increasingly crucial in applications requiring greater S. Tripathi · S. Shukla · S. Attrey · A. Agrawal · V. S. Bhadoria (B) Department of Electrical and Electronics Engineering, ABES Engineering College, Ghaziabad, India e-mail: [email protected] S. Tripathi e-mail: [email protected] S. Shukla e-mail: [email protected] S. Attrey e-mail: [email protected] A. Agrawal e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 P. K. Kapur et al. (eds.), Strategic System Assurance and Business Analytics, Asset Analytics, https://doi.org/10.1007/978-981-15-3647-2_18

245

246

S. Tripathi et al.

precision and accuracy. Autonomous robots, sensors and other actuators were used for accuracy and precision. To achieve this precision, robots with single task at a time were made which can perform one task at time by taking sensory information as input. Real-time monitoring of objects having a smaller size in a fast-flowing stream of packages and objects to be packaged has opened new scope and horizons for industrial sorting process. This paper is all about the same—A smart packaging system which helps to sort, identify, control and automatically manage each and every product that passes through the conveyor belt. In order to collectively operate and establish coordination between plurality of sensors, modules and motors, a controller is embedded into the system viz-a-viz Arduino Mega 2560 as shown in Fig. 18.1. Apart from microcontroller, a programmable logic controller (PLC: Delta PLC 14ssp2 having eight digital inputs and six digital outputs) will be used to control the motors of the conveyor and the proximity sensors that are used to sense the object on the production line, apart from that a real-time monitoring using the supervisory control and data acquisition (SCADA) system has been introduced in order to meet the industry standards of monitoring and control. To overcome the industry packaging challenges, barcode sensor assisted with a load cell and a decision-making device— Arduino Mega 2560 and programmable logic controller(s) (PLC 14ssp2) for the identification and sorting of CLD(s) or RSC(s) along a retracting conveyor is used, and in addition to it, a GSM module in the feedback to the microcontroller is placed, such that any deviation from the default parameters will be sensed automatically and area of problem occurrence (AOC) is easily identified [2]. Since 1976, India has witnessed an exponential growth in the SMEs, SSIs and other cottage industries. These industries are low capital and small size industries having lesser inventory, workforce and land space. It creates a demand of low cost,

Fig. 18.1 Module connection of the proposed packaging system

18 Smart Industrial Packaging and Sorting System

247

effective automation solution to packaging for these enterprises. This paper throws light on the same where a retracting conveyor mounted rotating disk type lid closing system is discussed, thereby reducing the human dependency of packaging lines for small enterprises.

18.2 Conventional System In present available systems, existing technologies are used according to financial constraints and type of industry. Examples of the technologies are robotics systems, hand-based sorting (human dependent), colour-based and pneumatic-based systems.

18.2.1 Mechanical Arms The automated arms are driven using servo motors whose level of turn is controlled using the microcontroller. As indicated by the structure of mechanical arm, the different level of turns for the servomotor is allotted to complete different tasks. The arm of robot is acknowledged to be utilizing aluminium sections. The mechanical arms are excessively exorbitant and complex because of the multifaceted nature and the creation procedure. Two types of sections are used; one is used to provide the support for servo engines, and other is used for the augmentations and interconnections of the mechanical arm [3]. The infrared sensor distinguishes the CLD or box and generates the trigger signal to a microcontroller. The function of this microcontroller is to control the arm movement as per the stature of box. The movement of the servo engine is controlled such that each CLD is dropped into a separate box put in a cascaded arrangement. A certain time is required by the mechanical arm for a solitary movement. This duration is fixed to 0.5 s. The process of grabbing a container and dropping it in the right crate will require eight stages movement of an automated arm. These are movement of mechanical arm from the initial position, selecting and picking a case, movement to the right bin, leaving the container in the crate and come back to the initial position. The quantity of steps being taken by the arm to pick the case and to drop it properly requires seven stages, and from that point, going back to default position requires one stage. Thus, the time required for picking and dropping the container including recognizing the stature is around five seconds. The automatic arm consists of four motors. One motor is used to manage the rotational movement of the base, second motor to control the movement of elbow joint, third motor manages the wrist motions, and fourth motor is used to control the grip, that is to hold and release. A switch-like instrument is utilized for opening and shutting the gripper. So, a solitary motor is sufficient for the gripper control. Fingers come nearer to pick and hold the case and move away when it drops the crate. Two positions are intended for the fingers by utilizing a single servo motor—one in close position and the other in vacant position.

248

S. Tripathi et al.

Fig. 18.2 Conventional method of sorting and packaging by human

18.2.2 Hand-Based Sorting System The oldest methodology is the hand-based sorting system. In this type of sorting, human is involved as shown in Fig. 18.2 [4]. In this type of sorting, the objects are used to be placed on the conveyor by the help of mechanical arm or any other known system, after that, a dedicated section is set up where all the people assemble and sort product according to the specification and put it on another conveyor for packaging process. This is time consuming as well as not precise. Henceforth, this method is obsolete nowadays.

18.2.3 Shape-Based Sorting System It is the most common method of sorting the product, i.e. according to shape. It does not require any kind of human interruption. In this process, the product get collected in a hopper or at the end of the conveyor, and after that, it is passed through some kind of mechanical restrictions, and the bigger shape is sorted at the starting and smaller at the end. But this technique has its own limitations, in terms of number of different shaped items, same shape and size but different objects, etc.

18 Smart Industrial Packaging and Sorting System

249

18.3 Proposed System Modelling with Flow Chart In this paper, we have a split conveyor in which there is a gap in between the conveyor. The split conveyor helps us to set up the stepper motor with table for the packing of the box which is the task that we have performed and also help us to place proximity sensor so that we can ensure the position of the package, in positioning we have used proximity sensor interface with PLC so that conveyor operations (start and stop) can be easily performed so that the other sensors can work effectively. As shown in Fig. 18.3 when the process starts, the proximity sensor senses the object and signals the PLC to start the conveyor. Objects now pass through guider strips that helped the objects to align before passing it through the IR sensor which determines the PLC to stop the box below the hopper. Basically, guider strips help the object to align towards the centre of the conveyor belt so that objects do not stuck at any other part of the system. Now, it has to pass from below of the hopper, so the items can fell into the box. Now, the conveyor will start again and place the box right upon the rotating disc. After that, the stepper motor controls the motion of the disc so that the box can be closed as per the command given by microcontroller by help of mechanical arm. After closing the box, the IR gives signal to the microcontroller to drop down the taping mechanism so that the box can be sealed so that the items can be stable inside

Fig. 18.3 Block diagram and flow chart of model

250

S. Tripathi et al.

the box and do not get drop down. Then, a servo controlled mechanical arm will place a barcode on the box, and the same time the weight of the box is also measured on the conveyor. Now, the box will be passed from front of a barcode sensor which is coupled to the microcontroller through RS232 and help in validating the box. If the barcode is valid, then the box will continue on the conveyor and sort according weight and the barcode. If the barcode is not valid, then the mechanical push arm will push the box to the inspection box and signal the GSM module to generate a warning message to the supervisor. All the system will be real-time monitored using the SCADA system so that we can have a look of the system from distant areas.

18.4 Product Specification for Proposed Model • IR proximity sensor: The framework comprises two vicinity sensors, used to recognize the nearness of article and stature of boxes. • Robotic arms: There are four robotic arms driven using servo motor. One is used for putting the barcode, and another is used to separate defective items and a set of two for closing of the box. • Conveyor belt: Here, we have used two conveyor belts, but it is attached on one setup with a gap of 0.35 inches so that our packaging mechanism can work. It will help us to place the sensors also. • PLC: A PLC is basically an easy to use small-scale processor-based microcomputer, comprising of equipment and programming, intended to control the task of industrial gear and procedures. An essential preferred standpoint of the PLC is that it very well maybe effectively modified and reinvented. Some driving PLC makers are ABB, Allen Bradley, Siemens, GE Fanuc, Mitsubishi and so on and so forth. To program the PLC, we are using Rx logix Software which is a simulation version for developing logic for controller applications. • Load cell: We have used a strain gauge which works on wheat stone bridge principle, and HX711 IC is used to convert the resistance into weight. It also helps us to measure the weight of boxes on running conveyor. • Barcode sensor: A barcode sensor is used to read the barcode that is printed on the product, the shield of barcode sensor used to read the gap between the lines and convert it into machine language. This is connected to microcontroller through RS232 or USB. • GSM module: We have used GSM 900A so that all 4G SIM can be used for the error recognition system. • Arduino Mega: It is a microcontroller used to interface all the sensors and help us to read analogue signals and program the sensors. It has ten analogue pins and 56 digital pins which help in connecting a large number of sensors and peripheral devices [5]. • Stepper motor: It is used to control the plate so that packing could be possible. ULN2003 is used to operate the stepper in steps of 90° [6].

18 Smart Industrial Packaging and Sorting System

251

• SCADA: The basic SCADA architecture begins with programmable logic controllers (PLCs) or remote terminal units (RTUs). SCADA systems are crucial for industrial organizations since they help to maintain efficiency, process data for smarter decisions and communicate system issues to help mitigate downtime.

18.5 Technical Aspect and Interfacing in the Model In this system, plurality of components such as PLC–SCADA, servo motor-driven robotic arms, DC 12 V motor, sensors along with two types of power supplies like 5 and 24 V DC have been used. Following interfacing is used in our prototype. 1. AC mains to PLC: PLC takes 230 V AC as input power and gives out 24 V DC. 2. SMPS: It simply gives out DC voltages at different levels ranging from 5 to 24 V by taking 24 V DC as input. 3. IR PROXIMITY SENSOR to PLC: For position control, we have utilized this sensor. 4. ARDUINO to ROBOTIC ARMS: Two different servo motor controlled robotic arms are used being interfaced with Arduino which changes their position depending upon the trigger. 5. ARDUINO with GSM: A GSM 900A is used to generate the error message to the authorized person. 6. BARCODE SENSOR with ARDUINO: It is used to scan the barcode printed on the object to sort the object, and the proteus design for the barcode sensor is shown in Fig. 18.4 [7]. 7. ARDUINO WITH LOAD CELL: It is used to measure the weight of the object on the conveyor, and the proteus design is shown in Fig. 18.5 [7]. 8. SERVO MOTORS WITH ARDUINO: It is used to control the mechanical arms, and the proteus design for mechanical controlled structure is shown in Fig. 18.6 [7].

18.6 Result The replacement of robotic pick and place as well as the human factor with the automatic product dispensing inside the regular slotted container (RSC) will be a novelty and an avant-garde in the field of package engineering. Use of retracting conveyors will be the fundamental factor behind the systematic product arrangement inside the corrugated box which is then packaged using a revolving disc type lid closing system that can reduce the time for packaging (as shown in Fig. 18.7) in compare to the hand-based packaging. The barcode sensor will be helping to reduce the manmade errors and machine malfunctioning as shown in Fig. 18.8. The data for barcode sensing is an extrapolated experimental reading, and the data for non-barcode sensing and sorting was the research data from (i) Coca-Cola Bottling Facility, Hebei

252

Fig. 18.4 Barcode sensor design developed by Proteus software

Fig. 18.5 Proteus developed design for load cell

S. Tripathi et al.

18 Smart Industrial Packaging and Sorting System

253

Fig. 18.6 Proteus developed design for mechanical arm

Fig. 18.7 Decrease in time using rotational disc method

China. (ii) PepsiCo Frito-Lay Chips Production Facility, Arizona USA, where major breakdowns in productivity and packaging were sighted. In addition, load cell will help to sort CLD(s) according to the weight and its barcode, and in case of any deviation, GSM will be acting as a feedback for the process, thereby will reduce time for error searching on the production line, and debugging will be done swiftly and more precisely.

254

S. Tripathi et al.

Fig. 18.8 Experimental reading of error reduction using barcode

18.7 Conclusion The results show that the setup can decrease human effort and will succeed to automate the small-scale manufacturing industries and reduce human error by using barcode sensor for sorting of products. This will help in maintaining the productivity at the higher side and with less number of errors. The rotating plate will help to decrease the time of packaging in small enterprises, with faster productivity. Study verifies that the error after using barcode sensor decreased error by 5%, and packaging time also decreased using rotating disc type lid closing system.

References 1. Manjunatha VG (2015) Postal automation system for mail sorting. Int J Emerg Technol Adv Eng 5(3). ISSN 2250-2459 2. Aruna YV, Beena S (2015) Automatic convey or system with in-process sorting mechanism using PLC and HMI system. Int J Eng Res Appl 5(11):37–42. ISSN 2248-9622 3. Gu J, Goetschalckx M, McGinnis LF (2007) Research on warehouse operation: a comprehensive review. Eur J Oper Res 177:1–21 4. https://blog.springfieldelectric.com/sick-image-based-barcode-readers/ 5. Akram H, Suzali S (1995) A new high speed digital motor controller for increasing the packaging speed of a Tiromat machine. In: IEEE industry application society annual meeting, Oct 1995 6. Ali MS, Ali MSR (2017) Automatic multi machine operation with product sorting and packaging by their colour and dimension with speed control of motors. In: 2017 international conference on advances in electrical technology for green energy (ICAETGT), 23 Sept 2017 7. Proteus Design Suite (2018) Proteus 8 Professional

Chapter 19

Credentials Safety and System Security Pay-off and Trade-off: Comfort Level Security Assurance Framework Habib ur Rehman, Mohammed Nazir, and Khurram Mustafa

19.1 Introduction Every critical application essentially needs security, privacy and trust to ensure its reliability. Any compromise of these aspects often creates subtle loopholes for unauthorized access and hacking. Security is concerned with protecting the system from the environment, whereas safety is concerned with protecting the environment from the system [1]. Safety requires ‘malevolent and unauthorized actions to be considered during hazard’ [1]. The system security largely depends upon these credential’s safety. If credentials are expressed over a compromised system [2], there is a high possibility of man in the middle attack. The defensible argument of safety case is endangered; likewise, the assurance case loses the degree of confidence that system properties are secure against attack [3]. Users disclose the credentials [4] as an evidence to show the authenticity as benign user of the application [5]. However, this evidence raises the safety concerns [6] and threatens the security aspects of the application [7]. If they are under the possession of an adversary, the application would be completely hijacked. There is nothing like absolute security. Security always comes at considerable costs and lack of it costs much more. Traditionally, there is no provision for a user for partial expression of credentials, forming the partial trust with an application. However, the partial credentials [4] can allow a user to perform less critical functionality that can keep the unexpressed credentials under safe zone. H. Rehman (B) DXC Technology, Noida, India e-mail: [email protected] M. Nazir · K. Mustafa Department of Computer Science, Jamia Millia Islamia, New Delhi, India e-mail: [email protected] K. Mustafa e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 P. K. Kapur et al. (eds.), Strategic System Assurance and Business Analytics, Asset Analytics, https://doi.org/10.1007/978-981-15-3647-2_19

255

256

H. Rehman et al.

‘Security assurance provides confidence in the security-related properties and functionalities, as well as the operation and administration procedures, of a developed solution’ [9]. We need security for every mission-critical application [8] that has direct impact on people’s financial, personal life or national safety. Mission-critical applications have various distinguishable functionalities [10]. The paper introduces a novel framework of ‘comfort level security’ assurance, to utilize the variation in user decision about the usage of critical functionality. This is to avail comfort level of credential safety and security assurance, which is comprised of the two guidelines: 1. Effectiveness on ‘assuring the right functionality.’ That corresponds to the user control transition from full application to the right functionality. 2. ‘Assuring the rights,’ with minimum expression of user credentials to safeguard the unexpressed credentials. The comfort level security (CLS) is a manifestation of efforts as well as a realization of its effectiveness. It can balance the credential safety with application security by means of intelligent coordination between a user and the system. Finally, the assurance case of CLSF is evaluated in terms of safety, security and privacy analyses.

19.2 Background Credentials often comprise of a user-id with text, graphic password [5] or multi-factor authentication support [11]. These are generally the encrypted digital identity used to trigger a client–server linkage environment. The research considerations about credentials security have gained enough attention for its encryption at client, server [11] and intermediate network [4]. Even strongly encrypted credentials cannot defend from unauthorized access because cryptographic security is also dependent on power of computation [12]. Humans are the weakest link where safety lapse are high [5]. A user initiates the linkage environment still lacks the safety measure against possible loss of credentials [4]. Security and safety are viewed as dependable to each other in a sense that each is concerned with mitigating the effects of a particular kind of failure [1]. These two disciplines use similar techniques to identify potential failure modes and assess their impact on overall system. An unsafe credential might cause the complete security failure of a critical application. Traditionally, the access policies assumed that credentials are safe if they are complex to guess by an attacker and easy to remember for a user [5]. There is no other mechanism for credential safety. An intruder typically traps the credentials, when user express over a compromised system [13]. A financial application deserves more safety measure [7] for its credentials due to the severity of damage it might produce, e.g., banking application [9] is more critical than an online game. It is also not necessary that all functionalities of a security-critical application expect the same level of credentials safety [7]. Application security is an active area of research due to the growing attack incidents [14], but it is largely incomplete without credential safety. The major threat [6]

19 Credentials Safety and System Security Pay-off …

257

on credential safety is observed when an application runs over a compromised system [2, 13]. The current practice of the credential’s safety is only users’ responsibility [5]. It neither considers end user intelligence nor entertains usage variation about various functionalities of an application. However, not all functionalities require the same level of security [9]. Functionality usage may be situational and can vary from person to person. There might be situations where an end user can better realize the possible risk of the login devices, e.g., presence of eavesdrop [15], lack of firewall protection, login from cybercafé or similar forum. The credential safety and application security can be achieved up to a comfort level through a balance coordination of user with different functionalities of an application. When the ‘right credentials’ are expressed by a user for the ‘right functionality’ of an application. This is the central theme of CLSF. Apart from safety and security, it would also serve two purposes. Firstly, It would not be a compulsion for the ‘user’ to bear all credentials when he/she requires to perform less critical functionality, e.g., a user can check his/her account balance from an ATM even in absence of the debit/credit card. Secondly, full credentials validation would not be a compulsion on ‘application’ when user request is only for the non-critical functionality [7]. The unexpressed credentials will remain safe even when the application is login over a compromised system [2]. In Sect. 19.3, we discuss the effective means for an application to perform the right functionality. Sections 19.4 and 19.5 cover how CLS efficiently maps the right credentials with right functionality of an application. In Sects. 19.6 and 19.7, we explore how CLSF achieves requisite safety, security and preserves the credentials privacy.

19.3 Right Functionality Security assurance intends to provide a level of assurance instead of a true measure, how much the application is secure [16]. An application might have multiple functionalities [9] that should also sustain a level of assurance. We thereby explain the concept using typical banking application, which is subdivided into the different functionalities as illustrated in Fig. 19.1. It is obvious that the usage and criticality of ‘password change functionality’ F6 is not the equivalent as the ‘balance checks functionality’ F1 .

19.3.1 Usage Variation The threat severity [6] on all functionalities could not be the same [9]. A user does not perform the critical functionality in every login instance [7]. That is why the normal functionality should not deserve the credentials as required for a critical functionality, because the security hazards are more severe for a critical functionality. Hence, the unnecessarily expressing full strength of credentials to execute the normal

258

H. Rehman et al.

Fig. 19.1 Functionality subdivision of a typical banking application

functionality might cause higher safety hazards [6] over the critical functionality and its credentials. It is highly desirable to deploy dynamic safety controls to provide more flexibility to an end user. Hence, the levels of assurance can vary with justenough credentials, especially when there is high possibility of exposing risk over a compromised system. According to a mobile banking user survey, the account balance or recent transactions checking is the most frequent user activity, which is almost three times more as compared to critical money transfer activity [18, 19], as depicted in Fig. 19.2. Generally, critical activities [7] such as the money transfer [9] or password change is less frequently [17] performed. A system should extra-ordinarily protect the critical functionalities [7]. That is why it requires more stringent safety measures as compared to other frequent functionalities such as account balance checking. The proposed CLS framework addresses such a concern by involving user. It varies the level of security assurance considering the credential safety requirement and usage variation.

19 Credentials Safety and System Security Pay-off …

259

Fig. 19.2 Frequency of mobile banking activities [17]

19.3.2 User Claim User claim for appropriate functionality mainly depends on three factors: meet the current requirement, its mapping with the credential and involved risk. To address the variation of usage and criticality in terms of safety and security, it is the direction of this research. A user can perform risk assessment of client system, especially to consider the case of compromised system. Such scenarios may include login from cybercafé, working off hours, avoidance to perform critical functionality [7] during nonofficial hours, working from personal computer having lack of firewall protection, etc. A user can determine the extent of credentials required to procure the functionality; hence, the system can place reliance on assured identities accordingly. 1. The user’s objectives for availing the service (e.g., a service with safety objective of minimum expression of credentials); 2. The application’s appetite for accepting risk considering the impacts and assessment of user’s profile; and 3. The functionality approach to mitigate residual risk, (if possible to mitigate risk or comfortable accepting the risk). Higher level of confidence perceives higher risk too. It has profile-based risk assessment [20] for the distinct functionalities of an application. The appropriate level comprises user comfort, which depends on the risk consequences of credentials misuse and likelihood of the occurrence.

19.3.3 Theoretical Basis Claim To authenticate a user claim, the credentials are the most acceptable basis as available evidence. They represent the ‘degree of belief’ graded in terms of the number of eliminated defeaters (i.e., the reasons for doubt) [21]. Initially, every intruder is considered as defeaters with ‘zero confidence.’ The confidence is built by eliminating

260

H. Rehman et al.

the reasons to doubt the validity of claims (that intruder is not an attacker), evidence (of associatively) and the inference rules (for premises of control). As reasons for doubt are eliminated, confidence grows (eliminative induction). The functionality separation attains the defense in depth principle of security [1]. The concept of eliminative induction and defeasible reasoning help for developing sound and complete arguments [22] of the credential, to access the selective functionality of an application. The system is doubtful for every benign user or an adversary and considers them as the defeater to the system unless receive the evidence [23] to eliminate its title of defeaters. The sub-constituents Pw1 |e1 , Pw2 |e2 , . . . , Pw5 |e5 of same credentials Pw = Pw1 |, Pw2 |, . . . , Pw5 can perform multiple behaviors with abrupt variation e1 , e2 , . . . , e5 of its constituents. The unexpressed constituents will remain safe on compromised system. Figure 19.3a gives an overview how it can provide greater control on credentials safety along with arbitrary levels of security assurance, e.g., 1|5 (when some doubt is eliminated) represents more confidence than 0|5 (when no doubt is eliminated), whereas 5|5 (represents all doubts are eliminated) hence the complete confidence [22] for full access from low to critical functionality. If the credentials separation is not applied on application, access for all functionalities will be clubbed like the traditional way as is illustrated in Fig. 19.3b.

Fig. 19.3 Comfort level security a comparison

19 Credentials Safety and System Security Pay-off …

261

19.3.4 Proof of Concept The multiple credentials can be a combination of multi-factor, text or graphical passwords, etc. We explain our concept with zoom feature of digital map, as depicted in Fig. 19.4 for easy generation of multiple credentials, without an overhead of remembering them separately. The zooming control undergoes sequentially selection of one quadrant their by eliminates the other three quadrant as the defeaters at each zoom level until reach out the coordinates to express the required credentials of high entropy evidence. Using Google Map and JavaScript, we implemented a prototype where six functionalities of banking application are ranked from low to high criticality and mapped with zoom feature. The selected points form a polygon using Google Encoded Polyline Algorithm Format [24]. The order of selection and zoom levels provides anonymity for a user to manage various credentials instead of memorizing them separately. It forms an easy elimination of defeaters as per the need of inevitably requisite of functionality, i.e., hide the credential exposure and seek the comfort level of security assurance.

Fig. 19.4 Functionality coordinates mapping

262

H. Rehman et al.

In this prototype, we de-limit ourselves to the perceptual level in generating the credentials. However, the biometric aspect such as finger touch and cognitive aspects such as the speed-dependent zooming or finger sequence can provide further anonymity to manage safe credentials. A user can claim specific functionality with this flexibility. Criticality of the functionality is application dependent and usage is user-dependent. The frequency of used functionality sets the usage index in the user profile. An application can track this statistic. We conducted a survey on 34 online banking users for a premium question about the usage index of the six fundamental functionalities illustrated in Fig. 19.1. Table 19.1 summarizes the results of this survey. The average variation of safety and criticality is illustrated in Fig. 19.5. The safety index is the function of unexpressed credentials of average user. The assurance case is the ratio percent of usage index and safety index. This assurance case of individual functionalities is unattainable in traditional access control methods Table 19.1 Assurance case in functionality subdivision Functionality

Criticality index

Unexpressed credentials

Safety index

Usage index

Assurance case

F1

0.05

0.75

0.4285

0.35

0.167

F2

0.15

0.50

0.2857

0.25

0.33

F3

0.2

0.25

0.1428

0.2

0.22

F4

0.25

0.25

0.1428

0.15

0.22

F5

0.35

0.0

0.0

0.05

0.0

F6

0.35

0.0

0.0

0.05

0.0

Fig. 19.5 Safety and criticality indexes of functionalities

19 Credentials Safety and System Security Pay-off …

263

Table 19.2 Traditional access for clubbed functionalities Functionality

Criticality index

Unexpressed credentials

Safety index

Usage index

Assurance case

F 1 –F 6

1

0.0

0.0

1

0.0

that provide full access AFull to the entire application for all functionalities from F1 to F6 , as depicted in Table 19.2.  AFull :=

0, Credentials = No expression, F F F F F F1 2 3 4 5 6 , Credentials = Full expression

If an application is disintegrated under different functionalities and we can map Comfortaccess ⊂ (F1 ≤ F2 ≤ F3 ≤ . . . ≤ F6 ) against the user claim x for functionalities Fi (x) ⊂ (C1 < C2 < C3 < . . . < C6 ). It can avail credential safety and application security by correctly mapping the ‘right credentials’ with ‘right functionality.’

19.3.5 Functionality Right to Right Functionality Mapping An assurance case is a structured argument that system properties are desirable reliable, or safe, or secure against attack [3]. They are widely used to justify the safety then in security [3]. An application requires high-security assurance for its critical functionalities [7] and safety for its digital identity. A high-level assurance case facilitates its stakeholders (service provider, prospective users) to understand how the safety efforts can be taken together to use the system safely [3]. An end user is the stakeholder, who actually chooses the credential to facilitate the functionality right and develop comfort level security assurance, which in turn helps the system to take better security decisions while executing the right functionality. The proposed CLSF is based on partitioning the user’s credentials as evidences E i so that their reintegration pattern conveys additional information to the access control mechanism and provides specific ‘functionality rights’ to the ‘right functionality.’ The right credential expresses the functionality rights to the right functionality of an application as illustrated in Fig. 19.6. s ; πsc,d ; . . . ; πsm,n ; πsy,z , . . . E i : πa,b

Apps : F1 ; F2 ; . . . ; Fi ; . . . ; Fn ; . . .

264

H. Rehman et al.

Fig. 19.6 CLSF facilitates a user to express “functionality rights” for the “right functionality”

19.4 Comfort Level Security The environmental interaction of an application requires subjective assessment by a designer, regulator or evaluator. In order to evaluate validity, we need an assurance case framework [3] for mutual understanding for the application and end user. The proposed CLSF is using the design by contract (DBC) [32] methodology, i.e., based on assumption that ‘the service provider and the prospective users of modular application establish a contract between each content and comfort, in order to correctly use the expected range of functionality.’ The security best practices provide implicit assurance [3]. To have an explicit assurance case, some changes are needed in the application and setup rules to assess by the user.

19.4.1 Comfort Level Comfort level (CL) defines the degree of sensitivity to access the data and resources [4]. CL addresses the content sensitivity, resource criticality and user’s action, e.g., consequence of changing the password. It rates and maps with layer trust levels [25] as defined in Table 19.3. Full access level has a slight disturbing effect from the confidentiality point of view. Hence, its dispersal is also desirable in terms of content sensitivity and CL.

19.4.2 Credentials Hierarchy Credentials can be an efficient means for users to express the comfort level. Specifically, the multi-factor authentication depends on information that can be dispersed

19 Credentials Safety and System Security Pay-off … Table 19.3 Comfort level dispersal

265

Content sensitivity and comfort level 1

virtual sensitivity functionality_secret

2

Properties

3

provided_virtual_trust_level ⇒ (classifier (vt_secret));

4

end functionality_secret

5



6

virtual sensitivity functionality_unclassified

7

Properties

8

provided_virtual_trust_level ⇒ (classifier (vt_unclassified));

9

end functionality_unclassified

10



11

virtual comfort vs_secret

12

Properties cls:: comfort_level ⇒ 5;

13 14

end vs_secret;

15



16

virtual comfort vs_unclassified

17

Properties cls:: comfort_level ⇒ 1;

18 19

end vs_unclassified;

20



across various orthogonal factors. It may include knowledge (something the user knows, e.g., password) possession (something the user have, e.g., a hardware token) or inheritance (something that belongs to user, e.g., biometric information such as fingerprint or retina scan) [26] summarized in Table 19.4, as follows. These factors can serve the purpose for compute of CL to demonstrate user’s decision to the functionality request. A user can express the functionality right by putting knowledge and possession together in a seamless manner. Hence, the application achieves user’s intent to login.

19.5 Comfort Level Security Framework CLSF is a function of gradual expression of credentials, i.e., user’s claims for specific functionality. A user can personate or impersonate an entity (e.g., mobile phones, PDAs, set-top boxes, laptops). A relation is setup between the application and user’s conscience for the various functionalities [9, 32] of an application. The different

266

H. Rehman et al.

Table 19.4 Various authentication factors Factors

Knowledge and possession

Password

The length and sequence as the sub-elements of password forms the magnitude and direction of the knowledge factor

Swap card

User possesses the swap cards that are also having printed knowledge for authentication

Biometric

Possession of physical characteristics, e.g., finger prints are in the direct possession of physical attributes, and particular finger can provide what level of authentication is the ingredient of knowledge factor

Behavior biometric

Behavioral attributes are knowingly/unknowingly captured

Geographical location

The GPRS on the digital map is the possession attributes of the user location

Timestamp of login

Event of login at an instance reveals time and location

functionality access using few factors of the MFA obtained through argumentation of valid credentials. Zone of trust: The comfort level provides different zones of trust in the entire application and facilitates more flexible, dynamic and granular control. It relatively enhances the flexibility and focuses for usage variation as shown for an example in Fig. 19.7. An argument states a claim that ‘if true, serves for eliminating the associated defeater.’ The refinement of the argument eventually develops trust by associated evidence. The potential number of defeaters can be incredibly large for a safety sensitive system. A convincing argument is to formulate low-level defeaters that can be eliminated by appropriate evidence [22]. Failure to identify a relevant defeater leads to an inflated sense of insecurity. It might be a threat to the account for the higher level of defeater (doubting the claim, i.e., attack instant), hence, deny the access

Fig. 19.7 CLSF different zones of trust

19 Credentials Safety and System Security Pay-off …

267

of critical functionalities. If it is difficult to eliminate some lowest level defeaters (break of entropy or decryption), then the associated doubt leads to mistrust for all functionalities of an application. It relatively enhances the application flexibility and focus for rapid detection of compromise. The argumentation is based on hierarchy of claim [22] an example is shown in Fig. 19.7. Our interest is to know ‘how a level of user confidence and a level of security influence the assurance case claims.’ C L 1 : The first level presents minimal assurance through asserted credentials. It is used for the functionalities that can bear minimum risk during erroneous authentication, e.g., self-registered credential or a MAC address satisfying a device authentication requirement, traditional text password or single-factor authentication. C L 2 : The second level presents higher assurance that is requested for functionality associated with moderate risk, i.e., proving confidence one level above on C L 1 , e.g., 2FA. C L 3 : The third level serves a higher confidence beyond C L 2 . It is required for functionalities, which might cause substantial risk during erroneous authentication. It develops higher level of trust by means of expressing multiple credentials (as factors of authentication). C L 4 : It is required for next critical level of functionalities [7], where erroneous authentication might cause preeminent threat to the application. Hence, it deserves more reliable proof because of the associated risk, e.g., tamper-resistant hardware devices (for secret or private cryptographic keys) or biometric proof. It should be scalable to higher levels. However, for lower convince access, it is not mandatory to present all multi-factor authentications. Object specified by CLS: It defines the set of increasing restrictions [4] that applies to the content sensitivity of user request for access rights. Typically, the multi-factor authentication scheme controls the end user request: what they know, what they have, who they are, where they are (the access location), or the time of day (timestamp), etc. These factors collectively or selectively can be used to define content sensitivity and a trust level, e.g., what they know (password) being the lowest level of the trust. Rules definition: Accessing higher sensitivity levels requires additional controls while accessing lower sensitivity levels allow reduced controls. Such a scenario is depicted in Fig. 19.7, taking the trust level 3 as the default: (a) Allow access to data sensitivity levels 1 and 2 and hence associated with fewer consequences (reduced control). (b) Default controls are associated with a particular trust level that provide access to data and resources associated with the same data sensitivity level and if tampered will result in moderate consequences. (c) Allow access to higher data sensitivity levels 4 and 5 and hence they are associated with high consequences (Additional control). Table 19.5 illustrates a mapping example of trust, sensitivity and comfort level of security assurance. The different CL (C L 1 , C L 2 , C L 3 , C L 4 , . . .) are mapped with different functionalities (F1 , F2 , F3 , F4 , F5 , . . .) of an application as illustrated in Fig. 19.7.

268 Table 19.5 CLS mapping with trust level and sensitivity levels

H. Rehman et al. CL

Control ratio

Trust level

Sensitivity level

C L1

¼

Very low

Very low

C L2

½

Low

Low

C L3

¾

Moderate

Moderate

C L4

1

Full

High

Traditionally application behaves as the single access point, e.g., read and write clubbed into single write access. The multiple functionalities [7] form the separation patterns [3, 27]. The sensitivity [4] of these functionalities assures the layered security [28]. ‘Comfort level access’ is the central theme to claim these functionalities. Here, the level of security assurance varies for critical operation [7]. A combination of CL and trust level defines the set of rules to access a resource. Default controls are used when a particular CL do an access of a resources at the same sensitivity level [4]. We can associate functionality Fi with trust level ηi , e.g., binding the program objects under security assurance. The layered security is imposed at run-time over the functionality as per the expressed credentials (ec). It permits Fi to acquire read access to an object a only if a → ηi and acquire write access to an object b only if b → ηi . In sequential notation Fi → ηi can read objects if a1 , a2 , a3 , . . . , am → ηi and Fi → ηi can write objects if b1 , b2 , b3 , . . . , bm → ηi . The comfort level access of an object is validated prior the executing any functionality. It makes the functionality sensitive enough to prevent unauthorized access, hence, provides object level sensitivity.

19.6 Security Assurance A critical application needs high security, privacy and safety to assure its digital identity. ‘Security lable the degree to which nasty loss is prevented, detected and reacted to,’ whereas ‘safety lable the degree to which accidental loss is prevented, detected and reacted to’ [30, 31]. Security assurance is the intention of reasoning, argument and supporting evidence for justification of the claims. We cannot assure that system is completely free from malicious programs. Hide the credential exposure and seek the security assurance as basis for assessing the validity of an assurance case. We next compare the CLS for partial expression of credentials with the traditional full credential expression. The assurance case of CLSF is evaluated based on useradversary model for safety, security and privacy analyses [6].

19 Credentials Safety and System Security Pay-off …

269

19.6.1 Security Analysis The traditional means of adapting multi-factor access control solution can prevent an adversary from gaining the access but do not achieve layered security. The enforcement of multiple factors is collectively validated at servers. Here, credential safety with system security becomes a great challenge. The novelty of CLSF is with selective usage of orthogonal possession vector, in addition to the usage of knowledge vector, as a mechanism to provide multiple levels of security assurance. The magnitude of knowledge and possession in a specific direction gives rise to create user controlled knowledge vector and possession vector. The partitioning of user’s credentials into the syntactic, semantic and cognitive domain does not strictly follow the interpretation of conventional multi-factor authentication. To perform a critical functionality [7], a user needs to express required knowledge and possession vectors, while selective vectors can allow a user to perform less critical functionality. Hence, the reintegration pattern of the constituent conveys additional information to express the ‘functionality rights’ for the ‘right functionality.’

19.6.2 Security Analysis in Syntactic Domain Syntactic domain of a formal specification is to populate the magnitude and direction of knowledge and possession vectors [29]. The primarily used encryption password Pw acts as the combination of these vectors. The breakup of this password has their specific encryption base (for additional protection of these possession vectors), i.e., Pw1 |e1 , Pw2 |e2 , . . . , Pw5 |e5 . Hence, P W = Pw1 |Pw2 , . . . , Pw5 can have the break ups (which can be seen as a knowledge cum possession vector having its magnitude and directions). Here, the separator ‘|’ indicates the concatenation of sub-elements. For security, one may encrypt this content pw1 , pw2 , . . . , pw5 using e1 , e2 , . . . , e5 and soon the part of pwi . Partitioning the credentials is preferable for safe usage of available evidence at an edge.

19.6.3 Security Analysis in Semantic Domain The knowledge and possession pattern. The  vector in definite order forms the secret   s  s   is an ordered pair of secret tokens π π ST semantic domain i, j k,l k,l . User can explicitly determine the sequence (S0 ; S1 ; . . . ; Sn ). The selected sequence generates the tuple i ps : i p0 , i p1 , . . . , i pi , . . . i pn , i.e., compared with the stored secret pattern at server. These  S P  s are the combinations of  ST  and particularly vital for security measure [6]. Keeping them confidential provides an additional layer of security [28, 29] by obfuscation.

270

H. Rehman et al.

19.6.4 Security Analysis in Cognitive Domain 

ST  s are easy to remember for a human and represents knowledge and possession vectors. Different patterns [3] of information can be generated with slight variation of knowledge and possession vectors. It correspondences to the transition from the full exposure of credentials to the partial exposure, the selective functionality of an application is available. Hence, it avails the human controlled encryption that makes decryption difficult [29]. These constituent of secret tokens are easily recallable and encrypt able through human memory. The syntactic, semantic and cognitive [15] domains of the credentials present an unambiguous notations of security specification [6].

19.7 Safety and Privacy Analysis The credential safety derives multi-layer access through multi-factor control and/ or (multiple use of the same factor) control. To protect critical data or resources, we can distribute its access (in addition of applying encryption). The combination of these distributions is based on few measures that form various patterns of comfort levels depicted in Fig. 19.3a. This is to safeguard the credentials, which are unexpressed over a compromised system.

19.7.1 Precaution Measures • First measure: The declared secret patterns are sufficient to identify individual action. It is the expressiveness of various semantic actions of a user, who spread the secret patterns [3] and forms the sequence vectors, e.g., in multi-factor authentication (MFA) server accepts the credentials only if each of the factors pass the authentication, and all factors are separately verified [11]. • Second measure: The sequence vector {S0 ; S1 ; . . . ; Sn } is correct and sufficient up to the level n. The key generation requires an orderly expression of the required combination of knowledge and possession vectors. Its output vector is weighed sum of correct sequence. • Third measure: The initial enrollment of possession factors is secure. This measure considers the initial enrollment of all possession factors, where an individual has to enroll in controlled environment (by the designated authorities). Untrusted enrollment might lead to severe consequences, especially when possession

19 Credentials Safety and System Security Pay-off …

271

vector is the only deciding factor to set a specific trust level. Based on above measures privacy is also preserved.

19.7.2 Privacy Analysis We are trying to define a secure way of choosing human secrets that can eliminate the guessing attack [5]. More so, for every token of information exchange, how does it become meaningless to an adversary? We are evaluating the attack vector steps to check the credentials privacy, as follows. • Step 1. User is the initiator and the adversaryƎ is the observer.   s  model the exposure of the session key of instance. This query is i, j πi, j validto embrace the role, it actually holdsunder   a session key. If every token that   s t i, j πi, j sends out and delivered to i, j πi, j , with the response to this message     being returned to πi,s j as its own next message and same is observed by πtj,i . It       leads to matching conversation i, j πtj,i , with i, j πi,s j . • Step 2. An adversaryƎ is the initiator.     If every token πsj,i that was formerly generated by πi,s j and each message       that i, j πi,s j sends out and delivered to i, j πsj,i w.r.t the message being       returned πtj,i , then we say that the conversation πi,s j is hacked πtj,i .No − matching E (k) = ϕ with precise observation. This second condition is simply implied due to the first one. • Step 3. An adversaryƎ is the initiator. The oracle has a matching conversation if the first observation K matches with the second conversation K. For perfect match of this conversations, a sufficient number of moves R = 2η − 1 are needed (R-move π). Run in presence of an  policy on   adversary Ǝ and consider two oracles, πi,s j and πsj,i engross conversations K and K , respectively. For example, (τm , υm , ωm ) encodes that some oracle as asked υm and responded with ωm at time τm . K is a matching conversation with K if there exist τ0 < τ1 < . . . < τR and υ1 , ω1 , . . . υη , ωη , such that K is prefixed by (τ0 , ω0 , υ1 ), (τ2 , ω1 , υ2 ), (τ4 , ω2 , υ3 ) . . .     τ2η−4 , ωη−2 , υη−1 , τ2η−4 , ωη−2 , υη and K is prefixed by

272

H. Rehman et al.

  (τ0 , υ1 , ω1 ), (τ3 , υ2 , ω2 ), (τ5 , υ3 , ω3 ) . . . τ2η−3 , υη−1 , ωη−1 . K is a matching conversation with K if there exist τ0 < τ1 < . . . < τR and υ1 , ω1 , . . . υη , ωη , such that K is prefixed by (τ1 , υ1 , ω1 ), (τ3 , υ2 , ω2 ), (τ5 , υ3 , ω3 ) . . .     τ2η−3 , υη−1 , ωη−1 , τ2η−1 , υη , ∗ and K is prefixed by   (τ0 , υ1 , ω1 ), (τ3 , υ2 , ω2 ), (τ5 , υ3 , ω3 ) . . . τ2η−3 , υη−1 , ωη−1 . Apropos to these, an unordered or incomplete sequence will break the conversation making it harder for an adversary to login the server (irrespective of the traditional condition to have all factors mandatorily evaluated at server). Hence, the privacy of all unexposed factors is preserved up to the level of its entropy. The ordering of the exposed factor further enhances the entropy by the factor of its mismatch. A user is allowed to login the system up to the extent of supported factors; hence, the privacy of unsupported factors is preserved.

19.8 Conclusion Security and safety are open-ended subjective problems. They become more severe with growing uncertainty about the knowledge and capabilities of attackers; credential safety becomes worse over a compromised system. Hence, it becomes a great challenge for the system security. Criticality is system dependent and usage is userdependent. The articles suggest that frequent use of common functionality should not threaten the occasionally used for critical functionality. Moreover, wisely applying the functionality separation can prevent security breaches across the functionalities. The article introduces a novel framework of ‘comfort level security assurance,’ which can balance the credential safety with application security by means of intelligent coordination between a user and the system. Hiding the credentials exposure and seek the security assurance is the basis for CLSF. It reduces the intruder’s effect up to a comfort level of application security and credential safety, which is demonstrated via various functionalities of a banking application. The safety, security and privacy analysis of CLSF also confirms the reduction in intruder’s effect up to a comfort level of application functionalities.

19 Credentials Safety and System Security Pay-off …

273

References 1. Bloomfield R, Netkachova K, Stroud R (2013) Security-informed safety: if it’s not secure, it’s not safe. In: International workshop on software engineering for resilient systems, pp 17–32. Springer, Berlin, Heidelberg 2. Zhao X, Borders K, Prakash A (2005) Towards protecting sensitive files in a compromised system. In: Third IEEE international security in storage workshop, SISW’05, pp 8–26 3. Alexander R, Hawkins R, Kelly T (2011) Security assurance cases: motivation and the state of the art. High Integrity Systems Engineering Department of Computer Science University of York Deramore Lane York YO10 5GH 4. Ardagna CA, De Capitani di Vimercati S, Foresti S, Paraboschi S, Samarati P (2012) Minimising disclosure of client information in credential-based interactions. Int J Inf Priv, Secur Integr 2(1):2–3, 205–233 5. Shay R, Komanduri S, Kelley PG, Leon PG, Mazurek ML, Bauer L, Cranor LF (2010) Encountering stronger password requirements: user attitudes and behaviors. In: Proceedings of the sixth symposium on usable privacy and security, p 2. ACM 6. Pereira D, Hirata C, Pagliares R, Nadjm-Tehrani S (2017) Towards combined safety and security constraints analysis. In: International conference on computer safety, reliability, and security, pp 70–80. Springer, Cham 7. Delange J, Nam MY, Feiler P, Klieber W (2016) An architecture-centric process for MILS development. In: 2nd international workshop on MILS: architecture and assurance for secure systems. MILS workshop 2016, Prague. http://mils-workshop-2016.euromils.eu/#description 8. Rao HR, Upadhyaya S (2009) Information assurance, security and privacy services, vol 4. Emerald Group Publishing 9. Beznosov K, Kruchten P (2004) Towards agile security assurance. In: Proceedings of the 2004 workshop on new security paradigms, pp 47–54. ACM 10. Rehman H, Nazir M, Mustafa K (2018) Comfort level security–a multi-factor authentication framework. Int J Appl Eng Res 13(17):13166–13177 11. Adham M, Azodi A, Desmedt Y, Karaolis I (2013) How to attack two-factor authentication internet banking. In: International conference on financial cryptography and data security, pp 322–328. Springer, Berlin, Heidelberg 12. Bernstein DJ (2009) Introduction to post-quantum cryptography. In: Post-quantum cryptography, pp 1–14. Springer, Berlin, Heidelberg 13. Strunk JD, Goodson GR, Scheinholtz ML, Soules CA, Ganger GR (2000) Self-securing storage: protecting data in compromised system. In: Proceedings of the 4th conference on symposium on operating system design and implementation, vol 4, p 12. USENIX Association 14. Rehman H, Nazir M, Mustafa K (2017) Security of web application—state of the art: research theories and industrial practices. In: Information, communication and computing technology, ICICCT 2017. Communications in computer and information science, vol 750, pp 168–180. Springer, Singapore. https://doi.org/10.1007/978-981-10-6544-6_17 15. Jøsang A, Rosenberger C, Miralabé L, Klevjer H, Daveau J, Taugbøl P (2015) Local user-centric identity management. Journal of trust management 2(1):1 16. Cheng BC, Chen H, Tseng RY (2007a) A theoretical security model for access control and security assurance. In: Third international symposium on information assurance and security, 2007. IAS 2007, pp 137–142. IEEE 17. CMFS: Consumers and Mobile Financial Services Report (2016) Board of Governors of the federal reserve system. https://www.federalreserve.gov/econresdata/consumers-and-mobilefinancial-services-report-201603.pdf 18. Burhouse S, Chu K, Goodstein R, Northwood J, Osaki Y, Sharma D (2014, October) National survey of unbanked and under banked households. Fed Depos Insur Corp. https://www.fdic. gov/householdsurvey/2013report.pdf 19. Burhouse S, Homer M, Osaki Y, Bachman M (2014) Assessing the economic inclusion potential of mobile financial services. Fed Depos Insur Corp

274

H. Rehman et al.

20. Cheng PC, Rohatgi P, Keser C, Karger PA, Wagner GM, Reninger AS (2007b) Fuzzy multilevel security: an experiment on quantified risk-adaptive access control. In: IEEE symposium on security and privacy, 2007. SP’07, pp 222–230. IEEE 21. Weinstock CB, Howard FL, Goodenough JB (2007) Arguing security: creating security assurance cases. Technical report, Software Engineering Institute, Carnegie Mellon University 22. Weinstock CB, Goodenough JB, & Klein AZ (2013) Measuring assurance case confidence using Baconian probabilities. In: 1st international workshop on assurance cases for software-intensive systems (ASSURE), pp 7–11. IEEE 23. Goodenough J, Weinstock CB, Klein AZ (2012) Toward a theory of assurance case confidence. Software Engineering Institute, Carnegie Mellon University, Pittsburgh, PA 24. Google. Encoded Polyline Algorithm Format (2013). https://developers.google.com/maps/ documentation/utilities/polylinealgorithm. Accessed July 27, 2019 25. Payne BD, Sailer R, Cáceres R, Perez R, Lee W (2007) A layered approach to simplified access control in virtualized systems. ACM SIGOPS Oper Syst Rev 41(4):12–19 26. Bhargav-Spantzel A, Squicciarini AC, Modi S, Young M, Bertino E, Elliott SJ (2007) Privacy preserving multi-factor authentication with biometrics. J Comput Secur 15(5):529–560 27. Pavlich-Mariscal JA, Demurjian SA, Michel LD (2005, October) A framework for composable security definition, assurance, and enforcement. In: International conference on model driven engineering languages and systems. Springer, Berlin, Heidelberg, pp 353–354 28. Alves-Foss J, Taylor C, Oman P (2004) A multi-layered approach to security in high assurance systems. In: Proceedings of the 37th annual Hawaii international conference, system sciences, p 10. IEEE 29. Rehman H, Khan U, Nazir M, Mustafa K (2018) Strengthening the Bitcoin safety: a graded span based key partitioning mechanism. Int J Inf Technol 1–7. https://rdcu.be/bah3Z 30. Pietre-Cambacedes L, Chaudet C (2009) Disentangling the relations between safety and security. In: Proceedings of the 9th WSEAS international conference on applied informatics and communications. World Scientific and Engineering Academy and Society (WSEAS)on AIC’09, pp 156–161 31. Bellare M, Rogaway P (1993) Entity authentication and key distribution. In: Annual international cryptology conference. Springer, Berlin, Heidelberg, pp 232–249 32. Rubio-Medrano CE, Ahn GJ, Sohr K (2014) Achieving security assurance with assertionbased application construction. In: 2014 international conference on collaborative computing: networking, applications and worksharing (collaboratecom), pp 503–510. IEEE

Chapter 20

A Study of Microfinance on Sustainable Development Shital Jhunjhunwala and Prasanna Vaidya

20.1 Introduction In 2015, United Nations Organization (UNO) announced the Sustainable Development Goals (SDGs) which are adopted especially by South Asian countries to address the sustainable issues such as poverty, gender inequality, education, health, economy and so on. Despite the economic and technological breakthrough, South Asian countries account for 36% of the world’s poor [1] and 736 million people are living on less than $1.90/day [2]. The concept of credit was a relevant and useful programme which took a big bounce in the 1970s when Yunus’s Grameen Bank in Bangladesh began to institutionalize. Later, he was awarded a Nobel Peace Prize for his microcredit movement contribution on the modern concept of microfinance in 2006 [3]. The idea of microfinance was introduced to help, uplift the living standard of poor people and empower women. Microfinance is the arrangement of minimal credits to the poor people, to enable them to participate in new productive business activities and to extend existing ones [4]. It allows them to join in self-employment projects that contribute to generate income and improve their social status and standard of living. However, microfinance includes a broader range of services overtime. Microfinance is a developmental tool for providing financial services, namely: credit, saving/deposit, insurance, leasing and money transfer, and repayment services. It also includes social intermediation services, namely: training and education, organizational support, skill development to people who are deprived of getting into conventional financial services [5–7]. In

S. Jhunjhunwala (B) · P. Vaidya Department of Commerce, Delhi School of Economics, University of Delhi, North Campus, New Delhi, India e-mail: [email protected] P. Vaidya e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 P. K. Kapur et al. (eds.), Strategic System Assurance and Business Analytics, Asset Analytics, https://doi.org/10.1007/978-981-15-3647-2_20

275

276

S. Jhunjhunwala and P. Vaidya

contrast, microfinance institution insight to develop their capacity, imposes a small percentage of enthusiasm on the credit amount or loans. In this regard, microfinance is a form of strategy which has brought significant impact and boosts the ability of poorer individuals primarily aimed to alleviate poverty [8]. It focused on encompassing both socio-economic factors and has proven to be a useful tool in aiding Sustainable Development Goals. It further contributed substantially to improving the level of education, housing, gender equality, health care, overcoming hunger and environmental degradation [9, 10]. Today, microfinance has created an opportunity to invest mainly in developing regions such as Latin America, Africa and South Asia. This process is facilitated by international institution like the United Nations, the World Bank and National Institutions like commercial banks, development banks, MFIs, finance companies, NGOs, etc. [11]. It also dedicates large amounts and research projects to microfinance. However, the success rate of microfinance differs among different countries and across the globe. Despite the long history of South Asian countries, microfinance has both success stories and failure stories and some are still debatable. The relationship between microfinance and achieving its sustainable goals is still questionable [11]. This article provides some new empirical evidence in achieving Sustainable Development Goals. The study further endeavours to fill the gap in identifying the impact of microfinance on various socio-economic elements of microfinance. This study’s objective is to examine the impact of microfinance on poverty reduction, women economic empowerment, gender equality, children enrolment in school, especially girls, and health status. The study manifests the developmental impact of microfinance among India (Katihar) and Nepal (Morang). The next section provides relevant reviews of the literature. Section 20.3 describes the research methodology. Section 20.4 shows the research analysis and empirical results. The final section exhibits a conclusion.

20.2 Literature Survey A systematic review provides an opportunity to address and analyse the questions based on the evidence of impact in the field of research so far. Microfinance has been recognized as an essential tool for reducing poverty and socio-economic well-being. It helps to diversify the income, smoothen the household consumption and empower them to cope with economic shocks and fluctuations [5–7]. Microfinance plays a crucial role in poverty reduction and socio-economic development in sub-Saharan African countries [12]. Bangladesh, India, Sri Lanka and Uganda clients of MFIs have access to multiple institutions for their credit and savings needs [13–16]. The credit borrowed for the productive purpose was essential for poverty reduction in rural areas than urban areas and significant positive outcome of microfinance institution [17]. Microfinance is a successful tool to reduce poverty and empowering provincial women which also guide to social and economic changes in rural India. It analysed the impact of

20 A Study of Microfinance on Sustainable Development

277

microfinance on enabling SHGs clients in psychological and managerial skills, their attitudes, and social and economic aspects in Kanyakumari district [18]. Microfinance and self-help groups are viable in alleviating poverty, empowering women, creating awareness and ensuring the sustainability of the environment, which finally achieve sustainable development [19]. An economically active woman with her independent savings and increasing income share within the household has continuous economic power, and this makes her empowered and prone to challenge the overarching norms that confine her ability to make decisions [20, 21]. Further, credit programme leads women taking a household decision, schooling of children, greater access to financial and economic resources, social networks, bargaining power and greater freedom of mobility [22, 23]. Microfinance has a huge potential for adding to women’s financial, social and political strengthening. Access to savings and credit by women instigates their interest in economic activities, which eventually improves employment opportunities for women. This economic commitment may increase their role in financial decisionmaking in the family as well as a change in gender roles and increased status within families and communities. Savings and microfinance credits lead to increased income and resources and control over these earnings and resources. Status of women within the community is also enhanced through a blend of women’s’ increased economic activity and control over income, access to information, improved aptitudes and encouraging groups of people [21]. The microfinance has expedited a positive effect on the life of the poorer when compared with those who do not have access to these microfinance services. It has accelerated a positive effect on income, resource building and access to schools and medical facilities in the area [24]. The studies of Grameen Bank and two different MFIs in Bangladesh showed a small but significant and positive impact that children attend schools [25]. It also acknowledged improved health, nutrition practices and a positive effect on the education of the children. Again, the study of FOCCAS explains Zimbabwe, India, Honduras and Bangladesh also have the same result [6, 26]. Microfinance performance, microfinance customers, credit portfolio outstanding and savings are in increasing trend. The level of poverty stays unaltered across rural families. As a proportion of impact, with the rise in client duration, cooking fuel status is yet to be improved. The contribution of other incomes in deciding the total income of households is more significant in comparison with agricultural and farm-related income. The status of girl child enrolment in a private school has significantly increased independent of the degree of earning of families [10]. Microfinance builds the self-confidence of the poor by gathering their emergency prerequisites, ensuring convenient need-based credits and making the poor capable of savings. Peoples do not consider microfinance as a supportive tool for health issues, yet a small portion of people take loans for health facilities [27]. On the other hand, some researcher argues that relations are still debatable. Other factors also contribute to poverty reduction like GDP, international openness and inflation rate [11]. In sub-Saharan, Africa found the mixed evidence positive on health and housing, but a negative impact on the educational attainment of children [28].

278

S. Jhunjhunwala and P. Vaidya

Therefore, this study further examines the hypotheses, which are: • H1: Microfinance leads to sustainable development by reducing poverty and women economic empowerment. • H2: Microfinance leads to sustainable development by increasing children enrolment in school, especially girl, gender equality and improvement in health status. • H3: There is a significant relationship in the developmental impact of microfinance between Katihar and Morang.

20.3 Methodology This study is a survey based on descriptive research design in defining the socioeconomic conditions and impact of microfinance. The study has mainly focused on the primary sources of data and also considered secondary sources of data to review the available literature. The sample size comprises 60 respondents each drawn from Katihar district of India and Morang district of Nepal. One hundred and twenty respondents are selected using stratified and judgmental sampling technique. Structured questions, some dichotomous question and Likert scale (1–5 point scale) helped to collect the information from the respondents. The quantitative statistical methods include percentage, mean, standard deviation and t-test to measure and describe the relationships along with comparing the variables.

20.4 Analysis and Results The empirical study includes the variables for achieving Sustainable Development Goals and defines the descriptive statistical measure of percentage, mean, standard deviation and t-test on both countries. It explains the respondents strongly agree, agree, neither agree nor disagree, disagree and strongly disagree with the questionnaire.

20.4.1 Analysis Based on Percentage 1. Poverty Reduction The change in poverty is illustrated with the help of the diagram, as shown in Fig. 20.1. The analysis signifies that respondents of Morang district 10% strongly agree, 38.3% agree, total 48.3% positively agree on poverty reduction, whereas 31.7%

20 A Study of Microfinance on Sustainable Development

279

Fig. 20.1 Poverty reduction

neither agree nor disagree and 20% disagree. Similarly, Katihar district reveals 6.7% strongly agree, 36.7% agree, total 43.4% positively agree on poverty reduction, whereas 33.3% neither agree nor disagree and 23.3% disagree. Moreover, while comparatively, the clients of Morang district represent higher respondents on poverty reduction than Katihar district by 4.9%, although respondents disagree more by 3.3% in Katihar than Morang district. Thus, microfinance supports and has a positive impact on poverty reduction. 2. Women Economic Empowerment It shows the women participation contributing to economic empowerment with the help of Fig. 20.2. The respondents of Morang district show that 13.3% strongly agree, 43.3% agree, total 56.6% positively agree, whereas 36.7% neither agree nor disagree and few disagree. On the other hand, Katihar district represents 10% strongly agree, 36.7% agree, total 46.7% positively agree, but 43.3% neither agree nor disagree and 10% disagree. While comparing the results, most of the Morang district respondents represent a higher response than Katihar district by 9.9%, although respondents disagree more by 3.3% in Katihar rather than Morang district. Thus, microfinance strongly supports and has a positive impact on women economic empowerment. 3. Gender Equality It shows the change in gender equality in Fig. 20.3. It indicates from the respondents of Morang district that 6.7% strongly agree, 45% agree, total 51.7% positively agree, whereas 35% neither agree nor disagree and 13.3% disagree. Similarly, Katihar district exhibits 3.3% strongly agree, 41.7% agree, total 45% positively agree, whereas 36.7% neither agree nor disagree and 18.3% disagree. Further to analyse comparatively, Morang district shares higher

280

Fig. 20.2 Women economic empowerment

Fig. 20.3 Gender equality

S. Jhunjhunwala and P. Vaidya

20 A Study of Microfinance on Sustainable Development

281

Fig. 20.4 Increase in children enrolment in school, girls

respondents than Katihar district by 6.7%. Thus, microfinance strongly supports and has a positive impact on gender equality. 4. Increase in Children Enrolment in School, Girls Especially It shows the children enrolment in school, girls especially, with the help of Fig. 20.4. The results from the respondents of Morang district display that 13.3% strongly agree, 43.3% agree, total 56.6% positively agree on increasing children enrolment in school especially girls, whereas 36.7% neither agree nor disagree and 6.7% disagree. Similarly, Katihar district describes that 11.7% strongly agree, 38.3% agree, total 50% positively agree on increasing children enrolment in school especially girls, whereas 36.7% neither agree nor disagree and 13.3% disagree. While comparing, Morang district shares higher respondents than Katihar district by 6.6% and the respondents on disagreeing increased by 6.6% in Katihar than Morang district. Thus, microfinance actively supports and has a positive impact on increasing children enrolment in School, especially girls. 5. Improvement in Health Status The health condition of respondents is explained with the help of Fig. 20.5. Morang district reflects that 11.7% strongly agree, 36.7% agree, total 48.4% positively agree, but 46.6% neither agree nor disagree and few disagree. The evident of Katihar district shows 6.7% strongly agree, 50.0% agree, total 56.7% positively agree on improvement in health status, whereas 38.3% neither agree nor disagree and few disagree. On the other hand, the corresponding result shows Katihar district that higher respondents respond to improvement in health status than Morang district by 8.3%, but the respondents equally represent to disagree. Thus, microfinance strongly supports improvement in health status and has a positive impact on microfinance.

282

S. Jhunjhunwala and P. Vaidya

Fig. 20.5 Improvement in health status

20.4.2 Analysis Based on Mean and Standard Deviation The analysis through mean and standard explains the socio-economic factors from the area on five-point scale with the help of Table 20.1. The scale ranges in-between 1 to 5, where a score closer to 1 implies strongly agree and a score closer to 5 represents strongly disagree. The table shows that the average mean of Morang district on poverty reduction, women economic empowerment, gender equality and increase in children enrolment in school, especially girls, are less than average mean of Katihar district which provides evidence that respondents respond positively to Morang district than Katihar district. Thus, these factors undoubtedly agree to higher respondents in Morang than Katihar district. Moreover, the mean on improvement in health status is less in Katihar district, which explains higher respondents agree than Morang district. On the other hand, in Morang district increase in children enrolment in school especially girls and women economic empowerment have less standard deviation which explains low variability whereas poverty reduction and gender equality have a slightly more Table 20.1 Mean and standard deviation Morang

Nepal

Katihar

India

Factors

Mean

Std. deviation

Mean

Std. deviation

Poverty reduction

2.62

0.926

2.73

0.899

Women economic empowerment

2.37

0.802

2.53

0.812

Gender equality

2.55

0.811

2.70

0.808

Increase in children enrolment in school, girls

2.37

0.802

2.52

0.873

Improvement in health status

2.45

0.768

2.42

0.696

20 A Study of Microfinance on Sustainable Development

283

Table 20.2 Socio-economic factors t-test Socio-economic factors

t

df

Sig.

Poverty reduction

−0.701

118

0.484

Women economic empowerment

−1.131

118

0.260

Gender equality

−1.014

118

0.313

Increase in children enrolment in school, girls

−0.980

118

0.329

Improvement in health status

0.249

118

0.804

standard deviation. The bold mean in the table represents that the corresponding district agrees more in comparison to the other district.

20.4.3 Comparing the Socio-economic Factors Socio-economic factors explain the relationship in the developmental impact of microfinance in Katihar and Morang districts. The difference in outcome between the districts is analysed using independent samples, t-test statistical tool with the help of Table 20.2. It exhibits that poverty reduction, women economic empowerment, gender equality, increase in children enrolment in school, especially girls, and improvement in health status are non-significant at 5%. So, we do not reject the null hypothesis at P < 0.05 level of significance. Thus, it indicates that there is no significant difference in relationship in the developmental impact of microfinance between Katihar and Morang districts. This is because both the countries have homogeneous socioeconomic environment and similar geographic location which also impact the overall performance of microfinance.

20.5 Conclusion Sustainable development became a global when Yunus propounded microcredit movement in 1970 which contributed to the modern concept of microfinance. The study is evident that microfinance leads to Sustainable Development Goals where many respondents positively agree where some strongly agree. This increases the likelihood of reducing poverty, women economic empowerment, gender equality and increase in children enrolment in school especially girls. However, few disagree in Katihar because many people invest in a business with limited knowledge. This results in unaffordability to pay the interest and principal amount. Dowry is also the reason for credit in Katihar. But in Morang, few people do business who are aware of where and how to invest. Although people in Morang are cautious, they however take

284

S. Jhunjhunwala and P. Vaidya

loans mostly from commercial and development bank rather than other micro-credit institution, paying higher interest rates. They do so due to trust worthiness issue and are willing to bear higher cost. The study further contributes with a comparative analysis that in Morang district microfinance shares low mean and is more effective in poverty reduction, women economic empowerment, gender equality and increase in children enrolment in school, especially girls. On the other hand, in Katihar district, improvement in health status is more effective because people are aware of health hazard and proper government health services. However, the study indicates that there exists no significant relationship in the developmental impact of microfinance on both countries. Therefore, this article uncovers the new dimensions based on the contributions and limitations of the study. It also recommends for further research which should focus on the microfinance services, the access to outreach or other financial institutions.

References 1. United Nations Economic and Social Commission for Asia and the Pacific (2018) Achieving the sustainable development goals in South Asia: key policy priorities and implementation challenges. United Nations publication (UNESCAP)—SRO-SSWA—SDG report. Available: http://www.unescap.org/ 2. Worldbank (2018) Ending global poverty. Available: https://www.worldbank.org/ or http:// pubdocs.worldbank.org 3. Feiner SF, Barker DK (2006) Microcredit and women’s poverty. Dollar Sense 268:10–11 4. Kimotha M (2005) National microfinance policy framework and expected impact on the microfinance market in Nigeria. In: CBN proceedings of seminar in microfinance policy. Regulatory and supervisory framework for Nigeria 5. Ledgerwood J (1998) Microfinance handbook: an institutional and financial perspective. The World Bank 6. Littlefield E, Morduch J, Hashemi S (2003) Is microfinance an effective strategy to reach the millennium development goals? Focus Note 24(2003):1–11 7. Robinson MS (2001) The microfinance revolution: sustainable finance for the poor. The World Bank. Available: https://openknowledge.worldbank.org/ 8. Ebimobowei A, Sophia JM, Wisdom S (2012) An analysis of microfinance and poverty reduction in the Bayelsa state of Nigeria. Kuwait Chapter Arab J Bus Manag Rev 33(842):1–20 9. Government of India Planning Commission (2013) Annual report. Available: http:// planningcommission.gov.in/ 10. Paudel NP (2013) Socio-economic impact of microfinance in Nepal. J Manag Dev Stud 25(1):59–81 11. Miled KBH, Rejeb JEB (2015) Microfinance and poverty reduction: a review and synthesis of empirical evidence. Procedia-Soc Behav Sci 195:705–712 12. Van Rooyen C, Stewart R, De Wet T (2012) The impact of microfinance in sub-saharan Africa: a systematic review of the evidence. World Dev 40(11):2249–2262 13. Rutherford S (2006) Grameen II: the first five years 2001–2005: a grounded view of Grameen new initiative. MicroSave 14. Srinivasan G (2008) Review of post-tsunami Micro Finance in Sri Lanka, Colombo 15. Krishnaswamy K (2007) Over-borrowing and competition: are credit bureaus the solution? Microfinance Insights 34–36

20 A Study of Microfinance on Sustainable Development

285

16. Wright GA, Rippey P (2003) The competitive environment in Uganda: implications for microfinance institutions and their clients. MicroSave/FSDU/Imp-Act, Kampala 17. Imai KS, Arun T, Annim SK (2010) Microfinance and household poverty reduction: new evidence from India. World Dev 38(12):1760–1774 18. Pillai T, Nadarajan S (2010) Impact of microfinance-an empirical study on the attitude of SHG leaders in Kanyakumari district-Tamilnadu. Int J Enterp Innov Manag Stud 1(3):89–95 19. Rajendran K, Raya RP (2010) Impact of microfinance-an empirical study on the attitude of SHG leaders in Vellore district. Glob J Financ Manag 2(1):59–68 20. Swain RB, Wallentin FY (2009) Does microfinance empower women? Evidence from self-help groups in India. Int Rev Appl Econ 23(5):541–556 21. Malhotra A, Schuler SR, Boender C (2002) Measuring women’s empowerment as a variable in international development. In: Background paper prepared for the World Bank workshop on poverty and gender: new perspectives, vol 28 22. Mahmud S (2003) Actually how empowering is microcredit? Development and change 34(4):577–605. Available: https://onlinelibrary.wiley.com/doi/s/ 23. Sharma PR (2012) An overview of microfinance service practices in Nepal. J Nepal Bus Stud 7(1):1–16 24. Asemelash (2002) The impact of microfinance in Ethiopia: the case of DCSI in Ganta Afeshum Woreda of Eastern Tigray M.A. Thesis, Department of RLDS AAU 25. Pitt MM, Khandker SR (1998) The impact of group-based credit programs on poor households in Bangladesh: does the gender of participants matter? J Polit Econ 106(5):958–996 26. Wright G (2000) Microfinance systems: designing quality financial services for the poor. Zed Books 27. Gopalan SS (2007) Micro-finance and its contributions to health care access (a study of selfhelp groups (SHGS) in Kerala). Health and Population Department of Kerala on Health and Population, Kerala, pp 134–149 28. Stewart R, Rooyen C, de Wet T (2010) What is the impact of microfinance on poor people? A systematic review of evidence from sub-saharan Africa. Technical report. EPPI-Centre, Social Science Research Unit, University of London, London. Available: http://eppi.ioe.ac.uk/cms/

Chapter 21

Development of Decision Support System for a Paper Making Unit of a Paper Plant Using Genetic Algorithm Technique Rajeev Khanduja and Mridul Sharma

21.1 Introduction In the present world, to compete in the international market, high productivity for any industrial system is a must. The process industries are complex in configuration, used to process large amount of bulk product continuously, i.e., only few products in large volume. Various process industries like sugar, paper, fertilizer, paper etc. in which the high productivity goal with minimum parallel units, minimum losses and minimum failures cost is very difficult to achieve. Thus, the importance of reliable operation is must in these industries which can be obtained by taking corrective maintenance activities at proper place and at proper time. In this paper, a decision support system (DSS) using genetic algorithm (GA) has been proposed for the paper making unit. A paper plant is comprised of various engineering units like chipping, digesting, washing, bleaching, screening, stock preparation, paper making, etc. These units are connected in combined configuration. From the quality of paper point of view, this paper making unit is very important. In this plant, the chips from storage silos through belt conveyor and shuttle conveyor are fed to digesters (a large pressurecooking vessel) to convert them into the pulp. After that, the pulp is passed through various subsystems called knotter, decker, opener and washers. The chlorine gas is passed through the washed pulp for bleaching to get white pulp. The oversize and odd shape particles of white pulp are screened out after passing through screen and cleaner. After that, the pulp is prepared by adding various chemicals and fillers to enhance the paper properties and proper mixing is done in the tank. By using medium consistency pump, the pulp is spread evenly over an endless belt running between breast and couch rolls of the paper making machine. The pulp flows over a wire R. Khanduja (B) · M. Sharma Jawaharlal Nehru Government Engineering College, Sundernagar, Himachal Pradesh, India e-mail: [email protected] M. Sharma e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 P. K. Kapur et al. (eds.), Strategic System Assurance and Business Analytics, Asset Analytics, https://doi.org/10.1007/978-981-15-3647-2_21

287

288

R. Khanduja and M. Sharma

Fig. 21.1 Schematic flow diagram of paper making unit

mat which is moving over the rollers. The water from the pulp is sucked by vacuum pumps which are arranged in parallel, and paper is produced in the form of sheets. This wet paper is then sent to pressers and dryers to iron out any irregularities and to get smooth paper. The dried paper sheets are finally rolled in the form of the rolls. The process flow diagram of paper making unit is shown in Fig. 21.1.

21.2 Literature Review In the literature, several techniques have been used to analyze the reliability and availability of the system which includes Petri Nets (PN), Markov modeling, Monte Carlo simulation, fault tree analysis (FTA) and reliability block diagram (RBD) proposed by [1–12]. They presented the literature mainly on analysis of chemical system reliability published over 25 years. It was concerned with system reliability of various chemical industries. They stressed on system reliability, availability and maintainability in chemical process plants as the chance of failure of chemical plants was much high. Singh [13] used the Markov modeling to evaluate reliability parameters for a biogas plan. Kumar et al. [14–16] evaluated the performance of paper plant using the Markov modeling by assuming the failure and repair rates of paper plant systems to be constant. Gupta et al. [17] evaluated the reliability of butter manufacturing system in a diary plant by forming the difference–differential equations by using of Markov birth– death approach. These difference–differential equations are solved by the Runge– Kutta method to calculate the long-run availability by applying recursive method and mean time between failure using numerical technique. Kumar et al. [18] have discussed the availability of carbon dioxide cooling system of a fertilizer plant.

21 Development of Decision Support System for a Paper …

289

Ravikanth et al. [19] designed the decision support system software on cost analysis in a thermal plant, in which they considered different types of costs like cost of coal consumption, oil consumption, operation, maintenance, etc. Gupta et al. [20] dealt with the performance matrices and decision support system for a water feed system of thermal power plant, and the models are developed on the basis of Markov approach. In this, the performance matrices are formed for different combinations of failure and repair rates for all subsystems which helps to decide about repair priority for proper execution of maintenance plan in order to improve the performance of the system of the plant concerned. Khanduja et al. [21] have evaluated the availability of bleaching system of a paper plant and developed the various performance levels of screening system in a paper plant. Michael et al. [22] discussed the availability evaluation of transmission system of Goa Electricity Department by using Markov model. In this, algorithm on the MATLAB has been developed to solve the Laplace transition matrix involved in stochastic modeling. Deb [23] has described the various optimization approaches related to the engineering problems in his book. Tewari et al. [24, 25] explained the decision support system of refining unit of sugar plant. They evaluated the availability for the refining unit with standby subsystems for sugar industry. They also discussed the mathematical modeling and behavioral analysis for the refining system by using genetic algorithm. Juang et al. [26] optimized the availability level for a parallel–series system which helps to find component’s mean time between failure (MTBF) and mean time to repair (MTTR) in most economical manner by using genetic algorithm. Khanduja et al. [27, 28] described the mathematical modeling and performance optimization for the screening unit and paper making system in a paper plant using GA. Tewari et al. [29] developed the model to optimize the performance of a crystallization unit of a sugar plant and stock preparation unit of a paper plant by using genetic algorithm, which provide the optimum availability of both the units. In this research paper, the mathematical model has been developed to evaluate the various availability levels of paper making unit of a paper plant. After using genetic algorithm, the availability optimization has been calculated, which gives the optimum availability of paper making unit of the paper plant. So with the help of genetic algorithm, decision making is done by developing decision support system to decide about repair priority for the subsystems for a paper making unit. Decision support system is helpful for taking the maintenance/repair decisions timely. After discussion with plant executives, the findings of this paper are highly beneficial for the futuristic maintenance planning and to improve the performance of the unit concerned.

21.3 Assumptions The assumptions used in the probabilistic model are: (i)

Failure/repair rates are constant over time and statistically independent.

290

R. Khanduja and M. Sharma

(ii)

A repaired unit is as good as new regarding performance-wise for a specified duration. (iii) Sufficient repair facilities are provided, so there is no waiting time to start the repairs. (iv) Standby units (if any) are of the same nature and capacity as the active units. (v) System failure/repair follows exponential distribution. (vi) Service includes repair or replacement. (vii) System may work at a reduced capacity. (viii) There are no simultaneous failures among units.

21.4 Notations The following notations are associated with the transition diagram of the paper making unit: G1 , G2 , G3 , G4 : Represent good working states of respective wire mat, synthetic belt, roller, vacuum pump. g1 , g2 , g3 , g4 : Represent failed states of respective wire mat, synthetic belt, roller, vacuum pump. λ1 , λ2 , λ3 , λ4 : Mean constant failure rates of G1 , G2 , G3 , G4 . μ1 , μ2 , μ3 , μ4 : Mean constant repair rates of g1 , g2 , g3 , g4 . λ5 : Respective failure rate of steam supply. μ5 : Respective repair rate of steam supply. P 0 (t): Represents the state probability that the system is working at full capacity at time t. P i (t): Represents the state probability that the system is in the ith state at time t. Pi (t): Represents first-order derivative of the probabilities.

21.5 Paper Making Unit The paper making unit of a paper plant consists of four main subsystems arranged in series with the following description: I.

Subsystem G1 : This subsystem is a wire mat arranged in series and used to deposit the pulp on the top of wire mat. After that, the water from pulp is sucked by vacuum pumps. The failure of this subsystem results in complete failure of the unit. II. Subsystem G2 : This subsystem is a synthetic belt to support the fiber running through presser and dryer sections. The failure of synthetic belt results in the complete failure of the unit.

21 Development of Decision Support System for a Paper …

291

III. Subsystem G3 : This subsystem is having a number of rollers which are arranged in series to support the wire mat and synthetic belt to run on them smoothly. Failure of anyone of these rollers will cause complete failure of the unit. IV. Subsystem G4 : This subsystem is having six vacuum pumps arranged in parallel and used for sucking water from pulp through wire mat. Four pumps remain operating at a time, while remaining two pumps are in standby condition. Complete failure of the paper making unit takes place only when more than two pumps fail at a time. This paper making unit is associated with a common steam supply, and the failure in this supply needs emergency attention. Failure in steam supply affects the subsystems, resulting in further delay in operation of the unit. λ5 and μ5 are the failure and repair rates for this special failure.

21.6 Mathematical Modeling The mathematical model of paper making unit has been developed by using Markov birth–death process. The difference–differential equations are formed on the basis of probabilistic approach by using transition diagram. These equations are further solved recursively to determine the steady-state availability. The following differential equations associated with the transition diagram of the paper making unit are formed: The paper making unit consists of 26 states as: State 0 represents full capacity working with no standby. States 4 and 8 represent full capacity working with standby. States 1–3, 5–7, 9–12 and 0SF–12SF indicate that unit is in complete failed state as shown in Fig. 21.2. P0 (t) +

5  i=1

P4 (t)

+

5 

λi P0 (t) =

4 

μi Pi_27 (t) + μ5

i=1

λi P4 (t) + μ4 P4 (t) =

i=1

3 

P jSF (t) + μ5 P12SF (t)

(21.1)

j=0 4 

μi Pi_23 (t) + λ4 P0 (t) + μ5

i=1

7 

P jSF (t)

j=4

(21.2) P8 (t)

+

5  i=1

λi P8 (t) + μ4 P8 (t) =

4  i=1

μi Pi_19 (t) + λ4 P4 (t) + μ5

11 

P jSF (t)

j=8

(21.3) Pi (t) + λ5 Pi (t) + μ1 Pi (t) = λ1 P j (t) where i = 1, 5, 9 j = 0, 4, 8

(21.4)

Pi (t) + λ5 Pi (t) + μ2 Pi (t) = λ2 P j (t) where i = 2, 6, 10 j = 0, 4, 8 (21.5)

292

R. Khanduja and M. Sharma

Fig. 21.2 Transition diagram of paper making unit

Pi (t) + λ5 Pi (t) + μ3 Pi (t) = λ3 P j (t) where i = 3, 7, 11 j = 0, 4, 8 (21.6)  PiSF (t) + μ5

12  i=0

PiSF (t) =

12 

λ5 P j (t)

(21.7)

i=0

where i = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 With initial conditions at time

and

j

=

t = 0 Pi (t) = 1 for i = 0 = 0 for i = 0 Solution of Equations Steady-State Behavior The steady-state behavior of the paper making unit can be analyzed by setting P = 0 as t → ∞ in Eqs. (21.1)–(21.7), and solving these equations recursively, we get the following: P1 = C1 P0 P2 = C2 P0 P3 = C3 P0 P4 = L P0 P5 = C1 L P0 P6 = C2 L P0 P7 = C3 L P0 P8 = N L P0 P9 = C1 N L P0 P10 = C2 N L P0 P11 = C3 N L P0 P12 = M N L P0 P0SF = B P0 P1SF = BC1 P0 P2SF = BC2 P0 P3SF = BC3 P0 P4SF = B L P0 P5SF = BC1 L P0 P6SF = BC2 L P0 P7SF = BC3 L P0 P8SF = B N L P0 P9SF = BC1 N L P0 P10SF = BC2 N L P0 P11SF = BC3 N L P0 P12SF = B M N L P0

21 Development of Decision Support System for a Paper …

293

where, C1 = λ1 /(λ5 + μ1 ), C2 = λ2 /(λ5 + μ2 ), C3 = λ3 /(λ5 + μ3 ), B = λ5 /μ5 , L = λ4 /{λ4 + μ4 − (N ∗ μ4 )}, N = λ4 /{λ4 + μ4 − (M ∗ μ4 ) , M = λ4 /(λ5 + μ4 ) Using condition i.e. sum of all the state probabilities is equal to 12 Normalizing  one i=0 Pi + 12 j=0 P j S F = 1 gives the expression the probability of full capacity working (P0 ): P0 + C1 P0 + C2 P0 + C3 P0 + L P0 + C1 L P0 + C2 L P0 + C3 L P0 + N L P0 + C1 N L P0 + C2 N L P0 + C3 N L P0 + M N L P0 + B P0 + BC1 P0 + BC2 P0 + BC3 P0 + B L P0 + BC1 L P0 + BC2 L P0 + BC3 L P0 + B N L P0 + BC1 N L P0 + B C2 N L P0 + BC3 N L P0 + B M N L P0 = 1 P0 [(1 + C1 + C2 + C3 ) + L(1 + C1 + C2 + C3 ) + N L(1 + C1 + C2 + C3 + M) + B(1 + C1 + C2 + C3 ) + B L(1 + C1 + C2 + C3 ) +B N L(1 + C1 + C2 + C3 + M)] = 1 P0 (Z 1 + L Z 1 + N L Z 2 + B Z 1 + B L Z 1 + B N L Z 2 ) = 1 P0 [Z 1 (1 + L + B + B L) + Z 2 (N L + B N L)] = 1 P0 (Z 1 Y1 + Z 2 Y2 ) = 1 P0 = 1/(Z 1 Y1 + Z 2 Y2 )

(21.8)

where Z 1 = 1 + C 1 + C 2 + C 3 , Z 2 = 1 + C 1 + C 2 + C 3 + M, Y 1 = (1 + L + B + (B * L)), Y 2 = ((N * L) + (B * N * L)). Now, the steady-state availability (Av) of the paper making unit is given by adding all the full working and reduced capacity state probabilities. Av = P0 + P4 + P8 Av = P0 + L P0 + N L P0 Availability(Av) = [(1 + L + (N L))/((Z 1 Y1 ) + (Z 2 Y2 ))]

(21.9)

294

R. Khanduja and M. Sharma

21.7 Development of Decision Support System (DSS) of Paper Making Unit of Paper Plant Using Genetic Algorithm (GA) Decision support system (DSS) is a class of computerized information system that supports decision-making activities. In fact, it is a computer program application which helps the plant managers by providing the data, tools and models to take appropriate decisions. In this computer program, various tools and techniques can be mixed and mapped to solve various engineering problems. Here for the problem taken in this paper, i.e., DSS for paper making unit of a paper plant is done for maintenance strategy. The objective of decision support system developed is to decide about the relative maintenance priorities. Using GA the Decision Support System of the paper making unit is erized search and optimization algorithm based on the mechanics of natural genetics and selection. Genetic algorithm is a search and optimization technique for complex engineering optimization problems. Genetic algorithm is chosen for finding the optimum solution. Genetic algorithm mimics the principles of genetics and natural selection to constitute search and optimization procedures. Natural selection and survival of the fittest theory of Darwin are used as the basis for the algorithm. A computer program using the MATLAB software has been developed for genetic algorithm. The MATLAB is a software package for technical computing and visualization with hundreds of built-in functions for computation, graphics and animation. This software integrates computation, visualization and programming in an easy way where problems and solutions are indicated in familiar mathematical notation. Figure 21.3 shows different steps of a genetic algorithm to solve the optimization problem. The action of genetic algorithm to optimize the availability of the paper making unit of the paper plant in the present problem is stated as follows: 1 2 3 4 5 6 7 8 9

Identify objective function with all constraints. Prepare an M-file for the GA execution using MATLAB programming. Call the prepared M-file in GA tool. Select fitness function value, i.e., availability. Randomly specify the initial population size. Select the parents from the mating pool by various methods (selection). Perform crossover to produce offspring. Apply mutation to have more diversity in solution. Place the child strings to new population, and compute the fitness of each individual. 10 Replace old population with new populations. 11 Repeat steps number 5 to number 10 until the optimum value of the fitness function.

21 Development of Decision Support System for a Paper …

295

Start

IniƟal populaƟon generaƟon

IniƟalize generaƟon counter (t=1)

Fitness calculaƟon and GeneraƟon evaluaƟon No

GeneraƟon counter, t = t+1

Check terminaƟon criteria

ReproducƟon Yes Crossover

MutaƟon

Print result

Stop

Fig. 21.3 Flowchart for typical genetic algorithm technique

21.8 Analysis and Discussion The effect of the failure and repair parameters of each subsystem of the paper making unit is concerned with the performance evaluation of the paper plant. Here, genetic algorithm is proposed to coordinate simultaneously the failure and repair parameters of all the subsystems of a paper making unit to get optimum availability level. From maintenance record sheets of paper making unit of paper plant and after discussions with the plant personnel, the ranges of failure rates and repair rates of all the subsystems are selected. For the performance analysis of the paper making unit, the number of subsystem parameters, i.e., five failure parameters and five repair parameters, is selected. The genetic algorithm procedure is as follows: To solve the given problem, the chromosomes are coded in real structures. Here, binary crossover coding is used. Unlike unsigned fixed point integer coding, system parameters are mapped to a specified interval [X min , X max ] where X min and X max are the minimum and maximum values of the failure and repair rates. To construct a multi-parameter coding, each coded parameter has its own X min and X max as given in Table 21.1. Genetic algorithm processes population of strings. So for the present

λ1

0.001

0.005

Parameters

Minimum

Maximum

0.50

0.10

μ1 0.005

0.001

λ2 0.50

0.10

μ2

Table 21.1 Failure and repair rates of paper making unit parameters λ3 0.006

0.002

μ3 0.60

0.20

λ4 0.10

0.02

μ4 0.50

0.10

λ5 0.10

0.02

μ5 0.50

0.10

296 R. Khanduja and M. Sharma

21 Development of Decision Support System for a Paper …

297

research work, a population is constructed as an array of individuals. In application of genetic operators to an entire population at each generation, a new population is created, which is stored at different locations, thereby making it non-overlapping population, which simplifies the birth of the offspring and the replacement of parents. It is tried that system parameters are automatically adjusted to keep the system availability at an optimum level of performance. So, availability of the system is used as performance index, which is a function of the variable system parameters (λi , μi ). Maximum value of performance index (Av.) then corresponds to the optimum set of failure and repair rate values. This performance index of genetic algorithm is a fitness function, which is used to resolve the viability of each chromosome. All the system parameters, i.e., failure and repair rates, are optimized. In this, failure and repair rates are determined simultaneously for optimal availability for paper making unit. Effect of population size and maximum number of generations on the concerned unit availability is shown in Figs. 21.4 and 21.5. To get more precise results, trial sets are chosen for genetic algorithm and system parameters, as shown in Tables 21.2 and Availability Vs Number of GeneraƟons

0.94

Availability

0.935 0.93 0.925 0.92 0.915 0.91 0.905 0.9 0

20

40

60

80

100

120

100

120

Number of GeneraƟons (Paper Making Unit Availability)

Fig. 21.4 Effect of number of generations on fitness

Availability

Availability Vs PopulaƟon Size 0.945 0.94 0.935 0.93 0.925 0.92 0.915 0.91 0.905 0.9

0

20

40

60

80

PopulaƟon Size (Paper Making Unit Availability)

Fig. 21.5 Effect of population size on fitness

Availability

0.9060

0.9197

0.9220

0.9231

0.9242

0.9324

0.93694

0.93389

0.93373

Number of gen.

20

30

40

50

60

70

80

90

100

0.0012

0.0045

0.0046

0.0043

0.0037

0.0031

0.0039

0.0045

0.0021

λ1

0.2397

0.1694

0.2213

0.4406

0.2069

0.1126

0.2982

0.1543

0.4043

μ1

0.0037

0.0046

0.0031

0.0042

0.0047

0.0048

0.0024

0.0038

0.0043

λ2

0.1450

0.10

0.1256

0.3698

0.1

0.1718

0.1171

0.3510

0.1897

μ2

0.0051

0.0048

0.0058

0.0037

0.0055

0.0055

0.0058

0.006

0.0058

λ3

0.2610

0.2065

0.3718

0.4022

0.2855

0.2490

0.4697

0.2354

0.24

μ3

0.0929

0.0873

0.0989

0.0913

0.0969

0.0993

0.0866

0.0895

0.0976

λ4

0.1078

0.1057

0.1056

0.10

0.1140

0.1031

0.1125

0.1086

0.1

μ4

0.0967

0.0863

0.0930

0.0982

0.0726

0.0857

0.0958

0.0612

0.0996

λ5

0.1383

0.1622

0.1236

0.2871

0.1129

0.2314

0.2741

0.1236

0.1355

μ5

Table 21.2 Effect of number of generations on availability of the paper making unit using genetic algorithm (mutation probability = 0.015, population size = 80, crossover probability = 0.85)

298 R. Khanduja and M. Sharma

21 Development of Decision Support System for a Paper …

299

21.3. The availability (performance) level of the paper making unit is determined by using the designed values of the system parameters. The genetic algorithm is applied to optimize the availability of the paper making unit, and the values of corresponding failure and repair parameters are obtained. This has been done in two ways. (1) Firstly, the maximum number of generations is varied keeping population size, crossover probability and mutation probability constant. (2) Secondly, the population size is varied keeping maximum number of generations, crossover probability and mutation probability constant. Among the set of values obtained, the maximum value for availability is selected and hence the corresponding values of failure and repair parameters can be obtained. Graphs between availability versus number of generations as shown in Fig. 21.4 and availability versus population size as shown in Fig. 21.5 are drawn. The simulation for maximum number of generations is performed by varying from 20 to 100. It is clearly indicated that at number of generations 80, the optimum availability of the paper making unit is achieved, i.e., 93.694% as shown in Table 21.2, and the corresponding optimum values of failure and repair rates are λ1 = 0.0046, μ1 = 0.2213, λ2 = 0.0031, μ2 = 0.1256, λ3 = 0.0058, μ3 = 0.3718, λ4 = 0.0989, μ4 = 0.1056, λ5 = 0.0930, μ5 = 0.1236. Similarly, simulation has also been carried out by varying the population size from 20 to 100. At population size 70, the optimum value of unit’s performance (optimum unit availability) is 93.858% and the corresponding combination of failure and repair rates is λ1 = 0.0046, μ1 = 0.1486, λ2 = 0.0046, μ2 = 0.1769, λ3 = 0.00201, μ3 = 0.3026, λ4 = 0.10, μ4 = 0.1117, λ5 = 0.0867, μ5 = 0.1046 as shown in Table 21.3. The effect of number of generations and population size on the availability of the paper making unit (fitness) is also explained in Figs. 21.4 and 21.5. It is clearly indicated that to achieve the optimum availability, the corresponding values of repair and failure rates of all the subsystems of the unit should be properly maintained. The failure rates can be reduced by using good design, reliable machines, proper maintenance plan, providing standby subsystems, etc. Further, the repair rates can be improved by employing more workers for repair and by providing the better repair facilities. The maintenance workers should be properly trained so that repairs are done timely. These findings are discussed with the concerned paper plant executives. Such results are found very useful for the performance enhancement and hence to decide about the repair priorities of various subsystems of paper making unit in a paper plant.

21.9 Conclusions After developing the mathematical model, the performance optimization of the paper making unit of a paper plant has been done by use of GAT by selecting the best combination of failure and repair parameters to get an optimum availability level of

Availability

0.9048

0.9181

0.9182

0.9247

0.9245

0.93858

0.9377

0.9348

0.93454

Pop. size

20

30

40

50

60

70

80

90

100

0.0046

0.0049

0.0038

0.0046

0.0048

0.0045

0.0047

0.0034

0.0044

λ1

0.2224

0.10

0.1015

0.1486

0.1006

0.2707

0.1274

0.1304

0.10

μ1

0.0039

0.0048

0.0034

0.0046

0.005

0.005

0.0049

0.0039

0.0039

λ2

0.1356

0.1250

0.1205

0.1769

0.4704

0.2882

0.1567

0.3018

0.1011

μ2

0.0049

0.0056

0.0053

0.0020

0.0042

0.0038

0.0022

0.0042

0.0048

λ3

0.2769

0.230

0.2123

0.3026

0.3867

0.4149

0.2257

0.4039

0.3185

μ3

0.097

0.10

0.10

0.10

0.10

0.0963

0.10

0.0973

0.0748

λ4

0.10

0.10

0.1062

0.1117

0.1059

0.10

0.10

0.1195

0.1846

μ4

0.0769

0.099

0.0913

0.0867

0.0519

0.0958

0.0803

0.0426

0.0482

λ5

0.1148

0.4516

0.1784

0.1046

0.3062

0.3917

0.3861

0.1639

0.1040

μ5

Table 21.3 Effect of population size on availability of the paper making unit using genetic algorithm (mutation probability = 0.015, number of generations = 80, crossover probability = 0.85)

300 R. Khanduja and M. Sharma

21 Development of Decision Support System for a Paper …

301

unit concerned. Besides, the behavior of system performance with respect to genetic algorithm parameters such as population size and maximum number of generations on the availability has also been discussed with the help of availability tables and plots. Then, the findings of this paper are discussed with the plant executives. These results are found to be very useful to the maintenance managers in a paper plant, for advanced maintenance planning and control and hence taking the timely and corrective actions, so that the desired goals of maximum production, high availability and hence high profitability can be achieved. Besides, the behavior of system performance with respect to genetic algorithm parameters has also been analyzed with the help of availability tables and plots. Such DSS developed is of immense help for maintenance engineers to use failure and repair data and availability models to support decision making for maintenance purposes (to decide about repair priorities for a particular system). These DSSs are found to be very useful to the maintenance managers in a sugar mill, for advanced maintenance planning and control and hence taking the timely and corrective actions, so that the desired goals of maximum production, high availability and hence high profitability can be achieved.

References 1. Singer D (1990) A fuzzy set approach to fault tree and reliability analysis. Journal of Fuzzy Set and System 34:145–155 2. Bradley ML, Dawson R (1998) The cost of unreliability: a case study. Journal of Quality in Maintenance Engineering 4:212–218 3. Modarres M, Kaminsky (1999) Reliability engineering and risk analysis. Marcel Dekker Inc., New York 4. Bing L, Meilin Z, Kai X (2000) A practical engineering method for fuzzy reliability analysis of mechanical structures. Reliability Engineering and System Safety 67:311–315 5. Parveen SG, Yadav PO, Singh N, Chinnam RB (2003) A fuzzy logic based approach to reliability improvement estimation during product development. Reliability Engineering and System Safety. 80:63–74 6. Cochran JK, Murugan A, Krishnamurthy V (2000) Generic Markov models for availability estimation and failure characterization in petroleum refineries. Computers and Operation Research 28:1–12 7. Arhur N (2004) Dairy processing site performance improvement using reliability centered maintenance. In: Proceedings of IEEE annual reliability and maintainability symposium, pp 521–527 8. Gandhi OP, Sehgal R, Angra S (2003) Failure cause identification of tribo-mechanical system. Reliability Engineering and System Safety. 65:259–270 9. Adamyan A, Dravid H (2004) System failure analysis through counters of Petri nets. Journal of Quality and Reliability International 20:317–335 10. Panja SC, Ray PK (2007) Reliability analysis of track circuit of Indian railway signaling system. Int J Reliab Saf 1:428–445 11. Bhamare SS, Yaday OP, Rathore A (2008) Evolution of reliability engineering discipline over the last six decades: a comprehensive review. Int J Reliab Saf 1:377–410 12. Dhillon BS, Singh C (1981) Engineering reliability-new techniques and applications. Wiley, New York 13. Singh J (1989) Reliability analysis of a bio gas plant having two dissimilar units. Microelectron Reliab 29:779–781

302

R. Khanduja and M. Sharma

14. Kumar D, Singh IP, Singh J (1988) Reliability analysis of the feeding system in the paper industry. Microelectron Reliab 28:213–215 15. Kumar D, Singh J, Pandey PC (1989) Availability analysis of the washing system in the paper industry. Microelectron Reliab 29:775–778 16. Kumar D, Singh J, Pandey PC (1993) Operational behavior and profit function for a bleaching and screening system in the paper industry. Microelectron Reliab 33:1101–1105 17. Gupta P, Lal A, Sharma R, Singh J (2005) Numerical analysis of reliability and availability of the series processes in butter oil processing plant. Int J Qual Reliab Manag 22:303–316 18. Kumar S, Tewari PC, Sharma R (2007) Simulated availability of CO2 cooling System in a fertilizer plant. Industrial Engineering Journal (Indian Institution of Industrial Engineering, Mumbai) 36:19–23 19. Ravikanth D, Subba Rao C, Rajagopal K (2008) A case study on decision support system for cost analysis in a thermal plant. Industrial Engineering Journal 1:19–25 20. Gupta S, Kumar A, Sharma R, Tewari PC (2008) A performance modeling and decision support system for a feed water unit of a thermal power plant. S Afr J Ind Eng 19:125–134 21. Khanduja R, Tewari PC, Kumar D (2008) Availability analysis of bleaching system of paper plant. Journal of Industrial Engineering, Udyog Pragati, N.I.T.I.E. Mumbai (India) 32:24–29 22. Michael S, Mariappan V, Amonkar Uday J, Telang AD (2009) Availability analysis of transmission system using Markov model. Int J Indian Cult Bus Manag 2:551–570 23. Deb K (1995) Optimization for engineering design: algorithms and examples. Prentice Hall of India, New Delhi India 24. Tewari PC, Kumar D, Mehta NP (2000) Decision support system of refining system of sugar plant. J Inst Eng (India) 84:41–44 25. Tewari PC, Joshi D, Sreenivasa Rao M (2005) Mathematical modeling and behavioral analysis of a refining system using Genetic Algorithm. In: Proceedings of national conference on competitive manufacturing technology and management for global marketing, Chennai, pp 131–134 26. Juang Y-S, Lin S-S, Kao H-P (2008) A knowledge management system for series-parallel availability optimization and design. J Expert Syst Appl 34:181–193 27. Khanduja R, Tewari PC, Chauhan RS (2009) Performance analysis of screening unit in a paper plant using genetic algorithm. J Ind Syst Eng 3:140–151 28. Khanduja R, Tewari PC, Chauhan RS, Kumar D (2010) Mathematical modeling and performance optimization for paper making system of a paper plant. Jordan J Mech Ind Eng 4:487–494 29. Tewari PC, Khanduja R, Gupta M (2012) Performance optimization for the crystallization unit of a sugar plant using genetic algorithm technique. J Ind Eng Int, Tehran 8:01–06 30. Dhillon BS, Rayapati SN (1998) Chemical system reliability: a review. IEEE Trans Reliab 37:199–208 31. Khanduja R, Tewari PC, Kumar D (2008) Development of performance evaluation system for screening unit of paper plant. International Journal of Applied Engineering Research 3:451–460 32. Goldberg DE (2001) Genetic algorithm in search, optimization and machine learning. Pearson Edition Asia 33. Khanduja R, Tewari PC (2013) Performance modeling and optimization for the stock preparation unit of a paper plant using genetic algorithm. Int J Qual Reliab Manag 30:480–494

Chapter 22

The Challenge of Big Data in Official Statistics in India Pushpendra Kumar Verma and Preety

22.1 Introduction In India, the Department of Science and Technology (DST) called for the preparation of a “Strategic Road Map for Big Data Analytics” in 2014. This marked the first explicit and strategic government measure in this regard. While India has not adopted a comprehensive strategy or policy to leverage big data for development, it is useful to collate and analyze academic research, news, press reports, etc. For the purposes of this report, we have analyzed academic articles, Government reports and reports from consultancies and industry, along with news reports that provide information on the latest developments with respect to big data and development. This literature has been understood in terms of each SDG in order to map the discourse surrounding each goal [1]. Big data is a new term that arises because of the large amount of data produced by industries that use the Internet. The concept of big data itself began to be delivered by industry analyst named Doug Laney. He conveyed the concept that big data consists of three important parts, namely volume, speed and variation [2]. The world is exchanging: previously empathetic of the aged chain of occasions was once considered as expertise; however now, its sense has bowed into being a functionality to forecast and influence the future, such as the capability to diminish terrible future results and enhance nice ones. In one of its developing forms, this science is regarded as big data. There is no difficult definition of big data. Foundation of United States National Scientist (the National Science Foundation) issued an action P. K. Verma (B) Associate Professor, Department of Computer Science, IIMT University, Meerut, Uttar Pradesh, India e-mail: [email protected] Preety Assistant Professor, Faculty of Management, S.V. Subharti University, Meerut, Uttar Pradesh, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 P. K. Kapur et al. (eds.), Strategic System Assurance and Business Analytics, Asset Analytics, https://doi.org/10.1007/978-981-15-3647-2_22

303

304

P. K. Verma and Preety

plan titled “Cyber infrastructure Vision for twenty-first Century Discovery” where one of the action plans is to take advantage big data in every effort develop knowledge, including science exact sciences and social sciences. This is the initial point the evolution and application of big data. The word “big data” literally means abundant volume of data quantity (terabytes or petabytes). There are several explanations of “big data.” Big data must meet “3Vs”: volume (quantity); variety (variety forms: documents, sound recordings, pictures, video, etc.): velocity (change in data fast because it comes from multiple nature sources). Big data must meet the element “4V” with add “value” which is important factor of big data, namely process find the meaning behind a group data. Meanwhile, hardware manufacturers and global computer software, IBM add the “veracity” category to refer to on the certainty of the accuracy of the data available. Broadly speaking, big dataspreads in the electronic room (one of them: Internet) from various sources complex where the process is done in stages (collection, organizing, storing up to analysis) and requires resources which is not cheap. Because it is so big the scale and complexity of the procedure, big data grouped into five categories, namely: (1) Data sources; (2) Content format; (3) Data stores; (4) Data staging; (5) Data processing. Social broadcasting big data is categorized because it is data sources or information that can be shared or exchanged (exchange) between individuals or groups (communities) via the URL. Data only social media, classify it into two types: historic data sets (data previously accumulated and stored) and real-time data.

22.1.1 Big Data Big data is a term that describes a data with a very large size or volume. The data in question can be structured data or unstructured data. Big data is now widely used in large-scale business. The existence of big data can facilitate the analysis of a business so that business can be better in the future. To greater apprehend and understand “Big Data,” we ought to go back to what talks about big data is many things to wonderful human beings, and it is crucial to understand it deeper before it can be put to use. Toward this end, we show up at how big data is altering the paradigms of social science research (and for that cause the lenses by way of which we become aware of the world) and comply with this with how big data tools and techniques are being used to make smart coverage and commercial enterprise choices [2].

22.1.2 Boyd and Crawford’s Explanation Shifting the world paradigm from traditional computing to increasingly sophisticated computing and significant improvements in integrated devices and sensors are possible creating a vision to live in an intelligent environment or better known as a

22 The Challenge of Big Data in Official Statistics in India

305

“smart environment.” Several forms of applications from the smart environment have been introduced recently including smart transportation, smart home, smart health and smart city, all of this is due rapid urban occupation growth and urbanization. At present, urban performance does not only depend on physical infrastructure but also on the availability and quality of knowledge about communication and social infrastructure. The main factor supporting the creation of smart city, one of which is Internet technology (IoT) where all connected to network technology. The big data phenomenon has been characterized by volume, speed and various types of data have been created at an increasingly sophisticated level. Big data offers the potential for cities to obtain important knowledge from large amounts of data collected from various sources. Big data also impacts new changes in science because big data is able to analyze, describe and predict communication patterns, behavior and even social and non-social issues such as crime, patterns of disease spread and so on. Rumata (2016) tries to give an opinion that opportunities exist big data including (1) Big data has the opportunity to give birth to new knowledge paradigms as well break through multidisciplinary research in computer science, statistics and social science designation “computational social science”; (2) Big data answers limitations of conventional data collection methods both qualitative and quantitative. So, social researchers can enrich the value of its survey findings (or other conventional data collection methods) with analysis of big data; (3) Big data offers solutions for cultural sociology scientists, especially, when analyzing the macro and microlevel between social actors and elements culture that has been difficult to do with conventional methods; (4) Big data produces a new branch of science namely “Digital Humanities” that can be used to explore human involvement in the development of technology, media and computational methods. In addition try to conclude that big data has challenges that must be of note for social researchers. Big data tends to cause “apophenia,” which is a condition in which researchers see a pattern or relationship from something that does not really exist (Boyd and Crawford 2011). Big data is still a heated debate among scientists. There is at least three debate topic related to big data. First, the challenges are related to the definition of big data terminology itself. If the definition of big data emphasizes quantity (amount), then how much data is needed for addressing social issues? The second challenge is methodology and theory. Argues that big data offers a “trade off” in social science research. Big data shows patterns of communication or social interaction online, across geographical, cultural and social. Data in computing systems is the code is coded, whereas discourse data is stated in the form of a collection of symbols which measured and/or eligible. In its development, data has a new form of digital data as information changed in digital form and can be used back, combined and analyzed to show patterns and trends for informing behavioral decisions [2]. In this case, big data opens opportunities for reading into various aspects of the world’s data has never been collected, or even previously not considered data. Based on observations that are done by Andrea De Mauro, focus existing big data definitions are divided into in four groups, namely: attributes of data, technology needs, overcoming thresholds and social impact. The first group, data attributes (attributes of data) emphasize on big data characteristics. Doug Lane

306

P. K. Verma and Preety

put forward that there is “3V” dimension increase in data includes volume (velocity), velocity(speed) and variety (variation) yang then expanded by expert opinion by adding other features from big data, like veracity, value (value), complexity and unstructuredness (complexity and on-structured). Volume consists of large amounts of data referring to the quantity of data being manipulated and analyzed to get results is desirable. Velocity is a data speeds that move in big data that is formed in real-time. Variety is a characteristic of being representing the type of data stored, analyzed and used. Veracity includes the possibility of data consistency good enough for big data, where there is a possibility of data in big data means, contains uncertainty and error and values the quality of the data stored and further use of it [3, 4]. The second group is necessity technology (technology needs) which emphasizing the technological needs behind processing large amounts of data. According to Microsoft, big data illustrates. The process by which “computing power a serious ‘applied to’ collection of very large information and often very complicated.” Next group, threshold (overcoming of thresholds). Some definition considers big data in terms of crossing the threshold. Dumbill suggests big data needed when processing capacity conventional database system is exceeded and alternative approaches are needed to process it. The last group is impact social (social impact) who see the influence of big data in society. Boy and Crawford define big data as “cultural, technological and scientific” based on three elements: technology (i.e., power maximization computational and algorithmic accuracy); analysis (i.e., identifying patterns in sets big data) and mythology (i.e., Confidence that large data sets offering a form of superior intelligence brings an aura of truth, accuracy and objectivity). Data quality from data captured could vary and influence accurate analysis. Big data as a physical form of cyber using the 6C systemincludes: connection (sensor and network), cloud (computing and data requests), cyber (model and memory), content/context (meaning and correlation), community (sharing and collaboration), customization (personalization and value). In the process of delivering it, big data traveled at least seven stages, namely: collect (data collected from data sources and distributed in several nodes), process (the system then using powered by parallelism same height to do fast calculation of data on each node), manage (often big data which is being processed heterogeneously, so that need to be understood, defined), measure (the company measures the level at which data can be integrated with behavior customer), consume (data usage resulting from), store (process data storage), govern (governance data covering policy and data surveillance from a business perspective).

22.2 Objectives The Indian Statistical Organization shall be to provide, inside the decentralized shape of the system, trustworthy, well timed and credible social and economic statistics, to aid executive inside and outdoor the Government, motivate lookup and promote informed debate bearing on to situations moving people’s life.

22 The Challenge of Big Data in Official Statistics in India

307

22.3 Challenges Need to synchronize different data sources: When the data set becomes bigger and more diverse, there is a big challenge to combine it into a platform. If this is ignored, it will create a gap and lead to the wrong message and insight [2]. Absence of professionals who understand big data analysis: With the increase in exponential data, the demand for big data scientists and analysts is getting greater. It is important for companies to employ a data scientist who has diverse skills because the work of a data scientist is multidisciplinary [3]. Getting important insights through big data analysis: It is important for companies that only relevant departments have access to this information. Real-time data can help researchers and companies to provide the latest insights for decision making. Getting volume data into a big data platform: Companies need to handle large amounts of data every day. Uncertain landscape management data: With the emergence of big data, new technology and companies, the big challenge faced by companies in big data analysis is to find out which technology is most suitable for them without new problems and potential risks of data storage and quality. Popular data storage options such as data lakes/warehouses are generally used to collect and store large amounts of unstructured and structured data in its original format. Missing data, inconsistent data, logical conflicts and duplicate data all produce data quality challenges [1]. Data security and privacy: Big data also involves potential risks concerning data privacy and security. Big data tools are used for analysis and storage using different data sources. This ultimately leads to high risk e-data and makes it vulnerable. Increasing the amount of data also increases privacy and security issues.

22.4 Recommendations on the Use of Big Data in the Field of Official Statistics The World Working Group of the United Nations, as of its creation in 2014, put a strategy for a global program of use of big data in Official Statistics [5]. It proposes the practical use of data sources, stimulating in parallel the training in the area and the exchange of experiences. It includes in addition, the promotion of public confidence in the use of data from the private sector. Big data of the priorities established was the use of these data in the indicators of the Agenda 2030 for the Sustainable Development of the United Nations [6]. Different work teams assigned to specific tasks were established, which included: cellular phone, satellite images and data from social networks in order to develop active practicalities through pilot projects. Tasks were carried out to strengthen ties between the private sector and other communities working on the development of interim agreements for accessing data with suppliers of this information. At the same time promoted the benefits of the use of big data and the participation of developing

308

P. K. Verma and Preety

countries in the pilot projects. Reference guidelines were also established two to the training since it was necessary in the offices of Official Statistics in conjunction with guidelines related to methodological and quality issues that involve they went to several sectors. The members of this Working Group participated in several projects pilot and continued to develop an inventory of such projects [7, 8]. Among the progress made by the group, several promotional products were generated for the use of these data and established a set of principles related to access to the sources of big data. Preliminary classifications of these sources were established and also the definition of a quality framework for this type of data. Projects were carried out that they involved cell phone data and social network data. Initiatives were also supported are intended for the use of satellite images and spatial data to develop all that can respond to the proposed objectives. In 2015, a global survey on big data in Official Statistics was conducted with the aim of knowing the progress so far of the different statistical offices with respect to the use of this data. The national statistical offices were interred asked about their strategy and practical experience in this area. The questionnaire contained questions about the management, promotion and communication of the use of big data, access, privacy, technical and training needs regarding data as well as needs of statistical offices with respect to their use. They were also included specific questions about the different big data projects to those offices that they have been implemented [4]. A total of 93 countries answered the survey about half of the countries reported carry out big data projects. According to the offices consulted, they turned out to be “Statistics faster and more timely,” “Reduce of the burden for the respondent” and “Transformation of the statistical production process case,” followed by “New products and services” and “Decrease of costs.” As for the 115 projects mentioned, 42 turned out to be about cell phone data and 31 data extracted from the Web. Most of the offices established relationships with public institutes and academic and research organizations, with fewer links with company’s data providers. In those projects that passed to the production part, the big data complemented an existing data source. These projects were analyzed through traditional statistical methods, which reflect that the requirement for the analysis of big data was not as high or that the offices still did not have sufficient training for this information. The technical competencies to acquire more mentioned by the Official Statistics offices turned out to be an “expert on methodology with the Microdata,” “specialist in data science” and “specialist in models mathematics,” while the capabilities related to information technology. They were less mentioned. It is clear that new advanced techniques are needed for the analysis but not yet hired or trained personnel in the area. The conclusions regarding of this survey establish the need to encourage and provide more training in the area and promote the generation of pilot projects, involving developing countries. In a result, the World Working Group established training as its main priorities. The first International Conference on big data in Official Statistics carried out in 2014 in Beijing, China aimed to promote the practical use of these sources, trying to find solutions to their challenges, to stimulate capacity building and share experiences in this regard.

22 The Challenge of Big Data in Official Statistics in India

309

The second Conference International held in 2015 in Abu Dhabi, United Arab Emirates, showed the progress in the different projects. These included telephony data mobile, social and satellite networks. Progress was made regarding the dissemination of certain needs: training, access to data, establishment of links with other entities, quality and adequate methodology. Progress was also made on the way to communicate the importance of using big data more effectively. At the third Global Conference, held in Dublin, Ireland in 2016, a further step was taken focusing specifically on three aspects: (a) The data belonging to other bodies and the establishment of a successful association SA with data providers, (b) Strategies for building capacity for use of big data in the process of generating statistics and (c) The use of this data in the compilation of indicators for the Sustainable Development of the 2030 Agenda. On September 25, 2015, this agenda was adopted unanimously by all the member countries of the United Nations, constitutes a universal plan of action for people, the planet and prosperity and contains 17 Sustainable Development Goals (ODS) and 169 related goals to be achieved in the year 2030. It is referred to the needs of all people across the planet and also demands their contribution to achieve these objectives. To ensure compliance with this agenda, it is the construction of a solid set of SDG indicators is necessary, which requires intensive so methodological and technical work that represents a challenge even for the most advanced National Statistical System. The Interagency Group of Experts (IAEG) of the United Nations is composed of specialists in each of the issues that concern the SDGs and their goals and are responsible for the development of the methodology of each of the 232 indicated which will serve to evaluate and monitor the fulfillment of the goals until 2030. Then, the IAEG is in charge of publishing the metadata of each of the indicators, the compilation of the measurements made by the countries of the comparison between the and aggregation at a regional and global level. In this way, it is established that who will be responsible for measuring the indicators. This requires that the Official Statistics offices should be modernized by increasing the capacity of these entities. The big data is a more flexible, effective and efficient way to the new requirements and challenges, including the monitoring of these indicators of the Development Goals Sustainable. This includes the incorporation of new data sources in particular the use of big data, which has been little used in the production of Official Statistics. At the fourth Global Conference, held in Bogotá, Colombia, in November 2017, it was established that new advances were necessary for the work done to date can be used by the statistical community and this implies that data sources, be it big data, administrative records or traditional sources of Official Statistics data are treated jointly from a multi-sectoral approach trial. The challenge in itself is not to use big data sources, but to achieve collaboration in the use of data that is reliable, and this implies a close collaboration with the sector private sector, the academic community

310

P. K. Verma and Preety

and civil society. The topics of the conferences addressed issues such as how statistical offices can collaborate, company’s technology companies and the companies that own data so that they all benefit and results are achieved in a changing world where data is the most important source for the creation of welfare and development. They also focused on the experience gained, in collaborative projects, in relation to coverage, the inclusion (and exclusion) of participants, activities, management and financing. Other topics of interest referred to the how sensitive information can be shared in a confederated cloud of information given the regulatory frameworks of privacy and confidentiality and also how to use new tools and services and at the same time adapt the profiles of the new posts of work required for these tasks in the offices of Official Statistics [10].

22.5 Advances in the Use of Big Data in the Field of Official Statistics in India In India, the need for statistical data has increased manifold over the years due to populace growth, boom in entrepreneurial activity and delicensing and deregulation. The Government (at exceptional levels) is the producer of authentic data and is additionally the biggest user of statistics [6]. No office in the Government which produces reliable information does it for its personal consumption alone and no respectable consumer of statistics produces all these data that are required via it. Although production of official facts is in the respective domain of the concerned organizations, they need to be responsible to the consumer wants in recognize of nice and timeliness. An implicit requirement for the production of Official Statistics is access to data so sustained over time. Traditionally, this access has been achieved thanks to legislation statistics in particular through the laws of public statistics both nationally and European, which recognize the law of offices of statistics to have the data and the obligation of citizens (either individuals or legal) to provide such data on time and shape [2]. Most big data is generated, stored eaten and processed by private corporations as a result of its business activity, being data (i) with information about third parties (usually your customers), (ii) not prepared to statistical exploitation without processing prior and (iii) with a large volume and speed of generation. Institutional access in a sustained way presents a challenge from different angles. In pre instead, the legislation of the sector of corporate activity (which regulates the use of the information generated by the economic activity), statistical legislation (which grants the public statistical administration rights over data) and data protection legislation of personal character (which generically protects sensitive information of citizens). Second, the structure of these core big data producing potions is not homo-gender, and it is likely that while some put in an adequate information system to share this information, others can need investments for it. In some activities of economic networks, the core of the business is not the treatment of these data, but these are generated as a consequence of another major activity. This complicates

22 The Challenge of Big Data in Official Statistics in India

311

the same treatment as all must receive before the Public Administration if all of them must provide such data. Third, the operability of the process of data access when volume, speed of generation and the variety of its structure are so tall gets complicated. It is certainly not what same administer an interview through paper, electronic or telephone questionnaire that access information volumes of the order of the Terabytes. If, in addition, these data must be combined with official data to improve estimation processes, the situation becomes even more complex. Finally, in some cases, it is necessary tout in a computer infrastructure to extract the information and preprocess it for subsequent statistical exploitation, as well as personal qualified is to carry out. Recent recommendations for access to Private Organizational Data Nations for Official Statistics(currently under development), recognizing the right of statistical offices to data for free, they observe scrupulously, so the costs and effort required by private corporations promoting Verve equity between both sides [6, 9]. Core statistics, in most of the cases, require statistics drift among distinct Central Ministries, state-level establishments and sub-state-level organizations. For this purpose, producers of these facts at the center and in the States/UTs are established on every other. No workplace should find the money for to end producing sure data simply for the purpose that it does now not use them or to bypass the necessities of user departments in recognize of nice and timeliness [11]. Big data can also be used by the Indian government in processing statistical data. In fact, there are already studies that prove the extent to which big data contributes to the processing of statistical data by the Government of India.

22.6 Discussion Important big data is being produced through Government divisions in India already. Department of Science and Technology, Government of India, has added plans to receipt big data lookup beforehand in the Indian context, which encompass financial aid for companies taking up such tasks. However, nonstop exertion shall be wished for a lengthy period of time in the past than some success testimonies of giant records studies, and their penalties are visible. One of the primary principles related to the use of big data in Official Statistics is complement and enhance the estimates of the methods currently used, presenting faster and larger timely statistics, and in many cases, decreasing the burden on the steeply priced. But there are difficulties that want to be confronted and for which you want to find out to come up with a reply in the years to come. The big data sources are available to be used but are not designed by the Official Statistics offices, and therefore, their structure must be revised and understood before its use for statistical analysis. There are other potential challenges, such as the presence of lost data, which is not alien to the sources of big data, for example, those produced by the failure of a sensor, power cuts, or the fall of servers. The processing time can be extended too much need to provide frequent publication statistics. In many cases, this problem can be evaluated through comparisons of characteristics between the covered

312

P. K. Verma and Preety

population and the target population. Which is difficult since these characteristics are rarely available in the macrodata? New capacities are required, and the description of the necessary positions should redefine, certainly a high level of specialization is necessary in the treatment of data sources to be usable for statistical analysis. The term “scientist of data” is used to describe a person with the aforementioned skills combined with knowledge of statistical methodology [11, 10]. When talking about big data, it is inevitable to mention the technologies involved for storage and processing the necessary technological framework to deal with this type of data is clearly more complex than information systems regulars of a statistical office. This means that the Official Statistics, in case that the business model goes through processing data. In offices, you need to invest in infrastructure necessary structure to store and process the volumes of data. In either case, however, the Official journalist must have the necessary capabilities Cesaris to know how to implement the processes, already are new or more traditional, in environments of distributed computing the official statistician will face the design and execution of algorithms in computer clusters. The key issue now is to make sure if the quality framework is still valid with the new you go to big data sources, especially due to the use of the new methodology, not only as for the audacity, but also as for other dimensions of quality such as punctuation quality, opportunity, comparability, burden answer. In the same line, it is imperative that the profile professional of the official statistician integrated of mane-unified statistical and computer training to meet the demands that the new proproduction stops will require. In short, Official Statistics must exceed a number of challenges to incorporate the new big data information sources to your production. These efforts are part of the process of moving derivation and industrialization of public offices data of statistics. In this context, Eurostat, in common with its Statistical System partners, developed a strategic response to meet these modernization challenges: the 2020 Vision 11. This strategic vision, adopted by the Statistical Committee in May 2014, identifies five key areas of action (users, quality, new data sources, efficiency of the process production and dissemination). For its implementation, it has been launched a set of European projects, among the four you find one dedicated to big data to Official Statistics, focused on the realization of pilot projects with specific big data sources [6, 7].

22.7 Conclusions The incorporation of data from big data represents a great opportunity in the Official Statistics but also a great challenge. The recommendations on the use of big data in this area focus on several aspects and have become more specific with the advance of the years but can be summarized in the following points: (a) access to data accesses that belong to other agencies (b) establishment of a successful association and collaboration with the data providers, (c) development of practical activities through project pilot and (d) construction of appropriate methodology for the use of big data in the

22 The Challenge of Big Data in Official Statistics in India

313

case of generation of Official Statistics. The INDEC has taken important first steps in the area following these recommendations through the signing of the agreement on cooperation thematic innovation with the CBS in 2017, which can foster the generation of pilot projects in the area. The discussion held within the framework of the firm of the agreement on the implementation of big data in Official Statistics, with exhibition specialists from the public and private sectors, where generators participate big data certainly contributes to the establishment of a collaborative partnership with providers of data and the possibility of accessing them through agreements between different organisms. Even so, the challenge of using big data in Official Statistics in Argentina is still immense. It demands intense methodological and technical work and must attend the topics such as the training of personnel in the necessary methodologies, the creation of specific posts for the incorporation of big data sources in the production of Officials from a multisectoral approach where it complements the use of administrative and traditional data sources. Summarizing learnings from modern day functions of large data during the world and identifying the initiatives embraced with the useful resource of governments of affluent nations, we highlight the big practicable that huge data clasps and can explain for the central and kingdom Governments in India. We additionally recommend phases that can be occupied in India in the path of insurance formulation, jail frameworks enactment, infrastructure upgradation and genius pool creation for the country to reap from big data structures and technology. If modernization and industrialization are already a necessitate before the exhaustion of the product model traditional, the inclusion of big data in the processation of production is a necessity in the face of risk that statistical offices lose relevance in the production of statistics. This is a cambio of culture in Public Statistics that requires a strengthening its collaboration with the private sectors and academic and surely coordinated design of an access and reuse strategy of this data by the Public Administrations.

References 1. Abdulkadri A, Evans A, Ash T (2016) An assessment of big data for official statistics in the Caribbean: challenges and opportunities. Retrieved from http://repositorio.cepal.org/bitstream/handle/11362/39853/S1501378_en.pdf;jsessionid= 3727E954D3F58C63721F89230EBBE91E?sequence=1 2. Hackl P (2016) Big data: what can official statistics expect? Stat J IAOS 32(2016):43–52 3. Boettcher I (2015) Automatic data collection on the Internet (web scraping). Statistics Austria. Retrieved http://www.stat.go.jp/english/info/meetings/og2015/pdf/t1s2p6_pap.pdf 4. Caserta J (2016) Big data at a turning point. The internet of things: a huge opportunity, wrapped in risk. Big data Quarterly Spring 2016, pp 4–5. Unisphere Media-a Division of Information Today, Inc. 5. Taylor L, Schroeder R (2014) Is bigger better? The emergence of big data as a tool for international development policy. English. Geo J 1–16. ISSN: 0343-2521. https://doi.org/10.1007/ s10708-014-9603-5 6. EMC Education Services (2015) Data science and big data analytics: discovering, analyzing, visualizing and presenting data. Wiley, Indianapolis

314

P. K. Verma and Preety

7. Florescu D et al (2014) Will ‘big data’ transform official statistics? Retrieved from http://www. q2014.at/fileadmin/user_upload/ESTAT-Q2014-BigDataOS-v1a.pdf. Global Pulse (2012) Big data for development: challenges & opportunities. Retrieved from http://www.unglobalpulse. org/sites/default/files/BigDataforDevelopmentUNGlobalPulseJune2012.pdf 8. Russom P (2013) Managing big data. TDWI best practices report, 4th Quarter 2013. Retrieved from https://www.sas.com/content/dam/SAS/en_us/doc/whitepaper2/tdwi-managing-bigdata106702.pdf. Struijs P, Daas P (2014) Quality approaches to big data in official statistics. Paper presented at the European conference on quality in official statistics (Q2014), Vienna, Austria. Retrieved from https://unstats.un.org/unsd/trade/events/2014/beijing/documents/ other/Statistics% 20Netherlands%20%20Quality%20Approaches%20to%20Big%20Data.pdf 9. Schintler LA, Kulkarni R (2014) Big data for policy analysis: the good, the bad, and the ugly. Rev Policy Res 31(4):343–348. ISSN: 1541-1338. https://doi.org/10.1111/ropr.12079 10. Salsburg D (2001) The lady tasting tea: how statistics revolutionized science in the twentieth century. Macmillan 11. Tam S-M, Clarke F (2014) Big data, official statistics and some initiatives by the australian bureau of statistics. Paper presented at international conference on big data for official statistics, Beijing, China, 28–30 Oct 2014. Retrieved from https://unstats.un.org/unsd/trade/events/ 2014/Beijing/documents/other/Australia%20Bureau%20of%20Statistics%20%20Some% 20initiatives%20on%20Big%20Data%20-%2023%20July%202014.pdf. UN Global Pulse (2014)

Chapter 23

CUDA Accelerated HAPO (C-HAPO) Algorithm for Fast Responses in Vehicular Ad Hoc Networks Vinita Jindal and Punam Bedi

23.1 Introduction In vehicular ad hoc network (VANET), nodes are taken from the vehicles and the distance between these nodes on underlying roads is taken as edges. The VANETs are facing routing as a major research issue as it involves with the reduction of the travel time on roads by avoiding congestion en route. This congestion results in the more travel time for the commuters and results in remarkable strain causing various health hazards like skin diseases, allergies, asthma, visual impairments and lung-related diseases for the people in the city. No one wants to devote more time on the roads waiting to complete their journey. If congestion en route can be detected in advance, then the people can avoid the congested routes and accordingly plan the journey [1]. Earlier studies show that passengers are like ants that hate the congestion; thus, they can also prevent the en route congestion after taking alternative routes like ants [2]. Dorigo et al. presented an ant colony optimization (ACO) algorithm, inspired by the ant’s way of searching for the food [3, 4]. Ants make a trail of pheromone, i.e., a chemical substance on their path. In ACO, ants communicate through pheromone, and this is used to construct the paths between various nodes in the network. Artificial ants are also communicating by copying the real ant’s behavior. These artificial ants are competent to find significantly better results in various real-time complex problems. For any given problem, the value of pheromone can be adapted for finding an optimal solution. Modified ant colony optimization (MACO) is a kind of ACO algorithm presented to decrease the total travel time by altering the attraction by the repulsion in the pheromone value [5]. V. Jindal (B) Department of Computer Science, Keshav Mahavidyalaya, University of Delhi, New Delhi, India e-mail: [email protected] P. Bedi Department of Computer Science, University of Delhi, New Delhi, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 P. K. Kapur et al. (eds.), Strategic System Assurance and Business Analytics, Asset Analytics, https://doi.org/10.1007/978-981-15-3647-2_23

315

316

V. Jindal and P. Bedi

Particle swarm optimization (PSO) algorithm is motivated by the behavior of fishes school or birds flock [6, 7]. The MACO algorithm is facing the local optima problem, slow and/or premature convergence. The PSO algorithm used with the MACO algorithm can avoid the local optima and attain global optimum value with the faster convergence. The hybrid ant particle optimization (HAPO) algorithm combines the MACO and PSO algorithms for handling the congestion en route. The HAPO algorithm decreases the travel time for the journey [8]. MACO was proposed by the researchers with the assumption of working road conditions. But, in real time, this may not be true. Thus, to provide real-time simulation, the assumption of working road condition was eliminated in HAPO. The main aim of HAPO was to reduce the overall journey time. But, this algorithm uses a single CPU for its computations, and hence, the time taken during computation increases exponentially with an increase in the number of vehicles. Thus, for providing the quicker calculations and further reducing the total travel time, in this paper, we are proposing the CUDA accelerated HAPO (C-HAPO) algorithm. In the proposed work, computations are performed by multiple GPU cores simultaneously on NVIDIA architecture in order to provide fast results. The C-HAPO approach includes the different modules for route construction, pheromone evaporation, pheromone deposition and backtracking of the congested path. To the best of our knowledge, earlier researchers have provided the GPU implementation of the route construction only in the variants of ACO. In the present work, we have parallelized all the possible modules of the C-HAPO algorithm. The experimental results verify that the C-HAPO algorithm is significantly reducing the total time for the computations with an increase in the number of vehicles. The parallel algorithm has been implemented on NVIDIA GEFORCE 710 M GPU using CUDA 7.5 toolkit. The road network of the northwest Delhi was used for the experiments. The obtained results were compared with the obtained results from the existing non-parallel and parallel implementation of Dijkstra, ACO, MACO, algorithms with the non-parallel HAPO algorithm. It was found that by using the proposed C-HAPO algorithm, a noteworthy reduction in the travel time is achieved during peak hours. The rest of the paper is organized as follows: Sect. 23.2 describes the literature survey followed by description of C-HAPO algorithm in Sect. 23.3. Section 23.4 presents the experimental setup with the obtained results. Lastly, the conclusion is given in Sect. 23.5.

23.2 Literature Survey Ant colony optimization (ACO) was proposed in early 1990. In the networking domain, ACO is the most frequently applied algorithms. It is used for creating selforganizing methods that are used in various routing-related problems [4, 9]. Some of the well-known problems that make use of ant colonies are traveling salesman

23 CUDA Accelerated HAPO (C-HAPO) Algorithm …

317

problem and quadratic assignment problem [10]. Other domains that are using the ACO technique for optimization also exist in the literature [11, 12]. The idea used by the real ants can be applied in the traffic environment for finding an optimal path used by the vehicles and allowing communication between them through the pheromone. In traffic environment, pheromone is taken as an attribute of roads or lanes. Pheromone is updated by every vehicle during its journey on that road or lane. t helps vehicles in getting the current traffic status on roads and hence allows the indirect communication between them. The degree of congestion on every lane can also be obtained using the pheromone deposition value in order to help the vehicles for finding an uncongested route. Many researchers have parallelized the standard ACO using GPU techniques in their work [13–15]. There exist many approaches used in a traffic environment using ACO given by various researchers [2, 16, 17]. Turky et al. presented a genetic approach for both pedestrian crossing as well as a traffic light control in their work [18]. Other researchers have used ACO by link travel time prediction for finding routes to save travel time of the journey [19]. Jindal et al. have presented the MACO algorithm for decreasing the total travel time of the whole trip [5]. The MACO algorithm is a modification of classical ACO algorithm that is facing local optima problem with slow and/or early convergence. For avoiding the condition of local optima, many researchers have used the PSO algorithm in their work. The PSO algorithm proposed by Kennedy et al. also provides fast convergence speed and is used for the continuous optimization applications [20, 21]. Earlier researchers used the ACO algorithm for finding the local optimization, and for finding the global optimization, the PSO algorithm has been used in their work [22, 23]. The combination of PSO and ACO algorithms had developed in order to solve an extensive variety of optimization problems [10, 24]. One of the well-known applications of this combination includes TSP. To the best of our knowledge, this combination has not been used in real-time traffic simulation. The MACO algorithm is a modification of the ACO algorithm whose parameters are chosen through the random experiments. Further, MACO algorithm is facing the problem of local optima and premature or slow convergence. One needs a mechanism with both local and global optimization to overcome the abovesaid problems. Thus, the HAPO algorithm was proposed by researchers in which the PSO algorithm is used to optimize the parameters of the MACO algorithm dynamically in VANETs [8]. In the paper, we propose a parallelization for all the phases of the C-HAPO algorithm. This will decrease the time required for computations by the algorithm and help the driver to respond faster. The C-HAPO algorithm is based on the existing HAPO algorithm. The major shortcoming with the HAPO algorithm was that it used a single CPU, and in a vehicular environment with the large number of vehicles present, it takes a considerable amount of time for computations, and hence, the reaction time of the driver is also delayed resulting in either backtracking from the congested route or selecting the longer route. In the proposed work, we are using the parallel processing for all the phases of C-HAPO algorithm using the GPU in order to overcome the above-said problem.

318

V. Jindal and P. Bedi

The proposed approach differs mainly in four aspects from the earlier algorithms. First, real-time road conditions are considered, i.e., some roads may be obstructed for some time due to either some accident, emergency or ongoing construction work on the roads. This obstruction will be removed after some time when these conditions become usual. Second, while choosing the optimal path, traversing all the nodes between source and destination is not required. Greedy best-first search algorithm can be selected to find the best neighbor for reaching the destination. Third, value of the pheromone is updated dynamically using information provided by the PSO algorithm and hence ensuring a hassle-free traffic flow with faster convergence. Fourth, in order to provide faster computations, all the phases are running in parallel on GPU for the C-HAPO algorithm. The efficiency of the C-HAPO algorithm was tested using the real-time northwest Delhi map. The C-HAPO algorithm is discussed in the next section. In Fig. 23.2, we have presented the CUDA_CHAPO_KERNEL function with six device functions, namely Calculate_Distance, Adj_Nodes, Update_Pheromone, Evaluate_Route, Increase_Pheromone and Evaporate_Pheromone. All of these device functions will be called in parallel by the GPU kernel for fast execution. Initially, a random value to the pheromones is assigned to each edge by the kernel, and each thread is attached to each node in the network. The values of threshold and pheromone are used for choosing the next best node. The device function Calculate_Distance function is calculating the distances between various nodes from the obtained coordinates by using the standard distance formula. Then, the adjacent nodes are computed using the device function Adj_Nodes. Next, Update_Pheromone function is updating the pheromone value. Evaluate_Route is another device function that calculates the overall time taken for travel during the journey. The device functions Increase_Pheromone and Evaporate_Pheromone are updating the pheromone accumulation values for each parallel thread according to the present conditions.

23.3 Proposed CUDA Accelerated HAPO (C-HAPO) Algorithm The CUDA accelerated HAPO (C-HAPO) algorithm is an enhancement of HAPO algorithm. The HAPO algorithm uses PSO algorithm to optimize the parameters of the MACO algorithm. The main idea is to provide an optimal path for the given destination by combining the advantages of both MACO and PSO algorithms and hence avoiding the congestion en route [8]. The shortcoming with the HAPO algorithm was its running on a single CPU. In VANETs, there are enormous vehicles present on roads during peak hours, and hence, computations will take significant time in providing results. To provide faster computations, the C-HAPO algorithm using CUDA with GPU programming is being proposed in the paper. For the proposed C-HAPO algorithm, we use a graph matrix as input which is taken from the northwest Delhi road network. The host function CUDA_CHAPO is

23 CUDA Accelerated HAPO (C-HAPO) Algorithm …

319

Algorithm: CUDA_CHAPO (inp_graph, min_distance, blocked roads, opƟmal_route, pheromone_value) 1. Initially cudaMalloc function is allocating the GPU device memory. 2. Next, cudaMemcpy function is taking all the inputs from CPU to GPU. 3. Next, CUDA_CHAPO_KERNEL (inp_graph, min_distance, blocked roads, optimal_route, pheromone_value) function is being called on the device. 4. Next, cudaMemcpy function is taking all the results from GPU to CPU.

Fig. 23.1 CUDA_CHAPO algorithm

given in Fig. 23.1; all variables needed by the GPU device will be allocated memory by cudaMalloc function. Then, all variables will be copied from CPU host to GPU device by cudaMemcpy function with HostToDevice option. Then, a kernel function CUDA_CHAPO_KERNEL will be called on the grid. This kernel function will be called with five parameters, viz. graph, distance, blocked roads, route and pheromone, with all tasks running in parallel. After this, results need to be taken back from the GPU to CPU by using the cudaMemcpy function with the option DeviceToHost. Finally, cudaFree function will release the GPU memory. The speed of the computation will be increased after the parallelization of all the phases. This will give faster results and hence allow driver to think and react accordingly in a given situation and avoid unnecessary backtracking. Thus, it decreases the total time of travel for the commuters. Algorithm 1 given in Fig. 23.1 is a CPUenabled serial algorithm, and algorithm 2 given in Fig. 23.2 is a GPU-enabled parallel

Algorithm: CUDA_CHAPO_KERNEL (inp_graph, min_distance, blocked roads, opƟmal_route, pheromone_value) 1. First, the function Calculate_Distance (inp_graph, min_distance, optimal_route) is called on GPU to calculate the distance between the routes from any source to destination in the graph. 2. There exists a thread for all nodes in the graph. 3. Then pheromone for each edge is assigned with initial value which is randomly defined. 4. Next, the function Adj_Nodes (inp_graph, current_node, pheromone_value) is finding the adjacent nodes. 5. Next, for updating the value of pheromone using PSO algorithm, the function Update_Pheromone (inp_graph, optimal_route, pheromone_value) is used. 6. For avoiding the congestion, next node is found by the minimum pheromone_value and the value of the threshold. 7. For calculating the overall travel time, function Evaluate_Route (optimal_route, min_distance) is used. 8. Then to increase the pheromone deposition, function Increase_Pheromone (optimal_route, min_distance) is used. 9. In last to reduce the pheromone deposition., function Evaporate_Pheromone (optimal_route, min_distance) is used.

Fig. 23.2 CUDA_CHAPO_KERNAL algorithm

320

V. Jindal and P. Bedi

algorithm. In algorithm 1, at line 3, GPU kernel is being called by the CPU host and algorithm 2 starts running parallel on the GPU device. The next section describes the experimental setup and results.

23.4 Experimental Setup and Results For C-HAPO algorithm, we have used NVIDIA GeForce 710 M GPU with 2 GB dedicated VRAM. Programming is done using CUDA 7.5 toolkit with C language in Visual Studio 2013. The version of the NVIDIA graphic driver used is 361.43. The proposed C-HAPO algorithm is significantly reducing the computation time and reduces the total travel time for the commuters. The results obtained by running C-HAPO in parallel are compared with the results obtained with its counterparts. During the experiments, assumption of all working roads has been omitted. The simulations were performed on a northwest Delhi road network with different numbers of vehicles. The network snapshot used in experiments is shown in Fig. 23.3. All the four algorithms, Dijkstra, ACO, MACO and HAPO, have been implemented in CPU mode as well as GPU mode. The obtained results of all the eight algorithms in consideration in terms of total travel time are given in Table 23.1, and the corresponding graphical depiction is given in Fig. 23.4. It shows that after using the C-HAPO algorithm, there is a noteworthy decrease in the travel time for the commuters under similar environment. It is also concluded from the obtained results that after using the C-HAPO algorithm, the total travel time is reduced by 81%, 75%, 62%, 53%, 79%, 59% and 17% when compared with the results of the non-parallel and parallel implementations of

Fig. 23.3 Real-time northwest Delhi network

23 CUDA Accelerated HAPO (C-HAPO) Algorithm …

321

Table 23.1 Results obtained for C-HAPO algorithm No. of vehicles

Total travel time (in sec) Dijkstra

ACO

MACO

HAPO

Parallel Dijkstra

Parallel ACO

Parallel MACO

Parallel HAPO

100

542

478

410

375

478

342

203

112

200

648

576

504

411

549

389

249

135

300

825

704

578

498

734

504

273

167

400

963

799

651

569

862

573

298

173

500

1027

882

759

584

912

615

323

186

600

1169

971

820

685

1126

751

382

209

700

1421

1083

957

843

1364

923

439

217

800

1564

1245

1068

897

1448

967

467

256

900

1676

1327

1195

932

1570

1038

498

294

1000

1824

1540

1239

997

1798

1156

557

325

1100

2138

1821

1304

1009

1895

1232

581

371

1200

2267

1986

1328

1027

2042

1349

651

427

1300

2375

2053

1532

1123

2189

1398

697

469

1400

2463

2167

1663

1299

2337

1435

729

507

1500

2592

2265

1792

1362

2475

1584

756

538

1600

2769

2384

1896

1497

2632

1626

781

571

1700

2824

2412

1968

1518

2779

1723

824

602

1800

3021

2538

2026

1549

2926

1867

895

635

1900

3283

2679

2075

1594

3074

1928

926

683

2000

3397

2760

2129

1615

3221

2079

957

714

2100

3573

2863

2147

1634

3372

2152

973

736

2200

3748

2974

2238

1656

3523

2245

992

768

2300

3926

3095

2255

1678

3665

2326

1028

792

2400

4085

3182

2278

1735

3817

2418

1059

815

2500

4259

3326

2394

1750

3926

2517

1084

837

2600

4432

3477

2443

1847

4118

2639

1116

869

2700

4604

3561

2498

1906

4234

2693

1137

926

2800

4780

3643

2561

1948

4342

2751

1162

948

2900

4951

3788

2618

1959

4460

2842

1183

954

3000

5116

3879

2642

1987

4689

2925

1209

971

3100

5287

4014

2723

2016

4753

3014

1245

998

3200

5455

4118

2765

2158

4806

3088

1268

1017

3300

5638

4236

2834

2265

4947

3129

1291

1035

3400

5809

4352

2892

2313

5038

3193

1326

1062

3500

5973

4467

2937

2392

5232

3265

1347

1123

322

V. Jindal and P. Bedi

Travel Time (in Sec) 7000

Travel Time (in sec)

6000 5000 4000 3000 2000

Number of Vehicles

3300

3500

2900

3100

2700

2500

2100

2300

1700

1900

1500

1300

1100

700

900

500

300

0

100

1000

Overall travel Ɵme (in sec) Dijkstra Overall travel Ɵme (in sec) ACO Overall travel Ɵme (in sec) MACO Overall travel Ɵme (in sec) HAPO Overall travel Ɵme (in sec) Parallel Dijkstra Overall travel Ɵme (in sec) Parallel ACO Overall travel Ɵme (in sec) Parallel MACO Overall travel Ɵme (in sec) Parallel HAPO

Fig. 23.4 Graphical representation of result for C-HAPO algorithm

the existing Dijkstra, ACO and MACO algorithms with the non-parallel HAPO algorithm, respectively, with 3500 vehicles. Thus, the C-HAPO algorithm is decreasing the total travel time and helping the people to complete their journey faster.

23.5 Conclusion Congestion is caused by the exponential growth in the vehicles and the limited road capacity in VANETs. The major drawback of congestion is that overall travel time gets increased during a journey as well as it has a very bad effect on the health of commuters. To overcome these problems, one needs some mechanism through which decision-making process becomes faster to avoid the congestion en route that can decrease the overall time for travel and helps commuters in reaching the destination faster. HAPO algorithm described in the literature is used to reduce the congestion en route. But, HAPO algorithm is using a single CPU and hence is slow in execution due to its serial implementation. In the paper, a CUDA accelerated HAPO (C-HAPO) algorithm is presented that is running on the latest NVIDIA architecture with CUDA 7.5 toolkit. The C-HAPO algorithm successfully provides the faster computations that enable commuter to respond faster, and hence, the total travel time is decreased. Obtained results by using the C-HAPO algorithm were compared to its counterparts. It was found that the C-HAPO algorithm usage is reducing the total travel

23 CUDA Accelerated HAPO (C-HAPO) Algorithm …

323

time by 81%, 75%, 62%, 53%, 79%, 59% and 17% when matched with the obtained results of the non-parallel versions of Dijkstra, ACO, MACO, HAPO algorithms and the parallel versions of the Dijkstra, ACO and MACO algorithms, respectively, with 3500 vehicles. Thus, it was concluded that the presented C-HAPO algorithm outperforms all the algorithms in consideration during peak hours.

References 1. He J, Shen W, Divakaruni P, Wynter L, Lawrence R (2013) Improving traffic prediction with tweet semantics. In: Twenty-third international joint conference on artificial intelligence, (IJCAI-2013), pp 1387–1393 2. Bedi P, Mediratta N, Dhand S, Sharma R, Singhal A (2007) Avoiding traffic jam using ant colony optimization—a novel approach. In: International conference on computational intelligence and multimedia applications, vol 1, pp 61–67. Sivakasi, Tamil Nadu, India 3. Bonabeau E, Dorigo M, Theraulaz G (1999) Swarm intelligence: from natural to artificial systems. Santa Fe Institute Studies in the Sciences of Complexity, Ed. Oxford University Press, New York, NY 4. Dorigo M, Stützle T (2004) Ant colony optimization. MIT Press, USA 5. Jindal V, Dhankani H, Garg R, Bedi P (2015) MACO: modified ACO for reducing travel time in VANETs. In: Third international symposium on women in computing and informatics (WCI-2015), pp 97–102. ACM, Kochi, India 6. Lazinica Aleksandar (2009) Particle swarm optimization. In-Tech, intechweb.org, Austria 7. Teodorovic´ D (2008) Swarm intelligence systems for transportation engineering: principles and applications. Transp Res Part C, Elsevier 16:651–667 8. Jindal V, Bedi P (2018) Parameter tuning in MACO for actual road conditions. Communicated in an International Journal 9. Dorigo M, Caro G, Gambardella L (1999) Ant algorithms for discrete optimization. Artif Life 5(2):137–172 10. Elloumia Walid, El Abeda Haikal, Abra Ajith, Alimi Adel M (2014) A comparative study of the improvement of performance using a PSO modified by ACO applied to TSP. Appl Soft Comput 25:234–241 11. Deneubourg JL, Aron S, Goss S, Pasteel JM (1990) The self-organizing exploratory pattern of the Argentine ant. J Insect Behav 3(2):159–168 12. Bell John E, McMullen Patrick R (2004) Ant colony optimization techniques for the vehicle routing problem. Adv Eng Inform 18:41–48 13. Fu J, Lei L, Zhou G (2010) A parallel ant colony optimization algorithm with GPU-acceleration based on all-in-roulette selection. In: Third international workshop on advanced computational intelligence (IWACI), pp 260–264 14. Dawson L, Stewart I (2013) Improving ant colony optimization performance on the GPU using CUDA. In: IEEE congress on evolutionary computation (CEC), pp 1901–1908 15. Cecilia JM, Garcia JM, Ujaldon M, Nisbet A, Amos M (2011) Parallelization strategies for ant colony optimisation on GPUs. In: IEEE international symposium on parallel and distributed processing workshops and PHD forum (IPDPSW), pp 339–346 16. Nanda BK, Das G (2011) Ant colony optimization: a computational intelligence technique. Int J Comput Commmun Technol 2(6):105–110 17. Rizzoli AE, Montemanni R, Lucibello E, Gambardella LM (2007) Ant colony optimization for real-world vehicle routing problems: from theory to applications. Swarm Intell 1:135–151 18. Turky AM, Ahmad MS, Yusoff MZM (2009) The use of genetic algorithm for traffic light and pedestrian crossing control. Int J Comput Sci Netw Secur 9(2):88–96

324

V. Jindal and P. Bedi

19. Claes R, Holvoet T (2011) Ant colony optimization applied to route planning using link travel time prediction. In: International symposium on parallel distributed processing, pp 358–365 20. Khanra A, Maiti MK, Maiti M (2015) Profit maximization of TSP through a hybrid algorithm. Comput Ind Eng 88:229–236 21. Kennedy J, Eberhart RC (1995) Particle swarm optimization. In: International conference on neural networks IEEE, pp 1942–1948, Piscataway, NJ 22. Xie X, Wu P (2010) Research on the optimal combination of ACO parameters based on PSO. In: International conference on networking and digital society, pp 94–97 23. Shi C, Bu Y, Li Z (2008) Path planning for deep sea mining robot based on ACO-PSO hybrid algorithm. In: International conference on intelligent computation technology and automation, pp 125–129 24. Kiran MS, Gündüz M, Baykan ÖK (2012) A novel hybrid algorithm based on particle swarm and ant colony optimization for finding the global minimum. Appl Math Comput 219:1515– 1521

Chapter 24

A Survey of Portfolio Optimization with Emphasis on Investments Made by Housewives in Popular Portfolios Sunita Sharma and Renu Tuli

24.1 Introduction Financing is one of the major needs of every individual in order to channelize one’s income resources in the best possible way. Financial mathematics has been applied to portfolio optimization since 1950s when Markowitz [1, 2] first applied mathematical tools for optimizing portfolio selection and proposed the Markowitz model. Later on, Hatemi and El-Khatib [3] developed an approach combining optimization of risk and return. A very important tool in portfolio optimization called the Sharpe ratio was developed by Sharpe [4]. Lorentz [5] modified the Sharpe ratio and showed that investment of at least half of the capital among assets gives the best performance of portfolio. Tsuil and Chadam [6] developed two strategies to improve the Sharpe ratio. Sinha and Tripathi [7] compared the Markowitz and Sharpe’s models on the basis of risk and return. The present work focuses on identifying popular portfolios preferred by housewives, identifying the tools which are best suited for optimizing portfolios and applying the optimization tools, which have been identified as best suited, on the popular portfolios. Section 2 gives the basic terms and definitions, Sect. 3 deals with collection and analysis of data, Sect. 4 shows the steps to find the optimal portfolio and the results obtained on implementing the steps on the data, and Sect. 5 gives the concluding remarks.

S. Sharma Department of Mathematics, Kalindi College, Delhi University, New Delhi, India e-mail: [email protected] R. Tuli (B) Department of Applied Sciences, The NorthCap University, Gurugram, Haryana, India e-mail: [email protected]; [email protected] © Springer Nature Singapore Pte Ltd. 2020 P. K. Kapur et al. (eds.), Strategic System Assurance and Business Analytics, Asset Analytics, https://doi.org/10.1007/978-981-15-3647-2_24

325

326

S. Sharma and R. Tuli

24.2 Basic Terms and Definitions (i)

Pre-Financial Instruments: Monetary contracts like currencies, shares, bonds, stocks and mutual funds are called financial instruments. (ii) Portfolio: A combination of a persons financial instruments is called a portfolio. (iii) Asset Return [8]: An asset is a financial instrument which can be bought and sold. The return of an asset is R = xx01 , x0 is the amount at which the asset is purchased, and x1 is the amount at which the asset is sold. The rate of return 0 = R − 1. on the asset is r = x1x−x 0 (iv) Expected Return of a Portfolio [8]: Let a portfolio consist of n assets. Expected n ri wi where ri = Ri − 1, i = 1, 2, . . . , n, return of the portfolio is r = i=1 is the return on asset i and w is the weight factor for asset i satisfying R i i n w = 1. i i=1 (v)  Portfolio Variance is given by σ 2 = n [9]: Variance of the portfolio n 2 2 2 r =1 wi σi + i, j=1 wi w j Cov vi j , where σi denotes the variance of rates of return for asset i and Cov vi j denotes the covariance between the rates of return for assets i and j. (vi) Sharpe Ratio: The Sharpe ratio = (mean portfolio return − risk-free rate)/standard deviation of portfolio return. (vii) Markowitz Model: This model considers the return, standard deviation and coefficient of correlation to determine the efficient set of portfolio. (viii) Portfolio Optimization: It is the process of finding the optimal portfolio so as to maximize the expected return and minimize the risk. (ix) Efficient Frontier: It is the set of all optimal portfolios that maximize the expected return for a defined level of risk or minimize the risk for a given level of expected return.

24.3 Collection and Analysis of Data An experimental methodology has been used for determining the correlation between El-Nino and global warming. It also predicts the upcoming El-Nino year. The methodology is described in the following steps and illustrated with the help of the diagram as shown in Fig. 24.1. It has been assumed that a housewife does not have knowledge of various investment portfolios available to her. She knows three basic investment portfolios, viz. cash, gold and property. The rest are only popular portfolios, i.e., those portfolios which are made popular by advertisements in different media accessible to a reasonably intelligent housewife. With increasing awareness, some of the women have independently started taking decisions on investments of their surplus income/savings. The main objective of such investment is future needs of children’s education and their marriage. This means that they look for a long-term investment ranging from 12–15 years. Side by side, homemakers have also started investing in

24 A Survey of Portfolio Optimization with Emphasis …

327

short-term portfolios to increase money’s value quickly for a big purchase in near future such as jewelery, car. Besides the above, investments are made into portfolios like insurances, mutual funds to benefit at the time of retirement or old age. These investments lock returns for 20–30 years. Apart from that, housewives in Delhi have always invested small values in gold and cash, which they call as ‘savings.’ In this changing scenario, there is a need to optimize portfolios of a homemaker such that investment can be optimally allocated to various assets for generating best returns in near future as well as in the next 20–30 years. Another consideration is that certain allocations should be easily converted to liquidity to be at par with ‘savings.’ A sample of 120 housewives in Delhi has been surveyed regarding their saving and investment pattern, and the data collected has been studied and analyzed with the help of MS-EXCEL. The following results have been obtained (Table 24.1, Fig. 24.1, Table 24.2, Fig. 24.2, Table24.3, Fig. 24.3, Table 24.4, Fig. 24.4) 35 30 25 20 15 10 5 0

10000-20000 Bank

20000-40000

Property

Gold

40000-50000 Life Insurance

F.D

50000 and above Committee

Fig. 24.1 Column chart representing relationship between net income and investment options

Table 24.1 Relationship between net income and investment options Net income

Investment option Bank

Property

Gold

Life insurance

Fixed deposit

Committee

10,000–20,000

8

1

4

2

3

5

20,000–40,000

16

5

9

7

9

3

40,000–50,000

11

2

6

6

4

1

50,000 and above

33

10

16

19

14

9

328

S. Sharma and R. Tuli

Table 24.2 Relationship between savings and investment options Savings

Investment option Bank

Property

Gold

Life insurance

Fixed deposit

Committee

5000–10,000

33

9

17

16

14

14

10,000–20,000

9

1

4

3

4

3

20,000–30,000

20

6

10

11

7

1

30,000 and above

4

2

3

3

3

0

35 30 25 20 15 10 5 0

5000-10000 Bank

10000-20000 Property

Gold

20000-40000

50000 and above

Life Insurance

F.D.

Committee

Fig. 24.2 Column chart representing relationship between savings and investment options

Table 24.3 Relationship between expenditure and investment options Expenditure

Investment option Bank

Property

Gold

Life insurance

Fixed deposit

Committee

10,000–20,000

22

2

8

6

10

5

20,000–40,000

28

9

15

15

13

6

40,000–50,000

9

3

7

6

3

4

50,000 and above

7

4

3

5

2

2

24.4 Optimal Portfolio As expected, the survey revealed that housewives in Delhi invest in Bank, gold and fixed deposits. The project would provide an optimal investment scenario for the homemaker women. In other words, it would provide how much investment an Indian

24 A Survey of Portfolio Optimization with Emphasis …

329

30 25 20 15 10 5 0

10000-20000 Bank

20000-40000

Property

Gold

40000-50000 Life Insurance

50000 and above F.D.

Committee

Fig. 24.3 Column chart representing relationship between expenditure and investment options

Table 24.4 Relationship between net income and savings Net income

Savings 5000–10,000

10,000–20,000

20,000–30,000

30,000 and above

10,000–20,000

9

0

0

0

20,000–40,000

19

0

1

0

40,000–50,000

8

2

2

0

50,000 and above

6

7

17

4

20 15 10 5 0 10000-20000

5000-10000

20000-40000

10000-20000

40000-50000

20000-30000

50000 and above

30000 and above

Fig. 24.4 Column chart representing relationship between net income and savings

housewife should allocate to various preferred stocks to have an optimal return on the investments, while reducing the risks involved. Historic data [10] for the following three stocks have been considered on a monthly basis for a period of 1 year from January 2017–January 2018: • Hindustan Unilever (Hindunilvr) • ABB India • Axis bank.

330

S. Sharma and R. Tuli

The portfolio contains these stocks along with gold prices on a monthly basis for a period of 1 year from January 2017–January 2018 [11]. The use of MS-Excel Solver has been made for this purpose. The following steps have been taken to achieve the optimal portfolio: Step 1: The rate of returns for each of these four investment options has been calculated. Step 2: The average monthly rate of return has been calculated using AVERAGE() Step 3: The monthly variance of the rate of return has been calculated using VAR(). Step 4: The yearly rate of return and variance has been calculated by multiplying the values obtained in Steps 2 and 3 by 12. Step 5: The variance–covariance table is constructed using COVAR(). Step 6: An equally weighted portfolio is considered, assigning weight 0.25 to each of the four options. The portfolio return is calculated using the formula MMULT(TRANSPOSE(WEIGHTS),YEARLY RETURNS), and standard deviation is calculated using the formula SQRT(MMULT(MMULT(TRANSPOSE(WEIGHTS),COV),WEIGHTS)). Step7 The optimal risky portfolio is found by using Excel Solver add-in, where the Sharpe ratio is maximized. Step 8 The efficient frontier is obtained by considering a portfolio consisting of ten different weighted portfolios (Tables 24.5 and 24.6).

24.5 Conclusion The data collected from the sample size of housewives in Delhi reveals that most of their investments are equivalent to savings. In other words, by way of investment they believe in saving money either in a savings bank account or a fixed deposit. To extend the scope of their investment, the housewives may also invest some amount in Gold ETFs or insurance policies. Some have also trusted investment in committees, which double as social gatherings. In this research, it has been found that the investment options used by housewives in Delhi offer a comparatively low rate of interest, which fluctuates around 7–8% p.a. Moreover, not all investments provide savings in tax and after deduction of tax, etc., and the return on investment is reduced further. Although these portfolios offer low rate of interest, they are almost risk-free and for this reason preferred by housewives, who are seemingly risk-averse investors. Housewives do not make short-term investments. Portfolios that may otherwise be risky in short term could become low risk in long term. In the present work, returns have been analyzed over a period of one year. In the research stocks of well-known companies are used which have a history of good performance along with gold. On

24 A Survey of Portfolio Optimization with Emphasis …

331

Table 24.5 Optimal portfolio construction Hindunilvr

Gold price

ABB India

Axis Bank

Jan-17

855.15

81,187.80

1095.80

466.00

Feb-17

864.90

82,090.33

1198.20

506.65

Mar-17

909.75

81,945.53

1279.90

490.80

Apr-17

934.70

81,804.88

1409.60

509.65

May-17

1066.80

79,082.26

1459.60

514.05

Jun-17

1081.60

81,620.98

1450.66

517.35

Jul-17

1153.35

79,439.00

1421.00

519.80

Aug-17

1217.35

82,217.85

1340.30

500.35

Sep-17

1175.15

85,275.38

1398.90

509.15

Oct-17

1236.85

84,385.40

1385.85

523.15

Nov-17

1273.75

83,603.48

1398.80

535.40

Dec-17

1368.10

80,269.67

1402.90

563.95

Jan-18

1369.65

84,983.77

1653.20

593.60

Average monthly return (%)

4.09

0.43

3.67

2.09

Monthly variance (%)

0.20

0.10

0.42

0.12

Average yearly return (%)

49.06

5.10

44.02

25.11

Yearly variance (%)

2.35

1.16

5.06

1.48

Hindunilvr

Gold

ABB India

Axis Bank

Hindunilvr

0.0215

−0.0112

−0.0115

−0.0048

Gold

−0.0112

0.0106

0.0081

−0.0005

ABB India

−0.0115

0.0081

0.0464

0.0133

Axis Bank

−0.0048

−0.0005

0.0133

0.0136

Variance-covariance table

Weights for equally-weighted portfolio

Weights for optimal risky portfolio

Hindunilvr

0.37835018

0.25

Gold

0.25

0.39137903

ABB India

0.25

0.01253053

Axis Bank

0.25

0.21774025

Sum

1

1

Expected return (%)

30.82

26.58

Standard deviation (%)

7.02

3.48

Sharpe ratio

5.729

1

0

0

0

1

49.06

14.66

Hindunilvr

Gold

ABB India

Axis Bank

Sum

Expected return (%)

Standard deviation (%)

Wrights

9.38

41.96

1

0

0.1

0.15

0.75

7.15

38.37

1

0.15

0.1

0.15

0.6

6.30

36.92

1

0.2

0.15

0.15

0.5

Table 24.6 Expected return and standard deviation for different weights

5.65

34.92

1

0.1

0.15

0.25

0.5

3.48

26.58

1

0.21774

0.012531

0.391379

0.37835

5.49

18.04

1

0

0.05

0.7

0.25

7.19

13.64

1

0

0.05

0.8

0.15

10.30

5.10

1

0

0

1

0

7.02

30.82

1

0.25

0.25

0.25

0.25

0.00

6.65

Risk for rate

332 S. Sharma and R. Tuli

24 A Survey of Portfolio Optimization with Emphasis …

333

comparison with the risk-free interest, an optimal ratio has been obtained for investing in high performing companies and gold as compared to risk-free investments. In the study, it has been found that the housewives have to change from totally risk-averse investors to partially risk-averse investors to optimize their returns on investment. In other words, housewives have to invest some part in risky portfolios while using the remaining for risk-free investments. The ratio of investment in risky portfolios will vary from individual to individual depending on such individual’s risk aversion. This study however suggests that risk aversion for housewives should not be 100% as is found in the data collected. Instead, housewives should spend some part in risky portfolios as well to enhance their return.

References 1. Markowitz H (1952) Portfolio selection. J Financ 7(1):77–91 2. Markowitz HM (1999) The early history of portfolio theory: 1600–1960. Financ Anal J 55(4):5– 16 3. Hatemi JA, El-Khatib Y (2015) Portfolio selection: an alternative approach. Econ Lett 135:424– 427 4. Sharpe WF (1994) The sharpe ratio. J Portf Manag 21(1):49–58 5. Lorentz P (2012) A modified sharpe ratio based portfolio optimization. Master of science thesis. Royal Institute of Technology, School of Engineering Sciences, Sweden 6. Tsuil LK, Chadam J (2008) Portfolio optimization. In: Proceedings of the 2nd Fields–MITACS industrial problem-solving workshop 7. Sinha R, Tripathi PK (2017) A comparison of Markowitz and sharpe’s model of portfolio analysis. Wealth: Int J Money, Bank Financ 6(1):18–24 8. Luenberger DG (1998) Investment science. Oxford University Press 9. Brown R (2012) Analysis of investments and management of portfolios. Cengage Learn 10. Historic prices of stocks and indices—Moneycontrol. Available at https://www.moneycontrol. com/stocks/histstock.php 11. Gold Price in Indian Rupee—Ycharts. Available at https://ycharts.com/indicators/gold_price_ in_indian_rupee

Chapter 25

Single Vacation Policy for Discrete-Time Retrial Queue with Two Types of Customers Geetika Malik and Shweta Upadhyaya

25.1 Introduction • Retrial queues have always attracted a lot of researchers ever since it was introduced by Cohen [1] in 1957. On arrival of the unit if each and every server is unavailable, then that unit goes into the orbit to retry for that same service and such a situation is referred as retrial queueing system. Such queues are highly useful to model any problem in areas like computer networks, telecommunication systems. • A lot of work has been done in this area. The reader may go through some excellent surveys done by Falin [2] and Yang and Templeton [3]. Some recent references include Jain and Bhagat [4], Choudhury and Ke [5]. • Whenever a situation arises where time is slotted, then discrete-time queues are more appropriate. The level of complexity of discrete-time queueing system is higher than that in retrial queues with continuous time. This is one of the major reasons that still a lot of work is needed to be done in this field. • Yang and Li [6] were the ones who gave the concept of discrete-time queues. For a complete study and understanding this concept, the reader may go through [7]. Authors like Upadhyaya [8], Li [9], Wei et al. [10] have contributed a lot in this area. • Further, Li [9] worked on the discrete time working vacation queueing model and determined the probability generating function (pgf ) of total units in the queue at departure point as well as waiting time of the unit. An application to network scheduling is also discussed. Wei et al. [10] examined Geo/G/1 retrial queue in discrete environment, wherein unit may balk and opt for optional service as well. They established the pgf of the system size and orbit size for this model. G. Malik (B) · S. Upadhyaya Department of Mathematics, Amity Institute of Applied Science, Amity University, Noida, India e-mail: [email protected] S. Upadhyaya e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 P. K. Kapur et al. (eds.), Strategic System Assurance and Business Analytics, Asset Analytics, https://doi.org/10.1007/978-981-15-3647-2_25

335

336

G. Malik and S. Upadhyaya

• In most of the queueing systems, unit enters the system and leaves only after the completion of service. However, in real world, there are a lot of units who either do not wish to enter the queue at all and exit before even being served (balking) or breaks the queue to get the service immediately (preferred units). • Ke and Chang [11] analysed a general service model with repeated attempts and Bernoulli feedback and used the supplementary variable technique for finding some performance indices. Also, Gao [12] worked on retrial queue with two variants of units, i.e. priority and ordinary units to obtain the sojourn time of an arbitrary ordinary unit. • Later on, Xu et al. [13] described a single server queue with balking along with higher priority units. They have performed various numerical examples in their work. Wu et al. [14] and Atencia [15] are few of the researchers who have worked on discrete-time queues with preferred units and priority services, respectively. • Many a times, there arises a situation when the server being idle leaves the system and goes on a vacation. Rajadurai et al. [16] and Upadhyaya [17] have contributed a lot in retrial queues with vacations. • Zang and Zu [18] examined a Geo/G/1 queue in discrete environment and have compared it with the corresponding continuous-time model. Atencia [19] considered a discrete-time model with last come first serve (LCFS) discipline and different vacation times. Upadhyaya [8] worked on queue with retrials, Bernoulli feedback in discrete environment and the condition that server follows J- vacation policy. • By going through the literature survey and our best knowledge, we have observed that such work is not there in the past which includes discrete-time queues with retrial, preferred and impatient units, single vacation and state-dependent policy. This has motivated us to do this work. The remaining part of our work is organised as follows: Detailed explanation of the considered model plus the symbols used are described in Sect. 25.2. Further, the steady-state probabilities and system size distribution are given in Sect. 25.3. Section 25.4 is devoted to find some of the useful performance measures. Thereafter, in Sect. 25.5, we have illustrated a numerical example along with trends in average system size with respect to some sensitivity parameters. Finally, we have given concluding remarks and future scope in Sect. 25.6.

25.2 Model Details • This investigation considers a single server Geo/G/1 retrial queue under discrete environment including Bernoulli feedback along with preferred plus impatient unit under single vacation and state-dependent policy. • Here, the time axis gets equally segmented into slots such as 0, 1, 2, 3, …, m. All the queueing activities take place at these slot boundaries. In this queueing model, we have considered an early arrival system (EAS) wherein the interval

25 Single Vacation Policy for Discrete-Time Retrial …

• • •



337

(m− , m) is taken for the finishing activities such as departures or the end of the vacation, whereas (m, m+ ) is considered for the beginning activities such as arrivals or retrials, or may be the beginning of the vacation in sequence. Units arrive to the system following geometrical process with rate po if the server is idle (retrial), with rate p1 for the server being busy and with rate p2 for server being on vacation (p1 > p2 ). We assume that due to the unavailability of any waiting space, the arriving unit immediately goes to receive the service if server is idle and then exits the system forever once the service is completed. Else, if server is rendering service on arrival, the unit arrived may interrupt the ongoing work to initiate his service with probability qθ (priority unit) or may enter into the orbit with probability qθ = q(1 − θ ) (repeated unit) or may exit permanently with complementary probability q = 1 − q (impatient unit). The unit which gets interfered joins the orbit. Consecutive inter-retrial time of a unit follow any arbitrary distribution {rρ } and ∞  rρ eρ . The service time is independent and idengenerating function R(e) = ρ=0

tically distributed with probability distribution {wρ } and probability generating ∞  wρ eρ with the ρth factorial moment wρ (ρ ≥ 1). function W (e) = ρ=1

• We assume that probability distribution of vacation time is {sρ }, probability ∞  sρ eρ and the ith factorial moment sρ (ρ ≥ 1). generating function S(e) = ρ=0

• When a unit finishes its service, it either decides to go into the retrial group to receive an extra service with chance α or departs completely from there with chance 1 − α = α, where 0 ≤ α < 1. When the orbit has no unit, the server goes for a vacation. Once the vacation is over, server may be free and take rest until the arrival of units either from orbit or from outside. • The time between arrivals and that between repeated attempts and vacation time are assumed to be bilateral independent.

25.3 Distribution of System Size In this era of busy network system and high-speed technology, analysing queueing models becomes very essential. So the system designers must be aware of each critical parameter to attain maximum profit. Thus, knowing the performance metrics of the queueing network is very essential so as to avoid any loss. This is done by analysing the model. The technique considered in our work to estimate the performance of the system is probability generating function technique. It is very useful while dealing with differential difference equations. With this method, one can easily convert probabilities into function of dummy variable.

338

G. Malik and S. Upadhyaya

• At time t + , we describe the system as the Markov process {Z t ; t ≥ 1} with Z t = {C t , β 0,t , β 1,t , β 2,t , N t }, where Ct is 0,1, 2 depending upon three states, viz. idle, in service, on vacation, respectively, and N t is the number of repeated units present in the orbit. • If C t = 0 with N t > 0, then β 0,t denotes the remaining time between repeated attempts, if C t = 1, then β 1,t denotes the remaining working time of the units under service at that instant. Similarly, if C t = 2, then β 2,t shows leftover vacation time. • We define the state transition probabilities as follows: χ0,1 : Chance that the system is idle. χn,1 : Chance that n ≥ 1 units are there in the queueing network while the server is busy. χn,2 : Chance that n ≥ 0 units are there in the queueing network while the server is on vacation. χε,ρ,m = lim P{Ct = ε, βε,t = ρ, Nt = m}, t→∞

ε = 0, ρ ≥ 1, m ≥ 1; ε = 1, ρ ≥ 1, m ≥ 0; ε = 2, ρ ≥ 1, m ≥ 0 We have constructed the following state governing equations: χ00 = po χ00 + p2 χ2,1,0 χ0,ρ,m = po χ0,ρ+1,m + α p1rρ χ1,1,m−1 + α p1rρ χ1,1,m + p2 rρ χ2,1,m

(25.1) (25.2)

χ1,ρ,m = δ0,m po wρ χ00 + p0 wρ χ0,1,m+1 + (1 − δ0,m ) p0 wρ

∞ 

χ0,ε,m + (1 − δ0,m )αp1 wρ χ1,1,m−1

ε=1

+ (α p1 + α p1r0 )wρ χ1,1,m + α p1r0 wρ χ1,1,m+1 + ( p1 + p1 q)χ1,ρ+1,m + (1 − δ0,m ) p1 qθ χ1,ρ+1,m−1 + (1 − δ0,m )

∞ 

p1 qθ wi χ1,ε,m−1

ε=2

+ p2 wρ + p2 r0 wρ χ2,1,m+1 ; ρ ≥ 1, m ≥ 0 χ2,1,m = p2 χ2,ρ+1,m + (1 − δ0,m ) p2 χ2,ρ+1,m−1 + δ0,m p1 αχ1,1,0 sρ ; ρ ≥ 1 m ≥ 0 Now, the probability generating functions are defined as

(25.3)

(25.4)

25 Single Vacation Policy for Discrete-Time Retrial … ∞ 

ω0,ρ (κ) = ω2,ρ (κ) =

m=1 ∞ 

χ0,ρ,m κ m ; ω1,ρ (κ) =

∞ 

χ1,ρ,m κ m

m=1 ∞ 

χ2,ρ,m κ m ; ω0 (e, κ) =

m=1 ∞ 

ωε (e, κ) =

339

ω0,ρ (κ)eρ

i=1

ωε,ρ (κ)eρ , ε = 1, 2, 3 . . . . . .

i=i

Multiplying Eqs. (25.2) and (25.4) by κ m and summimg over m, we get ω0,ρ (κ) = p0 ω0,ρ+1 (κ) + η(κ) p1rρ ω1,1 (κ) + p2 rρ ω2,1 (κ) − α p1rρ χ1,1,0 − p2 rρ χ2,1,0

(25.5)

where η(κ) = (α + ακ) ω2,ρ (κ) = ( p2 + p2 κ)ω2,ρ+1 + p1 αsρ χ1,1,0

(25.6)

We now multiply Eq. (25.6) by eρ and then summing over ρ, 

 e − p2 + p2 κ ω2 (e, κ) = −( p2 + p2 κ)ω2,1 (κ) + p1 αs(e)χ1,1,0 e

(25.7)

Setting e = p2 + p2 κ, we have ω2,1 (κ) =

p1 αs(γ ) χ1,1,0 ; where, γ (κ) = p2 + p2 κ γ( f )

(25.8)

Solving Eqs. (25.7) and (25.8), we get ω2 (e, κ) =

e p1 α[s(e) − s(γ )] χ1,1,0 [z 1 − γ (κ)]

(25.9)

Take derivative of above equation with respect to e and letting e = κ = 0, χ2,1,0 =

s( p2 )α p1 χ1,1,0 p2

(25.10)

Using Eq. (25.10) in Eq. (25.5), we get ω0,ρ (κ) = p0 ω0,ρ+1 (κ) + η(κ) p1rρ ω1,1 (κ) + p2 rρ ω2,1 (κ) p0 ri [1 + s( p2 )] χ00 − s( p2 ) where η(κ) = (α + ακ)

(25.11)

340

G. Malik and S. Upadhyaya



 e − p0 ω0 (e, κ) = − p0 ω0,1 (κ) + p1 [R(e) − r0 ]η(e)ω1,1 (κ) e   − p2 s(η) + [1 + s( p2 )]γ (κ) χ00 (25.12) − p0 [R(e) − r0 ] γ (κ)s( p2 ) ⎤ ⎡ ω1,ρ (k) =

p 0 wρ ω0,1 (κ) + ς(κ)ω1,ρ+1 (κ) + ⎣ κ

(α + ακ)wρ ( p1 κ + p1 r0 ⎦ω1,1 (κ) κ − p1 q1 θkwρ

( p2 κ + p2 r 0 ) + p0 wρ ω0,1 (κ) + wρ ω2,1 (κ) κ   r0 κ − r0 + p 0 wρ − χ00 κ κs( p2 )

(25.13)

where ς (κ) = p1 + p1 q1 + p1 q1 θ κ Multiplying Eq. (25.13) by eρ and summing over ρ, we get 

 e − ς (κ) p0 w(e) ω1 (e, κ) = ω0,1 (κ) + p0 w(e)ω0 (1, κ) e κ + p1 q1 θ κw(e)ω1 (1, κ)  

p1 κ + p1 r 0 η(κ) − p1 q1 θ κ w(e) − ς (κ) ω1,1 (κ) + κ  

r0 ( p2 e + p2 r0 ) p0 s(γ )w(e) κ − r0 − p0 w(e) + χ00 ) + κ κs( p2 ) κ γ (κ)s( p2 )

(25.14)

Putting e = 1 in Eq. (25.14), we have p0 ω0,1 (κ) + p0 ω0 (1, κ) q(1 − κ)ω1 (1, κ) = κ 

 p1 κ + p1 r 0 η(κ) − p1 q1 θ κ x(e) − ς (e) ω1,1 (κ) + κ  

r0 ( p2 κ + p2 r0 ) p0 s(γ )w(e) κ − r0 − p0 w(e) + χ00 + κ κs( p2 ) κ γ (k)s( p2 )

(25.15)

Putting e = 1 in Eq. (25.12), we have p0 ω0 (e, κ) = − p0 ω0,1 (κ) + p1 [1 − r0 ]η(κ)ω1,1 (κ)   − p2 s(γ ) + [1 + s( p2 )]γ (κ) χ00 − p0 [1 − r0 ] γ (κ)s( p2 ) Using Eqs. (25.15) and (25.16) in Eq. (25.14), we get 

 e − ς (κ) p0 ω1 (e, κ) = (1 − θ κ)s(e)ω0,1 (κ) e κ

(25.16)

25 Single Vacation Policy for Discrete-Time Retrial …

341





 (1 − θ κ) κ + p1r0 (1 − κ) θ κw(e) w(e)η( f ) − ξ( f ) − + ω1,1 (κ) (1 − κ) κ 1−κ ⎤ ⎡

κ − r0 r0 ( p2 κ + p2 r0 ) s(γ )   − + ⎢ (1 − θκ) κ κs( p2 ) κ γ (z)s( p2 ) ⎥ ⎥χ00  + p0 w(e)⎢ ⎣ − p2 s(γ ) + [1 + s( p2 )]γ (κ) ⎦ (1 − κ) −(1 − r0 ) γ (κ)s( p2 ) (25.17) Setting e = p0 in Eq. (25.12) and e = ς (κ) in Eq. (25.17), we have p0 [R( p0 ) − r0 ] [{−κ(1 − θ κ)η(κ)w(ς (κ)) + κς (κ) − κ 2 ς (κ) p0 s( p2 )γ (κ)H (κ) + θ κ 2 ς (κ)}{γ (κ)[1 + s( p2 )] − p2 s(γ )}

ω0,1 (κ) =

+ {κγ (κ) − ks(γ )} p1 (1 − θ κ)η(κ)w(ς (κ))]χ00

(25.18)

p0 (1 − θ κ)w(ς (κ)) [κγ (κ) − κs(γ ) + R( p0 )(1 − κ){γ (κ)[1 + s( p2 )] s( p2 )γ (κ)H (κ) (25.19) − p2 s(γ )}]χ00

ω1,1 (κ) =

where H ( f ) = (1 − θκ)η(κ)w(ς(κ))[ p1 R( p0 )(1 − κ) + κ] − κς(κ) + κ 2 [ς(κ) − θ x(ξ(κ))]

Theorem 1 The marginal generating function (mgf) of the total units present in the orbit while server is in busy state is (1 − θ κ)χ00 [{γ (κ)[1 + s( p2 )] κqθ (1 − κ)s( p2 )γ (κ)H (κ) − p2 s(γ )}{(r0 − κ)H (κ) + κ(1 − κ)2 ς (κ)[1 − w(ς (κ))]} + [κγ (κ) − κs(γ )]T1 (κ) + {(κ − r0 )s( p2 )γ (κ) − r0 γ (κ)

ω1 (1, κ) =

+ ( p2 κ + p2 r0 )s(γ )}H (κ)] where H ( f ) = (1 − θκ)η(κ)w(ς(κ))[ p1 R( p0 )(1 − κ) + κ] − κς(κ) + κ 2 [ς(κ) − θ x(ξ(κ))]

T1 (k) = (1 − θ κ)η(κ)w(ς (κ)){[R( p0 ) − r0 ] p1 (1 − κ) + (κ + p1r0 (1 − κ))} − κ(1 − κ)ς (κ) − ακ 2 w(ς (κ)) Theorem 2 The mgf of the total units present in the buffer during idle state of the server is

342

G. Malik and S. Upadhyaya

 s( p2 )γ (κ)H (κ) + T2 (κ) + T3 (κ)− ω00 χ00 + ω0 (1, κ) = s( p2 )γ (κ)H (κ) (1 − r0 )H (κ){γ (k)[1 + s( p2 ) − p2 s(γ )} where T2 (κ) = p1 η(κ)(1 − r0 )(1 − θ κ)w(ς (κ)){κγ (κ) − κs(γ ) + R( p0 )(1 − κ){γ (κ)[1 + s( p2 ) − p2 s(γ )}} T3 (κ) = −[R( p0 ) − r0 ]{−κ(1 − θ κ)η(κ)w(ς (κ)) + κς (κ) − κ 2 ς (κ) + θ κ 2 wς (κ))}{γ (κ)[1 + s( p2 )] − p2 s(γ ) + {κγ (κ) − κs(γ )} p1 (1 − θ κ)η(κ)w(ς (κ)) Theorem 3 The units who wait in the orbit have their generating functions as follows: χ00 κqθ (1 − κ)s( p2 )γ (κ)H (κ) [κqθ (1 − κ){s( p2 )γ (κ)H (κ) + T2 (κ) + T3 (κ) − (1 − r0 )H (κ){γ (κ)

χ00 + ω0 (1, κ) + ω1 (1, κ) + ω2 (1, κ) =

[1 + s( p2 )] − p2 s(γ )}} + (1 − θ κ)[{γ (κ)[1 + s( p2 )] − p2 s(γ )}{(r0 − κ)H (κ) + κ(1 − κ)2 ς (κ)[1 − W (ς (κ))]} + [κγ (κ) − κs(κ)]T1 (κ) + {(κ − r0 ) p0 [1 − s(γ )]χ00 s( p2 )γ (κ) − r0 γ (κ) + ( p2 κ + p2 r0 )s(γ )}H (κ)]] + s( p2 )[1 − γ (κ)] Theorem 4 The units who wait in the system have their generating functions as follows: χ00 κqθ (1 − κ)s( p2 )γ (κ)H (κ) [κqθ (1 − κ){s( p2 )γ (κ)H (κ) + T2 (κ) + T3 (κ) − (1 − r0 )H (κ){γ (κ) χ00 + ω0 (1, κ) + κω1 (1, κ) + ω2 (1, κ) =

[1 + s( p2 )] − p2 s(γ )}} + (1 − θ κ) f [{γ (κ)[1 + s( p2 )] − p2 s(γ )}{(r0 − κ)H (κ) + κ(1 − κ)2 β(κ)[1 − W (β(κ))]} + [κγ (κ) − κs(γ )]T1 (κ) + {(κ − r0 ) p0 [1 − s(γ )]χ00 s( p2 )γ (κ) − r0 γ (κ) + ( p2 κ + p2 r0 )s(γ )}H (κ)]] + s( p2 )[1 − γ (κ)]

25.4 Performance Measures After the construction of a queueing model and obtaining the stationary distributions, we check whether the system is performing well or not. This can be done by obtaining and measuring some useful performance measures. The probabilities of the server

25 Single Vacation Policy for Discrete-Time Retrial …

343

in various conditions, average units present in the orbit or system, etc., are some of these important measures which we have discussed in our work. • The system is idle with probability given by χ00 =

p0 qθ S( p2 ){θ W (ς(1))( p0 − p1 R( p0 ) − 1) − θw(ς(1)) + ς(1)} [{θ( p1 R( p0 ) − p2 r0 + q)(w(ς(1))}{(1 − θ p0 ) − ς(1) + T4 (1)} +{θw(ς(1))( p0 − p1 R( p0 ) − 1) − θw(ς(1)) + ς(1)}T5 (1) + { p2 ς(1)S(1)}]

where T4 (1) = { p02 q( p0 − p1 R( p0 )) − p0 p2 (1 − r0 )}{θ 2 w(ς (1))(1 + p0 + s( p2 ))} T5 (1) = p0 qθ (r0 S( p2 ) − p2 (1 − r0 ) + p1 θ (1 − 2 p0 )) • The chance that the server is free is given by χ00 + ω0 (1, 1) =

χ00 p0 s( p2 ){θ w(ς (1))(α − p1 R( p0 ) − 1) − θ w(ς (1)) + ς (1)} ∗ [r0 p0 s( p2 ){θ w(ς (1))(θ − p1 R( p0 ) − 1) − θ w(ς (1)) + ς (1)} + p1 s( p2 )T6 (1)]

T6 (1) = w(ς (1))((1 − θ w) − (ς (1)) − p1 p2 (1 − r0 ) − θ w(ς (1))( p2 + R( p0 ))(1 + p1 + s( p2 )) The server is busy with probability as given below

ω1 (1, 1) =

{( p2 + s( p2 ))w(ς (1))(1 − θ w − ς (1))(R( p0 ) − r0 ) + p0 qθ ( p2 − R( p0 ))(1 + p1 + s( p2 ))w(ς (1))} qs( p2 )θ w(ς (1))(θ − p1 R( p0 ) − 1) − θ w(ς (1)) + ς (1) p1 −1) + θ(2 λ qθs( p2 ) 00

χ00

• The average units which are in the orbit in case when p0 = p1 = p2 = p is  d [χ00 + ω0 (1, κ) + ω1 (1, κ) + ω2 (1, κ) E[N ] = φ  (κ)κ=1 == lim κ→1 d f = lim [ω0 (1, κ) + ω1 (1, κ) + ω2 (1, κ) κ→1

Let ω0 (1, κ) = Now,

Num1 (κ) , Dem1 (κ)

ω1 (1, κ) =

Num2 (κ) , Dem2 (κ)

ω2 (1, κ) =

Num3 (κ) Dem3 (κ)

344

G. Malik and S. Upadhyaya

Dem1 (1)Num1 (1) − Num1 (1)Dem1 (1) Num1 (κ) = k→1 Dem1 (κ) 2(Dem1 )2

lim ω0 (1, κ) = lim

κ→1

where Num1 (1) = (R( p) − 1)χ00 [( p + s( p))(αθw(ς(1)) − w(ς(1)) + (ς(1)) − pθw(ς(1)))s( p) + θ p 2 ps  (1)w(ς(1))] Num (1) = (R( p) − 1)λ00 [( p + s( p)){−2w(ς(1)) + 2αθw(ς(1)) + 2 pθw(ς(1)) + 2ς(1) − −2θαw(ς(1)) − 2θ pqw  (ς(1)) − 2θ pw(ς(1)) + θw  (ς(1))(θ pq)2 2

+ 2θαθ pqw  (ς(1)) + θθqp 2 w  (ς(1)) + θαpw(ς(1)) + θ θqpw  (ς(1)) − pθs( p)w(ς(1)) + θ p p 2 w(ς(1))} + { p(1 + s( p)) − p ps  (1)(−3θw(ς(1))) + pqθ + ς(1) − 2θθ pqw  (ς(1))}] Dem (1) = s( p)[w(ς(1)){1 − θ(α − p R( p))} + ς(1)] Dem (1) = s( p)[−2θαw(ς(1)) + 2αθw(ς(1)) − 2w(ς(1)) − 2θ pqw  (ς(1)) + 2θ p R( p)w(ς(1)) − 2αθ p R( p)w(ς(1)) + 2αθθ pqw  (ς(1)) − 2θθqp p R( p)w  (ς(1)) + 2 + 2θqp 2 + 2 pw(ς(1)){αθ − 1 − p R( p)}]



  2Dem2 (1)Num 2 (1) − 2Dem2 (1)Num2 (1)

lim ω1 (1, κ) =

κ→1



 −3Dem 2 (1)Num2 (1)

4(Dem2 (1))2

To solve this, we suppose Num2 (κ) = (1 − θκ)Num3 (κ) where Num2 (1) = θNum3 (1); Num2 (1) = −2θNum3 (1) + θNum3 (1); Num3 (1) = pw(ς(1))(1 + p − w(ς(1))); Num3 (1) = { p − s( p)( p + 2) + r0 p(1 + s( p))}{w(ς(1)) (1 − θ(α − p R( p)) − ς(1)) + ( p + s( p))(r0 − 1)[−2θ θ pqw  (ς(1)) (α + p R( p)) − 2θw(ς(1))(1 − p R( p)) − θw(ς(1))(1 + α − α(1 − p R( p)) + 2ς(1) + 2 pqθ − 2θw(ς(1))]} + 2 pθ(1 − w(ς(1))){w(ς(1)) + pqθ w  (ς(1))} + 2w(ς(1)) p{θα − θ (1 + p) + θ(1 − r0 p) + ς(1) − 2θw(ς(1)) − θθ pqw  (ς(1)) + (1 − r0 )( p + s( p))H  (1)} H  (1) = −2θ αw(ξ(1)) − 2(θ)2 pqw  (ξ(1)) − 2θw(ς(1))(1 − p R( p)) + 2αθ θ pqw  (ς(1)) + 2αθw(ς(1))(1 − p R( p)) + 2(θ)2 pqw  (ς(1))(1 − p R( p)) + 2(ς(1) − θw(ς(1))) + 2( pqθ − θθ pqw  (ς(1))) − 2 − θ θ pqw  (ς(1)) + 2 p H  (1)

where

25 Single Vacation Policy for Discrete-Time Retrial … H  (1) = ς(1) − x(ς(1))(1 − θ(α − p R( p)))   Num 2 (1) = −3θ Num3 (1) + θNum3 (1)

Num 3 1) = −2 p(1 + s( p))(1 + θα)(ς(1) − w(ς(1))) + (1 + s( p)) ( p(r0 − 1) − 1){−2θ pqw (ς(1)) − 2θθ pq(α + 1 − p R( p))w (ς(1)) 2

− 2θw(ς(1))(α + 1 − p R( p)) + 2θαw(ς(1))(1 − p R( p)) + 2θ pq + 2ς(1) − 2θw(ς(1))} + (1 + s( p))(r0 − 1){−3(θ pq)2 w (ς(1))(1 + θ (α + 1 − p R( p)) − 5θ pqw (ς(1))(α + 1 − p R( p)) + 6θαθ pqw (ς(1)) 2

(1 − p R( p)) − 6θαw(ς(1))(1 − p R( p)) + 6θ pq − 6θθ pqw (ς(1))} + 4q p 2 θw (ς(1)) + 2θ pbw (ς(1))(1 − w(ς(1))) + {4 pw(ς(1)) + θ pqw (ς(1))} {θ(α + 1 − pr0 ) − θ + ς(1) − 2θw(ς(1)) − θ θ pqw (ς(1))} + [2 pw(ς(1)) {−θ α(2 + p) + αθ(1 − pr0 ) − 2θ(1 − pr0 ) + 2ς(1) + θ pq − 4θθ pqw (ς(1)) − 2θw(ς(1)) − θ(θ pq)2 w (ς(1))}] + 6 ps( p)H  (1) + 3{S( p) + (1 − r0 ) ps( p) − r0 p}H  (1) + {(1 − r0 )s( p) − r0 + p + pr0 }H  (1)

H  (1) = (−6θ α − 5θ (1 − p R( p)) − θ + θ α(1 − p R( p) − 6θ ) θ pqw  (ς (1)) − 5αθ w(ς (1))(1 − p R( p))) + (θ pq)2 w  (ς (1)) (3αθ − 6θ − 2θ + 3θ (1 − p R( p))) + 6θ pq Dem2 (1) = −2 pθ s( p)H  (1);   Dem 2 (1) = −qθ s( p)[H (1){5 + 6 p} + 3H (1)];    Dem 2 (1) = −qθ s( p)[21 p H (1) + {11 p + 10}H (1) + 3H (1)]

where the values of H  (1), H  (1), H  (1) are already defined above. Applying L’ Hospital rule to the third term, we get lim ω2 (1, κ) =

κ→1

pS  (1) χ0,0 S( p)

• The average units there in the system is  E[L] = φ  (κ)κ=1 d [χ00 + ω0 (1, κ) + κω1 (1, κ) + ω2 (1, κ) = lim κ→1 dκ = lim [χ0 (1, κ) + ω1 (1, κ) + ω1 (1, κ) + ω2 (1, κ) κ→1

E[L] = E[L] + ω1 (1, 1). where all parameters are defined above.

345

346

G. Malik and S. Upadhyaya

25.5 Numerical Example and Sensitivity Analysis Discrete time retrial queues are being used in various fields today to minimise the queue length, waiting time of the units and to obtain an optimal cost effective model. We have considered here an example of a computer system (server) in a hospital that provides the ultrasound report to the patients with service rate μ = 1/w = 0.4 units/min. Let us assume that each patient has unique code in the system, i.e. data packet (units) which arrives prior to slot closing n− and depart after slot closing n+ with rate p0 = p1 = p2 = p = 0.2 units/min. We further assume that if the server is free, the reports are immediately printed (packets are transmitted and departed), else the data packet of that particular patient is given a choice to retry after certain time gap and the time between repeated trials is geometrically distributed with generating 3 . There may be a possibility of misprinting due to corruption function R(a) = 5−2a of packets, and the report has to be taken again (feedback). This feedback probability is taken to be α = 0.1. Due to network issues, sometimes server may run at a slower pace (vacation mode) with rate s = 1/s1 = 0.1. Let the vacation time be geometric r1 ; r1 = 0.8 and q = 1 − a. If distributed with generating function S(a) = (1−qa) there is an emergency and patient is in bad health condition, then his data packet is transmitted first (impatient units). The joining probability of impatient units is assumed to be q = 0.99. Moreover, the emergency patients are given priority over other with probability θ = 0.01. This models is a Geo/G/1 queue with repeated trials in discrete environment including preferred along with impatient units, Bernoulli feedback under single vacation policy. We have performed a program using MATLAB software to derive average system size E[N] = 2.6158. Further, we observe how effective the model is under three types of service time distributions, i.e. exponential, Erlangian and gamma distribution. The results obtained are shown in Figs. 25.1i–iv. The default parameters for these figures are μ = 0.4, p0 = p1 = p2 = p = 0.2, s = 0.1, α = 0.1, q = 0.99 and θ = 0.01. From Fig. 25.1a–d, the trend that we have observed is that E[N] is highly effected with change in p, μ, s and α. E[N] increases as arrival rate and feedback probability increases. Whereas if the service rate is good enough, then E[N] decreases. Further, it reduces by rising vacation rate. We can observe the same feature in many service centres, where system size reduces to some extent whenever server serves with high rate. On the other hand, system size increases when more and more units get accumulated in the absence of server. We also observe that system size is higher in case when service time is assumed to follow gamma distribution than in cases of exponential and Erlangian distribution. So, choosing a right distribution becomes equally important. The above numerical results matches with that of work done by Upadhyaya [8] for J = 1 (i.e. single vacation policy) in special case when q = 1, α = 0 (i.e. no preferred and impatient units). The above numerical results also matches with that of Upadhyaya [20] in special cases with no batch arrivals and no vacation.

25 Single Vacation Policy for Discrete-Time Retrial … Fig. 25.1 Effect of i p ii μ iii s and iv α on E[N] under various distributions of service time

347

348

G. Malik and S. Upadhyaya

25.6 Conclusion • For the developed model, we have derived the probability generating functions for system size and orbit size by applying the probability generating function technique and supplementary variable method. • Also, this investigation is applicable on modelling digital areas for instance networks involving computer and communication and ATMs in discrete parameters. • Nowadays, dynamic flow of information is modelled using discrete-time queues, for instance, flow of units in technological and communication areas or in the world of Internet. • Results obtained in this investigation are quite helpful while dealing with discretetime retrial queueing systems where the server can go for a vacation whenever he is idle. • The vacation may refer to activities carried out by the server to improve the system, like scanning of virus in digital systems. This study provides useful results so that the technological experts can study the impatient behaviour of their units, which in turn can improve their respective systems. • In future, one may consider batch arrivals instead of single one and can consider multiple vacation policy rather than single vacation policy.

References 1. Cohen JW (1957) Basic problem of telephone traffic theory and the influence of repeated calls. Phillips Telecommun Rev 49–101 2. Falin G (1990) A survey of retrial queues. Queueing Syst 7(2):127–167 3. Yang T, Templeton JGC (1987) A survey on retrial queues 201–233 4. Jain M, Bhagat A (2014) Unreliable bulk retrial queues with delayed repairs and modified vacation policy. J Ind Eng Int 10(3):63 5. Choudhury G, Ke JC (2014) An unreliable retrial queue with delaying repair and general retrial times under Bernoulli Vacation schedule. Appl Math Comput 230:436–450 6. Yang T, Li H (1995) On the steady state queue size distribution of discrete time Geo/G/1 queue with repeated units. Queueing Syst 21(1–2):199–215 7. Bruneel H (1993) Performance of discrete time queueing systems. Comput Oper Res 20(3):303– 320 8. Upadhyaya S (2018) Performance analysis of a discrete-time Geo/G/1 retrial queue under J-vacation policy. Int J Indus Syst Eng 29(3):369–388 9. Li JH (2013) Analysis of discrete time Geo/G/1 working vacation queue and its applications to network scheduling. Comput Indus Eng 65(4):594–604 10. Wei CM, Cai L, Wang JJ (2016) A discrete time Geo/G/1 retrial queue with balking units and second optional service. Opsearch 53(2):344–357 11. Ke JC, Chang FM (2009) Modified vacation policy for M/G/1 retrial queue with balking and feedback. Comput Indus Eng 57(1):433–443 12. Gao S (2015) A preemptive priority retrial queue with two classes of units and general retrial times. Oper Res 15(2):233–251 13. Xu B, Xu X, Wang X (2016) Optimal balking strategies for high priority units in M/G/1 queues with two classes of units. J Appl Math Comput 51(1–2):623–642

25 Single Vacation Policy for Discrete-Time Retrial …

349

14. Wu J, Wang J, Liu Z (2013) A discrete time Geo/G/1 retrial queue with preferred and impatient units. Appl Math Model 37(4):2552–2561 15. Atencia I (2017) A Geo/G/1 retrial queueing systems with priority services. Eur J Oper Res 256(1):178–186 16. Rajadurai P, Saravanarajan MC, Chandrasekaran VM (2013) Analysis of an MX /(G1 , G2 )/1 retrial queueing system with balking, optional re-service under modified vacation policy and service interruption. Ain Shams Eng J 5(3):935–950 17. Upadhyaya S (2016) Queueing systems with vacation an overview. Int J Math Oper Res 9(2):167–213 18. Zhang F, Zhu Z (2015) A discrete time Geo/G/1 retrial queue with two types of vacations. Math Probl Eng 12 19. Atencia I (2016) A discrete-time queueing system with changes in the vacation times. Int J Appl Math Comput Sci 26(2):379–390 20. Upadhyaya S (2016) Performance prediction of a discrete-time batch arrival retrial queue with Bernoulli feedback. Appl Math Comput 108–119

Chapter 26

Intuitionistic Fuzzy Hybrid Multi-criteria Decision-Making Approach with TOPSIS Method Using Entropy Measure for Weighting Criteria Talat Parveen, H. D. Arora, and Mansaf Alam

26.1 Introduction The decision of university selection plays a crucial role in life of each young person, as it has long-lasting effect on job profile, lifestyle and salary [1]. Decision of attending college has life-changing effects on individual, and it plays a significant role in one’s career building and future aspects [2]. Collecting information about different colleges is decision-making process for a student, individual has to go through various criteria each college is offering to finally decide it, vast information accompanying the whole process make it complicated and stressful decision-making process [3]. Students have to gather ample information to make a well-versed choice [4, 5]. A large number of researches have been carried out to study the effect of various factors on decision to finalize the university which includes academic, non-academic, sociological and psychological factors, such as faculty, fee, reputation of university, location, sports and social life. [6, 7]. Universities can target best students for their institution by estimating the factors that influence the applicant [8, 9]; also resources can be utilized in a better way [10–12]. Higher education has mandatory effect on the economic condition of the individual; it ensures better career prospects with high salaries [2, 3], lifelong economic security. Several studies have been carried out on university selection using statistical methods. University selection is based on several criteria [4, 5, 13], and lots of study have been done in this field considering several criteria T. Parveen (B) · H. D. Arora Department of Applied Mathematics, Amity Institute of Applied Sciences, Amity University, Sector-125, Noida, Uttar Pradesh, India e-mail: [email protected] H. D. Arora e-mail: [email protected] M. Alam Department of Computer Science, Jamia Millia Islamia, New Delhi, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 P. K. Kapur et al. (eds.), Strategic System Assurance and Business Analytics, Asset Analytics, https://doi.org/10.1007/978-981-15-3647-2_26

351

352

T. Parveen et al.

that affect the decision process such as quality of faculty in universities, academic program quality and available courses, cost, geographical location, campus, social life, future prospects [1, 7, 8], reputation of university, visa approval process, foreign language requirements and transport facility to and fro campus [2]. Multi-criteria decision-making is deployed to solve the university selection problem using several academic and non-academic criteria. In MCDM, a decision-maker ranks alternative after assessment of several interdependent and independent criteria. The presence of vague knowledge creates ambiguity which becomes challenging for decisionmakers, which motivated the researchers to evaluate MCDM techniques in fuzzy environment. TOPSIS method [14] is a classical MCDM method based on approach that the selected alternative must be nearest to the positive ideal solution and afar from negative ideal solution. Supplier selection MCDM problem has been solved by the application of fuzzy set theory [15, 16]. Chen et al. [17] applied the TOPSIS method to solve the multi-criteria decision-making problem in fuzzy setting. A model was presented [10] for assessing the MCDM problem applying AHP method; it was authenticated using fuzzy AHP. Bayrak et al. [18] implemented fuzzy arithmetic operations-based multi-criteria decision-making method. This paper proposes an intuitionistic fuzzy entropy [19, 20] multi-criteria decisionmaking with TOPSIS method to understand the criteria affecting university selection and thus selecting it based on chosen academic and non-academic criteria. In the literature, university selection problem has been solved using the statistical methods like logistic regression analysis, SPSS, ANOVA and chi-squared test [3, 11]. It is also a multi-criteria decision problem with several criteria influencing the decision which can be used to model and solve the problem using various multi-criteria decisionmaking methods. The multi-criteria decision-making method is implemented to analyse the situation of the university selection problem. The TOPSIS method is selected to solve the university selection problem due to its simplicity, rationality, comprehensibility, good computational efficiency and ability to measure the relative performance for each alternative in a simple mathematical form. The TOPSIS method’s intuitive and clear logic represents the rationale of human choice; it has good computational efficiency and possibility for visualization. The TOPSIS technique is useful to decision-makers to construct the problem, or to carry out analysis, for comparing and ranking the various alternatives. The classical TOPSIS method solves problems in which all decision data are known and represented by crisp numbers. Most of the real-world problems, however, have a more complex structure with uncertainty and lack of information or vagueness. The criteria’s importance and the alternative’s impact on criteria given by decision-makers are difficult to represent in crisp data; thus, IFS by Atanassov [19] are appropriated under multi-criteria decisionmaking environment. The TOPSIS process algorithm decision matrix is formed that represents the satisfaction value of each alternative with each criterion, and the criteria weight considered for the university selection problem is determined using intuitionistic fuzzy entropy method.

26 Intuitionistic Fuzzy Hybrid Multi-criteria Decision …

353

26.2 Methodology 26.2.1 Preliminaries Zadeh [21] proposed the fuzzy sets to model the vagueness in 1965, and classical fuzzy set theory is extended by Atanassov [19] to address the vagueness in decisionmaking process. An intuitionistic fuzzy set I on a universe X is defined as follows: I = {x, ξ I (x), ρ I (x) : ∀x ∈ X } where the functions ξ I (x) and ρ I (x) : X → [0, 1] represent the degree of membership and the degree of non-membership of an element x ∈ I ⊂ X , respectively. ϕ I (x) = 1 − ξ I (x) − ρ I (x) Is called degree of uncertainty, with the condition 0 ≤ ξ I (x) + ρ I (x) ≤ 1. The parameter ϕ I (x) is hesitation of degree whether x belongs to I or not. Low value of parameter ϕ I (x) indicates that the information about x is more certain, whereas the high ϕ I (x) value implies the uncertainty in information about x, [22]; when ξ I (x) = 1 − ρ I (x) ∀x, it becomes fuzzy set. For IFS U and V of set X, multiplication operator is given by Atanassov [19]: U ⊗ V = {ξ I (x) • ξ J (x), ρ I (x) + ρ J (x) − ρ I (x) • ρ J (x)|x ∈ X }

(26.1)

26.2.2 An Algorithm for Intuitionistic Fuzzy TOPSIS Let I = {I1 , I2 , . . . Im be the set of m alternatives and X = {X 1 , X 2 , . . . , X n } be set of n criteria. Steps involved in intuitionistic fuzzy TOPSIS [23] method are the following: Step 1: Decision-Makers Weight Evaluation Linguistic terms are expressed in intuitionistic fuzzy numbers determining the importance of ‘r’ decision-maker. Let Q k = [ξk , ρk , ϕk ] be an IF number to rate the kth decision-maker. Thus, the kth decision-maker’s weight is evaluated as:   ξk ξk + ϕk ξk +ρ k   γk =   l ξk k=1 ξk + ϕk ξk +ρk 

and

l k=1

γk = 1.

(26.2)

354

T. Parveen et al.

Step 2: Aggregated  IFS Matrix—DM’s Opinion  (k) (k) Let P = pi j is an IF decision matrix, γ = {γ1 , γ2 , . . . , γl } denotes each m×n  decision-maker’s weight and lk=1 γk = 1, γk ∈ [0, 1]. In multi-criteria decision-making method, all the individual opinions should be intertwined into a gathered conclusion to develop an aggregated intuitionistic fuzzy   decision matrix; IFWA operator [24] is utilized to achieve that P = pi j m×n

(26.3) where   (2) (l) pi j = IFWAψ pi(1) j , pi j , . . . , pi j (2) (3) (l) = γ1 pi(1) j ⊕ γ2 pi j ⊕ γ3 pi j ⊕ · · · γl pi j  l  l  l  l  γ  γ  γ γ   (k) k (k) k (k) k (k) k 1 − ξi j ρi j 1 − ξi j ρi j = 1− , , − (26.4) k=1

k=1

k=1

k=1

       Here, pi j = ξ Ii x j · ρ Ii x j , ϕ Ii x j (i = 1, 2, . . . , m; j = 1, 2, . . . , n). ⎡ ⎢ ⎢ ⎢ P=⎢ ⎢ ⎢ ⎣

p12 p13 p22 p23 p32 p33 ... ...

· · · p1m . . . p2m . . . p3m . . .. ..

pn1 pn2 pn3

. . . pnm

p11 p21 p31 ...

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

(26.5)

Step 3: Weight of Criteria Using  Entropy g g g In this step, weight vector W = w1 , w2 , . . . , wn is obtained [17, 25] using intun g g wi = 1. The entropy itionistic fuzzy entropy measure [1], where wi ≥ 0 and i=1 values of each IFS are evaluated, and the value in the decision matrix P is expressed into an entropy value Hi j as shown in Eq. (26.6) where Hi j is intuitionistic fuzzy entropy.

26 Intuitionistic Fuzzy Hybrid Multi-criteria Decision …

⎡ H11 H12 H13 H21 H22 H23 H31 H32 H33 ... ... ...

⎢ ⎢ ⎢ P=⎢ ⎢ ⎢ ⎣

Hn1 Hn2 Hn3

355

· · · H1m . . . H2m . . . H3m . . .. .. . . . Hnm

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

(26.6)

The intuitionistic fuzzy entropy values are normalized in the decision matrix using Eq. (26.7) gi j =

Hi j Hi2 Hi1   , ,..., max(Hi1 ) max(Hi2 ) max Hi j

(26.7)

where i = 1, 2,…, l; j =1, 2,…, m h i j defines the normalized value. The decision matrix as obtained after normalization is as follows: ⎤ ⎡ g11 g12 g13 · · · g1m ⎥ ⎢ ⎢ g21 g22 g23 . . . g2m ⎥ ⎥ ⎢ . . . g3m ⎥ P=⎢ ⎥ ⎢ g31 g32 g33 ⎢ ... ... ... . . .. ⎥ .. ⎦ ⎣ gn1 gn2 gn3 . . . glm g

Objective weights w j of criteria x j thus obtained as:   1 × 1 − ej n−T m  where e j = i=1 gi j and Q = nj=1 e j g

wj =

(26.8)

Step 4: Weighted Aggregated IFS Matrix—DM’s Opinion Weighted aggregated intuitionistic fuzzy [19] decision matrix is constructed utilizing the following definition:    P ⊗ W = x, ξ Ii (x) · ξW (x), ρ Ii (x) + ρW (x) − ρ Ii (x) · ρW (x) |x ∈ X and ϕ Ii ·W (x) = 1 − ρ Ii (x) − ρW (x) − ξ Ii (x) · ξW (x) + ρ Ii (x) · ρW (x) (26.9) ⎡ ⎢ ⎢ ⎢ P =⎢ ⎢ ⎢ ⎣



p11 p21 p31 ...





p12 p13 p22 p23 p32 p33 ... ...



pi1 pi2 pi3



· · · p1 j . . . p2 j . . . p3 j . . .. .. . . . pi j

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

356

T. Parveen et al.

(26.10)          where pi j = ξi j , ρi j , ϕi j = ξ Ii W x j , ρ Ii W x j , ϕ Ii W x j Step 5: Intuitionistic Fuzzy Set Deviations Let K 1 be benefit criteria and K 2 be the cost criteria, α + is IFPIS—intuitionistic fuzzy positive ideal solution and α − is IFNIS—intuitionistic fuzzy negative ideal solution:           α + = (ξ A∗ W x j , ρ A∗ W x j and α − = (ξ A− W x j , ρ A− W x j where           min max ξI ∗ W x j = ξ Ii W x j | j ∈ J2 ξ Ii W x j | j ∈ J1 , i i           max min ρI ∗ W x j = ρ Ii W x j | j ∈ J2 ρ Ii W x j | j ∈ J1 , i i           max min ξI − W x j = ξ Ii W x j | j ∈ J2 ξ Ii W x j | j ∈ J1 , i i           min max (26.11) ρI − W x j = ρ Ii W x j | j ∈ J2 ρ Ii W x j | j ∈ J1 , i i Step 6: Separation Measures     The separation measures, θIFS αi , αi+ and θIFS αi , αi− , of each alternative from IFPIS and IFNIS are calculated using the following distance measure [26], n              max ξ Ii W x j − ξ I + W x j , ρ Ii W x j − ρ I + W x j  θIFS αi , αi+ = j=1

  θIFS αi , αi− =

n 

          max ξ Ii W x j − ξ I − W x j , ρ Ii W x j − ρ I − W x j  (26.12)

j=1

Step 7: Closeness Coefficient The relative closeness coefficient with respect to the IFPIS αi+ of an alternative αi is given as follows:   θIFS αi , αi−     i ∗ = θIFS αi , αi+ + θIFS αi , αi−

(26.13)

26 Intuitionistic Fuzzy Hybrid Multi-criteria Decision …

357

where 0 ≤ i ∗ ≤ 1, i = 1, 2, 3, . . . , l Step 8: Alternative’s Rank The alternatives are assigned ranks in decreasing order after determining the i ∗ .

26.3 Case Study: University Selection Based on Criteria The information gaining concerning multiple aspects about colleges is an essential part of student’s decision-making process. The process of collecting information at broad level to serve the concern of university selection tends to further complicate the decision-making process for the student. It is essential to have enough information available for students to help them make a well-informed decision [3, 5]. There are several academic and non-academic factors that play pivotal role in university selection by student, academic criteria such as the reputation of university, faculty, courses offered, quality of program, social criteria like future career options and prospects, social life in university, group of people, information provided by university, influence of family, friend and teachers. Various physical aspects also affect the university selection process such as the geographical location of university, local infrastructure, institute’s infrastructure, campus, foreign language requirement and the facility for especially abled students. Major criteria that are essential in university selection are university’s tuition fee, scholarship offers and living expense in the city or campus. Here, the main objective is to select the university based on important criteria, and the proposed model of IF decision-making process using TOPSIS model provides a platform for decision-makers to analyse the several criteria to finalize the university. To implement the model, ten criteria are chosen that mainly affect the selection process by student, these factors are faculty/research facility, reputation and prestige, program, program quality, location, institute infrastructure, future career aspects and opportunities, tuition fee, living expense, scholarships, and these criteria are key indicator that facilitate the decision-makers to evaluate problem. The proposed study is concerned with academic and non-academic criteria affecting the student’s university preference and to establish a multi-criteria decision model for estimating and choosing the most appropriate university among the four universities, considered as alternatives, TOPSIS is used to obtain the ranking while intuitionistic fuzzy entropy measure is used to obtain the criteria weight. The following criteria as shown in Table 26.1 are considered for the study. Step 1: Decision-Makers Weight Evaluation The significance of decision-makers is determined utilizing Eq. (26.5), the linguistic terms are shown in Table 26.2, and resultant weights are shown in Table 26.3. The idea of using the intuitionistic fuzzy method in decision-making process is to integrate weights of each decision-maker. The capabilities of each decision-maker contribute to the process, and judgements of each decision-maker based on their experiences and capabilities are well ensured in IFS.

358

T. Parveen et al.

Table 26.1 Criteria description X1

Faculty/research facility

X6

Institute infrastructure

X2

Reputation and prestige

X7

Future career aspects and opportunities

X3

Program

X8

Tuition fee

X4

Program quality

X9

Living expense

X5

Location

X 10

Scholarships

Table 26.2 Linguistic terms

Table 26.3 Weight of decision-maker

Linguistics terms

IFNs

Very important

0.9

0.1

Important

0.8

0.15

Medium

0.5

0.45

Unimportant

0.3

0.65

Very unimportant

0.1

0.9

Decision-makers

Linguistics term

Weight (λ)

I

Very important

0.397

II

Medium

0.232

III

Important

0.371

Step 2: Aggregated IFS Matrix—DM’s Opinion Linguistic terms are shown in Table 26.4 with notations, extremely good (L 1 ), very very good (L 2 ), very good (L 3 ), good (L 4 ), medium good (L 5 ), fair (L 6 ), medium bad (L 7 ), bad (L 8 ), very bad (L 9 ) and very very bad (L 10 ) are used to construct the opinion of decision-maker based on all criteria for each alternative expressed in Table 26.5, and using Eq. (26.7) aggregated decision matrix is obtained as shown in Table 26.6. Step 3: Weight of Criteria Using Entropy Decision-makers evaluated the ten criteria, and their opinions were aggregated to estimate the weights of each criterion using Eq. (26.6); the resultant weights are shown in Table 26.7. Step 4: Weighted Aggregated IFS Matrix—DM’s Opinion Using Eq. (26.7), a weighted aggregated intuitionistic fuzzy decision matrix is constructed and Table 26.8 is thus obtained. Step 5: Intuitionistic Fuzzy Set Deviations Now, intuitionistic fuzzy positive ideal solution (IFPIS) and intuitionistic fuzzy negative ideal solution (IFNIS) are evaluated using Eq. (26.9) and are shown in Table 26.9.

0.1

0.05

IFNs

L2

0.85

L1

0.95

Linguistic terms

Table 26.4 Linguistic terms L3 0.15

0.75

L4 0.2

0.7

L5 0.25

0.65

L6 0.35

0.55

L7 0.5

0.4

L8 0.65

0.25

L9 0.75

0.1

L 10 0.9

0.1

26 Intuitionistic Fuzzy Hybrid Multi-criteria Decision … 359

360

T. Parveen et al.

Table 26.5 Linguistic terms Criteria

Alternative

Decision-makers I

X1

X2

X3

X4

X5

X6

X7

X8

X9

II

III

A1

G

VG

G

A2

VVG

VG

VG

A3

MG

G

G

A4

F

MG

MB

A1

MG

G

MG

A2

VG

G

VG

A3

VG

VG

MG

A4

G

MG

G

A1

VG

G

VG

A2

VG

VG

G

A3

VG

VG

G

A4

G

G

MG

A1

F

MG

F

A2

VG

VG

VVG

A3

MG

VG

VG

A4

VG

G

MG

A1

G

VG

G

A2

VVG

G

VG

A3

G

F

F

A4

VG

G

G

A1

MG

VG

G

A2

VG

G

VVG

A3

MG

MG

G

A4

F

MB

G

A1

G

VG

VG

A2

VVG

MG

VG

A3

VG

G

MG

A4

MB

MB

G

A1

G

MG

VG

A2

VVG

VVG

VG

A3

VG

MG

VG

A4

G

VG

VG

A1

VG

MG

VG

A2

VVG

G

VVG

A3

MG

G

G (continued)

26 Intuitionistic Fuzzy Hybrid Multi-criteria Decision …

361

Table 26.5 (continued) Criteria

Alternative

Decision-makers I

X 10

II

III

A4

F

MB

G

A1

G

MG

G

A2

VG

G

VG

A3

MG

MG

MG

A4

MB

MB

MG

Table 26.6 Aggregated IF decision matrix Criteria

A1

A2

A3

A4

X1

(0.712, 0.187)

(0.796, .0128)

(0.681, 0.219)

(0.528, 0.37)

X2

(0.662, 0.237)

(0.739, 0.16)

(0.717, 0.181)

(0.689, 0.211)

X3

(0.739, 0.16)

(0.733, 0.167)

(0.733, 0.167)

(0.682, 0.217)

X4

(0.575, 0.324)

(0.793, 0.129)

(0.714, 0.184)

(0.705, 0.194)

X5

(0.712, 0.187)

(0.787, 0.137)

(0.617, 0.28)

(0.721, 0.178)

X6

(0.694, 0.204)

(0.784, 0.138)

(0.669, 0.23)

(0.586, 0.309)

X7

(0.731, 0.17)

(0.779, 0.155)

(0.705, 0.2)

(0.536, 0.405)

X8

(0.709, 0.194)

(0.819, 0.119)

(0.73, 0.174)

(0.731, 0.17)

X9

(0.73, 0.174)

(0.824, 0.124)

(0.681, 0.22)

(0.586, 0.339)

X 10

(0.689, 0.212)

(0.739, 0.162)

(0.65, 0.25)

(0.509, 0.419)

Table 26.7 Weights of criteria Criteria

X1

X2

X3

X4

X5

Weight

0.0564

0.10053

0.09935

0.0643

0.075

Criteria

X6

X7

X8

X9

X 10

Weight

0.0754

0.05189

0.10057

0.0634

0.0588

Step 6 and 7: Separation Measures and Closeness Coefficient The positive and negative separation measures together with relative closeness coefficient are evaluated, and result is thus shown in Table 26.10 and are evaluated using Eqs. (26.10) and (26.11). Step 8: Alternative’s Rank The alternatives ranked based on i ∗ are A2 > A1 > A3 > A4

362

T. Parveen et al.

Table 26.8 Weighted aggregated IF decision matrix Criteria

A1

A2

A3

A4

X1

(0.068, 0.91

(0.086, 0.890)

(0.062, 0.920)

(0.042, 0.950)

X2

(0.103, 0.865)

(0.126, 0.832)

(0.119, 0.842)

(0.111, 0.855)

X3

(0.125, 0.834)

(0.123, 0.837)

(0.123, 0.837)

(0.108, 0.860)

X4

(0.054, 0.930)

(0.097, 0.877)

(0.077, 0.897)

(0.076, 0.900)

X5

(0.089, 0.882)

(0.109, 0.862)

(0.069, 0.910)

(0.091, 0.880)

X6

(0.085, 0.887)

(0.109, 0.861)

(0.080, 0.895)

(0.064, 0.915)

X7

(0.071, 0.905)

(0.082, 0.900)

(0.067, 0.913)

(0.042, 0.950)

X8

(0.067, 0.912)

(0.092, 0.887)

(0.071, 0.906)

(0.072, 0.905)

X9

(0.071, 0.906)

(0.093, 0.888)

(0.062, 0.918)

(0.048, 0.941)

X 10

(0.064, 0.916)

(0.073, 0.902)

(0.057, 0.925)

(0.039, 0.952)

Table 26.9 IFPIS and IFNIS Criteria

α+

α−

Criteria

X1

0.0857

0.0415

X6

0.1091

0.0643

0.89053

0.94547

0.86137

0.91531

0.02372

0.01307

0.02958

0.0204

0.1263

0.1033

0.0816

0.0424

0.8317

0.8653

0.9002

0.9503

0.042

0.031

0.018

0.007

0.125

0.108

0.092

0.067

0.834

0.859

0.887

0.912

0.0415

0.0332

0.0212

0.0211

0.096

0.054

0.8765

0.9301

0.027

0.016

0.109

0.069

0.8616 0.029

X2

X3

X4

X5

X7

X8

X9

X 10

α+

α−

0.093

0.049

0.8889

0.9408

0.018

0.011

0.073

0.039

0.909

0.9024

0.9521

0.0216

0.0246

0.0086

Table 26.10 Separation measures of closeness coefficient     Alternatives θIFS αi , αi+ θIFS αi , αi−

i ∗

Rank

A1

0.22334

0.23263

0.51

2

A2

0.00355

0.44184

0.992

1

A3

0.23187

0.21879

0.485

3

A4

0.37167

0.07735

0.172

4

26 Intuitionistic Fuzzy Hybrid Multi-criteria Decision …

363

26.4 Result and Conclusion Selecting a university which provides best education and fulfils all criteria among many alternatives for specific student is challenging task; with hesitancy and vagueness present in it, it is a multi-criteria decision-making problem among several alternatives involving several academic and non-academic criteria best dealt with intuitionistic fuzzy sets, as otherwise for students it is difficult to decide precisely based on available information. The study is concerned with the criteria that influence the student’s university selection and to establish the multi-criteria model for ranking the universities based on important criteria affecting selection by students. In this paper, an intuitionistic fuzzy entropy-based MCDM is proposed with TOPSIS method to determine the vagueness and exactness of alternatives over the effective criteria and thus to rank the universities based on the academic and non-academic criteria which affect the student’s selection process. The IFS method includes the rating of available alternatives based on the criteria selected, and weight to each criterion is given in linguistic terms and has been characterized by IF numbers. In this further, opinions of decision-makers were aggregated using the IF operator. Based on Euclidean distance, IFPIS and IFNIS are obtained and thus closeness coefficients of alternatives are calculated with which alternatives were ranked. The proposed methodology differs from the established statistical approach for university selection problem. In this paper, a multi-criteria decision-making method is proposed for selecting a university, and comparing university alternatives based on several academic and non-academic criteria, the proposed method models the vagueness and hesitancy in selection of university using intuitionistic fuzzy hybrid MCDM with TOPSIS method where entropy method is used for weighting criteria, which is different from existing methods in the literature which are based on statistical approaches. It can help decision-makers analyse the alternatives and criteria precisely. The TOPSIS method is considered for solving this problem because of its simplicity, rationality, quality, comprehensibility, computational efficiency and its capability to estimate mathematical form of the alternative’s relative performance. Based on the analysis, the university A2 with very good opinion about criteria such as faculty/research facility, future career aspects and opportunities, tuition fee, living expense and location is ranked first, and A4 with below average opinion about future career aspects and opportunities, tuition fee, living expense, faculty/research facility and institute infrastructure is ranked fourth. The results indicate that the four criteria which majorly influence the student’s preference are tuition fee, living expense, future career aspects and opportunities and faculty/research facility; these are the main criteria that affect the university selection by students.

364

T. Parveen et al.

References 1. Conard MJ, Conard MA (2000) An analysis of academic reputation as perceived by consumers of higher education. J Mark High Educ 9(4):69–79 2. Pampaloni AM (2010) The influence of organizational image on college selection: what students seek in institutions of higher education. J Mark High Educ 20(1):19–48 3. Simoes C, Soares AM (2010) Applying to higher education: information sources and choice factors. Stud High Educ 35(4):371–389 4. Briggs S (2006) An exploratory study of the factors influencing undergraduate student choice: the case of higher education in Scotland. Stud High Educ 31(6):705–722 5. Briggs S, Wilson A (2007) Which university? A study of the influence of cost and information factors on Scottish undergraduate choice. J High Educ Policy Manag 29(1):57–72 6. Adams J, Evenland V (2007) Marketing online degree programs: how do traditional residential programs compare? J Mark High Educ 17(1):67–90 7. Soutar G, Turner J (2002) Students’ preferences for university: a conjoint analysis. Int J Educ Manag 16(1):40–45 8. Broekemier GM, Seshadri S (2000) Differences in college choice criteria between deciding students and their parents. J Mark High Educ 9(3):1–13 9. Poock MC, Love PG (2001) Factors influencing the program choice of doctoral students in higher education administration. NASPA J 38(2):203–223 10. Haq AN, Kannan G (2006) Fuzzy analytical hierarchy process for evaluating and selecting a vendor in a supply chain model. Int J Adv Manuf Technol 29:826–835 11. Sojkin B, Bartkowiak P, Skuza A (2011) Determinants of higher education choices and student satisfaction: the case of Poland. High Educ 63(5):565–581 12. Taylor JS, Brites R, Correia F, Farhangmehr M, Ferreira B, Machado ML (2008) Strategic enrolment management: improving student satisfaction and success in Portugal. High Educ Manag Policy 20(1):129–145 13. Warwick J, Mansfield PM (2004) Perceived risk in college selection: differences in evaluative criteria used by students and parents. J Mark High Educ 13(1):101–125 14. Hwang CL, Yoon K (1981) Multiple attribute decision making: methods and applications. Springer, New York 15. Holt GD (1998) Which contractor selection methodology? Int J Project Manage 16(3):153–164 16. Li CC, Fun YP, Hung JS (1997) A new measure for supplier performance evaluation. IIE Trans Oper Eng 29:753–758 17. Chen TY, Li CH (2010) Determining objective weights with intuitionistic fuzzy entropy measures: a comparative analysis. Inform Sci 180(21):4207–4222 18. Bayrak MY, Çelebi N, Taskin H (2007) A fuzzy approach method for supplier selection. Prod Plan Control Manag Oper 18(1):54–63 19. Atanassov KT (1986) Intuitionistic fuzzy sets. Fuzzy Sets Syst 20:87–96 20. Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423 21. Zadeh LA (1965) Fuzzy sets. Inform Control 8(3):338–356 22. Shu MS, Cheng CH, Chang JR (2006) Using intuitionistic fuzzy sets for fault tree analysis on printed circuit board assembly. Microelectron Reliab 46(12):2139–2148 23. Xu Z, Zhang X (2013) Hesitant fuzzy multi-attribute decision making based on TOPSIS with incomplete weight information. Knowl-Based Syst 52:53–64 24. Xu ZS (2007) Intuitionistic fuzzy aggregation operators. IEEE Trans Fuzzy Syst 15(6):1179– 1187 25. Zeleny M (1976) The attribute-dynamic attribute model. Manage Sci 23:12–26 26. Grzegorzewski P (2004) Distances between intuitionistic fuzzy sets and/or interval-valued fuzzy sets based on the Hausdoff metric. Fuzzy Sets Syst 149(2):319–328

Chapter 27

Dynamic Analysis of Prey–Predator Model with Harvesting Prey Under the Effect of Pollution and Disease in Prey Species Naina Arya, Sumit Kaur Bhatia, Sudipa Chauhan, and Puneet Sharma

27.1 Introduction The traditional Lotka–Volterra model is familiar to the researchers. The toxicants like sulphur oxide and oxides of carbon enter into the environment through various sources even though incessant use of natural resources. Likewise industrial complexes, burning of household waste and vehicles are being few of the factors to emit hazardous toxicants in the environment. Hence, it can be said that the terrestrial as well as aquatic environment is being constantly stressed by various kinds of emissions which are hazardous to the populations. Our motivation is to consider the impact of pollutants on prey–predator model and how it impacts on harvesting rate. In recent decades, there had been a developing interest in the study of infected prey. An attack on the infect prey by another parasite was considered by [1]. The same model along with exponential development in the populations was studied by [2] considering more complex models by involving multispecies interactions. Likewise, more models were examined by by some resarchers in [3], and infectious transmission issues were examined in [3, 4, 5]. A large number of the species in the common biological system are being collected after being mined. Normally, optimal harvesting policy is especially required for such model [6–8]. Biswas et al. N. Arya (B) · S. K. Bhatia · S. Chauhan Department of Mathematics, Amity Institute of Applied Sciences, Amity University, Noida, Uttar Pradesh, India e-mail: [email protected] S. K. Bhatia e-mail: [email protected] S. Chauhan e-mail: [email protected] P. Sharma Department of Mathematics, Indian Institute of Technology Jodhpur, Jodhpur, Rajasthan, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 P. K. Kapur et al. (eds.), Strategic System Assurance and Business Analytics, Asset Analytics, https://doi.org/10.1007/978-981-15-3647-2_27

365

366

N. Arya et al.

considered a model with powerless Allee effect and prey populace with harvesting, and along side, found that optimal harvesting policy is reliant on the Allee effect as well as the incubation delay. Chattopadhy et al. [7] researched a reaping display involving prey and predator, with contamination in the prey populace and reasoned that cycle oscillations have been cut off due to harvesting on vulnerable prey. Huang et al. [9] exhibited ‘a beast of prey-victim’ arrangement, composed with steady yield prey harvesting, which was acquired with different bifurcations. For alternate models with optimal harvesting, we allude to [9–13]. In our model, the predator catches the infected prey as well as gets the susceptible prey. Furthermore, the harvesting of environmental assets and populace is a typical conduct. From the viewpoint of the real world, capture cannot be overlooked in the ecosystem. Based on the above, we proposed a model with steady contribution of susceptible prey populace and its impact on harvesting prey populace. The paper has been sorted out as follows. Followed by an Introduction, in Sect. 27.2, we have established a model with infection in harvesting prey species, and the essential properties of model have been discussed. In Sect. 27.3, we focus on boundedness and dynamical behaviour of the model. In the following section, we examined the presence of all equilibria. In Sect. 27.5, by using the relating characteristic equation, we examined the local stability of the positive equilibria of model, and in the next section, we examined local and global stability of model at the infectionfree equilibrium and the positive equilibrium. In the next section, the optimal control issue has been established, by picking effort as the control variable. To help our hypothetical examination, some numerical simulations are given in Sect. 27.7. An overall conclusion has been given in Sect. 27.8.

27.2 Development of Model Here, a general prey–predator model has been reviewed, in which prey is exposed to an illness. Consider the populace density of the aggregate prey populace to be W (t) and the populace density of the predator populace to be P(t). Along these lines, the prey populace can be written as the sum of two subdivided sections, one of which is susceptible populace and another is contaminated populace, defined by M(t) and N(t) state factors, respectively. Predator populace P(t) is expected to feast upon both the contaminated and susceptible prey populace alongside diverse predation rates. Presently, the following presumptions are considered in order to develop our eco-epidemiological model: • The total prey can be seen as the sum of two subclasses: vulnerable prey (N) and susceptible prey (M). Hence, they signify the total mass of the prey as M (t) + N (t). • The illness cannot be transmitted vertically but spreads among the prey masses just by contact. Hence, the rate of the sickness in the prey is bilinear type of βMN.

27 Dynamic Analysis of Prey–Predator Model with Harvesting …

367

• The infective prey along with the susceptible prey is caught. Thus, the odds of predation of the susceptible prey are less in comparison than that of an infective prey. This implies that the demise rate of infectious prey holds mortality in view of illness as well as mortality due to natural causes. • The harvesting of the prey species is done at a linear rate. The above leads to the following model: dM = M(λ − M) − β M N − α1 M P − q1 E M − r1 U M dt

(27.1)

dN = β M N − α2 N P − d2 N − q2 E N − r2 U N dt

(27.2)

dP = eα1 M P + eα2 N P − d3 P dt

(27.3)

dC = Q − αC − δC(M + N ) dt

(27.4)

dU = δC(M + N ) − mU dt

(27.5)

The following initial conditions have been described for the above ordinary differential equations: M(0) = M0 > 0, N (0) = N0 > 0, P(0) = P0 > 0, U (0) = U0 > 0, C(0) = C0 > 0 where λ is the recruitment rate of vulnerable prey, β is the disease transmission coefficient, α1 and α2 are the catch rate of the susceptible prey and infected prey, respectively, d2 and d3 are the passing rate of the contaminated prey and the predator, respectively, the harvesting effort is E, the conversion coefficient of the predator from the prey is e, q1 and q2 are the catchability coefficient of susceptible and contaminated prey, respectively, r1 and r2 are the rates due to which the susceptible prey is diminishing because of toxins, C is the environmental concentration of toxins, α is the natural diminishing rate of the toxins which exist in the environment, U is the toxicant concentration in the population, δ is the uptake rate of toxin by life form, and m is the natural washout out rate of the toxin from the life form.

27.3 Boundedness and Dynamical Behaviour In this section, we will study the boundedness as well as the dynamical behaviour of the model. We have the lemma as follows: Lemma 1 The following region holds all the solutions of the system as t → ∞ 5 B1 = {(M, N , P, C, U ) ∈ R+ : 0 ≤ M + N + P ≤ A1 , 0 ≤ C ≤ A2 , 0 ≤ U ≤ A3 }, where Ai (i = 1, 2, 3) and a j ( j = 1, 2, 3) are given in the proof of the lemma.

368

N. Arya et al.

Proof Consider the following function W (t) = M(t) + N (t) + P(t)

(27.6)

Therefore,  W˙ (t) = Mλ − M 2 − β M N − α1 M P − q1 E M − r1 U M + β M N − α2 N P − d2 N −q2 E N − r2 U N + eα1 M P + eα2 N P − d3 P] Then,  W˙ (t) + μW (t) = Mλ − M 2 + (e − 1)(α1 M + α2 N )P + (μ − q1 E − r1 U )M + (μ − d2 −q2 E − r2 U )N + (μ − d3 )P] If we suppose μ = min(r1 + q1 E, d2 + q2 E, d3 ) and 0 < e < 1, then we get, W˙ (t) + μW (t) ≤ λ2 It means lim supW ≤

t→+∞

λ2 μ

(27.7)

Also, from Eqs. (27.4) and (27.5) we get: dC = Q − αC dt

(27.8)

dU = δ A1 A2 − mU dt

(27.9)

And

Then, by the usual comparison theorem, as t → ∞, we get: Q = A2 α

(27.10)

δ A1 A2 = A3 m

(27.11)

C(t) ≤ And U (t) ≤ This completes the proof.

27 Dynamic Analysis of Prey–Predator Model with Harvesting …

369

27.4 The Equilibrium Points The equilibrium points considered are as follows: E ∗ (M ∗ , N ∗ , P ∗ , C ∗ , U ∗ ), E 1 (M, N , P, 0, 0) and E o (0, 0, 0, C, 0), and local behaviour of equilibrium points E 1 (M, N , P, 0, 0) and E o (0, 0, 0, C, 0) is as follows: • E o (0, 0, 0, Qα , 0) is always feasible. d1 , α1 , −m,0,0 are the eigenvalues, correlated with the equilibrium point. Thus, E o is unstable. • Existence of E 1 (M, N , P, 0, 0), is obtained as follows: From Eqs. (27.1), (27.2) and (27.3), we get (λ − M) − β N − α1 P − q1 E = 0 and β M − α2 P− d2 − q2 E = 0, which implies P = β M−dα22−q2 E . Also, from Eq. (27.3), 1 M+d3 we get, which implies, N = −eαeα 2 Hence, substituting the values of P and N in (λ − M) − β N − α1 P − q1 E = 0, we would get M and hence, the values of N and P can be evaluated, which shows the feasibility of E 1 . Also, for obtaining the local stability of E 1 (M, N , P, 0, 0), we obtain the characteristic equation of the equilibrium point which is given as λ5 + r1 λ4 + r2 λ3 + r 3 λ2 + r 4 λ + r 5 = 0 where r1 = (α + δ(M + N ) + (m + M)) r2 = (α + δ(M + N ))(m + M) + (eα22 P N + eα12 M P + m M + β M N ) r3 = (α + δ(M + N ))(eα22 P N + eα12 M P + m M + β M N ) + (meα22 P N + meα12 M P + mβ M N + eα22 M N P) r4 = (α + δ(M + N ))(mβ M N + eα22 M N P + meα22 P N + meα12 M P) + (meα22 M N P) r5 = (α + δ(M + N ))(meα22 N P) Consequently, every one of the coefficients of characteristic equation is positive. Therefore, E 1 is locally asymptotically stable, by Routh–Hurwitz criteria, if the following conditions hold: (i) ri > 0, for i = 1, 2, 3, 4, 5 (ii) r1r2 r2 > r32 + r12 r4 and (iii) (r1r4 − r5 )(r1r2 r2 − r32 − r12 r4 ) > r5 (r1r2 − r3 )2 + r52 r1 . • Now, we shall exhibit the presence of interior point E ∗ . • The following system of equations will be used to find out the non-trivial positive equilibrium point: M ∗ (λ − M ∗ ) − β M ∗ N ∗ − α1 M ∗ P ∗ − q1 E M ∗ − r1 U M ∗ = 0

(27.12)

370

N. Arya et al.

β M ∗ N ∗ − α2 N ∗ P ∗ − d2 N ∗ − q2 E N ∗ − r2 U ∗ N ∗ = 0 eα1 M ∗ P ∗ + eα2 N ∗ P ∗ − d3 P ∗ = 0

(27.14)

Q − αC ∗ − δC ∗ (M ∗ + N ∗ ) = 0

(27.15)

δC ∗ (M ∗ + N ∗ ) − mU ∗ = 0

(27.16)

Thus, from (27.13), (27.14) and (27.15), we get, M ∗ = ∗

(27.13)



d3 eα1



α2 N ∗ , C∗ α1

=

P ∗ = βαM2 − αd22 − qα2 2E − r2αU2 Substituting these values in Eqs. (27.12) and (27.16) and further simplifying, we get: Q , α+δ(M ∗ +N ∗ )

F(N , U ) = K 1 N 2 + K 2 N + K 3 U + K 4 U N − K 5 = 0

(27.17)

where α2 α2 d3 β , K 2 = −λ + − d2 − q2 E α1 α1 eα1 α2 2d3 α2 α2 d3r2 r1 d3 + + q1 E , K 3 = − α1 eα2 eα1 eα12  α2 d3 d3 q2 E d3 q1 E K 4 = −r2 − r1 , K 5 = − λ + − α1 eα1 eα2 eα1  2  d3 d2 d3 (d3 )2 β + + − eα1 eα2 eα2 α1 K1 = −

Similarly, we get G(N , U ) = l1 N + l2 U + l3 U N − l4

(27.18)

where l1 = (−δ Qα2 e + δ Qα1 e), l2 = (−meα1 α − mδd3 ), l3 = (mδα2 e − mδα1 e), l4 = −(δ Qd3 ) From Eqs. (27.17) and (27.18), we get: √ 2 • F(N , 0) = K 1 N 2 + K 2 N − K 5 = 0, solving we get N = −K 2 ± 2KK 2 +4K 1 = 1 H11 (say) • F(0, U ) = K 3 U − K 5 = 0, ∴ solving we get, U = KK 5 = H12 (say) 3 • G(N , 0) = l1 N − l4 = 0, ∴ solving we get, N = ll4 = H21 (say) 1 • G(0, U ) = l2 U − l4 = 0, ∴ solving we get, U = ll4 = H22 (say). 2 Hence, two isoclines lie across each other in positive-phase space, if any one of the conditions given as follows holds:

27 Dynamic Analysis of Prey–Predator Model with Harvesting …

371

• H11 > H21 and H12 < H22 • H11 < H21 and H12 > H22 The presence of these two isoclines lying across each other in the positive-phase space guarantees the presence of the equilibrium point, and this point is said to be exceptional only if U  (N ∗ ) < 0; therefore, it implies that E ∗ exists in positive-phase plane.

27.5 Local and Global Stability of E ∗ In this section, the local stability and the global stability of E ∗ by Routh–Hurwitz criteria and by Lyapunov’s direct method, respectively, will be discussed. • The characteristic equation corresponding to the interior equilibrium point E ∗ is λ 5 + t1 λ 4 + t2 λ 3 + t3 λ 2 + t4 λ + t5 = 0 where t1 = α + δ(M ∗ + N ∗ ) + m + M ∗ t2 = (α + δ(M ∗ + N ∗ ))(m + M ∗ ) + e P ∗ (α12 M ∗ + α22 N ∗ ) + δC ∗ (r1 M ∗ + r1 N ∗ ) + M ∗ (m + β 2 ) + eα22 N ∗ P ∗ t3 = (α + δ(M ∗ + N ∗ ))(e P ∗ (α12 M ∗ + α22 N ∗ ) + M ∗ (m + β 2 ) + eα22 N ∗ P ∗ ) + αδC ∗ (r1 M ∗ + r2 N ∗ ) + me P ∗ (M ∗ α12 + N ∗ α22 ) + δC ∗ N ∗ M ∗ (r1 β + r2 β + δC ∗ ) + mβ 2 M ∗ N ∗ + eα22 M ∗ N ∗ P ∗ t4 = (α + δ(M ∗ + N ∗ ))[me P ∗ (M ∗ α12 + N ∗ α22 ) + δC ∗ N ∗ M ∗ (r1 β + r2 β + C ∗ ) + mβ 2 M ∗ N ∗ r2 − r1 ] β + [eδ P ∗ C ∗ N ∗ M ∗ (r1 α12 + r2 α22 ) − eα1 α2 δ P ∗ C ∗ N ∗ M ∗ (r1 + r2 )

+ eα22 M ∗ N ∗ P ∗ ] + δ(M ∗ + N ∗ )[δβC ∗ N ∗ M ∗ (r2 − + meα22 M ∗ N ∗ P ∗ ] t5 = eαδ P ∗ C ∗ N ∗ M ∗ (r1 α12 + r2 α22 )

− eαα1 α2 δ P ∗ C ∗ N ∗ M ∗ (r1 + r2 ) + meαα22 M ∗ N ∗ P ∗ ] + δ(M ∗ + N ∗ )(meα22 M ∗ N ∗ P ∗ ) Thus, by Routh–Hurwitz criteria, E * is locally asymptotically stable if the following conditions hold: (i) ti > 0, for i = 1, 2, 3, 4, 5 (ii) t1 t2 t2 > t32 + t12 t4 and (iii) (t1 t4 − t5 )(t1 t2 t2 − t32 − t12 t4 ) > t5 (t1 t2 − t3 )2 + t52 t1 .

372

N. Arya et al.

Theorem 2 E ∗ is globally asymptotically stable if β ∗ , α1∗ , α2∗ , δ ∗ , r1∗ , r2∗ , Q ∗ , m ∗ and m > m ∗ , where the values of β ∗ , α1∗ , α2∗ , δ ∗ , r1∗ , r2∗ , Q ∗ , m ∗ are given in the proof. Proof Consider the following positive definite function: 1 M 1 V (M, N , P, C, U ) = [ (M − M ∗ − M ∗ log ∗ )2 + (N − N ∗ )2 2 M 2 1 1 1 + (P − P ∗ )2 + (C − C ∗ )2 + (U − U ∗ )2 ] 2 2 2 Therefore, the following function represents the time derivative of the above: .

V (t) = z 1 M˙ + z 2 N˙ + z 3 P˙ + z 4 C˙ + z 5 U˙

(27.19)

where z 1 = (M − M ∗ ), z 2 = (N − N ∗ ), z 3 = (P − P ∗ ), z 4 = (C − C ∗ ), z 5 = (U − U ∗ ). Therefore, using Eqs. (27.1) to (27.5) and Eq. (27.19), we get: .

V (t) = [z 1 ((λ − M) − β N − α1 P − q1 E − r1 U ) + z 2 (β M N − α2 N P − d2 N − q2 E N − r2 U N ) + z 3 (eα1 M P + eα2 N P − d3 P) + z 4 (Q − αC − δC(M + N )) + z 5 (δC(M + N ) − mU )] From the above equation, we get: V˙ (t) = −{a11 z 12 + a12 z 1 z 2 + a13 z 1 z 3 + a15 z 1 z 5 + a22 z 22 + a23 z 2 z 3 + a25 z 2 z 5 + a33 z 32 + a44 z 42 + a14 z 1 z 4 + a24 z 2 z 4 + a55 z 52 + a45 z 4 z 5 } where a11 = 1, a12 = β − β N ∗ , a13 = α1 − e P ∗ α1 , a14 = δC ∗ , a15 = r1 − δC ∗ , a22 = −β M + α2 P + d2 + q2 E + r2 U, a23 = α2 (N ∗ − e P ∗ ), a24 = δC ∗ , a25 = r2 N ∗ − δC ∗ , a33 = −eα1 M − eα2 N + d3 , a44 = (α + δ(M + N )) a45 = −δ(M + N ), a55 = m Then, by Slyvester’s criteria, the following conditions hold: 2 < a11 a22 4a12

(27.20)

2 2a13 < a11 a33

(27.21)

27 Dynamic Analysis of Prey–Predator Model with Harvesting …

373

2 3a14 < a11 a44

(27.22)

2 3a15 < a11 a55

(27.23)

2 2a23 < a22 a33

(27.24)

2 3a24 < a22 a44

(27.25)

2 3a25 < a22 a55

(27.26)

2 9a45 < 4a44 a55

(27.27)

Then, v(t) ˙ is a negative definite function, and therefore, by Lyapunov’s direct method, E ∗ is globally asymptotically stable. We can now obtain bounds for β, α1 , α2 , δ, r1 , r2 , Q and m from above. Now, from Eq. (27.25), we get β 2 × 4 × (1 − N ∗ )2 − (α2 Pmax + r2 Umax + d2 + q2 E) = 0 Rearranging it, we get f 1 (β) = π11 β 2 − π13 < 0

(27.28)

where π11 = 4(1 − N ∗ )2 , π13 = (α2 Pmax +r2 Umax + d2 + q2 E). Then, there exists a positive root of f 1 (β) = 0 for some β = β ∗ . Then clearly, the inequality holds when β < β ∗ . Now, from Eq. (27.26), we get 2α12 (1 − e P ∗ )2 < (d3 ). Rearranging it, we get f 2 (α1 ) = π21 α12 − π23 < 0

(27.29)

where π21 = 2(1−e P ∗ )2 , π23 = (d3 ). Then, there exists a positive root of f 2 (α1 ) = 0 for some α1 = α1∗ . Then clearly, the inequality holds when α1 < α1∗ . Now, from Eq. (27.27), we get, 3δ 2 C ∗2 < (α + δ(Mmax + Nmax )) Rearranging it, we get f 3 (δ) = π31 δ 2 − π32 δ − π33 < 0

(27.30)

where π31 = 3C ∗2 , π32 = (Mmax + Nmax ), π33 = α. Then, by theory of equations, there exists a positive root of f 3 (δ) = 0 for some δ = δ ∗ . Then clearly, the inequality holds when δ < δ ∗ . Now, from Eq. (27.28), we get 3(r1 − δC ∗ )2 < m Rearranging it, we get that the inequality would hold when r1 < r1∗ , where

374

N. Arya et al.

 r1∗

=

m + δC ∗ 3

(27.31)

Now, from Eq. (27.29) we get 2α22 (N ∗ − e P ∗ )2 < (α2 Pmax + d2 + q2 E + r2 Umax )(d3 ) Rearranging it, we get f 4 (α2 ) = π41 α22 − π42 α2 − π43 < 0

(27.32)

where π41 = 2(N ∗ − e P ∗ )2 , π42 = d3 Pmax , π43 = (d3 d2 + d3 q2 E + d3r2 Umax ). Then, by theory of equations, there exists a positive root of f 4 (α2 ) = 0 for some α2 = α2∗ . Then clearly, the inequality holds when α2 < α2∗ . Now, from Eq. (27.30), we get 3δ 2 C ∗2 < (α2 Pmax + d2 + q2 E + r2 Umax )(α + δ(Mmax + Nmax )) Rearranging it, we get  (α2 Pmax + d2 + q2 E + r2 Umax )(α + δ(Mmax + Nmax )) α ∗ ∗ Q =( +M +N ) δ 3 (27.33) ∗

Now, from Eq. (27.31), we get 3(r2∗ N ∗ − δC ∗ )2 < (α2 Pmax + d2 + q2 E + r2 Umax )m Rearranging it, we get that the inequality would hold when r2 < r2∗ , where  r2∗

=

m(α2 Pmax +d2 +q2 E+r2 Umax ) 3 N∗

+ δC ∗

(27.34)

Now, from Eq. (27.32), we get 9δ 2 < 4(α + δ(Mmax + Nmax ))m ∗ Rearranging it, we get that the inequality would hold when m > m ∗ , where m∗ =

9δ 2 4α + 4δ(Mmax + Nmax )

(27.35)

Hence, E ∗ is globally asymptotically stable if β < β ∗ , α1 < α1∗ , α2 < α2∗ , δ < δ ∗ , r1 < r1∗ , r2 < r2∗ , Q < Q and m > m ∗ . Hence, this proves the theorem.

27 Dynamic Analysis of Prey–Predator Model with Harvesting …

375

27.6 Optimal Harvesting In this section, we will discuss the local stability. This subsection shows the optimal harvesting policy for the proposed model, that is, by taking into account the benefit earned by harvesting and concentrating on quadratic expense, we will explore the optimal harvesting policy. However, an analytical expression for optimal harvesting will be inferred, which is the primary explanation behind utilizing quadratic cost. The primary explanation behind utilizing quadratic cost is that it enables us to infer an analytical expression for the optimal harvesting. Alongside, it is expected that the cost function reduces with the increment in the prey populace. Hence, we have the following: t f J=

e−δt [( p1 − v1 q1 E M)q1 E M + ( p2 − v2 q2 E N )q2 E N − cE]dt

(27.41)

0

Equation (27.41), control constraints E max ≥ E ≥ 0 and subject to the system. Along these lines, our goal is to locate optimal control E ∗ such that J (E ∗ ) = max J (E). E∈U

While c is the harvesting cost per unit effort, v1 and v2 are the economic constants, p1 is the steady price per unit biomass of the susceptible prey populace, p2 is the steady price per unit biomass of the infected prey populace, the maximum effort capacity  of harvesting is denoted by E max , U is the control set defined as U = E : t0 , t f → [0, E max ]|E is Lebesgue measurable . and δ is the instant discount rate annually.

 Theorem 3 For t belongs to 0, t f , there exists E ∗ , such that J (M(t)., N (t)., P(t)., E ∗ ) = max J (M(t), N (t), P(t), E), subject to the E∈U

control system with the initial conditions. Proof Following conditions are needed to check in order to prove the existence of optimal control: • The control and state factors are positive variables. • The control set U is convex as well as it is closed. • The state and control factors bound the right half of the state system by linear function. • The objective function’s integrand is concave on U. • The integrand in condition (27.41) of the objective functional is fulfilled because of the existence of the constants. The result in Fleming and Rishel has been used in order to verify the abovementioned conditions. As can be noted that bounded are the solutions and by definition, E (t) ∈ U is also convex and closed, where E (t) is the set of all control factors. Therefore, the existence of the optimal control can be determined by compactness as the optimal system is bounded. In Eq. (27.28), the integrand given as

376

N. Arya et al.

e−δt [( p1 − v1 q1 E M)q1 E M + ( p2 − v2 q2 E N )q2 E N − cE] is convex on the control set and can be rewritten as e−δt (v1 q1 E M − p1 )q1 E M +(v2 q2 E N − p2 )q2 E N −cE]. Hence, it can be seen that there exist γ > 0 (constant) and positive numbers w1

γ /2 and w2 , such that J (E) ≥ w1 |E|2 + w2 . Hence, it proves that the optimal control exists.

27.7 Numerical Example and Graphical Simulation We consider the following hypothetical and biologically feasible parameters for the model:λ = 50, β = 0.01, α1 = 0.01, α2 = 0.02, α = 0.2, d2 = 0.1, d3 = 1, q1 = 0.037, q2 = 0.01, δ = 0.001, r1 = 0.001, r2 = 0.01, E = 0.5, Q = 10, e = 0.9, and m = 0.3. 5 : 0 ≤M + N + P, ≤ Then, B1 = {(M, N , P, C, U ) ∈ R+ 128205.15, 0 ≤ C ≤ 50, } is the region of attraction for the model, and E ∗ = (49.5271, 30.7940, 14.7361, 35.6741, 9.5505) is the corresponding interior equilibrium point. Here, β ∗ = 0.90134, α1∗ = 0.05766, α2∗ = 208.565, δ ∗ = 100.739, r1∗ = 0.3519, r2∗ = 0.55271, Q ∗ = 139355, m ∗ = 8.7681e − 9 Figures 27.1 and 27.2 under the above parametric values show the system approaches the interior equilibrium point starting with different initial conditions, thus showing the global asymptotical stability of the interior equilibrium points, with their corresponding mutual interior equilibrium point to be 180 M(t) N(t) P(t) C(t) U(t)

160 140 120 100 80 60 40 20 0

0

100

200

300

400

500

600

700

800

900

1000

TIME

Fig. 27.1 Trajectories of the system described in the model, with the parametric values λ = 50, β = 0.01, α1 = 0.01, α2 = 0.02, α = 0.2, d2 = 0.1, d3 = 1, q1 = 0.037, q2 = 0.01, δ = 0.001 r1 = 0.001, r2 = 0.01, E = 0.5, Q = 10, e = 0.9, and m = 0.3 and initial conditions as [1,1,1,1,1]

27 Dynamic Analysis of Prey–Predator Model with Harvesting …

377

80 M(t)

70

N(t) P(t) C(t)

60

U(t)

50 40 30 20 10 0 0

100

200

300

400

500

600

700

800

900

1000

TIME

Fig. 27.2 Trajectories of the system described in the model, with the parametric values λ = 50, β = 0.01, α1 = 0.01, α2 = 0.02, α = 0.2, d2 = 0.1, d3 = 1, q1 = 0.037, q2 = 0.01, δ = 0.001 r1 = 0.001, r2 = 0.01, E = 0.5, Q = 10, e = 0.9 and m = 0.3 and initial conditions as [10,10,10,10,10]

E ∗ = (49.5271, 30.7940, 14.7361, 35.6741, 9.5505). From Fig. 27.3, we can see that the system in the absence of pollution is unstable, whereas from Fig. 27.2, it can be easily seen that the presence of toxicants has stabilizing effect. At Q ≤ 41, as shown in Fig. 27.4, the system approaches asymptotically to the point E ∗ = (49.6379 30.6179 0.0001 146.2958 39.1362), which shows the existence of predator population, whereas taking 41 < Q < 61, as shown in Fig. 27.5, the system approaches asymptotically to the point E ∗ = 80

M(t) N(t) P(t)

70 60 50 40 30 20 10 0

0

100

200

300

400

500

600

700

800

900

1000

TIME

Fig. 27.3 Trajectories of the system described in the model, with the parametric values λ = 50, β = 0.01, α1 = 0.01, α2 = 0.02, d2 = 0.1, d3 = 1, q1 = 0.037, q2 = 0.01, r1 = 0.001, r2 = 0.01, E = 0.5, Q = 10, e = 0.9.

378

N. Arya et al.

90

M(t) N(t) P(t) C(t) U(t)

80 70 60 50 40 30 20 10 0

0

100

200

300

400

500

600

700

800

900

1000

TIME

Fig. 27.4 Trajectories of the system described in the model, with the parametric values λ = 50, β = 0.01, α1 = 0.01, α2 = 0.02, α = 0.2, d2 = 0.1, d3 = 1, q1 = 0.037, E = 0.5, Q = 41, e = 0.9, q2 = 0.01, δ = 0.001, r1 = 0.001, r2 = 0.01. and m = 0.3 160 M(t)

140

N(t) P(t) C(t)

120

U(t)

100 80 60 40 20 0

0

100

200

300

400

500

600

700

800

900

1000

TIME

Fig. 27.5 Trajectories of the system described in the model, with the parametric values λ = 50, β = 0.01, α1 = 0.01, α2 = 0.02, α = 0.2, d2 = 0.1, d3 = 1, q1 = 0.037, q2 = 0.01, δ = 0.001, r1 = 0.001, r2 = 0.01, E = 0.5, Q = 41, e = 0.9 and m = 0.3

(49.6880 28.0115 0.0000 151.2566 39.1622) which shows the increase in the pollution leads to the extinction of the predator populaces. Now, considering Q = 61, as shown in Fig. 27.6, the system approaches asymptotically to E ∗ = (49.9656 0.0000 0.0000 244.0578 40.6281), showing that the further increase in pollution leads to the extinction of infected prey populace. Also, further increasing the pollution up to certain level would lead to extinction of both prey and predator populace together. Thus, it can be concluded that the pollution not only has the stabilizing effect on the system, but it leads to increase in prey population, thus showing

27 Dynamic Analysis of Prey–Predator Model with Harvesting …

379

250 M(t) N(t) P(t)

200

C(t) U(t)

150 100 50 0 0

100

200

300

400

500

600

700

800

900

1000

TIME

Fig. 27.6 Trajectories of the system described in the model, with the parametric values λ = 50, β = 0.01, α1 = 0.01, α2 = 0.02, α = 0.2, d2 = 0.1, d3 = 1, q1 = 0.037, q2 = 0.01, δ = 0.001 r1 = 0.001, r2 = 0.01, E = 0.5, Q = 61, e = 0.9 and m = 0.3

that the increase in the pollution up to certain level will lead to survival of healthy prey population.

27.8 Conclusion A predator–prey model has been proposed in order to review and analyse the effect of toxicants in the system with harvesting prey, subject to some disease. It has been determined that the model is bounded. The equilibrium points are locally as well as globally stable, which is dependent upon system parameters. The existence of optimality has been shown using the Pontryagin’s maximum principle. Numerical verification of analytic results has been done with the help of suitable examples, and necessary graphs have been plotted using MATLAB software. With the help of numerical examples, it was observed that because of extreme effects of toxicants, the equilibria level of model decreases. Also, it could be seen that the increase in pollution leads to survival of healthy prey populace.

References 1. Anderson RM, May RM (1986) The invasion, persistence and spread of infectious diseases within animal and plant communities. Philos. Trans. R. Soc. Lond. Ser. B 314(1167):553–570 2. Dhar J (2004) A prey-predator model with diffusion and supplementary resource for the prey in a two patch environment. J Math Modl And Analy 9(L):9–24

380

N. Arya et al.

3. Hsu SB, Ruan SG, Yang TH (2015) Analysis of the three species Lotka-Volterra food web models with omnivory. J. Math. Anal. Appl. 426(2):659–687 4. Hethcote HW (1989) Three basic epidemiological models. In: Levin SA, Hallom TG, Gross LJ (eds) Applied Mathematical ecology, biomath, vol 18, pp 119–144 5. Mena-Lorca J, Hethcote HW (1992) Dynamic models of infective diseases as regulators of population sizp. J Math Biol 30:693–716 6. Chakraborty K, Das S, Kar TK (2011) Optimal control of effort to fast age structured prey– predator fishery model with harvesting. Nonlinear Anal 12(6):3452–3467 7. Chattopadhy J, Sarkar R, Ghosal G (2002) Removal of infected prey prevent limit cycle oscillations in an infected prey–predator system–a mathematical study. Ecol Model 156(2–3):113–121 (JOURNAL OF BIOLOGICAL DYNAMICS 373) 8. Kumar D, Chakrabarty SP (2015) A comparative study of bioeconomic ratio-dependent predator–prey model with and without additional food to predators. Non-linear Dyn 80(1–2):23–38 9. Huang JC, Gong YJ, Chen J (2013) Multiple bifurcations in a predator–prey system of Holling and Leslie type with constant-yield prey. Int J Bifur Chaos Appl Sci Eng 23(10):1350164.24 10. Huang JC, Gong YJ, Ruan SG (2014) Bifurcation analysis in a predator–prey model with constant-yield predator harvesting. Discrete Contin Dyn Syst Ser B 18(8):2101–2121 11. Huang JC, Liu SH, Ruan SG, Zhang XA (2016) Bogdanov-Taken bifurcation of codimension 3 in a predator–prey model with constant-yield predator harvesting. Comm Pure Appl Anal 15(3):1041–1055 12. Xiao DM, Jennings LS (2005) Bifurcations of aratio-dependent predator–prey with constant rate harvesting. SIAMJ. Appl. Math. 65(3):737–753 13. Xiao DM, Ruan SG (1999) Bogdanov-taken bifurcations in predator–prey systems with constant rate harvesting. Fields Inst Commun 21:493–506

Chapter 28

Univariate and Multivariate Process Capability Indices—Measures of Process Performance—A Case Study Vivek Tyagi and Lalit Kumar

28.1 Introduction Process capability analysis is a part of statistical process control, SPC consists of statistical techniques refined from the normal curve and various charts with good analytical judgment to demystify and analyze the production process. In production operation, there is the variability which is inherited in the product during the various stages of manufacturing. Quantifying the variability with objectives and merits of mitigating the defects in the manufacturing process is the prime goal of the quality management. Process capability intends to the assessment of how precise a process conforms to the process limits or the competence of the process to manufacture parts that conform to quality control limits. A genuine state of optimum statistical control of the process can be sustained or the steadiness of the process can be ensured with the help of capability indices. The manufacturing process should be under statistical control before assessing the capability indices, i.e., only chance causes must affect the process and must assure also that the process data and observations are independent and identically distributed.

28.2 Literature Review In a complex process of multistage production, where variables are more to look upon, it becomes complex to comprehend and calculate. Process capability can be analyzed V. Tyagi Department of Statistics, NAS College, CCS University, Meerut, India e-mail: [email protected] L. Kumar (B) Department of Statistics, Meerut College, CCS University, Meerut, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 P. K. Kapur et al. (eds.), Strategic System Assurance and Business Analytics, Asset Analytics, https://doi.org/10.1007/978-981-15-3647-2_28

381

382

V. Tyagi and L. Kumar

using univariate capability indices. But, sometimes, the variables are correlated; alternative concept of multivariate capability indices can be utilized to analyze the process variables. To analyze the correlated process variables, multivariate process capability indices can be implemented. It has been suggested to divide MPCIs into four distinct groups [4]. MPCIs groups are as follows: (a) (b) (c) (d)

Based on the ratio of a tolerance region with respect to process region Based on non-conforming product probability Based on the principal component analysis (PCA) Others.

28.2.1 Univariate Capability Indices The process capability index C p which determines whether a process is capable or not by calculating a unit-less value was introduced [2]. The index is expressed by: Cp = (USL − LSL)/6σ

(1)

Here, in C p , p stands for process. σ depicts standard deviation of the process. USL and LSL represent upper specification limit and lower specification limit, respectively. For complying with the specifications, the process must have values of C p more than 1.33. The process, having C p value below 1.33, requires close monitoring to be within the controlling regime. But as and when the C p value drowns below 1, the process is unable to meet with specifications. When C p is equal to 1 and the process is lying centered within the control limits and is following to the normal distribution, it leads to a fraction non-conformance of 0.27%. It is also known as process potential. The limitation of C p is not measuring the process performance in terms of the target value. To consider the process location, C pk was developed [2]. The magnitude of process decentralization can be quantified by measuring C pk relative to C p , assuming process output “normal.” However, this index does not consider whether the process location (μ) diverges from the target (T ) or not. For processes only with the LSL, the C pk is given by:  Cpk = min

USL − μ μ − LSL , 3σ 3σ

Cpl = And for processes with only USL,

μ − LSL 3σ

 (2) (3)

28 Univariate and Multivariate Process Capability Indices …

Cpu =

383

USL − μ 3σ

(4)

Hence,   Cp = Cpl + Cpu /2

(5)

When the process is absolutely centered within the controlling limits, then C p = C pk . Since C p and C pk indices do not consider the contrast between the process mean and its target value. Ideally, a C pk nearer to 2 indicates that a process is good enough in capability aspect. If the value of C pk is more than 1.33, it can be the inference that a process is capable, and if it is less than 1.33, it signifies that either there is more variation as compared to control limits or that the location of the variation is not in the center of the conforming limits. It may be a combination of both width and location. C pm index was flourished which counters the complication of difference between the process mean and its target value under the assumption that target is in the middle of the specification limits [1]. Cpm = 

Cp   2 1 + μ−T σ

(6)

Here, T represents the target value of a quality characteristic. C pmk was developed which is a combination of C pk and C pm [3]. Cpmk = min

USL − μ

μ − LSL

,

3 σ 2 + (μ − T )2 3 σ 2 + (μ − T )2

(7)

C pmk decreases or increases more rapidly than other indices (C p , C pk , C pm ) when μ gets departed from T or approaches to T.

28.2.2 Multivariate Capability Indices A multivariate capability index, MC pm, was suggested [5]. It is defined as the ratio of two volumes, modified tolerance region (R1 ) and scaled 99.73% process region (R2 ). Both regions are elliptical. Their shapes are of ellipses when the regions are in two dimensions, and they are ellipsoids, in more than two dimensions. The modified tolerance region, R1 , is the largest ellipsoid and entirely lies within the undeniable tolerance region with its major axes parallel to the sides of the rectangular tolerance region. It is expressed as MCpm =

Cp 1 Vol(R1 ) = × Vol (R2 ) D D

(8)

384

V. Tyagi and L. Kumar

where Cp =

Vol(Modified Tolerance Region) Vol(R1 )    = Vol(R2 ) π · χ 2 v, 0.9973 v/2|S|1/2  v2 + 1 − 1

(9)

where ‘v’ is the number of quality characteristics. ‘S’ represents variance–covariance matrix. ‘D’ is the correcting factor when process mean is diverged from the target value and is calculated as  n D = 1+ (10) (−T ) S − 1(X − T ) n+1 The multivariate capability vector (C p M, PV, LI) was developed [6] and established on the genuine work of Hubele, Shahriari and Cheng (1991). The vector consists of three components. C p M, first component of the vector, is defined as the ratio of volumes of “engineering tolerance region” and “modified process region,” stipulated as the smallest region alike in shape of the engineering tolerance region, confined about a specified probability contour. It is defined as,  Vol. of engineering tolerance region 1/v Vol. of modified process region   v (USLi − LSLi ) Cp M = i=1 1/v v i=1 (UPLi − LPLi ) 

Cp M =

(11)

where USLi is the upper specification limit for the ith quality characteristic, LSLi is the lower specification limit for the ith quality characteristic, v is the number of quality characteristics. 

  χ 2 (v, α)det i−1 UPLi = μi + det(−1 )    χ 2 (v, α) det i−1   LPLi = μi − det  −1

(12)

(13)

  where  −1 is the inverse of sample variance–covariance matrix. det i−1 is the determinant of a matrix derived from  −1 , omitting ith row and column. The second component PV capability vector measures the closeness of the process mean to the target. The third component of the capability vector LI indicates whether modified process area lies outside the engineering tolerance region or not. LI equals to 1 depicts that the modified process area lands entirely within the engineering tolerance region, else the value of LI is equal to zero.

28 Univariate and Multivariate Process Capability Indices …

385

Another multivariate capability index [7] was expressed as NMCpm =

|A∗ |  1/2  

(14)

where the components of A* are stated by  Ai∗j

= ρi j

USLi − LSLi

2 χ 2 v, 0.9973



USL j − LSL j

2 χ 2 v, 0.9973



where ρ ij represents correlation coefficient between ith and jth univariate attribute. The index NMC pm in (14) is estimated by 

NMCpm =



 |A∗ | 1/2 |S|

where ‘S’ denotes the sample variance–covariance matrix.

28.3 Paswara Papers Ltd., Meerut: A Case 28.3.1 Collection and Storage The unused or scrap papers from home, schools or office are being collected through various vendors and being stored into warehouses until needed. Clean paper can be recovered with proper recycling. There should not be contaminants like plastic, wax, metal and other waste material, which makes recycling an arduous process. Finally, forklifts carry the paper from warehouse to big conveyors.

28.3.2 Pulping and Screening The conveyor, then, forwards the paper to a big vat called pulper, which accommodates water and chemicals. The recovered paper is being cut by the pulper into minor pieces. The heating process transforms the paper more rapidly into tiny strands of cellulose (organic plant material) called fibers. Eventually, the processed paper converts to a mushy mixture called pulp. The pulp is then processed through screens consisting holes and slots of different shapes and sizes. The screens terminate trivial contaminants such as bits of plastic and globs of glue. This process is called screening. At this stage, the consistency of the pulp remains 4.5–5 gm/100 ml, and Schopper-Riegler degree (°SR) ranges from 23 to 26.

386

V. Tyagi and L. Kumar

28.3.3 Cleaning Pulp is also being cleaned by spinning it around in large cone-shaped cylinders. Ponderous contaminants like staples are propelled out of the cone and fallen through the bottom of the cylinder, whereas the lighter contaminants get stored in the center of the cone and are separated. This process is called cleaning.

28.3.4 Deinking At some occasions, the pulp undergoes a “pulp laundering” operation called deinking (de-inking) to eliminate printing ink and “stickies” (sticky materials like glue residue and adhesives). Tiny splinters of ink are drenched from the pulp with water in a process called washing. Heavy particles and stickies are separated with air bubbles in another process called floatation. Pulp is being sent to a large floatation cell (vat) where air and soap-like chemicals, called surfactants, are administered into the pulp. The surfactants convict ink and stickies to loosen from the pulp and stick to the air bubbles as they float to the top of the mixture. The inky air bubbles create foam or froth which is removed from the top, leaving the clean pulp behind.

28.3.5 Refining During refining, the pulp is battered to make the recycled fibers expand, making them perfect for papermaking. If the pulp consists of any heavy bundles of fibers, refining segregates them into individual fibers. If color is there in the recovered paper, color stripping chemicals detach the dyes from the paper. At this stage, the consistency of the pulp remains 3.5–4 gm/100 ml, and Schopper-Riegler degree (°SR) ranges from 27 to 30. Now, the pulp gets muddled with water and chemicals to make it 99.5% water. This watery pulp mixture infiltrates the head-box, a giant metal box at the beginning of the paper machine, and then is scattered in a continuous wide jet (with the help of molds) onto a huge flat wire screen which is moving very rapidly through the paper machine. On the screen, water begins draining from the pulp, and the recycled fibers swiftly start to bond to construct a watery sheet.

28 Univariate and Multivariate Process Capability Indices …

387

28.3.6 Pressing Successively, pulpy (watery) sheet is pressed through a series of felt-covered press rollers which squeeze out excess water since at this stage, it contains 72% of moisture. After three successive pressings, watery sheet remains with 49% of moisture.

28.3.7 Drying (Steam) The sheet, which now resembles paper, proceeds through a series of heated metal rollers (steam inside) which wilt the paper. At this stage, sheet remains with 11% of moisture. Now, the sheet is mixed with starch powder and alum to maintain the pH level and to provide the surface size. At this stage, it consists 32% of moisture. Thereafter, it again passes through the steam dryers which dry the sheet up to with 6.5–7.5% of moisture.

28.3.8 Paper Reel (Mother Roll) At final stage, the finished paper is grazed into a giant roll and separated from the paper machine. The roll of paper is then cut into smaller rolls or sometimes into sheets before being transferred to a converting plant where it will be printed or made into products such as envelopes, paper bags, or boxes (Fig. 28.1).

Pulping

Screening

Refining

Size Press

Drying (Steam)

Pressing

Drying (Steam)

Paper Reel

Fig. 28.1 Production stages of kraft paper at Paswara Papers Ltd

388

V. Tyagi and L. Kumar

28.4 Process Variables Under Study 28.4.1 GSM It refers to a measurement of paper density in grams per square meter. The higher the GSM number, the heavier the paper.

28.4.2 BF Bursting factor of a material can be calculated by determining the bursting strength of the papers and related materials. The bursting strength of a paper material explains the amount of pressure that a material can tolerate before it ruptures.

28.4.3 Cobb It is referred as the ability of a hygroscopic material to repel the perforation of water. Specifically, this test quantifies the water that can be soaked up by the surface of paper or board in a given time. Water absorbency is quoted in g/m2 .

28.4.4 RCT (Cd) The ring crush test (RCT) is applied to ascertain the ring crush resistance of a paper strip formed into a ring.

28.4.5 Bulk Bulk depicts the specific volume of a material. It is the inverse of density. Bulk = 1/density (cm2 /g). In paper production, bulk is a more commonly used measure than density to determine the “compactness” of paper.

28 Univariate and Multivariate Process Capability Indices …

389

28.4.6 Caliper Thickness This method employs a measuring tool called a micrometer to determine the thickness of paper in points (or pts). One point equals 1/1000 of an inch (or 0.001 ). Caliper measurement is most generally used to represent card stock thickness.

28.4.7 Ply Bond Ply bond testing is used to determine the internal strength of paper and board material.

28.4.8 Paper Ash The amount of filler is considered as paper ash. Paper consists of organic cellulose fibers combined with inorganic fillers such as clay, titanium dioxide or calcium carbonate, which are added during papermaking process to increase paper properties such as brightness, whiteness or opacity.

28.4.9 Moisture The moisture content of paper and paperboard is the quantity of water present and measurable in paper.

28.5 Objective The objective of this study is to measure the capability indices (univariate and multivariate) for process variables in multistage paper manufacturing system.

28.6 Research Methodology Observed data (500 samples) of three months has been collected from the shop floor of Paswara Papers Ltd, Meerut. Univariate and multivariate capability indices have been computed to analyze the process performance. For analysis and computation purposes, STATISTICA 10 and R 3.5.2 software have been used.

390

V. Tyagi and L. Kumar

Table 28.1 Descriptive statistics of the process variables Variables

N

Mean

Minimum

Maximum

St. dev.

GSM

500

143.78

132.64

155.18

4.88

BF

500

19.00

18.19

19.70

0.39

Cobb

500

42.19

36.13

50.46

2.98

RCT (CD)

500

1.06

0.63

1.31

0.14

Bulk

500

1.45

1.36

1.57

0.04

Caliper thickness

500

0.21

0.20

0.22

0.01

Ply bond

500

226.34

212.19

240.61

6.14

Paper ash

500

7.25

6.89

7.73

0.20

Moisture

500

6.73

6.24

7.11

0.16

28.7 Results and Discussions 28.7.1 Descriptive Statistics The mean values for the variables have been computed along with range and standard deviation. GSM, Cobb and ply bond show the high standard deviation as 4.88, 2.98 and 6.14, respectively, as compared to other variables. Table 28.1 depicts the same.

28.7.2 Capability Indices Table 28.2 shows the computation of various univariate capability indices, viz C p , C pk , C pm and C pmk . Table 28.3 depicts the measures of studied multivariate capability indices of the process variables of multistage production processes of paper manufacturing. It shows that C p , proposed by Kane [2] considers the actual process spread except location (mean). When C p ≥ 1, it is considered that 99.73% process region is within the tolerance limit and process is capable, though it gives wrong indication about process performance. C pk , proposed by Kane [2] overcomes the limitation of C p and takes into account process mean location. However, this index does not consider the divergence of process location, μ, from the target value, T. Chen et al. [1] proposed C pm index which assumed that target value is in the middle of the specification interval. When average (μ) diverges from target value, T, value of C pm decreases. From the results, it is observed that μ is close to T for “bulk” (C pm = 2.58) and farthest from T for “Cobb” (C pm = 0.56). Pearn et al. [3] developed C pmk , combination of C pk and C pm . C pmk responds more rapidly that other univariate indices, viz C p , C pk , C pm, and decreases more rapidly when μ departs from T. It reflects more accuracy in the actual process performance. Taam and Liddy [5] suggested multivariate process capability index, MC pm , which is estimated by the ratio (C p /D), where C p > 1

28 Univariate and Multivariate Process Capability Indices …

391

Table 28.2 Univariate capability indices of the variables Variables

LSL

Nominal

USL

Kane [2] Cp

Kane [2] C pk

Chan et al. [1] C pm

Pearn et al. [3] C pmk

GSM

130.00

143.75

150.00

1.09

0.68

0.68

0.42

BF

18.00

19.00

20.00

1.09

1.08

0.85

0.85

Cobb

35.00

42.16

45.00

0.65

0.37

0.56

0.31

RCT (CD)

0.90

1.06

1.50

0.89

0.48

0.71

0.38

Bulk

1.20

1.45

1.80

3.93

3.28

2.58

2.16

Caliper thickness

0.19

0.21

0.22

1.15

0.83

0.98

0.71

Ply bond

200.00

226.30

250.00

1.64

1.55

1.36

1.28

Paper ash

7.00

7.25

9.00

1.98

0.50

1.67

0.42

Moisture

6.50

6.73

7.50

1.16

0.54

1.06

0.49

Table 28.3 Multivariate capability indices of the variables

Taam and Liddy [5]

MC pm

0.1408

Shahriari and Lawrence [6]

C pm

0.6197

Pan and Lee [7]

NMC pm

0.0130

signifies that the process has a less variation with respect to the specification limit. The larger value of 1/D implies that the mean is close to the target and it must satisfy the criteria, 0 < (1/D) < 1. The calculated value of MC pm is 0.1408. The values of C pm and NMC pm , proposed by Shahriari and Lawrence [6] and Pan and Lee [7], are 0.6197 and 0.0130, respectively.

28.8 Conclusion Process capability analysis is integrated and significant in the enforcement of Six Sigma technique. It provides information that act as a base for the process correction and improvement. The paper manufacturing process is best represented by Shahriari and Lawrence [6]. The study of the multivariate capability index, C pm, can be done on more batches of paper production with variation in process parameters and chemical composition of raw materials. The batch with the highest value of C pm depicts that parameters of the process, used in that batch, are appropriate to optimize the production process. To improve the quality of the end product, the quality of the input raw materials, process parameters of manufacturing and heat treatment processes have to be improved.

392

V. Tyagi and L. Kumar

The data under study has been converted to normal by applying inverse DF transformation function. Capability indices should be measured with non-normal behavior of data because of inherits complexity of multistage production process. This research can be standardized by analyzing other or new multivariate capability index in the studied manufacturing process.

References 1. Chan LK, Cheng SW, Spring FA (1988) A new measure of process capability: Cpk. J Qual Technol 20:162–175 2. Kane VE (1986) Process capability indices. J Qual Technol 18:11 3. Pearn WL, Kotz S, Johnson NL (1992) Distributional and inferential properties of process capability indices. J QualTechnol 24:216–233 4. Khadse KG, Shinde RL (2006) Multivariate process capability using relative importance of quality characteristics. Indian Assoc Prod Qual Reliab (IAPQR) Trans 31(2):85–97 5. Taam SP, Liddy JW (1993) A note on multivariate capability indices. J Appl Stat 20(3):12 6. Shahriari H, Lawrence FP (1995) A multivariate process capability vector. In: 4th industrial engineering research conference, pp 304–309 7. Pan JN, Lee CP (2010) New capability indices for evaluating the performance of multivariate manufacturing processes. Qual Reliab Eng Int 26(1):3–15

Chapter 29

Improving Customer Satisfaction Through Reduction in Post Release Defect Density (PRDD) Hemanta Chandra Bhatt and Amit Thakur

29.1 Introduction Software is usually developed through time-bound projects. Execution and management of software projects within the committed time and budget are the bedrock of any successful software organization. Management of software projects generally involves a series of planning and tracking activities. Tracking is important to keep the project on its planned path and is performed using metrics such as Schedule Variance (SV), Effort Variance (EV) and Post Release Defect Density (PRDD) [1]. PRDD is a key metrics in software projects which reflects the quality of the software delivered [2]. PRDD is computed as the number of defects reported by customer divided by the size of the software. It impacts the customer satisfaction which helps in the sustenance of the business [3]. Customer satisfaction is considered as an important objective of quality management and has a direct, positive impact on organizational cost, profit, and sales growth [4]. This paper suggests a mechanism based on Capability Maturity Model Integration (CMMI) for quantitative monitoring and tracking of a parameter of interest in a software project [5]. It then illustrates the suggested mechanism through a case study on PRDD. The scope of this study is limited to enhancement and maintenance type of software projects as they constitute majority of software projects in our organization. In addition, such projects are executed through frequent releases and thus provide sufficient data for analysis. This paper explains that improvements and changes in planning of certain engineering activities in the projects, selected based on statistical tools such as process

H. C. Bhatt · A. Thakur (B) Hughes Systique Private Ltd., Gurugram, India e-mail: [email protected] H. C. Bhatt e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 P. K. Kapur et al. (eds.), Strategic System Assurance and Business Analytics, Asset Analytics, https://doi.org/10.1007/978-981-15-3647-2_29

393

394

H. C. Bhatt and A. Thakur

capability analysis, scenario analysis, and sensitivity analysis would result in reduction in PRDD. It also suggests that tracking of the above activities using statistical tools such as control chart and cause and effect diagram would ensure the sustenance of improvements in PRDD.

29.1.1 Mechanism for Quantitative Monitoring and Tracking The mechanism explained in this paper employs statistical tools for quantitative monitoring and tracking of the parameter of interest in a software project. The statistical tools used as part of this mechanism include statistical models [6, 7], control charts, capability analysis, model-based sensitive analysis [8], and scenario analysis. Releases delivered in last six months from the set of enhancement and maintenance type of software projects executed in the organization were considered for this study. Data on PRDD and sub-process factors like Creator Quality (CQ), Percentage Code review effort (%CRE) was used for the purpose of analysis. 1. PRDD—It is computed as the number of defects reported by customer divided by the size of the software. 2. Creator Quality (CQ)—It indicates the competency skill of software project team members and is computed based on rating on a scale of 1–4 provided by project manager to each team member. 3. Percentage Code review effort (%CRE)—It is computed as the total efforts spent in the code review activity divided by the total efforts in coding phase. Usage of statistical model [9, 10] is a key pre-requisite for the use of the mechanism explained in this paper. Statistical models are developed [11–13] using historical data available in the organization and are helpful in prediction of the parameter of interest based on certain sub-process factors that influence it. The statistical model [14] used in this case study was developed using the following steps. 1. PRDD was selected as the parameter of interest because of its criticality for software projects. 2. Sub-process factors influencing the PRDD were identified through brainstorming with the subject matter experts. 3. Multiple linear regression analysis [15 16] was used to build the statistical model. 4. The model was validated through the following criteria and subsequently put into use as part of the mechanism explained in this paper. (a) Individual p-value for each sub-process factor should be 1

95% CI for σ(Pre - %CRE) / σ(Post - %CRE) F-Test P-Value 0.001

0

2

4

6

8

10

95% Chi-square CIs for StDevs Pre - %CRE Post - %CRE 0.00

0.01

0.02

0.03

0.04

Boxplot of Pre - %CRE, Post - %CRE Pre - %CRE Post - %CRE 0.00%

2.00%

4.00%

6.00%

8.00%

10.00%

12.00%

Fig. 29.2 Plot from variance test shows significant improvement in variance of %CRE at project level Boxplot Graph

12.00% 10.00%

Data

8.00% 6.00% 4.00% 2.00% 0.00% Pre - %CRE

Post - %CRE

Fig. 29.3 Box plot from 2-sample-t-test shows significant improvement in Mean of %CRE at project level

29 Improving Customer Satisfaction Through Reduction …

399

Test and CI for Two Variances: Pre-PRDD, Post-PRDD Ratio = 1 vs Ratio ≠ 1

95% CI for σ(Pre-PRDD) / σ(Post-PRDD) F-Test P-Value

0

5

10

15

20

25

0.000

30

95% Chi-square CIs for StDevs Pre-PRDD Post-PRDD 0.00

0.25

0.50

0.75

1.00

1.25

1.50

Boxplot of Pre-PRDD, Post-PRDD Pre-PRDD Post-PRDD 0.0

0.5

1.0

1.5

2.0

Fig. 29.4 Plot from variance test shows significant improvement in variance of PRDD at project level

Boxplot Graph 2.0

Data

1.5

1.0

0.5

0.0 Pre-PRDD

Post-PRDD

Fig. 29.5 Box plot from 2-sample-t-test shows significant improvement in mean of PRDD at project level

29.1.3.7

Sustain the Improvements

Process capability in the postphase was measured, and it was found that control limits were well within the customer defined specifications limits which indicated

400

H. C. Bhatt and A. Thakur

Fig. 29.6 Individual chart showed the statistical monitoring and tracking of PRDD and %CRE at project level

that process had become capable. PRDD and critical sub-process factor, and %CRE were tracked through effective usage of individual control chart (Fig. 29.6). Above results prove that tracking of certain engineering activities (%CRE in this case) using statistical tools such as control chart and cause and effect diagram has ensured the sustenance of improvements in PRDD.

29.1.4 Improvement at Organization Level in Stage-2 The following steps illustrate the replication of action items to organization level.

29.1.4.1

Extend and Replicate the Improvements

In this stage, action items identified in stage-1 were extended and replicated across the organization in similar type of projects.

29.1.4.2

Analyze and Validate the Performance

Improvements in PRDD at organization were validated by 2-variance test (for variance) and 2-sample-t-test (for mean). Results from test of variance showed that there was statistically significant improvement in variance of PRDD (Fig. 29.7). Results from test of mean showed that there was statistically significant improvement in mean of PRDD (Fig. 29.8). Individual control chart of PRDD in prephase and postphase were drawn to determine the overall impact on PRDD at organization level (Fig. 29.9).

29 Improving Customer Satisfaction Through Reduction …

401

Fig. 29.7 Plot from variance test shows significant improvement in variance of PRDD at organization level

Fig. 29.8 Box plot from 2-sample-t-test shows significant improvement in mean of PRDD at organization level

402

H. C. Bhatt and A. Thakur I Chart of PRDD by Phase

Individual Value

2.0

Pre-PRDD

Post-PRDD

1.5 1.0 0.5

UCL=0.570

X=0.328

_ X=0.091

0.0

LCL=-0.388

-0.5 -1.0 1

7

13

19

25

31

37

43

49

55

61

Observation

Fig. 29.9 Graph shows significant improvement in mean and standard deviation of PRDD at organization level

29.2 Linkage of PRDD and Customer Satisfaction As the eventual objective of this exercise was to improve customer satisfaction, it was decided to study the linkage between PRDD and customer satisfaction. For this exercise, customer satisfaction data was collected in the form of customer satisfaction rating (CSAT) received by projects. CSAT is defined as average of the ratings provided by the customer to a set of pre-defined questions on a scale of 1–5. Linkage between PRDD and CSAT was established statistically using chi-square test (Fig. 29.10). This illustrated how reduction in PRDD has eventually resulted in higher level of customer satisfaction. The improvement in CSAT at organization level was validated by 2-sample-t-test. Results from test of mean showed that there was statistically significant improvement in mean of CSAT (Fig. 29.11).

29 Improving Customer Satisfaction Through Reduction …

403

Chi-Square Test for Association: Outcomes by X Summary Report Do the percentage profiles differ? 0

0.05

0.1

Comments > 0.5

Yes

No

P = 0.031

Differences among the outcome percentage profiles are significant (p < 0.05). You can conclude there is an association between Outcomes and X.

• Test: You can conclude that there are differences among the outcome percentage profiles at the 0.05 level of significance. • Percentage Profiles Chart: Use to compare the profile for each value of X and the average profile. • % Difference Chart: Look for long bars to identify outcomes with the greatest % difference between observed and expected counts.

Percentage Profiles Chart Compare the profiles. High Low

Average

% Difference between Observed and Expected Counts

71%

High Low

29%

PRDD

PRDD

58% 42%

CSAT CSAT

83% 17%

0%

20%

40%

60%

80%

40% -40% -20% 0% 20% Positive: Occur more frequently than expected Negative: Occur less frequently than expected

Fig. 29.10 This summary graph shows significant linkage between PRDD and CSAT Boxplot Graph 5.00 4.75

Data

4.50 4.25 4.00 3.75 3.50 CSAT_Pre

Graph variables

CSAT_Post

Fig. 29.11 Box plot from 2-sample-t-test shows significant improvement in mean of CSAT

404

H. C. Bhatt and A. Thakur

29.3 Conclusion The mechanism explained in this paper employs statistical tools for quantitative monitoring and tracking of the parameter of interest in a software project. The case study in this paper demonstrated the above for PRDD, which is a critical parameter with significant impact on customer satisfaction. As illustrated in the previous sections, the mechanism explained in this paper is quite effective in quantitative monitoring and tracking of the parameter of interest in the software project. The case study also demonstrated how PRDD can be improved in enhancement and maintenance type of software projects. The mean of PRDD was reduced from 0.33 to 0.1 and variance from 0.48 to 0.15. This paper described the approach to identify and implement improvements and changes in a selected software project. It then described the extension and replication of these improvements and changes across the organization. Improvement in PRDD was achieved through improvement in critical sub-process factor that influences PRDD. As shown in this paper the sub-process factor was related to planning of certain engineering activities in a software project and improvements to the same resulted in reduction in PRDD. It was further shown that tracking of above activity would ensure the sustenance of improvements in PRDD. Finally, it illustrated how reduction in PRDD eventually resulted in higher level of customer satisfaction.

29.4 Future Scope In this paper, PRDD was selected as the parameter of interest. Similar exercise can be carried out for other parameters of a software project like EV and SV. The current case study focused on enhancement and maintenance type of projects. Similar exercise can be carried out for other type of software projects especially development.

References 1. Graham D, Veenendaal EV, Evans I, Black R (2007) Foundations of software testing: ISTQB certification. Thomson Learning, UK 2. Pressman RS (2010) Software engineering a practitioner’s approach, 7th edn. McGraw-Hill, 1221 Avenue of the Americas, New York (2010) 3. Nafees T. Impact of user satisfaction on software quality in use. Int J Electr Comput Sci IJECS-IJENS 11(03) 4. Madu CN, Kuei CH, Jacob RA (1996) An empirical assessment of the influence of quality dimensions on organizational performance. Int J Prod Res 34(7):1943–1962 5. CMMI Product Team, CMMI® for Development, Version 1.3, 11 Stanwix Street, Suite 1150 Pittsburgh, PA 15222

29 Improving Customer Satisfaction Through Reduction …

405

6. Suffian MDM, Ibrahim S (2012) A prediction model for system testing defects using regression analysis. Int J Soft Comput Softw Eng (JSCSE) 2(7). ISSN 2251-7545 7. Wikipedia Encyclopaedia. https://en.wikipedia.org/wiki/Regression_analysis 8. Collofello JS (2002) Simulating the system test phase of the software development life cycle. In: Proceedings of the 2002 summer software computer simulation conference 9. Fenton NE, Neil M (1999) A critique of software defect prediction models. IEEE Trans Softw Eng 25(5):675–689 10. Clark B, Zubrow D (2001) How good is the software: a review of defect prediction techniques. Carnegie Mellon University, USA 11. Nayak V, Naidya D (2003) Defect estimation strategies. Patni Computer Systems Limited, Mumbai 12. Thangarajan M, Biswas B (2002) Software reliability prediction model. Tata Elxsi Whitepaper 13. Wahyudin D, Schatten A, Winkler D, Tjoa AM, Biffl S (2008) Defect prediction using combined product and project metrics: a case study from the open source “Apache” MyFaces project family. In: Proceedings of software engineering and advanced applications (SEAA’08), 34th Euromicro conference, pp 207–215 14. Sinovcic I, Hribar L (2010) How to improve software development process using mathematical models for quality prediction and element of six sigma methodology. In: Proceedings of the 33rd international convention 2010 (MIPRO 2010), pp 388–395 15. Fehlmann T (2009) Defect density prediction with six sigma. Presentation in Software Measurement European Forum 16. Zawadski L, Orlova T (2012) Building and using a defect prediction model. Presentation in Chicago Software Process Improvement Network

Chapter 30

An Architecture for Data Unification in E-commerce using Graph Sonal Tuteja and Rajeev Kumar

30.1 Introduction E-commerce applications are becoming de facto mode of business and banking, encapsulating different assets to provide an integrated and coherent view to merchants and customers both for performing various activities. With millions of customers having several needs and opinions, it is challenging to bring out recommendations for satisfying them with appropriate products and services. Therefore, before rendering results to customers, the applications must consider several aspects, e.g., their preferences, search logs, social collaborations, etc. Different subsystems like Searching, Recommendation, Navigation, Presentation, etc., of e-commerce applications coordinate and collaborate among them to provide customized services to users: merchant and customer both [1]. Several data sources in e-commerce applications like customers order details, log files, demographic information and social network are rich sources of information dissemination for understanding customers’ interests [2–4]. However, these data sources are scattered in unstructured forms at different locations. For example, customer order details may be present at a database server in relational database, however, log files may be present at Web server in unstructured format. The unification of these heterogeneous data sources into a single format can help in understanding customers’ behavior and subsequently providing personalized products and contents to them. Relational databases have been extensively utilized for conventional applications having requirements of concrete schema, consistency, etc. For fetching data from relational model, join operations are used [5]. In e-commerce applications, flexible S. Tuteja (B) · R. Kumar Jawaharlal Nehru University, New Delhi, India e-mail: [email protected] R. Kumar e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 P. K. Kapur et al. (eds.), Strategic System Assurance and Business Analytics, Asset Analytics, https://doi.org/10.1007/978-981-15-3647-2_30

407

408

S. Tuteja and R. Kumar

schema is required as new products to be included in the application that can have different set of attributes from the previous ones. In addition, speed is an important concern for the customers as they would like to receive responses to their queries interactively without any perceivable delay. Due to schema inflexibility and larger number of joins in relational database, it is not a good choice for large-scale ecommerce applications [6–8]. On the other hand, graph database provide a flexible schema which can be utilized to unify data from several sources of e-commerce application [7, 9]. Graph can also capture relationships among customers and products which help in speeding up searching and recommendation of products to customers. In graph database, entities and relationships are represented using nodes and edges, respectively. Also, properties to nodes and edges of the graph can be associated using key-value pairs [10]. To utilize graph model for e-commerce applications, data from heterogeneous sources having different formats need to be unified into a single graph. Therefore, we present an architecture which can be used by an e-commerce application to map data into a graph and subsequently utilized by different subsystems of the application. Researchers have proposed several architectures for e-commerce which mostly focus on conceptual framework ignoring data-centric view [1, 2]. Many e-commerce industries like eBay, Walmart, etc., have successfully applied graph for data management and real-time recommendation [11, 12]. Therefore, in this article, we focus to present a data-centric architecture for e-commerce applications using graph which models relationships among products and customers. The presented architecture is generic which can be adapted to any e-commerce application with varied requirements. The paper is organized as follows. In Sect. 30.2, system architecture based on graph model has been proposed. In Sect. 30.3, we have discussed implementation details and result analysis for different e-commerce queries. The issues and challenges for adopting graph model for e-commerce applications have been discussed in Sect. 30.4. The paper has been concluded in Sect. 30.5.

30.2 Graph-Based System Architecture In an e-commerce application, data is generated as well as captured from multiple sources having different formats. However, for faster processing of customers queries, data needs to be unified at a single place in appropriate format. Subsequently, unified data can be consumed by different subsystems of the application to provide personalized results to the customers. To realize this, we propose an architecture which consists of two phases: (1) offline phase for bookkeeping and (2) online phase for addressing faster query. In offline phase, data from multiple sources is unified at a single place and converted into a graph capturing requirements of the application. In online phase, different subsystems of the application collaborate with each other for providing results to customers. The graph generated by offline phase helps in faster query processing by the online phase (Fig. 30.1).

30 An Architecture for Data Unification in E-commerce …

409

Fig. 30.1 E-commerce architecture using graph

30.2.1 Offline Phase The offline phase of the architecture mainly deals with unification and processing of data from multiple sources based on application requirements. In an e-commerce application, multiple sources of data are: • Transactional Data: It contains information about products, customers, orders, etc., having several attributes associated with them. In most of the e-commerce applications, relational database is used for storing transactional data. • Server Logs: In an e-commerce application, server logs are maintained at Web server which reveals about customer behavior like login time, time spent on application, products browsed, etc.

410

S. Tuteja and R. Kumar

• Social Network: The social network information of customers can depict their interests, connections, etc., which help in providing customized products to customers. These data sources representing different information about customers and products are utilized to create a graph with nodes representing products, customers, etc., and edges representing relationships among them. Depending on application requirements, various mapping criteria can be used to create relationships among nodes [13]. For example, when a customer buys a product on an e-commerce application, the products similar to the product bought are recommended to the customer. To realize this, relationships among products are created a priori based on similarity among them which helps in providing faster and appropriate recommendations. Following are the mappings which can be used to create relationships in e-commerce applications as: • Communication Mapping: This mapping is used to create relationships among nodes communicating with each other. For example, relationship based on communication mapping is created between two customers socially connected with each other. • Co-existence Mapping: This mapping is used to create relationships among nodes co-existing with each other in a container. For example, relationship based on coexistence mapping is created between HP laptop and HP mouse when they are bought together. In addition, weights to relationships can also be assigned based on frequency of co-existence. • Co-relation Mapping: This mapping is used to create relationships among nodes co-related to each other above a certain threshold. For example, relationship based on co-relation mapping is created between HP laptop and Lenovo laptop when they have same values of attributes like RAM, hard disk, screen size, etc. In this mapping also, weights to relationships can also be assigned based on co-relation between nodes which are calculated using several metrics. • Transactional Mapping: This mapping is used to create relationships among nodes participating in a transaction. For example, relationship based on transactional mapping is created between customer and corresponding products interacted by him in terms of buying, reviewing, liking, visiting, etc. In this mapping also, weights to relationships can be assigned based on type of interaction like buying a product having higher weightage than viewing a product. Based on the requirements of e-commerce applications, these mappings can be used to create relationships among nodes of a graph. Based on data updation in the heterogeneous data sources, the unified graph also needs to be periodically updated. After mapping the data from heterogeneous sources to a graph, various indexes on graph are also created, which further help to reduce the query execution time in online phase. Different graph-based indexing techniques such as path-based indexing, subgraph-based indexing, etc., have been proposed for reducing the query execution time in a graph model [8].

30 An Architecture for Data Unification in E-commerce …

411

30.2.2 Online Phase In the online phase, different subsystems of an e-commerce application collaborate and coordinate with each other to provide personalized results to the customers which are served by the graph generated by offline phase. The different subsystems of an e-commerce application are: • Searching: It consists of providing results to customers based on filtering criteria set by them. Indexes created on several properties of products nodes are used for searching products by the customers [14]. • Recommendation: It consists of providing product recommendation to customers based on their interests retrieved from the past data. Using relationships among customers and products in unified graph, products to customers are recommended [3, 15]. • Navigation: It consists of providing a roadmap to customers from one product to another in a large pool of products which helps in improving sales as well as shopping experience. By traversing relationships among products in graph, faster and more accurate navigation are provided. • Presentation: It is infeasible for a customer to view all products listed on a Web page, and therefore, the products must be presented in such a way that most relevant products are presented at top. Several graph properties like degree, centrality, etc., are used to find relevance of products for customers. As discussed, offline phase processes the data and represents in appropriate format based on requirements of the application. Therefore, processing time of various subsystems in online phase is substantially reduced. Also, in the online phase, new data is generated by the customers which updates the graph by creating new nodes and edges.

30.3 Implementation and Results To justify the architecture proposed in Sect. 30.2, we have created a prototype using ecommerce data taken from Kaggle1 . In the offline phase, data has been preprocessed and stored as graph using graph database Neo4j2 . In the online phase, different types of queries have been executed on graph data for verifying speed as well as accuracy. The implementation details of offline and online phase are described as:

30.3.1 Offline Phase Implementation In offline phase, data preparation required for online phase has been completed. After extraction of e-commerce data from Kaggle1 , the data has been cleaned, transformed

412

S. Tuteja and R. Kumar

Fig. 30.2 Graph model with node and edge types

and loaded to graph model using Neo4j2 as graph database. The nodes and edges in graph models are shown in Fig. 30.2. To compute similarity among products and/or customers, following mappings have been utilized as: • Co-existence Mapping: The products occurring in same order or the customers ordering the same products have been connected to each other with the help of co-existence mapping. • Co-relation Mapping: The co-relation among products has been calculated using cosine similarity method where each product is represented as bag of words using their description. The graph generated after loading data and creating data mappings contains approximately 40 K nodes and 100 K edges which have been utilized in online phase for query analysis.

30.3.2 Online Phase Implementation In online phase, we have run different types of queries of e-commerce applications including searching, recommendation and data analysis (Table 30.1). In query Q1, searching of product nodes based on the given description is performed. In query Table 30.1 Different types of graph queries in e-commerce applications Query

Type

Description

Q1

Keyword-based searching

Given keyword d, find out the products containing d in product description

Q2

Content-based recommendation

For a given product p, find out the products which are similar to p

Q3

Data analysis

Find out the purchase frequency of a given product p in the database

30 An Architecture for Data Unification in E-commerce …

413

Fig. 30.3 Execution time for query Q1 using graph model and relational model

Q2, products similar to the given product are searched using graph traversal. Query Q3 calculates purchase frequency of a given product by counting the number of relationships of a node with other nodes. The same queries have been run using relational model as well for analysis and comparison with graph model. Execution time of these queries has been measured in milliseconds using Neo4j (version 3.5) as well as MySQL (version 5.7) discussed in following subsections. The system used for testing runs Windows 10, version 1709, having Intel Core i7 CPU running at 3.60 GHz and 12 GB RAM.

30.3.2.1

Keyword-Based Searching

For analyzing the behavior of graph model for keyword-based searching, we have considered query Q1 which searches nodes of type PRODUCT containing keyword d in attribute P_DESC. Execution time of Q1 for a given keyword is measured ten times using graph model as well as relational model as represented in Fig. 30.3. As can be seen from Fig. 30.3, cold execution time as well as subsequent execution time for graph model is substantially lower than that of relational model.

30.3.2.2

Recommendation

For analyzing the behavior of graph model for recommendation, we have considered query Q2 which searches nodes of type PRODUCT similar to the given product p. Execution time of Q2 for a given product is measured ten times using graph model and relational model as represented in Fig. 30.4. It can be observed from Fig. 30.4, cold execution time as well as subsequent execution time for graph model is substantially lower than that of relational model. This is justified from the fact that complexity of relationship traversal is graph model is asymptotically lower than that of join operation in relational model [7].

414

S. Tuteja and R. Kumar

Fig. 30.4 Execution time for query Q2 using graph model and relational model

30.3.2.3

Data Analysis

For analyzing the behavior of graph model for data analysis, we have considered query Q3 which counts the number of relationships with nodes of type ORDER for a given product p. Execution time of Q3 for all products in the database is measured ten times using graph model and relational model as represented in Fig. 30.5. It can be observed from Fig. 30.5, graph model outperforms relational model for all executions. Fig. 30.5 Execution time for query Q3 using graph model and relational model

30 An Architecture for Data Unification in E-commerce …

415

30.3.3 Discussion As we can see from Sect. 30.3.2, graph model outperforms relational model for keyword-based searching, recommendation and analysis queries. The queries considered for comparison of graph model with relational model are traversal queries which include one or more level of traversal from a given source node. As discussed in [7], run time complexity of traversal queries using graph model is asymptotically lesser than that of join queries using relational which has been also verified using our analysis. However, performance of graph model for aggregation queries is not explored yet.

30.4 Issues and Challenges Graphs have already been successfully employed in many e-commerce organizations like Walmart, eBay, etc., for different subsystems like data management, recommendation, delivery tracking and so on [11, 12, 16]. However, these are some issues which need to be addressed while adopting graph database for e-commerce applications as: • Co-existence with Other Databases: In e-commerce applications, relational and document databases are also used for representing product information, order details, cart information, etc. On the other hand, graph database can provide better personalization, scalability, presentation, etc. The main challenge for an application is to decide whether data should be moved from other databases to graph or other databases should co-exist with graph with each one serving its own purpose. • Data Mapping: To move from an existing database to graph database for an ecommerce application, the existing database must be mapped to graph database with nodes and relationships among nodes. To decide various types of nodes, attributes and relationships among them is an important issue. • Online Learning: For a customer looking for products, it is expected to get quick and contextual responses for the required product. Therefore, the model must be fast enough to accommodate the customer behavior in existing database and return the results. • Update Frequency: As new data is continuously generated by e-commerce application in terms of transactions, logs, etc., it needs to be updated into unified graph. To improve the accuracy of results, updated frequency should be high which will consume more processing power. Deciding the trade-off between accuracy and processing power is also a big issue. • Privacy: For improving sales and enhancing customer experience, e-commerce applications capture data from internal and external sources which pertains to breaching customers’ privacy.

416

S. Tuteja and R. Kumar

Other than these, there are other challenges like scalability, flexibility, data storage, domain dependency, etc., to be considered while adopting a graph database for ecommerce applications. For example, deciding nodes and relationships among nodes may vary from application to application. Therefore, e-commerce application design using graph database is an emerging field having several issues and challenges which are being addressed.

30.5 Conclusions To conclude, we have discussed how graph database can be advantageous for an e-commerce application. To realize it, we have defined a generic graph-based architecture which can be utilized by its various subsystems of an e-commerce application. The architecture is divided into two phases: (1) offline phase which deals with data management considering application requirements and (2) online phase which consists of various subsystems utilizing data managed by offline phase. The unification and processing of data in offline phase help in speeding up the processing of queries in online phase. To justify the same, we have implemented a prototype using our proposed architecture. We have compared the query execution time of unified graph with relational model for different types of traversal queries. In addition, we have discussed issues for adopting graph database for real-time applications. In future, we will explore how we can resolve these issues using our architecture. We will also verify the applicability of unified graph for larger datasets and different types of queries.

References 1. Zimeo E, Oliva G, Baldi F, Caracciolo A (2013) Designing a scalable social e-commerce application. Scalable Comput: Pract Experience 14(2):131–141 2. Yu H, Huang X, Hu X, Wan C (2009) Knowledge management in E-commerce: a data mining perspective. In: 2009 International conference on management of e-Commerce and e-Government. IEEE, pp 152–155 3. Huang Z (2005) Graph-based analysis for E-commerce recommendation. Ph.D. Dissertation, University of Arizona, USA 4. Ding L, Han B, Wang S, Li X, Song B (2019) User-centered recommendation using us-elm based on dynamic graph model in e-commerce. Int J Mach Learn Cybernet 10(4):693–703 5. Codd EF (1990) The relational model for database management: version 2. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA 6. Kaur K, Rani R (2015) Managing data in healthcare information systems: many models, one solution. Computer 48(3):52–59 7. Ma S, Li J, Hu C, Lin X, Huai J (2016) Big graph search: challenges and techniques. Front Comput Sci 10(3):387–398 8. Srinivasa S (2012) Data, storage and index models for graph databases. In Graph data management: techniques and applications. IGI Global, pp 47–70

30 An Architecture for Data Unification in E-commerce …

417

9. Pokorný J (2015) Graph databases: their power and limitations. In: IFIP international conference on computer information systems and industrial management. Springer, Cham, pp 58–69 10. Sasaki BM, Graphista A (2016) Graph databases for beginners: other graph data technologies. https://neo4j.com/blog/other-graph-database-technologies/ 11. Neo-Technology (2015) Walmart uses Neo4j to optimize customer experience with personal recommendations (case study). Neo-Technology 12. Neo-Technology (2014) EBay now tackles e-Commerce delivery service routing with Neo4j (case study). Neo Technology 13. Costa LDF, Oliveira ON Jr, Travieso G, Rodrigues FA, Villas Boas PR, Antiqueira L, Viana MP, Correa Rocha LE (2011) Analyzing and modeling real-world phenomena with complex networks: a survey of applications. Adv Phys 60(3):329–412 14. Neo-Technology (2015) The power of graph-based search (white paper). Neo-Technology 15. Neo-Technology (2014) Powering recommendations with a graph database. Neo-Technology 16. Webber J, Robinson I (2015) The top 5 use cases of graph databases (white paper). NeoTechnology

Chapter 31

Road Accidents in EU, USA and India: A critical analysis of Data Collection Framework Alok Nikhil Jha, Geetam Tiwari, and Niladri Chatterjee

31.1 Introduction Road traffic accidents are uncertain, and there are several parameters involved causing these accidents, which are uncertain and unpredictable. The injuries and fatalities caused by road accident are globally acknowledged area to work on. Hundreds of thousands of people across the world are losing their lives to road accidents and road disasters every year, and these road accidents are causing wastages of national resources to the countries. The global situation for road traffic injury (RTI) has moved to the fifth position that shows the impact of the accidents. However, the accident parameter depending on various dynamic makes it heterogeneous in nature. Around 124 lacks people are killed across world due to traffic accidents as per World Health Organization (WHO) reports [1, 2]. A deeper details globally, it is observed and can be observed, for example, in Spain, 2478 fatalities were reported in year 2010 [3]. European Union reported 34,500 fatalities as a result of traffic accidents in 2009 [4] which further reported 25,900 fatalities and approximately 1.4 million injuries in more than 1 million car accidents in 2013 [5]. According to the US Department of Transportation [6], 33,561 fatalities were reported in year 2012 due to road accidents. Due to increasing fatality in road accidents, Global Burden of Disease Study [7] report has moved up the accidents to fifth position from tenth position. The conditions were not same for all countries, and in case of many of the Organisation for Economic Cooperation Development (OECD) countries [8], lowest- and middle-income countries [1] were affected [9]. The report also illustrated a decline of fatalities in eighty-eight countries which has around 48% high-income countries and 47% are middle-income A. N. Jha (B) · G. Tiwari TRIPP, Department of Civil Engineering, Indian Institute of Technology Delhi, New Delhi, Delhi, India e-mail: [email protected] N. Chatterjee Department of Mathematics, Indian Institute of Technology Delhi, New Delhi, Delhi, India © Springer Nature Singapore Pte Ltd. 2020 P. K. Kapur et al. (eds.), Strategic System Assurance and Business Analytics, Asset Analytics, https://doi.org/10.1007/978-981-15-3647-2_31

419

420

A. N. Jha et al.

countries and only 5% are low-income countries. The report presented in the increase of accident and fatalities in eighty-seven countries. Researchers based on data also believed that accidents occur due to human behavior, and at times, it is linked with traffic violence and offenses. There are studies around poor human behavior blaming them to be core reason for road crashes, and few of them are aligned to roads, vehicles, environment conditions, etc.; though being these parameters are very discrete, the clear identification needs to be done. The status report in India published by NCRB in 2015, around 1.4 lakhs died and 4.7 lakhs injured in road crashes in 2014 [4], but the data are probably as around 3–4 lakhs persons visited hospitals for road traffic injuries, and most of the cases go unreported. For every such incident(s), an epidemiology [10] approach is followed and data are collected from available sources. However, we still do not have an organized set data in a standard scalable and computer understandable format. Being the problem is serious in nature in many countries and is a matter of attention, United Nations declared the period of 2011–2020 as “decade of action for road safety” [2, 11, 12] with an aim to control and minimize fatalities at least by 50% or more in low- and middle-income countries [13] and establish an efficient road safety framework and management model for improved safety of road users and better post-crash response data for understanding and designing precautionary measures. A descriptive analysis of the various parameters for road accidents is essential to understand and estimate the magnitude, and the situation of road accidents is important but understanding the data quality, data availability, factors related to dangerous situations, and different interesting patterns of data is equally important, thereby a proactive capability and capacity can be looked forward. Section 31.2 provides the literature background of the importance of data collection and mechanisms and new initiatives taken up by countries of European Union, USA, and India. Section 31.3 presents the methodology and analysis of the systems in whole for the countries and elaborates the different variables deem fit as a generalized approach overcoming the limitations. Section 31.4 finally provides the conclusion of importance of accident data collection, and the relevant knowledge can be extracted for better planning.

31.2 Accidents and Data Collection Road accident is unpredicted, has unpredictable impact, depending on several factors, and can be prevented by the ways of understanding of the available data set and case studies. There are several parameters not limited to environmental parameters, road user and driver’s-related parameters, e.g., mental, behavioral, etc., vehicle-related parameters, road- and infrastructure-related parameters, and many other parameters. Government and other non-government organizations are working on philanthropic approach and educating the road users. Related industries, like automobiles, are adding several new technologies inside the vehicles like rear cameras, airbags, stability control systems, antilock brake systems and many other kind of applications, speed notifications, preventive laws are also hosted globally. But the accident and

31 Road Accidents in EU, USA and India: A critical analysis …

421

injury contribution still exist. Large count of disabilities, grievous injuries, and fatalities are reported in India owning a significant share world population [1] and setting up an efficient road safety measures are opportunity areas [14] and lot of such knowledge has to be learned from the other developed countries. The global situation for road traffic injury (RTI) has moved to the fifth position; this gives a clear indication of how road traffic accidents (RTAs) are contributing, fatalities, and injuries. RTAs are not the prime factor in injuries and fatalities but also waste resources of the nation. The objective of study and the identification of these parameters are important as they set the baseline of analysis of accidents and main factors associated with different type of road and traffic accident. The heterogeneity in the nature of road accident hinders in proper analysis. And to overcome the heterogeneous behavior, data are segmented to examine the data. The variables recorded in road accidents are discrete in nature, and hence, collation and study of such variable for quantification require an organized approach [15]. When an accident occurs, it is recorded by police or the victim or otherwise any other witness, and information thus collected is called accident information and extracted data is accident data. And this data make the backbone of any accident safety measures or patterns. As per research study by TRIPP [16] about fifty percent of fatalities in road traffic accidents occur among pedestrians, cyclists, and motorcyclists, and hence, they are also referred as most vulnerable road users. The figures, i.e., approximately 57% of fatalities, occur in low-income countries and 51% fatalities are reported in middle-income countries; however, in high-income countries, the averages lies around 39%. The different mix of participating parameters in different countries results in different distributions of fatalities over different transport modes. For example, in most of the countries based on southeast Asian region having more density of two wheelers and three wheelers as a common transportation mode and has a high proportion of fatalities, which is not the case as in high-income countries in American or European countries. Comparison of different countries will not be valid because of different factors and key to it is vehicle type. 93% of fatalities globally occur in low-income and middle-income countries as per WHO’s report, and these countries has only 60% share of total vehicle [17]. Being road accident is one of the causes in injury and fatalities a careful analysis which can be estimated on various parameters that can collectively provide lots of details and information that can be utilized in designing a safety system. And correctness of data collected is very important to interpret on possible behaviors [18]. The facts and information from accident site are always recorded; however, the issue comes in analysis of these data sets. The subjective analysis for larger set of information is very complicated and becomes tougher when data are increasing. The need of organizing collected information in a structure was of major concern and need. The best possible way was to create a database schema and import the relevant information from police record and other reports into that database. The another key challenge rose up about having a perfect system that can help in organization of data properly collected and further have all the relevant parameters to place any new information. Countries have done research around creating a best-fit system that can set a baseline to collect all the relevant data and can be easily extended with new

422

A. N. Jha et al.

information, if comes and can be extended easily. This paper shows the mechanism of the data collection by countries of European Union, USA, and India.

31.2.1 Data Collection in European Union European Union reports more than 170 lakhs injuries and 40,000 fatalities from road crashes almost every year [19, 20]. All the countries in European Union have been collecting accident data by their own national systems. With the main goal of record safety information of road users, evaluate efficiency of road safety measures, quantify road traffic problems, EU Council created a new database system—CARE a database to record data from accidents across Europe. The structure of CARE system was followed by all member countries which is a central repository of all information collected from various road accidents. The schema framework of CARE included variables like person details, details of vehicle, road details, i.e., junction, road type, area, area type, motorways, junctions, collision type, condition of weather, and light. The key objective of CARE system was to have a qualitative evaluation of safety problems and quantify various kinds of accidents and create and implement efficient safety measures to minimize the road accidents. At the initial level, 15 EU countries participated in CARE thus followed the data structure and other countries relied on their existing national data collection. The major difference with country specific data collection system and advantage of CARE along with other databases and mechanisms available in different countries with a high level categorization with the variables learned from available systems across used by countries in EU as CARE system is created based on member states’ accident data. The process and the schema of organized information have enough flexibility as it is designed on all the best practices of member countries. CARE system has 55 variables with 255 values. The raw data are collected form police department and stored in a national database, where the raw data are cleaned, validated, and organized. The European countries had challenges in recording the accident records as various accident data variables, across all road accidents, in different countries, were collected under different definitions. The harmonization process was a time-taking process. The CARE system proved to be a useful system initially for its data collection mechanism, as it tried or to have a generalized way. However, varying accident data based on different demography and many other emerging information-specific to a city or country, not captured in the system, were leading to various kinds of inaccuracies. It was also observed that the data collection forms that were spread across had varying structures after relevant data are filled in and hence cannot be compared [21]. Quality and availability of accident data get affected, thereby it affecting analysis of accident data’s reliability of CARE [22] system. The key challenges in the system were uniformity of data like differences in the collected data variables and values for the countries using the system. Each country has adopted their own definition of accident variables which becomes confusion when evaluated on an overall perspective of EU; similarly, the formats of metadata impact on quality and availability.

31 Road Accidents in EU, USA and India: A critical analysis …

31.2.1.1

423

Common Accident Data Set (CADaS)

The CARE systems’ limitation due to unsynchronized organization of data and datarecording system has affected the efficiency of accident data collection. New system common accident data set (CADaS) was formulated and designed with intent to create an uniformity in data organization and collection [23]. The system was empowered to be flexible enough to accommodate new variables and changes. The data sets consisted of minimum set of standardized data elements which are common across all EU countries and easily fit in their system of data collection, thus allowing a comparable data set collected across Europe. The CADaS is developed over CARE adding more variables and values with target of maximizing the scope of CARE in a detailed and reliable fashion. CADaS is initiated as a voluntary system for European countries and collected the data from CARE and other participating countries from existing national systems of data collection, thereby strengthening it. Over the time, nation police road accident data collection systems also adopted the CADaS. The system of CADaS has covered all variables to understand casualty, root causes, and other property losses based on knowledge organization from member countries. The system also ignored data which is difficult to collect to avoid any low-quality data. The preference was given to existing CARE data and other high-quality data targeting useful variables in accident analysis and whose detailed information can also be collected. The system was flexible enough to keep the data adaptive to have alternate forms of data for localized data varying with EU countries. The system followed a structured process of migration from the previous system as shown in Fig. 31.1 [24].

Fig. 31.1 Process for inclusion of CADaS. Source IRTAD conference, Seoul, Sep 2009

424

A. N. Jha et al.

The system ensured a smooth migration from existing national records and CARE data from the CAREPLUS protocol and then translating them to CADaS system. CADaS variables have five categories of variables related to accident, crash, traffic unit, road- and person-related. These categories have 73 variables and 471 values. The variables and values are selected considering variables and values significance and utility for accident analysis for countries of European Union. The variables are disaggregated in CADaS and designed to be recorded and organized independently with flexibility to add new variables or ignore any present variable by member countries in EU. The accident data variable is organized in four categories, viz. accident-related variables (or A), road-related variables (or R), vehicle-related variables (or V), and person-related variables (or P). The variables are segregated as variable of higher importance and lower importance. Table 31.1 shows the categories of variables of CADaS system with summary of variables and values. The codes corresponding to the variables (as cited in Table 31.1) are further illustrated with their values in Tables 31.2, 31.3, 31.4, and 31.5. A total of 88 variables under all category are identified which are further detailed and classified as detailed values and alternative values. The structure of all the variable lists in CADAs is stated below. Table 31.1 CADaS variables and values Variable category

Code

Accident

A

7

5

Road

R

16

21

Vehicle

V

7

10

Person

P

15

7

45

43

Total

Number of variables High (H) importance

Table 31.2 Accident data from CADaS

Lower (L) importance

Number of values Total

Detailed values

12

Alternative values (A)

Total

76

15

91

37

151

15

166

17

110

7

117

22

96

11

107

88

433

48

481

Accident variables 1. Date 2. Time 3. Weather condition 4. Lighting condition 5. Accident with pedestrians 6. Accident with parked vehicle 7. Single-vehicle accidents 8. Two-vehicle accidents—no turning 9. Two-vehicle accidents—turning or crossing

31 Road Accidents in EU, USA and India: A critical analysis … Table 31.3 Traffic unit data from CADaS

Table 31.4 Road data from CADaS

Table 31.5 Person’s data from CADaS

425

Traffic unit variables 10. Traffic unit type

18. Model

11. Vehicle special function

19. Registration year

12. Trailer

20. Traffic unit maneuver

13. Engine power

21. First point of impact

14. Active safety equipment

22. First object hit in

15. Vehicle drive

23. First object hit off

16. Make

24. Insurance

17. Registration country

25. Hit and run

Road variables 26. Latitude

35. Junction condition

27. Longitude

36. Obstacles

28. Road type

37. Consignment type

29. Road kilometers

38. Number of lanes

30. Speed limit

39. Emergency lane

31. Motorway

40. Tunnel

32. Urban area

41. Bridge

33. Junction

42. Work zone-related

34. Road segment grade

43. Road curve

Person variables 44. Age 45. Gender 46. Nationality 47. Road user type 48. Injury severity 49. Alcohol test 50. Alcohol test result 51. Alcohol level 52. Drug test 53. Driving license issue date 54. Driving license validity 55. Safety equipment 56. Distracted by device 57. Trip/journey purpose

426

A. N. Jha et al.

Some the variables corresponding to categories are given in Tables 31.2, 31.3, 31.4, and 31.5, respectively [22]. All the variables collected for accident category are given in Table 31.2. These variables are aligned toward basic information from the site of accident. Table 31.3 shows the vehicle- and traffic-related variables at the time of accident. It tries to capture other relevant information to understand the possible causes and status of other related information. It has the information about details of vehicle and impact after accident if maneuvering was required, impact details, etc. Road variables and person variables are given in Tables 31.4 and 31.5, respectively. The road variables provide precise details of road and its type. The person variables provide the details of people involved in the accident and other primary features of injury with other details drugs or alcohol. All the person variables are recorded for all participating vehicle or pedestrian. A quick review of these variables show intensity of depth of the information collected from these variables, making effective knowledge extracted from the parameters. All these variables have separate value sets, and the inputs around these categories are aligned to these variables, i.e., accident, traffic unit, person and road variables, that have list of value sets given in Table 31.1 and all possible value for each variable is placed from that value set, thus making the system flexible enough to be used and easily measurable locally and can be compared with other countries. The detailed variables are not presented due to huge list. The holistic approach used in European Union gives a depth information on various parameters that can be considered and are valid from international perspective and are useful for different stakeholders for accident data analyses [12]. The approach in CADaS systems is hence fully active and new way of data collection forms the sources and facts, thus making it easier toward analysis.

31.2.2 Data Collection in the USA The National Highway Traffic Safety Administration (NHTSA) [25] of the USA has identified and considered crashes that involve a vehicle and fixed objects (barricades, road dividers, bridge supports, etc.) on road, dynamic objects (e.g., other vehicles moving or parked), and other instances, e.g., slip on road, fire, tire bursting, etc. The accidents are conceptual term used colloquially, though crash and accident are used together. We will be discussing the parameters precisely about crashes. The safety council of USA is responsible for transportation safety practices and reports about the road users being prone to accidents precisely from vehicles like car, motorcycle which is also called here as light motor vehicle (LMV) and heavy motor vehicles (HMV) like truck and lorry. The road accidents apart from impacting lives have other overheads like medical expenses, wage and productivity losses and property damage that estimates somewhere $152 billion and other safety precautions, road maintenance, repairs, etc.; tasks get affected [26]. As per report by National Safety

31 Road Accidents in EU, USA and India: A critical analysis …

427

Council of USA [27], it was observed that accidents and fatalities count on scale of 100 million passengers was reduced in 2014 by 30% compared to 0.49 in 2015 specifically for LMVs [28]. In 2015, 38,300 fatalities and 4.4 million injuries were reported. In year 2016, a 2% raise in accidents are recorded. There are several possible correlations for this increased count, e.g., increased number of vehicles and road trips with the development of new roads and flyways, urbanization, etc. The authorities have been actively working into preventive measures since last several decades and have been actively collecting the data to access crashes and have action plans. The Department of National Highway Traffic Safety Administration (NHTSA) developed a sampling system called National Automotive Sampling System (NASS) in 1979 with an objective to reduce motor vehicle crashes, injuries, fatalities, and other property damage. The NASS was responsible to maintain all the data collected from motor vehicle crashes in the USA. NASS collected detailed crash data from all kinds of motor vehicle crashes and injuries and that was further used for research and designing new plans for minimizing accidents. NASS’s primary data sources were called primary sampling units (PSUs) and were from different counties of the states, and the secondary data collection sources were police jurisdictions (PJs) and the same has been since the induction of the system till date. However, in 1980, the system was re-evaluated and new features and data collection programs were appended, and hence, two new systems, NASS General Estimate System (NASS GES) [29] and NASS Crashworthiness Data System (CDS,) were evolved [30].

31.2.2.1

General Estimate System or GES

General Estimate System or GES [29] is completely based on accident reports collected by police also called PARs. GES data are used to showcase an overview of crash population and trends, find out safety problem areas for road users with a baseline architecture for regulatory and consumer information initiatives. The incomplete PARs are not accepted, and hence, there were lots of under reported crashes. Unlike Fatality Accident Reporting System (FARS) [31], the GES recorded all accidents that have loss of property, injury, and any fatality. The provision for addition of new data field to support for countermeasures or cite-emerging issues was not available in GES. The new data fields were required to be strengthened by different information, e.g., inputs related to vehicles, pedestrians, non-motorized transport vehicles, emergency, and other specialized vehicles with detailed information and other information of injuries, severity, and other environment factors. The performance of existing systems in data collation method was another mode of improving the performance over the time span and hence required modernization.

31.2.2.2

Crashworthiness Data System or CDS

CDS or Crashworthiness Data System is a crash data collection by the Transportation Department of USA. CDS systems collect detailed information about crashes and

428

A. N. Jha et al.

organized the precise data, sample from police-recorded cases for crashes light motor vehicle [30]. The NASS CDS provides an automated, comprehensive national traffic crash database. CDS considers crashes involving LMV and HMV with gross vehicle weight less than 10,000 lbs and vehicles less than four year old and only the crashes resulting in grievous injuries of pedestrian or non-motorized or motorized vehicle and the condition that vehicle to be towed due to damages. CDS collects all the information from police records, interviews of injured, and other members in participating in crash, medical reports analysis, interaction to emergency services and doctors, vehicle health, background of drivers for past accidents, health issues or any habits, etc. The CDS imbibes safety of the road users in several cases, for example, crash of heavy vehicles while trying to save cyclist or pedestrian and hitting on pole or divider, etc. The vehicle age is collected to ensure the mechanical health is alright. CDS has been improvised with the subset structure of parameters, for example, general vehicle, exterior vehicle and interior vehicle. All the details collected from occupant are organized as occupant assessment and occupant injury records. With more information coming up in deep detailing and being an efficient way for solving and understanding the crash cases with quality, system was improvised with accident event recording on wider level instead of limited situations parameters. Crashworthiness Data System is different from General Estimate System in several aspects. CDS uses the subset of information recorded by GES from accident site and other sources; the subset structure minimizes the double work of data collection as GES system records all information for all kinds of traffic crashes. Thus, it saves cost and time. The CDS system was focused toward understanding of crashworthiness of vehicles with possible consequences to occupants in such crashes. Dependency mode of collecting information from GES system makes it compulsory to record data in same structure and schema and conditions must meet compliances of GES and is forced for CDS. The data recorded cannot be customized for CDS as it follows the standards of GES. The increase in crashes required more sample size; however, the sample size computed for analysis was less to manage the safety issues, and with emerging areas of crashes, modernization was required in CDS system. For the increase in cases, Crash Reporting Sampling System (CRSS) was formulated which replaced the GES system and Crash Investigation Sampling System (CISS) was formulated which replaced CDS system.

31.2.2.3

Crash Report Sampling System (CRSS)

General Estimate System was discontinued in 2016 by Crash Report Sampling System (CRSS). The CRSS system was an efficient system with enough flexibility with demographic features and traffic-related information [30]. CRSS gathers all the records; having at least one motor vehicle and the crash must result in property damage, injury, or death; from police records, crashes involving all types of motorized transport vehicles, pedestrians, and cyclists leading to injury or property damage or fatality. The system has more purpose aligned in understanding whole

31 Road Accidents in EU, USA and India: A critical analysis …

429

crash picture, measure trends, plan and execute new initiatives, and form a basis of overall safety regulations. Annually, CRSS obtains around 5–6 million police-reported crashes across the USA. To design the system reports collected from 60 selected areas as the sample size in such a way to collect all parts of geography aligned to population density, vehicle density, weather conditions, miles driven, etc., 120 data sample are coded into a common format. The data received from police (or police accident report or PAR) are classified nine parameters for further analyses. The parameter cases are: 1. 2. 3. 4. 5. 6. 7. 8. 9.

Fatality or injury of a non-motorist with unknown injury severity, if injured Fatality or injury of a motorcycle with unknown injury severity, if injured At least one occupant is killed in a vehicle not more than 4 years old At least one occupant is killed in a vehicle more than 4 years old At least one occupant of not more than 4-year-old vehicle suffered injury with unknown severity At least one occupant of more than 4-year-old vehicle suffered injury with unknown severity At least one heavy truck or school bus or transit bus or motor coach or medium truck with a minimum gross weight to volume ratio is 10,000 lbs or more At least one vehicle (not more than 4 years old) without any injury or fatality in crash Crashes with only property damage.

CRSS sample is the result of probability sampling featuring stratification, clustering, and selection with unequal probabilities. It involves probability sampling from sampling units of data collected from counties and police jurisdictions. The CRSS survey system is more efficient with independent sample design from the GES or any other NHTSA’s surveys. All reports are referred from CRSS except for FARS for which FARS is referred.

31.2.2.4

Crash Investigation Sampling System (CISS)

CISS or Crash Investigation Sampling System deals with detailed and precise data from accident site any other relevant data connected to accident for analysis and for the cases where at least one passenger vehicle is involved in operational traffic. The detailing of data is stringent enough to collect skidding marks, any kind of object or obstacle on road, problem in vehicle, spill of coolant or engine oil or any other fluid. CISS gets in pre-crash assessment model to predict and recreate simulation to collect scenarios and then perform a root cause analysis using latest techniques and methodologies. With the detailed data collected by CISS, relationships between different variables are established and precautionary measures are planned. These learning and knowledge are used to assess the safety standards for highway roads and motor vehicles plying on highways.

430

A. N. Jha et al.

CISS is focused for passenger vehicles which are not in situation to be driven and are towed. It collects information right from crash site and follows a strong crash site inspection. The system is different from CRSS as CRSS is focused on the cases reported to police and data is always based on police records. CRSS is more based on police records and hence doesn’t need any onsite presence, the system manages and handles large number of cases as available with police and has a lenient response time. CISS has strict response times as the onsite data collection has many other environmental and ecosystem factors, hampering any available data source.

31.2.2.5

Fatality and Accident Reporting System (FARS)

Another active database system in USA is Fatality Analysis Reporting System or FARS. The system was designed and developed in 1975 [30] by the National Center for Statistics and Analysis (NCSA) of NHTSA. The objectives of FARS are aligned toward crashes leading to fatalities and thus measuring all aspects of safety on road and provided objective evaluation for motor vehicle safety standards, identify traffic safety problems, and suggest all the possible solutions. Data in FARS are derived from fatal traffic crashes of all states and the District of Columbia, and Puerto Rico that has the details of crash having in an operational traffic and fatality reported for at least one person, wherein motor vehicle traveling on an operational traffic way. The details are gathered from police reports, medical and death certificates, CCTV, highway accident data and other state records and then coded into FARS forms a standard format. The data sets are very dynamic and are updated time to time based on NCSA’s analytical data, and thus, auxiliary data sets are maintained which take care of many derived information from existing data sets. The crash data were coded in 143 different variables initially and have been improved continuously and the parameters count went to 300 in 2017 [32]. The system has key details of all accident and related parameters of people, vehicle, etc. The key parameters of FARS data comprise of [33]: 1. 2. 3. 4. 5.

Accident-level (crash level) data Vehicle-level data Person-level data Parked and working vehicle data Alcohol Multiple Imputation for crash and Alcohol Multiple Imputation for person’s data (MIACC and MIPER).

A total of around 300 variables are captured against the parameters specified above in FARS data. For accident level, it collects 51 variables; for vehicle-level information, approximate 104 variables are collected; at person level, 66 data variables are recorded; parked and working vehicles have 59 data variables, and MIACC has 9 variables and MIPER has 12 variables and compilation of all these information completes the FARS, making the system accurate [30]. Table 31.6 shows some of the vehicle-level data as recorded in FARS, and Table 31.7 shows crash-level infor-

31 Road Accidents in EU, USA and India: A critical analysis … Table 31.6 Vehicle-related data in FARS

431

Vehicle-level information—FARS data 1. Area of impact

23. Pre-impact stability

2. Attempted avoidance maneuver

24. Registered vehicle owner

3. Body type

25. Registration state

4. Bus use

26. Related factors

5. Cargo body type

27. Roadway alignment

6. Crash type

28. Roadway grade

7. Contributing circumstance, body type

29. Roadway surface condition

8. Device functioning

30. Roadway surface type

9. Emergency use

31. Rollover

10. Extend of damage

32. Sequence of events

11. Fire occurrence

33. Special use

12. Critical event—pre-crash category

34. Speed limit

13. Critical event—pre-crash event

35. Total lanes in roadway

14. Hazardous vehicle

36. Traffic control device

15. Model year

37. Traffic way description

16. Location of rollover

38. Travel speed

17. Most harmful event

39. Underride/override

18. Motor carrier identification number

40. Unit type

19. No of occupants

41. Vehicle configuration

20. Pre-event movement (prior to accident)

42. Vehicle make

21. Pre-impact location

43. Vehicle model

22. Gross vehicle weight rating

44. Vehicle removal

mation. The vehicle-level information considers variable related to roads and other related parameters. The crash details, collision details, vehicle registration, work zone as cited. A sample of the data variables recorded for vehicle level and crash level is shown in Tables 31.6 and 31.7. The variables are collected in the standardized forms. The CISS and CRSS also capture some of the similar data sets; however, FARS focuses only on fatalities. The detailed approach of data collection and regular revamping of the data collection systems present the importance of the road accident and awareness of authorities toward controlling it with a system sufficient with present scenario and established futuristic vision. And the deep extraction of information from the data gives enough

432

A. N. Jha et al.

Table 31.7 Crash-level data in FARS

Crash-level information—FARS data 45. Arrival time EMS

46. No of non-motor vehicles occupant forms submitted

47. Atmospheric conditions

48. No of motor vehicle occupant forms submitted

49. State

50. No of vehicle forms submitted

51. City

52. Rail grade crossing identifier

53. Country

54. EMS time at hospital

55. Special jurisdiction

56. Notification time EMS

57. Crash date

58. Related factors—crash level

59. Crash events

60. Relation to junction

61. Crash time

62. Roadway function class

63. Mile point

64. Route signing

65. First harmful event

66. School bus-related

67. Global position

68. Traffic way identifier

69. Light condition

70. Work zone

71. Manner of collision

72. National highway system

knowledge to be used in planning. The accident data captured is not similar to what is done for Europe due to its different rules, regulations, and other procedure.

31.2.3 Data Collection in India 31.2.3.1

Safety Scenario

India has a larger share of injuries and fatalities among the low- and middle-income countries [34]. Road network spectrum of India is spread over 5,231,922 km [35] as on March 31, 2013, contributing 4.8% in GDP [36] of India. The summary of the data around accidents and impact of accidents are given in Table 31.8. The National Crime Records Bureau of India shows that in 2015, 149 fatalities reported a density of eleven deaths per one lakh population as compared to rates of other countries, where rates tended to be around three to four deaths per one lakh [4]. Some states also improved by managing fatalities. Figure 31.2 shows a comparison metrics for fatal accidents. Tamil Nadu, Himachal Pradesh, Karnataka, and Andhra Pradesh have an increase in accidents over period and possible reason could be industrialization and tourism in case of Himachal Pradesh. Delhi has reported decline in number of accidents [37]. The report shows a 3.1% increase observed in 2015 compared to 2014. India has witnessed a compound annual growth increase of 3.8%

31 Road Accidents in EU, USA and India: A critical analysis …

433

Table 31.8 Summary of accidents with injury and fatality S. no

Year

Road accidents (in thousand)

Injuries (in thousand)

Fatalities (in thousand)

No. of vehicles (in thousand)

1

2011

440.1

468.8

136.8

141.86

2

2012

440.0

469.9

139.0

159.49

3

2013

443.0

469.9

137.4

182.44

4

2014

450.9

477.7

141.5

182.45a

5

2015

464.6

482.3

148.7

182.45a

a Figures

of the year 2013 used due to non-availability of data

Fig. 31.2 Number of fatalities per 100,000 persons in India

compared to 1979–2009 with 5.7% increase in fatalities and 5.2% in injuries. These data are given in Tables 31.8 and 31.9. The diversity in population density, demographics, and other factors is possibly the factors in traffic accidents. The country has embarked a trend increasing on positive side in accidents as well as fatalities. The road traffic system and operations have deficiencies. The accident records are deficient in providing best clue accident details. As per statistics from road transport ministry of Government of India [39], for a period from 2011 to 2017, as given in Fig. 31.3, an average of 487 K accidents

434

A. N. Jha et al.

Table 31.9 Growth in number of vehicles and road accidents in India (1970–2009) [38] GDP (INR in 10 M)

Road crashes (’000)

Fatalities (’000)

Persons injured (’000)

Registered vehicles (’000)

Road length (’000 km)

Population (’000)

Fatalities per 10,000 vehicles

1970

474,131

114.1

14.5

70.1

1401

1188.7

539,000

1980

641,921

153.2

24.6

109.1

4521

1491.9

673,000

103 54

1990

1,083,572

282.6

54.1

244.1

19,152

1983.9

835,000

28

2000

1,864,773

391.4

78.9

399.3

48,857

3316.1

1,014,825

16

2006

2,848,157

460.9

105.8

496.5

89,618

3880.6

1,112,186

12

2009

4,464,081

486.4

125.7

515.5

114,951

4120.0a 1,160,813

11

Source Reserve Bank of India and Ministry of Shipping, Road Transport and Highways, GoI a 2009 road length is estimated

Fig. 31.3 Total road accidents during 2011–2017

were reported with highest in 2015. From 2015 to 2018, an intermittent variation is observed; however in the period from 2015 to 2017, there is a decline in total accidents; however, number of deaths was highest in 2016 with 150.8 K and 2017 with 147.9 K deaths as shown in Fig. 31.4. The dependency can be elaborated as number of accidents reported are lesser due to better infrastructure and safety measures but severity of those accidents would have been much higher leading to more deaths. The only reason with the hypothesis can be derived that the improved infrastructure, e.g., roads, lighting, etc., has some role in severe accidents.

31.2.3.2

Accident Variables

The ways of accident data collection have evolved over the time. These methods were not following a similar approach but converge at a same magnitude of collection of variables. The primary source if accident data, like other countries, is available with

31 Road Accidents in EU, USA and India: A critical analysis …

435

Fig. 31.4 Total road deaths during 2011–2017

police. Police files first information report (FIR) that comprises of all the data related to road accident considering all factors on road, vehicle, and victims. Some of these recorded data depends on emergency services, ambulances, trauma centers, hospitals, victims, CCTVs, and any other witnesses and other visual inspection. Most of the accidents go unreported as well, and thus, no data is available for such cases and the figures can only be estimated. The operational procedure in India involves emergency services of police based on information or escalation about incident from a victim or passerby or otherwise hospital or ambulance services. The primary data collection in past had localized format that varied from states to states and differ even with in a state. The local language was used, and hence, it was difficult to analyze the historical data. Government of states and union territories has taken initiatives to identify a common set of data to be recorded and have adopted a newly designed Accident Recording Format in 2017. The new format was based on analysis of historical recorded data and identified the parameters and that has information not limited to date and time, location, victim(s) and other vehicle/accused details, age and gender, injury severity, fatality and other details like road conditions and lighting conditions. The hospital emergency or trauma center collects other related information like any alcohol in blood, preexisting diseases if any, injury impact and other health conditions like BP and pulse. Based on accident details and patients/victims condition, hospital’s data are used for evaluating injury severity and possible causes of fatality, and once that data are available, it is simply placed in category of injury. The Ministry of Road Transport and Highways of Government of India has established format for recording accidents [40]. The information is recorded in Road Accident Reporting Form which has more than 50 parameters covering all perspectives of road accidents data. The new systems have details of timings, location description with the nearest police station, details of impacting vehicle, victim and infrastructure, road and environment factors. The parameters are collected in different segments as:

436

1. 2. 3. 4.

A. N. Jha et al.

Accident details Details of road infrastructure factors Details of vehicles, injuries, and other information Impacting vehicle and drivers details, and other person’s impacted on other than occupants.

All the variables are recorded by police as shown in Table 31.10. The data attributes recorded by police in road accident are segmented into three entities. Around 60 parameters are recorded, as shown in Table 31.10. The process of collecting the detailed variables has evolved over the time and provides a better instinct. These key factors are emphasized to be reported in FIRs by police officer. The factors are localized to Indian ecosystem. The accident type elaborates if the accident has caused a fatality or grievous or a minor injury, no injury cases are also captured. Total number of fatalities and injuries are also collected. Accidents do result in property damage and is being recorded as damage of public property or private or any other property. Weather conditions are one of the major causes, e.g., rain, heavy dust storms, fog, etc., that create impact on traffic flow. The collision can be caused by static pole or tree or by animals; it may be due to a pedestrian. The collision with other vehicles has a limited dimension of collision from back or Table 31.10 Details of accident variables recorded in accidents Accident data

Road-related data

Vehicle data

Victim data

1. Name of place 2. Police station 3. District 4. State 5. Date and time of accident 6. Holiday status 7. Type of area 8. Accident type 9. No of fatalities 10. No of persons grievously injured 11. No of persons minor injuries 12. Property damage 13. No of motorized vehicle involved 14. No of non-motorized vehicle involved 15. No of pedestrians involved 16. Type of weather 17. Hit and run 18. Type of collision 19. Accident severity

20. Road name 21. Road number 22. Landmark 23. Road chainage 24. GPS 25. Lanes 26. Surface condition 27. Road type 28. Physical divider 29. Ongoing road works/construction 30. Speed limit 31. Visibility (at the time of accident) 32. Accident spot 33. Road features 34. Road junction 35. Type of traffic control (if accident is at junction) 36. Pedestrian involved

37. Vehicle registration no.—all involved vehicles 38. Vehicle type 39. Age of vehicle 40. Maneuvering at crash 41. Load condition 42. Disposition 43. Mechanical failure, hazardous cargo, fire, other 44. Impacted vehicle—count, type 45. Vehicle model and make

46. Victim count 47. Occupant vehicle 48. Impacting vehicle 49. Age 50. Gender 51. Seating position 52. License no 53. License type 54. Using requisite safety 55. Injury or fatality 56. Severity 57. Preexisting injury 58. Number of days in hospital 59. Others—pedestrian, cyclist 60. Count 61. Age 62. Gender 63. Type of injury 64. Safety device

31 Road Accidents in EU, USA and India: A critical analysis …

437

side or front and other cases like running of the road, overturning, skidding, and collision during parked state. The road conditions are more centered toward roadrelated parameters like name and number whereever applicable; landmark is one of the important parameters and chainage of the link. GPS latitude and longitude are keys to all these accident locations. The road type defines it as a rural, or urban, highway, or any other type and the lanes. Ongoing road constructions with out proper signals or lights and at times over speeding, wrong driving practices cause accident resulting in fatalities and grievous injuries. The data about speed signage are collected along vehicles’ speed, and this helps in understanding the pattern and designing the accident prone zones. The system also collects all such details and environment factors like visibility during accident. The spot of accident defines the locality if it is a market or residential, commercial or institutional area and road features like potholes, bridge, curved road, and straight road. The locality spot segments and helps in classifying the proper accident parameters. The traffic flow and controls are one of the important parameters, as all the road and traffic flow are not organized. Though signals are automated at most of the junctions, the case differs from location to location and sometime police also controls it. Details of vehicles comprise of different types of vehicle involved in accidents. There are cases when impacting vehicle is unknown and those cases stand valid in hit-and-run accidents. The health of vehicle depends on its registration when it started and mileage that it has already done. Improper vehicle health may lead to mechanical failures. The accident is dependent on driver that will further depend on its age, experience, and information of any past offenses. The license of the driver answers most of the things; however, other factors are to be identified if the proper safety measure like seat belt and helmet is used, or driver is using mobile phone while driving, drunken driving, driving on wrong side, over speed, jumping signals or possibly when no cases, but the driver of vehicle is impacted due to other reasons, e.g., weather condition leading to static objects falling down, animals coming in front of vehicle or at times sudden appearance of pedestrian or other vehicles on way. All these parameters recorded are used in extracting several different types of reports based on this data. And these are further used for analysis and draw useful depth and understanding toward root cause and design preventive measures.

31.3 Methodology and Analysis The study is aligned in understanding the data variables recorded after an accident occurs and collected mostly from accident site and other relevant details from other stakeholders, like emergency services, medical care, CCTV on a broader perspective the variables in USA, EU, and India, are always classified into segments of accidentlevel data, vehicle-related data, person-related data and road- and environmentrelated data. These four sets are identified as common heads in the countries and within these various variables are collected.

438

A. N. Jha et al.

The study follows a qualitative approach with the understanding of procedural handicraft of variables accorded in different countries evolved over the time. The evolution occurred due inability of the existing systems to accommodate different cases of accident parameters. The literature of data systems followed in European Union, USA, and India is presented amalgamated with the existing accident recording systems with deliberations on the parameters. The quantitative assessment and hypothesis-based dependency cannot be performed as the recorded variables follow same semantics but presented in different formulation. The changing trends in road accidents due to several factors like environmental, technology, and infrastructure motivated a change in entropy of road transportation which had affected road accidents and that further acted to identify techniques and processes for recording accidents. The EU countries followed their own structure, processes, and mechanism for recording accidents. Over the time, it was realized to have a standard formulation of schema to record accident variables and the same has been justified with the migration from CARE to CADAs for EU and GES, CISS, CDS, CRSS, and FARS [41]. Many data went missing while capturing by the countries and other field-level issues, which lead to migrate to a new system [42]. The new system, CADaS, contains all possible values considering the requirement for EU countries. The variables are aligned to fulfill the all information required to derive relative information; however, information like vehicle health, drivers’ health could have been embedded for a better collection and understanding the behavior of drivers. The way the accident rate is compared in consecutive years and should also consider the increase in population density, increase vehicle density, new roads and speed limits, and any other relevant inputs. The FARS system has been engineered after merging with GES system and hence evolved as a standard system in USA over the time thus having a strong validated schema structure enough to channelize the information at various levels. In comparison with the systems followed at European Union, FARS database manages all information at various levels of filtering. With a large categorization of variables in FARS, as suited to subcontinents of USA, both the FARS and CADaS have their own relevance. Most of the accident-, vehicle-, and victim-related data are similar but further generalization on these is aligned with other cases of accidents. Unlike systems established in European Countries and USA, where data population is organized considering the situation of respective countries, recording of data in India followed the approach of collecting and signing off FIR reports. The FIR data have a standard structure, and data are extracted from the FIR files for relevant variables. The record-keeping system has been digitized, and variable collection system has been organized so as to collect segregated data right at the site. The data collected are aligned to standard collection of data variables, viz. accident-related parameters which contains all accident-related details, road and environment details, vehicle-related parameters, victim- and pedestrian-related details. The systems established in European Union and USA have evolved over the time learning from existing systems and new functions joining in those parameters in real time. The systems are stabilized now, though these are flexible enough to capture

31 Road Accidents in EU, USA and India: A critical analysis …

439

any new different accident-related parameter to be brought in. All the countries who have established a system based on the requirements after proper research at field levels and also validated from present information and past information. However in the systems, provisions of involving futuristic possible variables are not being considered or explained. The variables and structure are strict in natures and having those provisions are a continuous approach. The upcoming innovations in transportation sector, with new kinds of roads, flyways, and vehicles and many other kinds of accidents or may be such exceptional cases that may have huge loss, if occurs, shall also be provisioned. India on the other side has established the system fitting to the environment suiting all the possible and significant values. At one side, it can be commented that India has not intensive accident parameters recording after accident and this can be solved on ground of the country which is growing with very high population density and police department, who are primarily responsible for data collection, ensures to collect maximum data in limited span of time frame and with these collected details; police and research departments are able to draw significant conclusions. On the other side, it can be also be explained that the parameters are comprehensive considering the diversity of different states in India and in parallel also aligned to other countries’ referred in this paper. The system is used uniformly across the country and now designed in a way to extract knowledge from data which envisages the different preventive measures. The semantics validation in the observational study can be done by manual observations as the parametric differentiations may vary based on recorded accident in different regions. With more count of variables recorded in accident data at time of accident, the system becomes more prone to human error and can have errors, missing values [42]. The flexibility in the systems in accommodating the new variables based on case to case basis extends the system so broader that in analysis on pattern of accidents, within its own attributes; it becomes difficult to derive calculations as the attribute in one case be available but may not for other and the behavior could be sparse for a shorter period. The hypothesis can be established based on review of actual data; however in the case, the study is limited to observation and study of accident parameters. On the other side, in case of India, the FIR parameters were initially taken as manual textual record and later organized by extractions and now digitized. We discussed the limitation exist as we have minimal data set which may be incapable of unforeseen situations which may come due to changes in various dependency factors. The study of these variables in different countries can be used to identify the variables which are more prone to cause of accident. However, unlike USA and European Union, the system misses out some of the parameters, e.g., pre-impact information about the different road factors, traffic flow, vehicle factor like tire bursting, roadway grade, type of road, and concrete road. However, all the parameters more or less provide same inputs some in more detailed and precise manner. These similarities establish a direct relationship in accident patterns with lots of dissimilarities due to different segments of vehicles, road users, road types, environment conditions, etc.

440

A. N. Jha et al.

The effectiveness of data variables collection and interventions has to be enforced strictly in a multi-disciplinary way which includes at different steps, strict enforcement, engineering and awareness, education and other psychological approaches. With availability of data of accidents, road safety interventions must not only address the sustainability of the outcomes but also the cost-effectiveness to implement and maintain it. With the availability of large number of recorded entities and attributes, the database values need to identify the key factors that can be held responsible in compared with others and shall be controlled to minimize the crash cases. Techniques like PCA can be used on these variables with respective collected values. Further in case of errors and missing information [42], same can be handled in a proper way, more the data and variables, better will be performance in any kind of predictive approach.

31.4 Conclusion Ensure a safer system for road users, preventive measures are to be designed and these precautionary and preventive measures plans are based on accident data which makes it the back bone of any country’s road safety system. The study presents facts and figures of entities captured in road accident in countries articulated in a prospective observational approach, intended to understand the standard variables and variations with respect to different geographically, culturally diversified and different economical deliberations in the countries. The study explains the impact of road traffic accidents and its affects globally, causing injuries and fatalities to road users and other wastage of resources. Road safety data helps in understanding the trends based in values collected under these variables over the time may be explained based on various parameters. The study elaborates importance of data from accident sites and available accident data and provides a perspective of importance of structured data for managing all sets of information. It explains the scenarios of European Union, USA, and India about their systems of accident data collection and management with their structured architecture toward various kinds of data sets recorded and organized in their systems from accident sites and integrated with other respective departments like hospitals, highways, and roads and briefs about the structured system established after their research for managing, storing information in a database system. The prospective observation approach shows that most of the parameters in all these countries are almost similar. The parameters in the variables of vehicle, accident and victim data are much lesser and collect limited information when compared with the ones in CADaS and FARS as in Tables 31.3, 31.4, 31.5, and 31.7, respectively. These two systems are intensive in their metadata. The system in India needs more enrichment to cover more data and thus derive information. Some of the standard variables can be taken from the CADaS and FARS systems; however, significance of difference of these variables also needs to be justified.

31 Road Accidents in EU, USA and India: A critical analysis …

441

The semantics validation in the observational study can be done by manual observations as the parametric differentiations may vary based on recorded accident in different regions. The study of the countries shows that there was significant changes in systems over the time which are genuine requirements leading to large number of bifurcation in variables. With large collection of variables being recorded from an accident site, it definitely empowers the system; however, any human error or missing data will also affect the system [42]. The flexibility should also be restricted, as in case of EU the countries can modify the standard structure of CADAs which definitely affects the central system, if organized properly. However, such needs are valid in recording more specified and details information from accident site, but it should have flexibility to some extent. The FIR system in India follows a subjective approach with the notes are entered based on information available at the accident site and which is further extracted for use in deriving right variables. Such systems are prone to errors as well and mostly due to missing in recording all the facts and figure and at the time of extraction of information. The digitization and restructuring in crash recording forms are in place and designed based on standard parameters recorded in past, evaluated, and finalized considering different regions. There are several other factors missed by the countries accountable for road accident, that can be referred as a preventive checkups, e.g., tire pressure, alcohol limit, mechanical health of vehicle, use of mobile phone, seat belt, etc. The analysis and study can be used for a better planning toward understanding the root cause of road accidents and minimizing them. It is also concluded about several possible information that can be collected from the site of accident are not recorded, and hence, many of these findings are limited in nature. The countries with the effective data recording mechanism several algorithms and technologies can be developed and pre incident systems can be created, and post-incident trauma care can be planned effectively. On other side, in case of India, the FIR parameters were initially taken as manual textual record and later organized by extractions and now digitized. We discussed the limitation exist as we have minimal data set which may be incapable of unforeseen situations which may come due to changes in various dependency factors. With innovations in transportation sector, new kinds of roads, flyways and vehicles coming in place, the system for managing accident data need continuous improvements too. Finally in planning and designing of safety framework for road users, understanding of the accident data, management of accident and derivation of respective information from the collected data has special importance The study of these variables in different countries can be used to identify the variables which are more prone to cause of accident and can be useful in pre-crash monitoring systems and prevention strategies and post-crash support like trauma centers in an cost-effective and sustainable way.

442

A. N. Jha et al.

References 1. World Health Organization (2018) Global status report on road safety 2013: supporting a decade of action. World Health Organization 2013 [Online] http://www.etsi.org. Accessed: 17 Nov 2018 2. Global Plan for the Decade of Action for Road Safety, 2011–2020, https://www.who.int/ roadsafety/decade_of_action/en/ 3. Fogue M, Garrido P, Martinez FJ, Cano J-C, Calafte CT (2013) A novel approach for traffic accidents sanitary resource allocation based on multi-objective genetic algorithms. Expert Syst Appl 40(1):323–336 4. National Crime Records Bureau (NCRB) (2015) Accidental deaths and suicides in India 2014, National Crime Records Bureau, Ministry of Home Affairs, New Delhi, https://ncrb.gov.in 5. CARE: European Road Accident Database, http://ec.europa.eu/transport/road_safety/ specialist/statistics/index_en.htm 6. Department of Transportation Reports, USA, https://www.transportation.gov/ 7. Global Burden of Disease Study, https://www.thelancet.com/gbd 8. International Traffic Safety Data and Analysis Group (IRTAD) (2014) Annual report. OECD International Transport Forum, Paris, p 526 9. OECD/ITF Road Safety Annual Report (2016). OECD Publishing, Paris 10. Jha N, Srinivasa DK, Roy G, Jagdish SJ (2014) Epidemiological study of road traffic accident cases: a study from south India. Ind J Commun Med, Mar 2014 11. Sleet DA, Baldwin G, Dellinger A, Dinh-Zarr B (2011) The decade of action for global road safety. J Saf Res 42:147–148 12. Organisation for Economic Co-operation and Development (OECD) (2000) International seminar on road traffic and accident data needs for the new century. OECD Seminar, Vienna 13. Bliss T, Breen J (2012) Meeting the management challenges of the decade of action for road safety. IATSS Res 35:48–55 14. Mohan D, Tsimhoni O, Sivak M, Flannagan MJ (2009) Road safety in India: challenges and opportunities. The University of Michigan Transportation Research Institute, Ann Arbor, MI, pp 1–57 15. Montella A, Chiaradonna S, Criscuolo G, Martino SDJ (2017) Development and evaluation of a web-based software for crash data collection, processing and analysis. Accid Anal Prev, Jan 2017 16. Wegman F (2013) Road safety in India: a system approach SWOV Institute for road safety research. Delft University of Technology, The Netherlands; 5th TRIPP Annual Lecture, TRIPP IIT Delhi, New Delhi, India, RP13-01, https://repository.tudelft.nl/islandora/object/ uuid:07131621-fde2-4359-84e4-b3425d90c68f/datastream/OBJ/download 17. Road Traffic Injuries (2018) https://www.who.int/news-room/fact-sheets/detail/road-trafficinjuries, Dec 2018 18. Chokotho LC, Matzopoulos R, Myers JE (2013) Assessing quality of existing data sources on road traffic injuries (RTIs) and their utility in informing injury prevention in the western cape province, South Africa. Traffic Inj Prev 14:267–273 19. CARE: Community Road Accident Database, http://ec.europa.eu/idabc/en/document/2281/ 5926.html 20. Eurostat: Statistical Office of the European Communities (2012) Transport statistics in the European Union, https://ec.europa.eu/eurostat 21. Frantzeskakis J, Yannis G, Handanos J (1998) Road safety in the urban network and access management. In: Proceedings of the second pan-hellenic road safety conference, University of Thessalia—Technical Chamber of Greece, Volos, May 1998, pp 347–358 22. Yannis G, Evgenikos P, Chaziris A (2009) CADaS—a common road accident data framework in Europe. In: IRTAD conference 2009. Korea 23. De Meester D (2011) Recommendation for a common accident data set. Reference guide, version 3.11

31 Road Accidents in EU, USA and India: A critical analysis …

443

24. Evgenikos P (2009) Road safety data: collection and for targeting and monitoring performance and progress. In: 4th IRTAD conference, Seoul, Sep 2009 25. National Highway Traffic Safety Administration (NHTSA), https://www.nhtsa.gov/ 26. Injury facts: deaths by transportation modes, https://injuryfacts.nsc.org/home-and-community/ safety-topics/deaths-by-transportation-mode/ 27. National Safety Council of USA, https://www.nsc.org/ 28. Traffic Safety Facts Annual Report Tables, NHTSA, https://cdan.nhtsa.gov/tsftables/tsfar.htm 29. National Automotive Sampling System General Estimates System (NASS GES), https://www. nhtsa.gov/research-data/national-automotive-sampling-system-nass 30. National Center for Statistics and Analysis (NCSA) Publication of United States https:// crashstats.nhtsa.dot.gov 31. Fatality Analysis Reporting System (FARS), https://www.nhtsa.gov/research-data/fatalityanalysis-reporting-system-fars 32. Chen C, Subramanian R, Zhang F, Noh EY (2015) Proceedings of the 2015 federal committee on statistical methodology (FCSM) research conference. New Jersey 33. NHTSA’s Fatality Analysis Reporting System (FARS) Data, https://www.nber.org/data/fars. html 34. Statistical Year Book of India, http://mospi.nic.in/sites/default/files/Statistical_year_book_ india_chapters/ACCIDENTS.pdf 35. Basic Road Statistics of India, 2012–13, TRW, MoRT&H, Government of India, http://www. indiaenvironmentportal.org.in/files/file/basic%20road%20statistics%20of%20india.pdf 36. Road Transport Year Book, 2012–13, TRW, MoRT&H, Government of India, http://morth. nic.in/ 37. Road Safety in Delhi Issues and Solutions, http://www.cseindia.org/userfiles/Anil% 20Shukla,%20Delhi%20Traffic%20Police.pdf 38. Reserve Bank of India, Government of India. Macro-economic aggregates (at constant prices) in India. Available from: http://www.indiastat.com 39. Ministry of Road Transport, Government of India, http://morth-roadsafety.nic.in/report.asp 40. Accident recording format, Ministry of Road Transport and Highways, Government of India, http://pib.nic.in/newsite/PrintRelease.aspx?relid=158600 41. European Commission (2008) Directive 2008/96/EC of the European Parliament and of the Council of 19 November 2008 on road infrastructure safety management. Off J Eur Union L. 319/59. Iowa Department of Public Safety, 2016. Highway 42. Jha AN, Chatterjee N, Tiwar G (2018) Data recording patterns and missing data in road crashes: case study of five Indian cities. In: 15th world conference on transport research (unpublished)

Chapter 32

Improving the QFD Methodology Yury S. Klochkov, Albina Gazizulina, and Maria Ostapenko

32.1 Introduction The article discusses the theoretical provisions and practical recommendations for the implementation, development and use of QFD analysis to improve the competitiveness of the organization and customer satisfaction, explored the application of Weber–Fechner law. Based on the analysis of theoretical and practical works of [1–8] and other authors, we can conclude that currently a lot of work has been devoted to the problems of consumer satisfaction research. However, most of the research involves processes to ensure the quality of the products produced at each stage of the life cycle, which would guarantee a result that meets the requirements and expectations of the consumer. Attention is paid to issues of competitiveness of organizations to a lesser extent, which is currently the most urgent problem for domestic manufacturers [9]. A universal tool for product development is QFD analysis, which forms a continuous information flow, ensuring that all elements of the production system are interconnected and subordinated to consumer requirements [10–18]. From the point of view of the organization’s management, the introduction of QFD analysis is seen as an improvement in the design, technology or process in order to save costs, improve quality and other strategic goals to ensure the competitiveness of the organization [19].

Y. S. Klochkov (B) · A. Gazizulina Peter the Great St. Petersburg Polytechnic University, St. Petersburg, Russia e-mail: [email protected] A. Gazizulina e-mail: [email protected] M. Ostapenko Tyumen Industrial University, Tyumen, Russia e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 P. K. Kapur et al. (eds.), Strategic System Assurance and Business Analytics, Asset Analytics, https://doi.org/10.1007/978-981-15-3647-2_32

445

446

Y. S. Klochkov et al.

The disadvantages of the existing approach are that the method of calculating the significance of product characteristics does not take into account the degree of the organization’s lagging behind competitors.

32.2 Key Research Findings 32.2.1 The Method of Calculating the Importance of Product Characteristics, Which Allows Taking into Account the Result of the Comparison of Products with Competitors At present, the calculated weight of the characteristics of the design, technology or process is key in determining the path of improvement, i.e., increase the competitiveness of products. Since in the conditions of competition there are several organizations in the market, it is therefore necessary to take into account the degree of fulfillment (satisfaction) of customer requirements (competitive advantages) in relation not only to the organization in question but also to all competitors. Therefore, the model for calculating the weight characteristics of products can be represented as B = f (Ix ; Rx ; Px ; C x ; K x ),

(32.1)

where I x is the relationship between the requirement and product characteristics ( – 1; – 3; – 9), Rx —absolute weight of the requirement, Px —the difference between the target and the obtained values of the customer satisfaction level (or the difference between the values of the competitor and the organization) (taken into account only if the results of the calculations are non-negative), K x —evaluation of the implementation of the requirements of competitors, C x —the target value of the degree of implementation of customer requirements [20–30]. In the classical model for calculating the weight of characteristics of a structure, technology or process, the position on the market is not taken into account. That is, it is necessary to fulfill not only the requirements of the consumer but also focus on competitors. Thus, we need to expand the list of components of product competitiveness and when analyzing consumer assessment of products manufactured by an enterprise and its competitors [31]. We need to specify the degree of implementation of requirements depending on the weight of its importance, and then, the revised calculation of the weight of characteristics of a design, technology or process is as follows:   (32.2) B1 prod.cr. = [Px /C x (Ix · Rx )], (Ix · Rx ) +

32 Improving the QFD Methodology

447

where I x is the relationship between the requirement and the product characteristics ( – 1; – 3; – 9 ), Rx is the absolute weight of the requirement, Px is the difference between the target and obtained values of the customer satisfaction level (or the difference between the values of a competitor and the organization in question) (taken into account only if the results of the calculations are non-negative), C x is the target value of the degree of implementation of the consumer’s requirement. Comparing the estimates of the degree of satisfaction of the requirements of new products and the products of competitors in the market, it is possible to identify ways to improve the designed products. The calculation of the characteristics of the design, technology or process, taking into account the difference between the target and the obtained values of the level of customer satisfaction (Px + 1) is as follows: B2 prod.cr. =



(Ix · Rx )(Px + 1),

(32.3)

where I x is the relationship between the requirement and the product characteristics ( – 1; – 3; – 9 ), Rx is the absolute weight of the requirement, Px is the difference between the target and obtained values of the customer satisfaction level (or the difference between the values of a competitor and organizations) (taken into account only if the results of calculations are non-negative). The potential problem of QFD and a high probability of losing a consumer is that the competitiveness of products, which include, for example, competitors’ activities (new products), new technologies, tools or materials, is not taken into account. Thus, the exceptional orientation of products to the wishes of consumers is not the optimum when considering the quality problem in a holistic way. The weight characteristics of the design, technology or process, taking into account the difference between the target and the obtained values of the level of customer satisfaction, can be calculated using another dependency: B3 prod.cr. =



(Rx + Px ) · Ix ,

(32.4)

where I x is the relationship between the requirement and the product characteristics ( – 1; – 3; – 9 ), Rx is the absolute weight of the requirement, Px is the difference between the target and obtained values of the customer satisfaction level (or the difference between the values of a competitor and organizations) (taken into account only if the results of calculations are non-negative). Since the orientation toward competitors appears in the chain of improvement of a design, technology or process, the model for calculating the weight of characteristics of a design, technology or production process can be represented as in Fig. 32.1. The proposed QFD model allows you to highlight the strengths of competitors’ products, and after analyzing them, offer even stronger technical solutions as an alternative. QFD allows you to focus on the best in the industry producers of specific products (leading in the market). This method allows you to identify their technical and technological advantages and on their basis to develop their own, even more,

448

Y. S. Klochkov et al.

Fig. 32.1 Competitive QFD model

Table 32.1 Recommendations on the choice of dependencies for calculating the weight of product characteristics with regard to competition Level of competition Low B3 prod.cr. =



Medium (Rx + Px ) · I x

High 

B1 prod.cr. = (I x · Rx ) +  [Px /C x (I x · Rx )]

B2 prod.cr. =  (I x · Rx )(Px+1 )

technical solutions for their products, which will absorb all the requirements of potential customers. Recommendations on the choice of dependencies for calculating the weight of product characteristics with regard to competition are presented in Table 32.1. The basic model of the house of quality may include a focus on market position and thus take into account the importance of meeting one or another requirement from competitors. Thus, we get additional information that is unknown to the consumer but must be implemented to improve competitiveness [32–39]. For the manufacture of technically advanced products that would satisfy both the manufacturer and the consumer, it is necessary to take into account all the presented requirements, both for the product itself and for the processes of its production and assembly. This allows you to increase the degree of accuracy in the formation of requirements for products, to provide the optimal combination of optimization parameters for the designed products, to increase its reliability and efficiency.

32.2.2 Method of Calculating the Significance of Product Characteristics The method of calculating the significance of product characteristics, based on the use of the Weber–Fechner law, shows an algorithm for choosing methods for calculating the significance of product characteristics that ensures the reliability of the calculation results. The model for calculating the weight of characteristics of a structure, technology or process based on the use of the Weber–Fechner law includes a coefficient k, which is a constant depending on the subject of the sensation. k = 2 is the degree to which

32 Improving the QFD Methodology

449

the difference between the target and the obtained values of the level of customer satisfaction is found. This chapter examines the previously established dependencies of the QFD analysis methodology taking into account the market situation and introduces an additional coefficient k according to the Weber–Fechner law and constructed a map of the weights of the product characteristics taking into account the Weber–Fechner. B4 prod.cr. =



(Ix · Rx ) +

B5 prod.cr. =



B6 prod.cr. =



 Pxk /C x · (Ix · Rx ) ,

  (Ix · Rx ) Pxk + 1 ,



 Rx + Pxk · Ix ,

(32.5) (32.6) (32.7)

where I x is the relationship between the requirement and the product characteristics ( – 1; – 3; – 9 ), Rx is the absolute weight of the requirement, Px is the difference between the target and obtained values of the customer satisfaction level (or the difference between the values of a competitor and organizations) (taken into account only if the results of the calculations are non-negative), C x is the target value of the degree of implementation of the consumer’s requirement (Fig. 32.2). Recommendations on the choice of dependencies for calculating the weight of product characteristics with regard to competition are presented in Table 32.2. The essence of the expert assessment method is to conduct an intuitive–logical analysis of the problem with a quantitative assessment of judgments. The generalized

Fig. 32.2 Weights map of the product characteristics based on the Weber–Fechner law

450

Y. S. Klochkov et al.

Table 32.2 Recommendations on the choice of dependence for calculating the weight of product characteristics, taking into account the law of Weber–Fechner Level of competition Low B6 prod.cr =



Rx +

Pxk



Medium · Ix

High 

B4 prod.cr. = (I x · Rx ) +   Pxk /C x (I x · Rx )

B =  5 prod.cr.  k (I x · Rx ) Px + 1

expert opinion obtained because of processing is accepted as a solution to the problem. Comprehensive use of intuition (thinking without thinking), logical thinking and quantitative assessments with their formal processing allows you to get an effective solution to the problem. Because the consumer uses the organoleptic method, then we should take into account Weber-Fechner law effect. In accordance with the Weber–Fechner law, a difference of one point will correspond to a two-fold improvement, then if the assessment is performed on a five-point scale, it is necessary to review the difference between C max and C org , using the following scale (Table 32.3; Fig. 32.3). Then, the difference between the points is calculated as follows. The difference between five points and four points will be 40 units (80–40). The difference between four points and three will be 20 units (40–20). The difference between three points Table 32.3 An example of determining the weight of the points in accordance with the law of Weber–Fechner Points

1

2

3

4

5

Weight of points

5

10

20

40

80

Fig. 32.3 Example of determining the weight of the points in accordance with the Weber–Fechner law

32 Improving the QFD Methodology

451

Table 32.4 An example of the distribution of the importance of the requirements for the degree of satisfaction on one requirement Points

1

2

3

4

5

6

7

8

9

10

Weight of points

0

0

0

0

0

0

0

10

20

40

and two will be 10 units (20–10). The difference between two points and one will be 5 units (10–5). Such a determination of the weight of the points has its drawbacks, so if the number of points changes from 1 to 10, then the difference in the weights of the points may be too large. Another option is possible to determine the weight of the score supplied by the consumer. In cases where there are no grades, for example, from 1 to 7, the weights of the points can take the following form (Table 32.4). If there are values not important, then they can be excluded. Determine the weight of the points in general terms as follows: xi = i f. p. · 2(i−ii. p. ) ,

(32.8)

where ii.p Initial point, if.p. Final point, i Calculated point.

32.3 Implementing QFD Analysis An example for demonstrating new methods of applying the QFD methodology is the house of quality, which we built for electrical equipment. Electrical equipment located in the front part includes starter, generator, lighting devices, horn, power fuse box, electric fan and other electronic equipment. All of these devices, located in the front, in the engine compartment of the car are connected using a front wiring harness. Its schematic diagram is presented in Fig. 32.4. This electrical equipment is located in the front and rear of the car, which is connected with the wiring harness. In conducting the study, the method of written questionnaire was used, the method requiring less cost. A written survey took place in the service center during 2018. The respondents are the owners of cars that have had the wiring harness guarantee cases defects. In determining the requirements of the consumer, the House of quality was formulated, presented in Table 32.5. To determine the competitiveness of products, we conducted a study and found out how satisfied consumers are with these competing enterprises’. The data are summarized in Table 32.6. Designations on the diagram:

452

Designations on the diagram: 1,3 – a block headlight, 2 – electric motor washer fluid reservoir, 4 – engine starter, 5 – rechargeable battery 6 – power fuse block, 7 – generator, 8 – sound signal, 9, 10, 11 – to the instrument panel connectors, 12 – reversing lamp limit switch, 13 – cooling fan. Fig. 32.4 Front wiring diagram

Y. S. Klochkov et al.

32 Improving the QFD Methodology

453

Table 32.5 An example of the distribution of the importance of the requirements for the degree of satisfaction on one requirement Points

1

2

3

4

5

6

7

8

9

10

Weight of points

0

0

0

0

0

0

0

10

20

40

Table 32.6 Satisfaction with the products of competitors Consumer requirements

I

II

III

Regular start of the car engine on the first attempt

3

3

4

Increased protection of electrical wiring against short circuits (car fire)

3

4

5

Stable battery charge

4

3

4

No wiring noise

3

3

5

Regular operation of the lantern lighting in the trunk

5

4

4

Regular headlamps work

3

4

5

Regular rear window heating

4

5

4

Stable lighting in the cabin

5

5

5

Stable door lock operation

4

3

3

Stable parking sensor

3

3

5

1, 3 2. 4 5 6 7 8 9, 10, 11 12 13

A block headlight, Electric motor washer fluid reservoir, Engine starter, Rechargeable battery Power fuse block, Generator, Sound signal, To the instrument panel connectors, Reversing lamp limit switch, Cooling fan.

The diagram shows the following designation of numbers: the first indicates to which element the wire leads, and the second to which contact number. Thus, virtually every electronic element in the engine compartment is connected using this harness. The weight of product characteristics, taking into account the situation on the market, is calculated according to dependence 2 since the lag from competitors is average: B1 prod.cr. with due regard for competition =



(Ix · Rx ) +



[Px /C x (Ix · Rx )]

454

where

Y. S. Klochkov et al.

I x —connection

between demand and product characteristics ( – 1; – 3; – 9 ), Rx —absolute weight requirements, Px —the difference between the target and the obtained values of the customer satisfaction level (or the difference between the values of the competitor and the organization) (taken into account only if the results of the calculations are non-negative), C x —target value of the degree of implementation of customer requirements (Table 32.7). The weight of product characteristics, taking into account the situation on the market and the Weber–Fechner law (coefficient k = 2, which takes into account the degree of intensity of the expert-consumer sensation), is calculated from dependence 5:   Pk x · (Ix · Rx ) , B4 prod.cr. Weber-Fechner = Σ(Ix · Rx ) + cx

The results of calculations of the weight of product characteristics with due regard for competition and taking into account the Weber–Fechner law are summarized in Table 32.8. Comparing the assessment of the degree of satisfaction of the requirements of new products and products of competitors in the market, the directions for improving the designed products are identified. To improve the quality of production of wiring harnesses need to pay attention primarily to the improvement of such characteristics as: 1. No contact oxidation; 2. Pull force from the wire; 3. Integrity isolation.

32.4 Discussion and Results For most organizations that are just starting to implement the QFD methodology, the application of the presented improvements is not mandatory. The most demanded new approach will be in situations where the level of product competition becomes very tough. In this case, the consumer cannot always definitely feel the changes. For example, in the automotive industry, in the conditions of the same prices, it is very difficult to win the consumer. In such markets, we need advanced mechanisms for assessing the competitiveness of products. QFD is a methodology in which all customer requirements are integrated, an assessment of their feasibility is given. The proposed improvements can significantly expand the capabilities of the methodology. For example, taking into account the consumer’s perception of the quality level allows improving the effectiveness of measures to improve products. In conditions of limited resources, the selection of the most significant characteristics is a priority. The proposed calculation of the importance of product characteristics is based on a detailed analysis of the competitive

Importance

10

10

9

8

Consumer requirements

Regular start of the car engine on the first attempt

Increased protection of electrical wiring against short circuits (car fire)

Stable battery charge

No wiring noise

Breakout force from the wire

The force of the liner from the wire

The force of a separation of a tip from a wire

Characteristics of wiring harnesses

Table 32.7 QFD analysis of wiring harnesses Insulation integrity

The presence of a protective cap

The presence of adhesive tape (Scapa)

No contact oxidation

(continued)

No shells in the structure of the metal terminals

32 Improving the QFD Methodology 455

Importance

9

9

8

8

Consumer requirements

Regular operation of the lantern lighting in the trunk

Regular headlamps work

Regular rear window heating

Stable lighting in the cabin

Table 32.7 (continued)

Breakout force from the wire

The force of the liner from the wire

The force of a separation of a tip from a wire

Characteristics of wiring harnesses Insulation integrity

The presence of a protective cap

The presence of adhesive tape (Scapa)

No contact oxidation

(continued)

No shells in the structure of the metal terminals

456 Y. S. Klochkov et al.

Importance

9

9

1426

1

Consumer requirements

Stable door lock operation

Stable parking sensor

Absolute weight

Relative weight (%)

Table 32.7 (continued)

0.063

90

Breakout force from the wire

0.007

10

The force of the liner from the wire

0.242

345

The force of a separation of a tip from a wire

Characteristics of wiring harnesses

0.13

180

Insulation integrity

0.063

90

The presence of a protective cap

0.05

72

The presence of adhesive tape (Scapa)

0.384

549

No contact oxidation

0.063

90

No shells in the structure of the metal terminals

32 Improving the QFD Methodology 457

458

Y. S. Klochkov et al.

Table 32.8 Results of calculations of product characteristics, taking into account the situation on the market Bprod.cr. =



(I x · Rx )    Bl prod.cr = (I x · Rx ) + (Px /C x )(I x · Rx )   k  Px /C x (I x · Rx ) B4 prod.cr. = (I x · Rx ) +

a

b

c

d

90

10

345

180

e 90

f 72

549

g

h 90

126

14

438

252

126

115,2

661

126

162

18

515

324

162

201,6

725

162

environment, as well as an understanding of how the consumer reacts to changes in product quality. QFD should be taken as a tool for compiling interconnected matrices. The more knowledge about competitors and consumers, the more accurate the product quality calculation result will be. Managerial implications. It should be noted that for managers, it is necessary to consider how the procedures of communication with customers are developed in the organization. If the manufacturer pays little attention to customer opinion polls, the results of this study will be ineffective. This study refers to markets where design of technology changes is quite common, so automotive components have been chosen as an example. For such a market, this research is more relevant. As a result, the manager receives a reliable study of consumer opinions. This assessment provides an opportunity to develop the design and technology so as to obtain the maximum competitive advantage. If the products do not change so often, and the market for its sale remains stable for a very long time, the proposed method will be too complex and inefficient. Acknowledgements This research work was supported by the Academic Excellence Project 5-100 proposed by Peter the Great St. Petersburg Polytechnic University.

References 1. Antipov DV, Akhmetzhanova GV, Antipova OI, Gazizulina AU, Sharov R (2017) Organizational models of teal organizations. Paper presented at the 2017 6th international conference on reliability, Infocom technologies and optimization: trends and future directions, pp 222–230. https://doi.org/10.1109/icrito.2017.8342428 2. Arjunwadkar PY (2018) FinTech: the technology driving disruption in the financial services industry. Auerbach Publications 3. Ashta A, Biot-Paquerot G (2018) FinTech evolution: strategic value management issues in a fast changing industry. Strateg Change 27(4):301–311 4. McKelvey B (2018) Modelling and statistical analysis of empirical data. Handbook of research methods in complexity science: theory and applications, 221p 5. Bandara W (2018) Developing enterprise-wide business process management capability: a teaching case from the financial sector. J Inf Technol Teach Cases, pp 1–17 6. Behr A (2015) Stochastic Frontier analysis, production and efficiency analysis with R. Springer, Cham, pp 183–201 7. Blakstad S, Allen R (2018) New standard models for banking, FinTech revolution. Palgrave Macmillan, Cham, pp 147–166

32 Improving the QFD Methodology

459

8. Blomstrom D (2018) The relationship between financial services and technology. Emotional Banking. Palgrave Macmillan, Cham, pp 3–16 9. Battese GE, Coelli TJ (1995) A model for technical inefficiency effects in a stochastic frontier production function for panel data. Empirical Economics 20(2):325–332 10. Bychkova A, Rudskaia I (2018) Assessing the efficiency of a regional innovation system as one of the models for running an innovative business. Paper presented at the ACM international conference proceeding series, pp 208–212. https://doi.org/10.1145/3230348.3230350 11. Demidenko DS, Malinin AM, Litvinenko AN (2017) A new classification of the risks of the quality processes. In: Proceedings of the 30th international business information management association conference, IBIMA 2017—vision 2020: sustainable economic development, innovation management, and global growth, p 2897 12. Didenko NI, Skripnuk DF, Kikkas KN, Sevashkin V, Romashkin G, Kulik SV (2018) Innovative and technological potential of the region and its impact on the social sector development. In: International conference on information networking, pp 611–615. https://doi.org/10.1109/ icoin.2018.8343191 13. Didenko NI, Skripnuk DF, Mirolyubova OV (2017) Big data and the global economy. In: Proceedings of 2017 10th international conference management of large-scale system development, MLSD 2017, p 8109611. https://doi.org/10.1109/mlsd.2017.8109611 14. Didenko NI, Skripnuk DF, Mirolyubova OV, Merkulov V, Sevashkin V, Samylovskaya E (2018) System of econometric equations of the world market of natural gas. In: International conference on information networking, pp 217–222. https://doi.org/10.1109/icoin.2018.8343113 15. Didenko N, Skripnuk D, Mirolyubova O, Radion M (2017) Analysis of rural areas development of the region using the ADL-model. Res Rural Dev 2(2):142–147. https://doi.org/10.22616/ rrd.23.2017.061 16. Gazizulina A, Eskina E, Vasilieva I, Valeeva O (2017) The reasons for the increase in selforganization in companies. Int J Reliab Qual Saf Eng 24(6):1740002. https://doi.org/10.1142/ S0218539317400022 17. Ismagilova LA, Gileva TA, Galimova MP, Glukhov VV (2017) Digital business model and smart economy sectoral development trajectories substantiation. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 10531, pp 13–28. https://doi.org/10.1007/978-3-319-67380-6_2 18. Jaeger A (2018) Achieving business excellence through self-assessment for personal and professional excellence. Total Qual Manag Bus Excell 29(13–14):1612–1632. https://doi.org/10. 1080/14783363.2017.1288564 19. Demidenko DS, Malevskaia-Malevich ED, Dubolazova YA (2018) ISO as a real source of funding. Pricing issues. Paper presented at the international conference on information networking, Jan 2018, pp 622–625. https://doi.org/10.1109/icoin.2018.8343193 20. Klochkov Y (2018) Conflicts between quality management methods. Paper presented at the 2017 international conference on Infocom technologies and unmanned systems: trends and future directions, ICTUS 2017, pp 34–36. https://doi.org/10.1109/ictus.2017.8285970 21. Kolesnikov AM, Malevskaia-Malevich ED, Dubolazova YA (2017) Peculiarities of quality management methodology for innovation projects of industrial enterprises. Paper presented at the proceedings of the 30th international business information management association conference, IBIMA 2017—vision 2020: sustainable economic development, innovation management, and global growth, pp 2898–2901 22. Koop G, Osiewalski J, Steel MFJ (1999) The components of output growth: a stochastic frontier analysis. Oxford Bull Econ Stat 61(4):455–487 23. Kozlovskiy V, Aydarov D (2017) Analytical models of mass media as a method of quality management in the automotive industry. Qual Access Succ 18(160):83–87 24. Kozlovskiy V, Aydarov D (2017) Development of remote tools to assess the effectiveness and quality of car service enterprises work. Int J Qual Res 11(3):573–586. https://doi.org/10.18421/ IJQR11.03-06 25. Kumbhakar SC, Lovell CAK (2003) Stochastic frontier analysis. Cambridge University Press 26. Minarchenko IM, Saiko IL (2018) The future of neobanks in the development of banking sector

460

Y. S. Klochkov et al.

27. Ozerov ES, Pupentsova SV, Leventsov VA, Dyachkov MS (2017) Selecting the best use option for assets in a corporate management system. Paper presented at the 2017 6th international conference on reliability, Infocom technologies and optimization: trends and future directions, ICRITO 2017, pp 162–170. https://doi.org/10.1109/icrito.2017.8342418 28. Pogodaeva TV, Zhaparova DV, Rudenko DY, Skripnuk DF (2015) Innovations and socioeconomic development: problems of the natural resources intensive use regions. Mediterr J Soc Sci 6(1):129–135. https://doi.org/10.5901/mjss.2015.v6n1p129 29. Raskin D, Vassiliev A, Samarin V, Cabezas D, Hiererra SE, Kurniawan Y (2018) Rapid prototyping of distributed embedded systems as a part of internet of things. Proc Comput Sci 135:503–509. https://doi.org/10.1016/j.procs.2018.08.202 30. Romashkina GF, Didenko NI, Skripnuk DF (2017) Socioeconomic modernization of Russia and its arctic regions. Stud Russ Econ Dev 28(1):22–30. https://doi.org/10.1134/ S1075700717010105 31. Mednikov MD, Sokolitsyn AS, Ivanov MV, Sokolitsyna NA (2017) Forming optimal enterprise development strategies. Paper presented at the proceedings of the 30th international business information management association conference, IBIMA 2017—vision 2020: sustainable economic development, innovation management, and global growth, pp 1053–1063 32. Rudenko AA, Iskoskov MO, Antipov DV, Polyakova TV, Zaharov SO (2016) Peculiarities of enterprises functioning under conditions of cyclicality of the economy. Int J Econ Financ Issues 6(2):219–224 33. Schmidt P (2002) Stochastic frontier analysis, F156–F158 34. Sedevich-Fons L (2018) Linking strategic management accounting and quality management systems. Bus Process Manag J 24(6):1302–1320. https://doi.org/10.1108/BPMJ-02-2018-0038 35. Silkina GY (2017) Information and communication technologies in ensuring of innovative development. Paper presented at the proceedings of the 29th international business information management association conference—education excellence and innovation management through vision 2020: from regional development sustainability to global economic growth, pp 1165–1176 36. Sokolitsyn AS, Ivanov MV, Sokolitsyna NA (2017) Financial policy: defining short-term credit under fixed and circulating capital for providing financial sustainability of industrial enterprise development. Paper presented at the proceedings of the 29th international business information management association conference—education excellence and innovation management through vision 2020: from regional development sustainability to global economic growth, pp 201–215 37. Tsvetkova NA, Tukkel IL, Ablyazov VI (2017) Simulation modeling the spread of innovations. Paper presented at the proceedings of 2017 20th IEEE international conference on soft computing and measurements, SCM 2017, pp 675–677. https://doi.org/10.1109/scm.2017. 7970686 38. Volkova VN, Vasiliev AY, Efremov AA, Loginova AV (2017) Information technologies to support decision-making in the engineering and control. Paper presented at the proceedings of 2017 20th IEEE international conference on soft computing and measurements, SCM 2017, pp 727–730. https://doi.org/10.1109/scm.2017.7970704 39. Klochkov Y, Gazizulina A, Muralidharan K (2019) Lean six sigma for sustainable business practices: a case study and standardisation. Int J Qual Res 13(1):47–74. https://doi.org/10. 24874/IJQR13.01-04

Chapter 33

Software Defect Prediction Based on Selected Features Using Neural Network and Decision Tree Prarna Mehta, Abhishek Tandon, and Neha

33.1 Introduction Currently, the software industries are following new design strategies (like the software systems are being developed based of object oriented characteristics like abstraction, encapsulation, inheritance etc.), along with some software quality assurance tactics in order to reduce system failures. Many companies are investing in Open source softwares, hence it has become an important issue to keep a check on the software quality. Software metrics are used to estimate the quality of the software. There are many metrics suggested in history but Object oriented metrics gives better resulting prediction models. In previous studies [1], experiments were conducted to validate object-oriented metrics by finding interlinks between metrics and its effect on defect prediction. But however, keeping a complex software system defect free has become an impossible task. Thus software defect prediction has become an emerging field, which helps the testing to optimally allocate their limited resources. These prediction models have become a part of software quality process. The software defect predictions models are developed on the basis of software metrics. In this paper, we use twenty software metrics to construct a model that predicts whether a module is defective or not (Fig. 33.1). Software defect data is characterized by software metrics, which are used for prediction. The complexity of the software data affects the accuracy and becomes a time-consuming process. Software defect prediction model’s efficiency depends on P. Mehta · Neha Department of Operational Research, University of Delhi, New Delhi, Delhi, India e-mail: [email protected] Neha e-mail: [email protected] A. Tandon (B) Shaheed Sukhdev College of Business Studies, University of Delhi, New Delhi, Delhi, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 P. K. Kapur et al. (eds.), Strategic System Assurance and Business Analytics, Asset Analytics, https://doi.org/10.1007/978-981-15-3647-2_33

461

462

P. Mehta et al.

Fig. 33.1 Software defect prediction model

software metrics that act has input features. It is important to choose the right set of input features to improve the software defect prediction model’s efficiency. Feature selection is a process of selecting suitable subset of features from the available set of features. Feature selection techniques focuses on selecting useful subsets to built prediction models and also reduces the misclassification error. Kumar et al. worked on 12 feature selection techniques along with support vector machine, evaluating performance of the techniques using accuracy and F measure value [2]. Different techniques are available for feature reduction for example Linear discriminant analysis, Principal component analysis, OneR, Info-gain ratio etc. In our paper, we use principal component analysis technique to identify the right set of metrics, which helps us to develop software defect prediction model. There are several machine learning methods used extensively to build software defect prediction model. Some of them are Naïve Bayes, decision tree, support vector machine, logistic regression, neural network etc. Reliable prediction models are much needed to classify software modules as defective or non-defective. Kanmani, Uthariaraj, Sankaranarayanan, Thambidurai worked on two neural network models, namely, back propagation neural network (BPN) and Probabilistic Neural network (PNN). PCA was used on the standardized metrics to generate optimal metric subset [3]. The performance of the two neural models was compared with that of discriminant and logistic regression model. Software data sets are prone to error due to huge size, which hinders the performance of the prediction model. In this paper, we have compared the results of two classifying models namely, neural network and Decision tree. The rest of the paper is divided into 6 sections. Section 33.2 describes the literature on the defect prediction models. Section 33.3 provides techniques used for feature extraction and modeling. Section 33.4 discusses the modeling and the results. Section 33.5 comprises of the threats to validity and Sect. 33.6 discusses the conclusion.

33 Software Defect Prediction Based on Selected Features …

463

33.2 Related Work There are enumerable types of machine learning techniques to deal with software defect prediction models. Some of the algorithms are, namely, Artificial Neural Network, Support vector machine, Decision tree, Naïve Bayes, Random Forest, Logistic regression. Guo, Ma, Cukic, Singh suggested an algorithm based on random Forests that have real life application. The algorithm was compared with 27 other machine learning techniques like logistic regression, discriminant analysis and other methods in softwares WEKA and SEE5. It resulted in higher prediction and classification accuracy of random forest when in comparison [4]. Elish and Elish evaluated the ability of Support vector machine and eight other statistical and machine learning techniques on four NASA datasets to identify defect prone modules. The techniques were evaluated on the basis of F-measure, accuracy, and recall with results varying between 0.83 and 0.94 [5]. Many researchers came up with innovative and hybrid machine learning techniques. Adaptive Neuro Fuzzy Inference system (ANFIS) was first time applied on McCabe metrics by Erturk and Sezer [6]. The performance of ANFIS was compared with that of Artificial Neural Network and Support Vector Machine techniques on the basis of ROC-AUC. Arar and Ayan [7], introduced a hybrid technique using Artificial Neural network and optimization algorithm, Artificial Bee Colony methods to obtain a cost effective neural model. He worked on data sets collected from NASA metrics data repository. The performance was measured on the basis of probability of false alarm, probability of detection, accuracy, balance and AUC. The accuracy of hybrid technique was observed as 68.4%. Many companies are investing in Open source softwares, hence it has become an important issue to test the quality and reliability of these software modules. Gyimothy et al. [8] has used object-oriented metrics over data collected from Bugzilla for an open source web Mozilla to detect defect prone codes. Logistic regression, linear regression, decision tree and neural network were used to built defect prediction models. He also analyzed the evolution of Mozilla over seven versions on the basis of bug reports. Arisholm et al. [9], in his work evaluated three sets of software metrics, namely object oriented metrics, process metrics and code churn metrics. Different combinations of the metrics sets were tested and selected 8 modeling techniques on the basis of a selection criterion. His aim was to develop a cost effective model in a systematic way. The result shows that the model was most cost effective when process metrics was considered. Sohan et al. [10] introduced a model to find the consistency between the balanced and imbalanced datasets. The model has been validated on eight datasets and using seven machine learning algorithms. Author has shown that imbalanced of non-faulty and faulty classes have a great impact on software defect prediction. Another research on defect prediction and metric selection has been done by Huda et al. [11] with the help of SVM and ANN.

464

P. Mehta et al.

Classifying software modules correctly becomes a challenging task when we have extra information and not classifying accurately leads to error. Mesquita et al. [12] suggested a reject option while classifying the modules with a level of certainty and rejecting the modules that do not match the threshold value. He developed Software defect prediction model namely, rejoELM and IrejoELM and tested its accuracy over five datasets on the basis of F-measure. On the same lines Li et al. [13], developed a cost effective software defect prediction framework where classification error was minimized, accuracy was upgraded and ensemble learning was used to predict defect prone modules. Jiang et al. [14], attempted to introduce cost curves as cost-effective technique in software quality analysis that would help minimize project cost and improve efficiency of the technique. Khoshgoftaar and Gao [15] performed an empirical study on imbalance software defect datasets reducing misclassification error due to it. Random sampling of eight datasets was used as a preprocessing technique and five different data sets was generated. The performance of the different datasets was compared and result showed that data sampled using feature selection is better when performed on original data. Lessmann et al. [16] used a large-scale data from NASA repository to make comparison over 22 classifying techniques. His aim was to perform the experiment on large data, to find appropriate predictive measure since different studies gave contradicting output, and extended statistical analysis for better output. Novakovic [17] worked on the accuracy of supervised learning method, Naïve Bayes used as a classifier whose result was improved by One-R. Six feature selection techniques were used on 11 dataset, which includes real and artificial benchmarked data. Zhou and Leung [18] had considered fault severity as a factor having some significant influence on fault proneness and software metrics. Logistic regression and some other machine learning were used to examine the fault severity influence. Results show that statistically, design metrics was related to defect proneness considering fault severity and also the severity of faults enhances the prediction power of the techniques. Mishra and Shukla [19] combined three learning techniques, namely. Support vector machine, fuzzy inference system and genetic algorithm to develop classification model. This methodology was tested on data from an open source and the result was compared with two other techniques Naïve Bayes and support vector machine. It was found that support vector based on fuzzy classification system has improved accuracy in comparison to other methods. In our study we have used ANN and DT to predict the fault-proneness of the metrics in the software system. Hammouri and Hammad [20] proposed a bug prediction model with NB, ANN and DT and compare the performance of the all the classifiers. The results showed that DT and ANN predicted the models with greater accuracy than the NB. ANN also has been used by Jayanthi and Florence [21] for defect prediction on different datasets of NASA software. Fokaefs et al. [22] proposed an empirical study on web service evolution and used DT to validate the model.

33 Software Defect Prediction Based on Selected Features …

465

33.3 Techniques 33.3.1 Feature Selection Methods In previous studies, it is observed that subsets of software metrics have been used as independent variables and fault proneness as the dependent variable. Thus deriving a useful set of software metrics, as input is an important data processing exercise to develop defect prediction model. In other words, feature selection is an integral step for prediction model development. It is a process of minimizing the number of features that are capable of characterizing the data. This minimized subset of features contains enough information to classify the defect data. The problems are more complicated in the real world with too many irrelevant features. The redundant features hampers the efficiency of the model that used for classifying the data. Thus it is important to determine the features that are significant in accordance to the problem set. Thus the aim of feature selection process is: • The prediction model’s efficiency is improved • Reducing the data’s size • Preserving only relevant features for classification of the defect data.

33.3.2 Principal Component Analysis (PCA) PCA converts a high-dimensioned set of highly correlated software metrics into a low dimension set of uncorrelated software metrics that contains the most significant features. If we consider high correlation between the independent variables, then it would result in identifying the same property of the dependent variable to be measured rather than explaining different relations between the variables. Thus PCA converts these correlated metrics into new metrics called principal components. The main insight of PCA is retaining the variability in the data to reduce the number of variables in a dataset and identify hidden patterns in the data so as to classify them [23]. The principal components are linear combination of these uncorrelated variables with reduced effect of the correlation. The first principal component explains maximum variance in the data set. In PCA, eigen values measures the variance on principal component. We consider those principal components whose Eigen value is more than 1. Eigen value is calculated keeping in mind that, 1. PC’s should be uncorrelated i.e., they are orthogonal to each other. 2. The variance of the first principal component is considered highest, variance of the second component would be the next highest and so on. Principal components are used to explain the variability in the data without losing much information.

466

P. Mehta et al.

33.3.3 Artificial Neuron Network (ANN) ANN is a mathematical non-linear, non-parametric classifier based on neurons of human brain. The model consists of interlinked elements called Nodes and Links, through which the information is processed. These nodes are arranged in three layers, namely, the input layer, one or more hidden layers and the output layer. ANN works on a simple concept that is learning from experience thus making it a highly adaptive model having no assumptions. It can be used to model a highly complex non-linear problem and also for pattern recognition. The basic algorithm of neural network is as follows: • Variable values are taken as input at the input layer. • Weights assigned to each link to the nodes are calculated based on the training set that affects the output. • At the output layer, the result is compared with a threshold value to predict the output. Hidden layer is a stack of neuron nodes between input and output layers. It helps the model to learn more complicated features. The activation function is a binary function placed between the hidden and output layer. It helps in giving the output of the neural network as Yes or No, depending upon the function used. Sigmoid function is one of the activation function which is popularly used. The gradient measures how much the output will change if there is any change in input. The most commonly used method to calculate gradient is backpropagation. The error is propagated backwards from output to input layer by modifying and adjusting weights to minimise the calculated difference between the actual and desired output (Fig. 33.2).

Fig. 33.2 Artificial neural network

33 Software Defect Prediction Based on Selected Features …

467

33.3.4 Decision Tree Decision tree is a naturally flowing structure and also another popular method used for classification. It is easy to comprehend and a simple way to visually represent a decision model. Neural network is based on human brain consisting of neurons where as decision tree has a structure of a tree. It consists of Leaf nodes, which represents the input value of the parameter and Decision nodes represents branching. Branching is done by testing the value of parameter with single branch, which is followed by a sub-tree as a result. The aim of the algorithm is to develop a model that predicts variable’s value based on the input variables. The starting node is called the root or the parent node and the end node is the leaf that represents the value of the target variable, which is also the decision of the tree. All the values of the input variables are passed over a path from root to the leaf.

33.4 Methodology and Numerical Analysis The methodology we use for the classification of the software fault prediction models consists of the following steps, 1. First, we extract the attributes (Metrics) by performing the principal component analysis on the standardized data to reduce the dimension. 2. Here at this step we split the data in testing and training datasets to test and validate our algorithm. 3. In this step validation is done using decision tree and Artificial neural network. 4. In the last step we compare the predicted values to actual values on the basis of true negative or false negative using confusion matrix and then check their performance with respect to some statistical measures. For the study we have considered 20 object oriented metrics that are given in Table 33.1. To implement the methodology we obtained datasets from an open source java project available on GitHub repository (https://github.com). The dataset contains 965 and 660 classes respectively and the defective classes are 188 and 66 respectively. The descriptive statistics such as mean, median, standard deviation of the collected datasets are presents in Tables 33.2 and 33.3. The statistics provide the summary about the data. Now we describe the steps of the methodology discussed above in this section, with detail and illustrate the numerical example as follows; Step 1: In this step we reduce the dimension of the data with the help of PCA. For our experiment we have used twenty object oriented metrics shown in Table 33.1. Here to remove the dependencies among the metrics we use PCA. PCA combines the metrics that are correlated with each other. The selected metrics from both of the datasets are given in Tables 33.4 and 33.5.

468

P. Mehta et al.

Table 33.1 Software metrics OO Metric

Characterization

References

“Weighted method per class” (WMC)

“Sum of complexities of the methods defined in class”

Chidamber and Kemerer [24]

“Depth of inheritance tree” (DIT)

“Maximum height of the class hierarchy”

Chidamber and Kemerer [24]

“Number of children” (NOC)

“Number of immediate descendants of the class”

Chidamber and Kemerer [24]

“Coupling between object classes” (CBO)

“Number of classes coupled to a given class”

Chidamber and Kemerer [24]

“Response for a class” (RFC)

“Number of different methods that can be executed when an object of that class receives a message”

Chidamber and Kemerer [24]

“Lack of cohesion in methods” (LCOM)

“Number of sets of methods that can be executed when an object through the sharing of some of the class’s fields”

Chidamber and Kemerer [24]

“Afferent coupling” (Ca)

“Number of classes use the specific class”

Martin [25]

“Efferent coupling” (Ce)

“Number of classes used by the specific class”

Martin [25]

“Number of public methods” (NPM)

“Number of methods in class that are declared a public”

Bansiya and Davis [26]

LCOM3

“Lack of cohesion in methods Henderson-Seller version”

Henderson-Sellers [27]

“Lines of code” (LOC)

“Number of lines in the next of the source code”

Halstead [28]

“Date access metric” (DAM)

“Ratio of the number of private (Protected) attributes to the total number of the attributes declared”

Bansiya and Davis [26]

“Measure of aggregation” (MOA)

“Number of data declarations (class fields) whose types are user defined classes”

Bansiya and Davis [26]

“Measure of functional abstraction” (MFA)

“Ratio of the number of methods inherited by a class to the total number of methods accessible by member methods of the class”

Bansiya and Davis [26]

(continued)

33 Software Defect Prediction Based on Selected Features …

469

Table 33.1 (continued) OO Metric

Characterization

References

“Cohesion among methods of class” (CAM)

“Sum of number of different types of method parameters in every methods divided by a multiplication of number of different method parameter types in whole class and number of methods”

Bansiya and Davis [26]

“Inheritance coupling” (IC)

“Number of parent classes to which a given class is coupled”

Tang et al. [29]

“Coupling between methods” (CBM)

“Number of new/redefined methods to which all the inherited methods are coupled”

Tang et al. [29]

“Average method complexity” (AMC)

“Average method size of each class”

Tang et al. [29]

“Maximum McCab’s cyclomatic complexity” (Max-CC)

“Maximum cyclomatic complexity of methods defined in a class”

McCabe [30]

“Average McCab’s cyclomatic complexity” (Avg-CC)

“Average cyclomatic complexity of methods in a class”

McCabe [30]

PCA reduces the metrics by transforming high dimension data to low dimension data. The lower dimension data we obtain consists of most significant metrics. PCA reduces the high correlation among the metrics and form new metrics which are not highly correlated. These new metrics are called as principal component. Here six principal components are formed for both the dataset 1 and 2 (Tables 33.4 and 33.5). These small numbers of principal components are considered to be sufficient to represent the most of the significant pattern in the data. Step 2: In this step we split the data in two sets for the prediction and classification of the data. We use 10-cross validation of the model to randomly split the data in training and validation data [20]. 90% of the data is used for testing and ten percentage of the data is used for the validation. Following these we get the module’s fault proneness. The test dataset helps in determining the alternative of the classifier architecture and the training dataset is used to define the classifier’s parameters by the training procedure. Step 3: In our methodology we use decision tree (DT) and artificial neural network (ANN). ANN use multilayer perceptron (MLP) that is based on biological neurons. In this, step 3 layers named input, hidden and output is used. To develop this proposed model using perceptron class with feedforward MLP and backpropagation algorithm. Our architecture detail is: we use input layer with 2 units, hidden layer consists of 12 units and output has 2 units. We use the sigmoid function

470

P. Mehta et al.

Table 33.2 Descriptive statistics of dataset 1 Metrics

Min

Max

Mean

Standard deviation

WMC

0

166

8.5731

11.2049

DIT

0

6

1.9544

1.2737

NOC

0

39

0.52,539

2.6321

CBO

0

448

11.1026

22.5269

RFC

0

322

21.4228

25.0093

LCOM

0

13,617

79.3316

523.7547

CA

0

446

5.2715

21.6919

CE

0

76

6.429

7.0095

NPM

0

157

6.9554

10.1553

LCOM3

0

2

1.1043

0.72193

LOC

0

2077

117.1554

176.584

DAM

0

1

0.61122

0.47924

MOA

0

9

0.63212

1.2241

MFA

0

1

0.40233

0.41947

CAM

0

1

0.49015

0.26477

IC

0

4

0.38964

0.60727

CBM

0

21

0.91399

2.585

AMC

0

158.6667

11.1442

11.9547

MAX_CC

0

33

2.287

2.8184

AVG_CC

0

8.5

0.95888

0.65767

as the activation function. Each value of training and test dataset are either 0 or 1. We train with continuous back propagation on the supervised data and weight adjustment is forward. Step 4: We discuss an experimental study about software defect prediction and classification with the help of feature reduction and classification. Using the dataset 1 and dataset 2 we have discussed the software defect prediction and to measure the performance of the proposed methodology we use the techniques as in data-mining. Among all parameters, the most significant one is the confusion matrix for analysis of performance. Based on this matrix we calculate the performance of the model, precision, F-score, sensitivity, accuracy and AUC. Each measurement parameter we used in our study is given in Table 33.6. The performance measures of the both the classifiers on datasets 1 and 2 are provided in Table 33.7. ROC curves have also been plotted in Fig. 33.3. In this paper, we quote the relationship of object oriented metrics (OO) with fault proneness at two different levels (i.e., faulty and non-faulty). We used PCA to reduce the number of metrics in the datasets and also for identifying the most significant pattern to classify the fault proneness. Finally, we applied two classifiers ANN and DT for analyzing the effect of OO metrics on fault proneness. Precision, sensitivity,

33 Software Defect Prediction Based on Selected Features …

471

Table 33.3 Descriptive statistics of dataset 2 Metrics

Min

Max

Mean

Standard deviation

WMC

1

40

8.7273

7.5372

DIT

1

5

1.3894

0.68245

NOC

0

20

0.2303

1.6632

CBO

0

76

10.2106

8.8225

RFC

1

154

24.2379

22.2935

LCOM

0

492

41.6818

85.7086

CA

0

74

4.4985

7.454

CE

0

38

5.9955

6.8786

NPM

0

37

7.2485

7.1112

LCOM3

0

2

1.258

0.61151

LOC

1

1051

147.8333

159.7212

DAM

0

1

0.58622

0.47985

MOA

0

11

0.94545

1.8601

MFA

0

1

0.22045

0.34822

CAM

0.11111

1

0.49643

0.25095

IC

0

3

0.31061

0.52455

CBM

0

4

0.45758

0.86739

AMC

0

198.5

17.4043

21.0685

MAX_CC

0

47

3.3818

4.3451

AVG_CC

0

9

1.2577

0.93585

Table 33.4 Extracted metrics of dataset 1

Table 33.5 Extracted metrics of dataset 2

Factors

Metrics

P1

WMC, RFC, LCOM, CE, NPM, LOC

P2

AMC, Max-CC, Avg-CC

P3

LCOM3, DAM, CAM

P4

DIT, MFA, IC

P5

NOC, CBO, CA

P6

MOA, CBM

Factors

Metrics

P1

WMC, LCOM, NPM, CAM

P2

RFC, CE, LOC, MOA

P3

DIT, MFA, IC, CBM

P4

AMC, Max-CC, Avg-CC

P5

LCOM3, DAM

P6

NOC, CBO, CA

472

P. Mehta et al.

Table 33.6 Measures for the classification Measure

Evaluation focus

Precision

Percent of positive prediction that were correct

Sensitivity

Portion of positive cases caught

F-score

The weighted average of precision and recall

Accuracy

Fraction of predictions done right

AUC

Area under the receiver operating curve

Table 33.7 Performace measures using DT and ANN (dataset 1 and dataset 2) Classifier/performance measure

DT

ANN

Dataset 1

Dataset 2

Dataset 1

Dataset 2

Precision

0.75

0.80

0.78

0.80

Sensitivity

0.82

0.89

0.83

0.89

F-score

0.77

0.84

0.78

0.85

Accuracy

0.82

0.89

0.83

0.90

AUC

0.59

0.64

0.60

0.67

F-score, accuracy and AUC are considered as evaluation metrics for comparing the performance of fault proneness predictive models proposed in this study. The results related to these performance measures are given in Table 33.7. ANN performed best with 78% precision, 83% sensitivity, 83% accuracy and 0.67 AUC for dataset 1. These results clearly provide overall good performance and validate the extracted OO metrics in terms of precision, sensitivity, F-score, accuracy and AUC for fault proneness. To generalize these results we have also applied our fault proneness model on other dataset. We again found that ANN classifier outperformed DT classifier with 80% precision, 89% sensitivity, 85% F-score, 83% accuracy and 0.67 AUC. ANN classifier has been used in many previous studies and mostly outperformed the other classifiers [31–33] Overall the results are relatively favorable and show the utility of fault proneness model in terms of five performance evaluation metrics for the fault prediction.

33.5 Threats to Validity Threats to validity are of two types one is internal and the other one is external validity. Threat to internal validity is there because of the degree to which we draw the conclusion between dependent and independent variables (REF). The external threats of validity are more severe than internal because it depends on the generalizability predicted model. Here we have used an open source software datasets which is not

33 Software Defect Prediction Based on Selected Features …

473

Fig. 33.3 Artificial neural network ROC for dataset 1. a DT classifier. b ANN classifier and ROC for dataset 2. c DT classifier. d ANN classifier

applicable on other systems. Also the datasets are not of large size. Threats to external validity can be reduced by using large number of similar studies through the several systems.

33.6 Conclusion Machine Learning techniques are being used extensively in software defect prediction. The aim of our study was to develop a software defect prediction model using machine-learning techniques. We have worked on data from open java source projects containing twenty object-oriented metrics as features. In our study, we have also used Principal component analysis technique on twenty object-oriented metrics for feature reduction. We have used machine-learning techniques to predict the software modules as defective and non-defective. Prediction modeling was done on two datasets using two classifiers, namely neural network and Decision tree, and comparison was made on basis of accuracy. The result was obtained through Python

474

P. Mehta et al.

3. The accuracy of ANN was observed as better than that of Decision tree. Similar type of analysis can performed on different data sets with more number of classifiers for a generalized result.

References 1. Aggarwal K, Singh Y, Kaur A, Malhotra R (2009) Empirical analysis for investigating the effect of object-oriented metrics on fault proneness: a replicated case study. Softw Process Improv Pract 14(1):39–62 2. Kumar L, Sripada SK, Sureka A, Rath SK (2018) Effective fault prediction model developed using least square support vector machine (LSSVM). J Syst Softw 137:686–712 3. Kanmani S, Uthariaraj VR, Sankaranarayanan V, Thambidurai P (2007) Object-oriented software fault prediction using neural networks. Inf Softw Technol 49(5):483–492 4. Guo L, Ma Y, Cukic B, Singh H (2004) Robust prediction of fault-proneness by random forests. In: 15th international symposium on software reliability engineering. IEEE 5. Elish KO, Elish MO (2008) Predicting defect-prone software modules using support vector machines. J Syst Softw 81(5):649–660 6. Erturk E, Sezer EA (2015) A comparison of some soft computing methods for software fault prediction. Expert Syst Appl 42(4):1872–1879 7. Arar ÖF, Ayan K (2015) Software defect prediction using cost-sensitive neural network. Appl Soft Comput 33:263–277 8. Gyimothy T, Ferenc R, Siket I (2005) Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans Softw Eng 31(10):897–910 9. Arisholm E, Briand LC, Johannessen EB (2010) A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. J Syst Softw 83(1):2–17 10. Sohan MF, Kabir MA, Jabiullah MI, Rahman SSMM (2019) Revisiting the class imbalance issue in software defect prediction. In: 2019 international conference on electrical, computer and communication engineering (ECCE). IEEE 11. Huda S, Alyahya S, Ali MM, Ahmad S, Abawajy J, Al-Dossari H, Yearwood J (2017) A framework for software defect prediction and metric selection. IEEE Access 6:2844–2858 12. Mesquita DP, Rocha LS, Gomes JPP, Neto ARR (2016) Classification with reject option for software defect prediction. Appl Soft Comput 49:1085–1093 13. Li W, Huang Z, Li Q (2016) Three-way decisions based software defect prediction. Knowl Based Syst 91:263–274 14. Jiang Y, Cukic B, Ma Y (2008) Techniques for evaluating fault prediction models. Empirical Softw Eng 13(5):561–595 15. Khoshgoftaar TM, Gao K (2009) Feature selection with imbalanced data for software defect prediction. In: 2009 international conference on machine learning and applications. IEEE 16. Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496 17. Novakovic J (2010) The impact of feature selection on the accuracy of Naïve Bayes classifier. In: 18th telecommunications forum TELFOR 18. Zhou Y, Leung H (2006) Empirical analysis of object-oriented design metrics for predicting high and low severity faults. IEEE Trans Softw Eng 32(10):771–789 19. Mishra B, Shukla K (2012) Defect prediction for object oriented software using support vector based fuzzy classification model. Int J Comput Appl 60(15) 20. Hammouri A, Hammad M, Alnabhan M, Alsarayrah F (2018) Software bug prediction using machine learning approach. Int J Adv Comput Sci Appl (IJACSA) 9(2):78–83 21. Jayanthi R, Florence L (2018) Software defect prediction techniques using metrics based on neural network classifier. Cluster Comput, 1–12

33 Software Defect Prediction Based on Selected Features …

475

22. Fokaefs M, Mikhaiel R, Tsantalis N, Stroulia E, Lau A (2011) An empirical study on web service evolution. In: 2011 IEEE international conference on web services. IEEE 23. Yang L, Xu Z (2019) Feature extraction by PCA and diagnosis of breast tumors using SVM with DE-based parameter tuning. Int J Mach Learn Cybernet 10(3):591–601 24. Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493 25. Martin R (1994) OO design quality metrics—an analysis of dependencies. In: Proceedings of the workshop pragmatic and theoretical directions in object-oriented software metrics, OOPSLA’94 26. Bansiya J, Davis CG (2002) A hierarchical model for object-oriented design quality assessment. IEEE Trans Softw Eng 28(1):4–17 27. Henderson-Sellers B (1996) The mathematical validity of software metrics. ACM SIGSOFT Softw Eng Notes 21(5):89–94 28. Halstead MH (1977) Elements of software science. vol 7. Elsevier, New York 29. Tang M-H, Kao M-H, Chen M-H (1999) An empirical study on object-oriented metrics. In: Proceedings sixth international software metrics symposium (Cat. No. PR00403). IEEE 30. McCabe TJ (1976) A complexity measure. IEEE Trans Softw Eng 4:308–320 31. Song Q, Shepperd M, Mair C (2005) Using grey relational analysis to predict software effort with small data sets. In: 11th IEEE international software metrics symposium (METRICS’05). IEEE 32. Moraes R, Valiati JF, Neto WPG (2013) Document-level sentiment classification: an empirical comparison between SVM and ANN. Expert Syst Appl 40(2):621–633 33. Khoshgoftaar TM, Allen EB, Hudepohl JP, Aud SJ (1997) Application of neural networks to software quality modeling of a very large telecommunications system. IEEE Trans Neural Netw 8(4):902–909

Chapter 34

Testing-Effort Dependent Software Reliability Assessment Integrating Change Point, Imperfect Debugging and FRF Rajat Arora, Anu Gupta Aggarwal, and Rubina Mittal

34.1 Introduction Software systems are embedded in different types of systems in diverse fields such as medical, transportation, military, education systems, banking and industries. It is virtually impossible to work without software in today’s world, and its importance is increasing with new advances in technology. In recent years, highly reliable software systems are being demanded by the users. Software development life cycle (SDLC) consists of requirement description, design, coding, testing, implementation and maintenance phases. Researchers have conducted vast research in this area leading to number of software reliability growth models (SRGMs). Software reliability is a key characteristic of software quality. SRGMs are quantitative mathematical tools to assess the failure phenomenon of a software system and thus predict its reliability. They are extensively used to develop and schedule test status along with monitoring the changes in fault removal. In the past, most of the SRGMs developed assume that the rate by which faults are detected during the testing phase remains constant. But in practical situations, this is not true. There can be notable change in the detection rate due the resources allocated, complexity in the code, reliability allocated to the components, efficiency in removing faults fault density, etc. Generally, the time point, where there is change in the rate, is termed as change point. Therefore, we have incorporated the effect of change point on the reliability growth process of the software system. It is a R. Arora (B) · A. G. Aggarwal Department of Operational Research, University of Delhi, New Delhi, Delhi, India e-mail: [email protected] A. G. Aggarwal e-mail: [email protected] R. Mittal Keshav Mahavidyalaya, University of Delhi, New Delhi, Delhi, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 P. K. Kapur et al. (eds.), Strategic System Assurance and Business Analytics, Asset Analytics, https://doi.org/10.1007/978-981-15-3647-2_34

477

478

R. Arora et al.

very realistic approach to increase the accuracy of SRGM in reliability assessment. Various causes of change point are any kind of change in testing conditions, change in amount of testing resources or change of fault target. A lot of work has been done so far on SRGM’s incorporating change point. Zhao is among the first researchers to incorporate change point in software and hardware reliability [1]. Huang introduced the change point modelling concept with testing effort [2]. Zou also incorporated change point in his study [3]. Several factors including FRF proposed by Musa et al. [4] affect the reliability progression of software system. Musa defined FRF as ‘the average ratio of rate of reduction of faults to the rate of failure occurrence’. He considered a constant value for FRF. FRF also represents the effect of environmental factors on reliability of software system. Here, in this paper, we have integrated time-dependent FRF into SRGM. Practically, during testing process, FRF is subjective to factors such as debugging condition, resources allocated, dependency of faults, time lag between detection and correction, and human skills. Pachauri et al. proposed SRGM with inflexion S-shaped function [5]. In reality, FRF depends on different environmental factors and has no definite pattern for different datasets. Therefore, it is worthy to consider definite behaviour of FRF. In this paper, we will consider logistic-type FRF. Testing consumes a large amount of resources, viz. CPU hours and manpower. The allocation of these resources is usually not constant during the testing process. Testing-effort function (TEF) describes the testing resource distribution. This factor is positively correlated to the reliability and should be taken as time-dependent factor [6]. Musa et al., Yamada et al. and Kapur et al. discussed TEF in their models [7–9]. Yamada et al. discussed a Weibull-type TEF that included three curves, namely exponential, Rayleigh and Weibull. The reliability of the software product is controlled by testing which in turn affects the quality of the system. In complex software systems, the testers are not able to completely eliminate all the generated faults. Also, sometimes removal of a fault leads to some new faults. This leads to increase in number of latent faults at a constant rate. This is imperfect debugging environment of testing. Yadavalli et al. [10] proposed a logistic distribution-based change point problem with imperfect debugging, but he did not consider FRF. In this paper, in order to address the issue of developing a more accurate SRGM with better goodness-of-fit, a modelling framework has been proposed based on TEF-dependent failure process, change point concept and time-dependent FRF under imperfect debugging conditions. Here, we will model the fault detection/removal process corresponding to resource expenditure function modelled by Weibull-curve incorporating logistic FRF and change point. Along with this, we will also compare it with the case of constant FRF. The results of the proposed model are validated on two real software failure datasets, and comparison is made with the case of perfect debugging. Rest of the paper is organized as follows: In the next section, we will develop the literature corresponding to the factors incorporated in the model. In Sect. 34.3, the proposed model is discussed followed by its validation on fault datasets in Sect. 34.4. The comparison of the model with perfect debugging scenario is illustrated in Sect. 34.5. At the end, conclusions and future scope is presented.

34 Testing-Effort dependent Software Reliability Assessment …

479

34.2 Literature Review In this section, we will discuss reliability in different system. Software reliability is very much associated with amount of testing effort spent on error detection and correction. In the literature, consumption of testing resources has been modelled using different functions as listed. Yamada et al. proposed the relation between effort and other characteristics of the software system [8]. Kapur et al. modelled testing effort as exponential, Rayleigh, Weibull and logistic functions in flexible SRGM [11]. Kuo [12], Huang and Kuo [6], Huang [13] used logistic TEF; Gokhale et al. [14], Bokhari and Ahmad [15] incorporated log-logistic TEF in their research. Li et al. [16] utilized inflexion S-shaped TEF. Chang et al. discussed the integrated effect of TEF and testing coverage for the assessment and improvement of reliability [17]. Next, we will discuss about the inclusion of change point in SRGM’S. Incorporating change points helps to improve the accuracy of reliability models in understanding the software failure process. Zou used change point concept for analysis software reliability when inter failure times are modelled by Weibull distribution [6]. Huang introduced the SRGM with change point with testing effort [4]. Zhao discussed the change point software to assess failure phenomenon of hardware and software systems [18]. Li et al. checked sensitivity of release time of a software system considering growth model with TEF and change points [19]. Inoue and Yamada compared the failure phenomenon of software system before and after change in testing environment [20]. Another important factor that greatly influences the reliability of the software systems is FRF. Musa coined the term FRF and defined it as ‘the proportion of failures experienced to the number of faults removed’ [7]. Several others defined FRF differently. Malaiya et al. defined FRF in terms of fault exposure ratio [21]. Friedman et al. defined FRF in terms of three ratios, namely detectability, associability and fault growth [22]. Hsu et al. discussed constant, increasing and decreasing curves of FRF and validated these trends on six real-life fault dataset [23]. Pachauri et al. considered inflexion S-shaped FRF to increase accuracy of growth model [5]. Later, Aggarwal et al. proposed exponentiated Weibull (EW) FRF-based SRGM with change points [24]. Yet, another important concept of error generation is also incorporated by some authors in their recent studies. Madhu et al. discussed the Weibull-type TEF considering multiple change points and imperfect debugging. She also extended it to evaluate total expected cost, warranty cost and determination of optimal release time of software [25]. Chatterjee et al. incorporated imperfect debugging with Weibull FRF and change point [26]. Anand et al. incorporated two-Dimensional multi-release software reliability modelling considering FRF under imperfect debugging [27].

480

R. Arora et al.

34.3 Proposed Modelling Framework In this section, we model the SRGM incorporating discussed factors. Also, the subsections discuss in detail the sequence of model development process.

34.3.1 Assumptions of the Proposed Model 1. All the faults are mutually independent, and the rate of failure is affected by faults latent in the software. 2. Failure phenomenon is based on NHPP. 3. The number of errors detected by testing effort in the time interval (t, t + dt) is proportional to the number of detectable errors in the software. 4. TEF is described by the Weibull function. 5. The detected faults are removed immediately removed, also leading to introduction of new faults. 6. The rate of error detection may change at any moment of time. 7. The factor of proportionality is in one case taken as a constant FRF and in other case a function of time–dependent FRF modelled by logistic distribution.

34.3.2 Notations The notations used for developing the proposed model framework are given as:

Notations

Description

W (t)

Expected testing-effort expenditure by time t

m(W )

Mean value function (MVF) corresponding to testing effort

∝, k

Scale and shape parameter of Weibull function; 0 0

a

Initial fault content in software

r

Fault detection rate (FDR)

B(W )

Testing-effort dependent fault reduction factor

B

Constant fault reduction factor

l, b

Shape and scale parameter of logistic function

b1 , b2

Scale parameter before and after change point

r1 , r2

FDR before and after change point

l1 ,l2

Shape parameter before and after change point

g1 , g2

Fault introduction rate before and after change point

34 Testing-Effort dependent Software Reliability Assessment …

481

The rate of change of mean value function m(t) can be represented by the following differential equation: d m(t) = r (t)(a − m(t)) dt

(34.1)

r (t) = r × B(t)

(34.2)

and

where r (t) is the FDR function and B(t) is the time-dependent FRF following logistic distribution. In above equation, time ‘t’ is taken as independent variable. Here, in our proposed model, we will consider Wt , the cumulative testing-effort function by time t, as independent variable. Therefore, the modified form of above equation using testing effort is: d m(Wt ) = r (Wt )(a − m(Wt )) dt

(34.3)

r (Wt ) = r × B(Wt )

(34.4)

and

34.3.3 Weibull TEF Weibull TEF fits most of the data used in software reliability growth modelling. The cumulative TEF W (t) is given by   k w(t) = W¯ × 1 − e−αt

(34.5)

where W¯ : upper limit on available testing-effort expenditure or maximum budget available for testing effort. The density function of testing effort is given by: d k Wt = wt = W¯ αkt k−1 e−αt dt

(34.6)

482

R. Arora et al.

34.3.4 FRF Musa considered fault reduction factor as constant. It is affected by different environmental factors viz. the debugging time lag or the faults spawn or imperfect debugging which may cause the FRF to vary with time. Here, we will consider the following two cases. Case 1. Constant FRF B(t) = B, 0 < B ≤ 1.

Case 1(a): Imperfect Debugging Without Change Point The differential equation can be written as d m(w) = r (w)(a − (1 − g)m(w)) dt

(34.7)

d m(w) = r × B(w) (a − (1 − g)m(w)) dt

(34.8)

d m(w) = r × B (a − (1 − g)m(w)) dt

(34.9)

On integrating and using initial condition m(0) = 0 we get: m(W ) =

 a  1 − e−(1−g)r BW (1 − g)

(34.10)

Case 1(b): Imperfect Debugging with Change Point The differential equation for this case can be written as: d m(w) = r × B(w) (a(w) − m(w)) dt

(34.11)

where 

 r1 t ≤ τ r2 t > τ  a + g1 × m(w); t ≤ τ a(w) = a + g × m(w) = a + g2 × m(w); t > τ  B1 ; t ≤ τ B(w) = B2 ; t > τ r=

(34.12) (34.13) (34.14)

34 Testing-Effort dependent Software Reliability Assessment …

483

On integrating the above differential equation, we get:   a 1 − e−(1−g1 )∗r1 ∗B1 ∗W (τ ) ∗ e−(1−g2 )∗r2 ∗B2 ∗[W (t)−W (τ )] (1 − g2 ) (g1 − g2 ) ; t ≤τ (34.15) + m(W (τ )) ∗ (1 − g2 )

m(W ) =

Case 2. FRF to Follow a Logistic Distribution B(t) =

α (1 + βe−αt )

where α and β denote the shape and scale parameter, respectively, of logistic distribution. Case 2(a): Imperfect Debugging Without Change Point The differential equation can be written as d m(w) = r (w)(a − (1 − g)m(w)) dt

(34.16)

d m(w) = r × B(w) (a − (1 − g)m(w)) dt

(34.17)

d l (a − (1 − g)m(w)) m(w) = r ×  dt 1 + be−lw

(34.18)

On integrating and using initial condition m(0) = 0, we get:

m(w) =

(1−g) (1+b)r e−lr w a 1 − 1+be−lw r ( ) 1−g

(34.19)

Case 2(b): Imperfect Debugging with Change Point The differential equation for this case can be written as: d m(w) = r × B(w) (a(w) − m(w)) dt

(34.20)

where  r=

r1 t ≤ τ r2 t > τ

 (34.21)

484

R. Arora et al.



a + g1 × m(w); t ≤ τ a + g2 × m(w); t > τ ⎧  l1 ⎨ ; t ≤ τ −l t 1 1+b e  B(w) =  l1 2 ⎩ ; t > τ 1+b2 e−l2 t

a(w) = a + g × m(w) =

(34.22)

(34.23)

On integrating the above differential equation, we get:

a 1− m(w) =

(1+b1 )r1 (1−g1 ) e−l1 r1 w(1−g1 ) (1+b1 e−l1 w )r1 (1−g1 )

1 − g1

; t ≤τ

(34.24)

  a (1 + b1 )r1 (1−g1 ) e−l1 r1 w(τ )(1−g1 )−l2 r2 (w−w(τ ))(1−g2 ) m(w) = 1−  r (1−g1 ) (1 − g2 ) 1 + b1 e−l1 w(τ ) 1

r (1−g2 )

1 + b2 e−l2 w(τ ) 2 × 1 + b2 e−l2 w   (1+b1 )r1 (1−g1 ) e−l1 r1 w(τ )(1−g1 ) (g1 − g2 )a (1+b1 e−l1 w(τ ) )r1 (1−g1 ) ; t >τ (34.25) + (1 − g2 )(1 − g1 )

34.4 Model Validation (Numerical Example) Accuracy of the proposed model is validated on two real fault datasets. These two datasets are widely used for testing the model performance in detecting the failure phenomenon. First dataset was given by Wood [28] and second dataset was reported by Obha [29]. Table 34.1 describes the resource utilized and faults detected during the testing duration. Weibull TEF is used to estimate the amount of resource consumed during testing. Using the resource consumed, we estimate the number of faults after each week of testing. Tables 34.2a, b show the estimates of model parameters for constant FRF and logistic FRF, respectively, under imperfect debugging environment. Based on the predicted faults for each dataset corresponding to both the models, we plot goodnessof-fit curves (Fig. 34.1). The graph plots the curves for predicted faults’ models with Table 34.1 Data description Datasets

Description

Testing time (weeks)

Resource utilized

Fault

1

PL/I database

19

47.65

328

2

Tandem computers

20

10,000

100

34 Testing-Effort dependent Software Reliability Assessment …

485

Table 34.2 (a) Parameter estimates (constant FRF) Without change point

With change point

Model-1 parameters

DS-1

DS-2

Model-2 parameters

DS-1

DS-2

a

353.084

100

a

328

95

B

0.378

0.034

B1

0.149

0.005

r

0.083

0.005

B2

0.304

0.023

g

0.373

0.40

r1

0.221

0.038

r2

0.115

0.006

g1

0.150

0.703

g2

0.617

0.229

(b) Parameter estimates (time-dependent logistic FRF) Without change point

With change point

Model-1 parameters

DS-1

DS-2

Model-2 parameters

DS-1

DS-2

a

331.592

105.001

a

343.736

105.005

b

0.004

0.888

B1

0.004

0.776

r

0.060

0.011

B2

0.965

0.770

l

0.551

0.017

r1

0.047

0.026

g

0.423

0.214

r2

0.034

0.001

l1

0.582

0.005

l2

0.068

0.075

g1

0.514

0.380

g2

0.441

0.149

a) Dataset 1

b) Dataset 2

Fig. 34.1 Goodness-of-fit analysis

and without change point corresponding to actual faults. Six comparison criteria’s have been used to evaluate the model performance. The criteria’s are as shown in Fig. 34.2. The results are compared with perfect debugging model and are illustrated in Tables 34.3a, b. Lower value for each of the criteria except the first one signifies better fit.

486

R. Arora et al.

Fig. 34.2 Performance criterion

Table 34.3 (a) Model performance comparison (constant FRF) Model

R2

MSE

PRR

PP

PRV

RMSPE

PL/I

0.985

149.88

24.033

27.26

12.54

12.56

Tandem

0.974

21.46

44.13

17.31

4.53

4.74

PL/I

0.991

93.43

16.12

18.13

10.12

10.13

Tandem

0.990

7.90

5.34

4.37

2.86

2.88

Dataset

Imperfect debugging 1 2

Perfect debugging 1

PL/I

0.984

161.79

23.58

23.37

13.04

13.05

Tandem

0.962

31.13

25.19

16.19

5.72

5.73

2

PL/I

0.991

91.22

15.88

17.37

9.79

9.80

Tandem

0.982

14.66

8.14

7.01

3.92

3.93

(b) Model performance comparison (logistic FRF) Model

Dataset

R2

MSE

PRR

PP

PRV

RMSPE

Imperfect debugging 1

PL/I

0.988

114.50

27.21

21.99

10.85

10.97

Tandem

0.978

18.021

40.455

15.376

4.77

4.91

2

PL/I

0.993

91.26

15.89

17.38

9.8

9.8

Tandem

0.991

5.65

1.96

1.93

2.44

2.44

Perfect debugging 1 2

PL/I

0.988

117.39

28.96

23.04

10.99

11.10

Tandem

0.965

16.588

33.27

13.99

4.07

4.17

PL/I

0.993

69.52

14.43

13.71

8.50

8.55

Tandem

0.984

13.40

30.54

12.64

3.61

3.75

34 Testing-Effort dependent Software Reliability Assessment …

487

Model with change point under imperfect debugging environment has lower value for all criteria’s except R 2 than the model without change point. Model under imperfect debugging scenario gives better fitting than its counterpart since it incorporates the realistic testing conditions.

34.5 Conclusions and Future Scope In this paper, we have evaluated the model performance in predicting reliability integrating testing effort and change point under imperfect debugging. Detection of faults is largely affected by the testing resources allocated during planning and development phase. Fundamentally, the resources allocated for various processes were considered to be constant. Practically, the need for resources may change with time. Hence, it has been considered time-dependent. Also, the model incorporates the error generation phenomenon. Here, we have proposed two SRGMs incorporating Weibull TEF, time-dependent FRF, change point in imperfect debugging environment. The first model assumes that there is no change in the fault detection rate (FDR). While in the second model, we assumed that at a particular time moment, there is a change in the FDR, i.e., there is a change point. The effort function has been modelled by Weibull function, and FRF is represented by the logistic function under imperfect debugging environment. The accuracy of the proposed models is confirmed through two real-life fault datasets of software. The results show that the models developed in this paper are accurate for fault prediction. We have also compared the entire set of results for the corresponding case of constant FRF and concluded that time-dependent logistic FRF performed better than the constant one. The performance of the developed model is compared with similar models in perfect debugging conditions. The least square estimation results obtained from SPSS show that the proposed model is more accurate in predicting faults. All the six criteria suggest better model fitting. The goodness-of-fit curves show the closeness of actual and predicted values for both the effort function and the proposed model. We also concluded that the incorporated Weibull TEF can be used to describe the real spending patterns during the development process. This model can be extended to incorporate multiple change points, optimal release-time problems and multi-release software growth modelling.

References 1. Zhao M (1993) Change-point problems in software and hardware reliability. Commun Stat Theory Methods 22(3):757–768 2. Huang CY (2005) Performance analysis of software reliability growth models with testingeffort and change-point. J Syst Softw 76(2):181–194

488

R. Arora et al.

3. Zou F (2003) A change-point perspective on the software failure process. Softw Test Verifi Reliab 13:85–93 4. Musa JD, Iannino A, Okumoto K (1987) Software reliability, measurement, prediction and application. McGraw-Hill, New York 5. Pachauri B, Dhar J, Kumar A (2014) Incorporating inflection S-shaped fault reduction factor to enhance software reliability growth. Appl Math Modell 39(5):1463–1469 6. Huang CY, Kuo SY (2002) Analysis of incorporating logistic testing-effort function into software reliability modeling. IEEE Trans Reliab 51(3):261–270 7. Musa JD (1975) A theory of software reliability and its application. IEEE Trans Eng SE-1:312– 327 8. Yamada S, Hishitani J, Osaki S (1993) Software reliability growth model with a Weibull testing effort: a model and application. IEEE Trans Reliab 42(1):100–106 9. Kapur PK, Goswami DN, Gupta A (2004) A software reliability growth model with testing effort dependent learning function for distributed systems. Int J Reliab Qual Saf Eng 11(4):365– 377 10. Yadavalli VSS, Aggarwal AG, Kapur PK, Kumar J (2010) Unified framework for developing testing effort dependent software reliability growth models with change point and imperfect debugging. In: Proceedings of the 4th national conference; INDIACom-2010 computing for nation development Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi 11. Kapur PK, Gupta A, Shatnawi O, Yadavalli VSS (2006) Testing effort control using flexible software reliability growth model with change point. Int J Perform Eng 2(3):245–263 12. Huang CY, Kuo SY, Chen IY (1997) Analysis of software reliability growth model with logistic testing-effort function. In: Proceedings of 8th international symposium on software reliability engineering. Albuquerque, NM, pp 378–388 13. Huang CY, Kuo SY, Lyu MR (2007) An assessment of testing-effort dependent software reliability growth models. IEEE Trans Reliab 56(2):198–211 14. Gokhale SS, Trivedi KS (1999) A time/structure based software reliability model. Ann Softw Eng 8(1):85–121 15. Bokhari MU, Ahmad N (2006) Analysis of a software reliability growth models: the case of log-logistic test-effort function. In: Proceedings of the 17th IASTED international conference on modeling and simulation, pp 540–545 16. Li QY, Li HF, Lu MY (2011) Software reliability growth model with S-shaped testing effort function. J Beijing Univ Aeronaut Astronaut 37(2) 17. Chang L, Liu Y, Ren Z, Li H (2015) Software reliability modelling considering both testing effort and testing coverage. In: International symposium on computers & informatics 18. Zhao J, Wang J (2007) Testing the existence of change-point in NHPP software reliability models. J Commun Stat Simul Comput 36(3):607–619 19. Li X, Xie M, Ng SH (2010) Sensitivity analysis of release time of software reliability models incorporating testing effort with multiple change-points. Appl Math Model 34(11):3560–3570 20. Inoue Shinji, Yamada Shigeru (2011) Software reliability measurement with effect of changepoint: modeling and application. Int J Syst Assur Eng Manag 2(2):155–162 21. Li N, Malaiya YK (1996) Fault exposure ratio estimation and applications. In: Proceedings of the seventh international symposium on software reliability engineering. White Plains, NY, USA, pp 372–38177 22. Friedman MA, Tran PY, Goddard PI (1995). Reliability of software intensive systems. Noyes Publications 23. Hsu CJ, Huang CY, Chang JR (2011) Enhancing software reliability modelling and prediction through the introduction of time variable fault reduction factor. Appl Math Modell 35:506–521 24. Aggarwal AG (2017) Reliability analysis for multi-release open-source software systems with change point and exponentiated Weibull fault reduction factor. Life Cycle Reliab Saf Eng 6(1):3–14 25. Jain M (2014) Imperfect debugging study of SRGM with fault reduction factor and multiple change point. Int J Math Oper Res 6(12):155–175

34 Testing-Effort dependent Software Reliability Assessment …

489

26. Chatterjee S, Shukla A (2016) Modeling and analysis of software fault detection and correction process through Weibull-type fault reduction factor, change point and imperfect debugging. Arab J Sci Eng 41(12):5009–5025 27. Anand S, Verma V, Aggarwal AG (2018) 2-dimensional multi-release software reliability modelling considering FRF under imperfect debugging. Special issue on emerging approaches to information technology and management, Ingenieria Solidaria. J Eng Educ 14(25):1–2 28. Wood A (1996) Software reliability growth models. Tandem technical report, 96(130056) 29. Obha M, Yamada S (1984) S-shaped software reliability growth model. In Proceedings of the 4th National Conference on Reliability and Maintainability (pp 430–436)

Chapter 35

Assessing the Severity of Software Bug Using Neural Network Ritu Bibyan, Sameer Anand, and Ajay Jaiswal

35.1 Introduction Currently, most of the human activities are clouded by software and it is growing immensely at a very high speed. While operating any software various bugs may be encountered which lead to software failure or a software defect. It can occur during the testing period of the software or during the usage. The software bug prediction is proliferating in recent years in software maintenance or development phase. There are various factors which lead to software failure such as lack of resources, lack of new technology, unknown objectives, and management of time, cost factor, unknown purpose, unexpected changes, and many others. Users as well as developers report the bugs through bug-tracking systems (BTS) such as Fossil, Bugzilla, Trac, Mantis, Zoho BugTracker, Bug host, FogBugz. These systems help in documentation or recording of bugs encountered by the users. The bugs have different percussion which can be major or minor which can cause crash or bring slight disruptions in the performance of the software. Thus, it is significant to know the criticality or severity of the bug. Here we define bug severity as the level of influence on the software performance. The difference between two terms severity and priority is that severity of a defect is related to how severe the bug is, whereas priority of a defect is related to severity. The most severe bug will be operated and rectified first, i.e., higher is the priority

R. Bibyan (B) Department of Operational Research, University of Delhi, New Delhi, India e-mail: [email protected] S. Anand · A. Jaiswal S.S. College of Business Studies, University of Delhi, New Delhi, India e-mail: [email protected] A. Jaiswal e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 P. K. Kapur et al. (eds.), Strategic System Assurance and Business Analytics, Asset Analytics, https://doi.org/10.1007/978-981-15-3647-2_35

491

492

R. Bibyan et al.

SEVERITY

BLOCKER

CRITICAL

MAJOR NORMAL MINOR

TRIVIAL

ENHANCEMENT

Fig. 35.1 Levels of severity

Fig. 35.2 Bug triaging process

of bug, sooner the bug is resolved. Priority is related to scheduling, and severity is related to functionality. When there are limited testers or developers and more bug reports, bug triaging helps to resolve bugs on the basis of priority and severity (as shown in Fig. 35.2). The bug severity assessment helps in prioritizing a bug. As shown in Fig. 35.1, there are seven levels of severity, namely blocker, critical, major, normal, minor, trivial, and enhancement. The bug report contains summary columns which refer to the brief detail of the bug. In 2008, the text summary was used for the prediction and assessment of bugs by Menzies and Marcus [1]. In this paper, we used text mining using python to construct a dictionary of critical terms indicating the severity of the bug which helps automatically in the classification of the bug report. Later machine learning application, artificial neural network, is used for the severity assessment of software bugs. An artificial neural network is the tool used in machine learning based on the mechanism of biological nervous system of human being. It performs certain tasks like classification, clustering, and pattern recognition. It consists of input layer (neurons) which sends the data to second layer called hidden layer which transforms the input into something that output layer can understand as shown in Fig. 35.3. All these layers have nodes (neurons) which are connected to their preceding layers through links. The most important and crucial task is to find the weights of the linkages between the nodes of the neural network. These weights explain the strength of the inter-connection between the neurons. The most widely used algorithm to train multilayer feed-forward networks is back-propagation. It has two passes between the different layers, namely forward and backward. In forward pass, we calculate activation rate, and for backward pass, we

35 Assessing the Severity of Software Bug Using Neural Network

I1

493

H1

H2

O1

I2 H3

O2

I3 H4 INPUT LAYER

OUTPUT LAYER HIDDEN LAYER

Fig. 35.3 Structure of neural network

calculate error which is propagated backward. Later the weights are readjusted to get the desired output. ANN is used because of its parallel architecture and nonlinearity. This technique has enormous memory, and it works excellently well when population is significantly different from training dataset. It is generally used when the past is repeated exactly in same way. As compared to traditional models, it handles much more variability. The paper is divided into different sections: • • • • • •

Introduction Research related work Data description Methodology Results and Conclusions References

35.2 Literature Survey The first idea of using text mining for the assessment of bug severity was given by Marcus and Menzies. Later in 2010, Lamkanfi and Demeyer used text mining approach using TF-IDF and NB classifier to predict the bug severity. The overall accuracy varies from 65 to 75% with Mozilla and Eclipse, 70–80% in case of GNOME [2]. The study was extended to compare another techniques SVM, k-NN, naïve Bayes, and naive Bayes multinomial classifier w.r.t. accuracy. For GNOME and Eclipse open-source project, naïve Bayes multinomial performance was higher than others [3].

494

R. Bibyan et al.

Using SAS Gegick proposed that reports are often misclassified. He observed that there was increase in precision by working on different feature selection methods for comparison like Chi-square, correlation coefficient, and info-gain using NBM classifier on datasets [4]. In 2011, Ghaluh Indah Permata attained accuracy of 99.83% by data reduction using info-gain technique with SVM [5]. An approach was proposed by Neelofar and Javed using Chi-square and TF-IDF for feature selection to assign particular bug to a relevant developer. By using Chi-square and naïve Bayes, 83% accuracy was attained [6]. Chaturvedi and Singh classified the bugs into five levels on the basis of priority from P1 to P5. For NASA datasets, it was observed that ML techniques are significant to determine bug severity using SVM, NB, MNB, k-NN, and RIPPER techniques with feasible accuracy above 70% except naïve Bayes technique [7]. Further in 2013, he found that NB classifier gives significant accuracy for closed and SVM for open-source data [8]. Later K.K. Chaturvedi used text mining approach for preprocessing of textual bug summary of Mozilla, Eclipse, and GNOME. He applied SVM, RIPPER, NB, J48, k-NN algorithms on IV and V project of NASA [9]. In 2012, Tian focused on prediction of severity labels by using extended BM25 and NB classifier. [10]. He discovered that his approach brought significant improvement in comparison with the approach of Menzies and Marcus (study on fine grained severity prediction). Later in 2013, he proposed a study to predict the priority level of bugs using ML algorithms [11]. Simple random sampling (SRS) was used to classify the reports as bug and non-bug on textual terms by Nagwani. They also evaluated the accuracy of JBoss Seam, Android, and Mozilla. LDA technique was used to generate taxonomic terms for the classification of bug reports [12]. Samina proposed a survey to improve data analysis for feature extraction and feature selection method. Kanti Singha Roy used different application bigram and Chi-square with NB classifier in 2014. They observed that n-gram application enhances the performance of classifier [13]. There are several bug-tracking systems (BTS) like Bugzilla, Fog, ikiwiki, and Jira which allow developer to report bug for improvement in the quality of the software by finding the accurate solution [14]. Later in 2015, Tian proposed a new approach which shows that existing automated approaches perform 77–86% agreement with manually assigned severity level for the given unreliable nature of the data [15]. A concept was suggested in 2015 by Zhang and Yang which analyzed historical bug reports. They considered two projects, namely Eclipse and Mozilla Firefox, for measuring the performance of their model. Their approach effectively recommended the appropriate developer to fix the given bug and found that it worked efficiently for bug severity prediction [16]. Jin and Lee in 2016 gave an approach which gave better result on five open-source project datasets for bug severity prediction using NB classifier on textual and metadata [17]. Pandey used various machine learning algorithms and gained 75% to 83% accuracy [18]. Wang and Liu introduced ranking-based strategy using ensemble feature selection. This approach improved the performance in terms of F-measure by 54.76% [19].

35 Assessing the Severity of Software Bug Using Neural Network

495

35.3 Data Description Eclipse is an open-source integrated development environment (IDE). The procedure for the classification on the basis of severity of the bug reports instances of Eclipse is described below. The four components from Eclipse project, namely UI, Debug, Core and Text, from bug reporting system Bugzilla are taken as shown in Table 35.1. The reports are collected in CSV format, and some attributes were fixed during the collection of the bug reports Table 35.2. In a study, Yang used historical bug reports for triaging and severity prediction. He identified reports with similar features like product, severity, component, and priority in correspondence with new reports [20]. The bug reports contain all the levels of bug severity. Since enhancement indicates the renovation of the features or incorporation of the new feature, therefore enhancement related bugs are removed. The bugs under normal severity levels are also removed because they might distract the classifiers and because this level indicates that the user is not confident regarding the severity level of that bug. We have ranked the other levels on the basis of severity as shown in Table 35.3. In this paper, the brief of the bug report explained by user in summary column is taken and then preprocessing is done using text mining approach in Python. We classify the bug reports into binary levels, namely severe and non-severe. Under Table 35.1 Description of data Components

Description

UI

Java IDE user interface

Debug

Debug support for Java

Core

Java IDE headless infrastructure

Text

Java editing support

Table 35.2 Attribute selection Attributes

Description

Status

Resolved, verified, closed

Resolution

Fixed, worksforme

Table 35.3 Ranking of severity levels Components UI Debug

Blocker (I)

Critical (II)

Major (III)

Minor (IV)

Trivial (V)

Total

5

47

190

119

74

435

21

80

98

55

40

244

Text

7

41

211

146

71

476

Core

14

43

146

94

24

321

496

R. Bibyan et al.

severe category, we have considered reports with critical, blocker, and major severity levels. Under non-severe category, we have considered reports with minor and trivial severity level.

35.4 Methodology Several steps for preprocessing of data are applied on the textual summary to extract features. • Tokenization: It is the process of segmentation of textual data into words. These words are named as tokens. It simply removes all the commas, brackets, punctuation, and other symbols or characters. • Stop word removal: Some words are repeated having no different meaning like articles, conjunctions, and prepositions are called stop words. The stop words are removed from the bug summary in this process. • Stemming: In this step, affixes are removed from a word, ending up with root node. For example the word “fixing,” ing suffix can be removed. This is done to reduce textual data. • Length reduction: The length of the token or term is observed. Tokens having maximum length 50 and minimum length 4 are chosen. After the preprocessing of the data, feature extraction is done using term frequency–inverse term frequency (TF-IDF). TF * IDF (TERM frequency * Inverse document frequency): This is the weight to determine the occurrence as well importance of the term in a document. It is the product of term frequency and inverse term frequency. The term frequency is defined as number of times a term present in a document, and it is normalized by dividing the frequency by total number of the terms in a document to remove the biasness. The inverse document frequency reflects that how significant a word is to a document. It is calculated by taking the logarithm of the quotient obtained by dividing the number of all documents by number of documents containing the term. TF is greater when the term is frequent, and IDF is greater when the term is rare in the document, i.e., higher the term frequency, lower is the inverse document frequency. IDFi = LOG (n/d f i ); i = 1, 2, 3 . . . n.

(35.1)

where dfi Is the document frequency n Is total number of documents Weight = Wi = TFi ∗ IDFi

(35.2)

35 Assessing the Severity of Software Bug Using Neural Network

497

We have selected top 150 words on the basis of their weights as they gave the maximum accuracy. Later a dictionary of 150 words is created specifying the bug severity. We train neural network using the dictionary of terms in Python with testing and training by 10-k fold cross-validation approach. The procedure for the neural network is given below: • Start with assigning random weights to all the linkages. W ljk weight from kth neuron (in l − 1 layer) to the jth neuron in the lth layer. • Each input is multiplied with their corresponding weights and summed up. If the summed weight is equal to 0, we add bias. • Find the activation rate between the input and the linkages (between input and hidden node). The equation to find the activation rate of H 1 is given as: 2 2 2 ∗ I1 + W12 ∗ I2 + W13 ∗ I3 + f P(H1 ) = W11   (− f ) = 1/ 1 + e

(35.3)

We assume the logistic sigmoid function between activation rate of hidden nodes and input nodes or between activation rate of output nodes and hidden nodes to introduce nonlinearity in the model. The reason behind using sigmoid function is that it is computationally easy to perform as it satisfies a property between the derivative and itself. • After finding the activation rate between the linkages between the input and hidden node, use it to find the activation rate of output nodes. • Calculate the error rate at the output node and revise all the linkages between output and hidden nodes. 3 3 ∗ Error at O1 + W21 ∗ Error at O2 Error at H1 = W11

(35.4)

• Cascade down the error to hidden node by using the error rate of output nodes and weights. • Revise the weights between hidden and input nodes. The process is done over and over again until convergence criterion is met, and the activation rate of output nodes is calculated using final linkage weights. In this study, we have estimated the performance measures for the classifier neural network using confusion matrix (Table 35.4). Different measures are explained in Table 35.5. The parameter settings for different operators are given in Table 35.6.

35.5 Result and Conclusion The bug report attributes like severity of the component and summary were promoted as part of our experiment. The bug reports from four components of Eclipse project

498

R. Bibyan et al.

Table 35.4 Confusion matrix Severe (S)

Non-severe (NS)

Severe (S)

True positive (TP)

False negative (FN)

Non-severe (NS)

False positive (FP)

True negative (TN)

Here, P severe bug, N non-severe bug, TP bug is severe and predicted as severe, FN bug is severe but predicted as non-severe, TN bug is non-severe and predicted as non-severe, FP bug is non-severe but predicted as severe Table 35.5 Performance measures Measures

Formula

Description

Accuracy

(TP + TN)/(TP + FP + TN + FN)

Classifier effectiveness w.r.t. TP and TN

Precision

TP/(TP + FP)

Positively classified severity level with in the dataset

Recall

TP/(TP + FN)

Sensitivity of the dataset

F-measure

(2 * Recall * Precision)/(Recall + Precision)

Test accuracy measure

Table 35.6 Optimal parameters

Process documents from files Vector creation

TF-IDF

Tokenize Characters

[# “”(), ==;: “.”]

Transform case Transform to

Lower case

Filter tokens Min length of token

4

Max length of token

50

Validation (X-validation) Number of validations

10

Sampling type

Stratified sampling

NNET Training features

150

Threshold value

60%

Hidden layer

1

Activation function

Logistic sigmoid

35 Assessing the Severity of Software Bug Using Neural Network

499

Table 35.7 Confusion matrix for each component UI

S

NS

S

NS

S

NS

S

NS

S

29

18

17

7

5

12

30

12

6

38

9

35

4

41

16

42

NS

Core

Debug

Text

were chosen to achieve the goal to enhance the accuracy of bug severity. The major role is played by the severity and summary of the bug reports to decide whether the bug is severe or non-severe. We have conducted the experiment by creating dictionary for each component by selecting top 150 words. These words contribute by classifying the severity of bug instances as severe and non-severe of each component. We have validated the model using neural network with one hidden layer and activation function (logistic sigmoid). The threshold value was set to 60% to evaluate the accuracy of each component using classifier neural network. Table 35.7 shows the confusion matrix of four components of Eclipse: UI, Core, Debug, and Text. The performance measures for each component of Eclipse using confusion matrix is given in Table 35.8. Figure 35.4 shows the comparison of accuracy for different components of Eclipse using neural network. The minimum accuracy is 72% attained for component text, and maximum is 76.47% for component core. The accuracy for different components of Eclipse lies in the range of 72–77%. The precision, recall, and F-measure values with respect to severe bugs are 68, 62, and 71% and for non-severe bugs are 83, 86, and 76% for component UI. The precision, recall, and F-measure values with respect to severe bugs are 76, 78.05, and 54.35% and for non-severe bugs are 88.04, 88.57, and 68.83% for component Core. The precision, recall, and F-measure values with respect to severe bugs are 56, 55.56, and 71% and for non-severe bugs are 77, 74, and 76.533% for component Debug. The precision, recall, and F-measure values with respect to severe bugs are 65, 71, and 68% and for non-severe bugs are 78, 72, and 72% for component Text.

73.63

76.47

74.19

72

UI

Core

Debug

Text

Accuracy (%)

Components

65

56

76 78

77

88.04

83

71

55.56

78.05

62

Severe (%)

68

Recall

Severe (%)

Non-severe (%)

Precision

Table 35.8 Performance measures of different components using neural network

72

74

88.57

86

Non-severe (%)

F-measure

68

71

54.35

71

Severe (%)

72

76.35

68.83

76

Non-severe (%)

500 R. Bibyan et al.

35 Assessing the Severity of Software Bug Using Neural Network

501

78.00%

Accuracy

76.00%

UI

74.00%

DEBUG

72.00%

CORE TEXT

70.00% 68.00%

Components

Fig. 35.4 Comparison of accuracy for different components of eclipse

References 1. Menzies T, Marcus A (2008) Automated severity assessment of software defect reports. In: IEEE international conference on software maintenance, 2008. ICSM 2008. IEEE, pp. 346–355 2. Lamkanfi A, Demeyer S, Giger E, Goethals B (2010) Predicting the severity of a reported bug. In: 2010 7th IEEE working conference on mining software repositories (MSR). IEEE, pp 1–10 3. Lamkanfi A, Demeyer S, Soetens QD, Verdonck T (2011) Comparing mining algorithms for predicting the severity of a reported bug. In: CSMR, Carl von Ossietzky University, Oldenburg, Germany. IEEE, New York, pp 249–258 4. Gegick M, Rotella P, Xie T (2010) Identifying security bug reports via text mining: an industrial case study. In: 2010 7th IEEE working conference on mining software repositories (MSR). IEEE, pp 11–20 5. Ghaluh Indah Permata S (2012) An attribute selection for severity level determination according to the support vector machine classification result. In: Proceedings of the international conference on information system business competitiveness 6. Neelofar, Javed MY, Mohsin H (2012) An automated approach for software bug classification. In: 2012 sixth international conference on complex, intelligent and software intensive systems (CISIS). IEEE, pp 414–419 7. Sharma M, Bedi P, Chaturvedi KK, Singh VB (2012) Predicting the priority of a reported bug using machine learning techniques and cross project validation. In: 2012 12th international conference on intelligent systems design and applications (ISDA). IEEE, pp 539–545 8. Chaturvedi KK, Singh VB (2012) An empirical comparison of machine learning techniques in predicting the bug severity of open and closed source projects. Int J Open Source Softw Process (IJOSSP) 4(2):32–59 9. Chaturvedi KK, Singh VB (2012) Determining bug severity using machine learning techniques. In: 2012 CSI sixth international conference on software engineering (CONSEG). IEEE, pp 1–6 10. Tian Y, Lo D, Sun C (2012) Information retrieval based nearest neighbor classification for fine-grained bug severity prediction. In: 2012 19th working conference on reverse engineering (WCRE). IEEE, pp 215–224 11. Tian Y, Lo D, Sun C (2013) Drone: predicting priority of reported bugs by multi-factor analysis. In: 2013 IEEE international conference on software maintenance. IEEE, pp 200–209 12. Nagwani NK, Verma S, Mehta KK (2013) Generating taxonomic terms for software bug classification by utilizing topic models based on Latent Dirichlet allocation. In: 2013 11th international conference on ICT and knowledge engineering (ICT&KE). IEEE, pp 1–5 13. Roy NKS, Rossi B (2014) Towards an improvement of bug severity classification. In: 2014 40th EUROMICRO conference on software engineering and advanced applications (SEAA). IEEE, pp 269–276 14. 15 most popular bug tracking software to ease your defect management process. http://www. softwaretestinghelp.com/popular-bugtracking-software/, 12 Feb 2015

502

R. Bibyan et al.

15. Tian Y, Ali N, Lo D, Hassan AE (2016) On the unreliability of bug severity data. Empirical Softw Eng 21(6):2298–2323 16. Zhang T, Yang G, Lee B, Chan AT (2015) Predicting severity of bug report by mining bug repository with concept profile. In: Proceedings of the 30th annual ACM symposium on applied computing. ACM, pp 1553–1558 17. Jin K, Dashbalbar A, Yang G, Lee JW, Lee B (2016) Bug severity prediction by classifying normal bugs with text and meta-field information. Adv Sci Technol Lett 129:19–24 18. Pandey N, Sanyal DK, Hudait A, Sen A (2017) Automated classification of software issue reports using machine learning techniques: an empirical study. Innov Syst Softw Eng 13(4):279–297 19. Liu W, Wang S, Chen X, Jiang H (2018) Predicting the severity of bug reports based on feature selection. Int J Softw Eng Knowl Eng 28(04):537–558 20. Yang G, Zhang T, & Lee B (2014) Towards semi-automatic bug triage and severity prediction based on topic model and multi-feature of bug reports. In: 2014 IEEE 38th annual computer software and applications conference (COMPSAC). IEEE, pp 97–106

Chapter 36

Optimal Refill Policy for New Product and Take-Back Quantity of Used Product with Deteriorating Items Under Inflation and Lead Time S. R. Singh and Karuna Rana

36.1 Introduction Environmental problems are having justifiable popularity among civilization, global. The demand of consumer for clean manufacturing and reutilizing is increasing. They presume to trade in an old product as soon as they start bargaining a new one. Therefore, for the previous few decades, the reverse flow of items from consumers to upstream dealings has received abundant interest. Also, consumers would like to buy from companies with green image. That is why, recovery of used materials and items has got more attention in past decade. Earlier, recycling and reusing were narrow to commonly used items like metal and glass. In real life, deterioration of several items such as chemicals, IC chip, blood banks, medicines, volatile liquids, electronic gears, and so on for the duration of storage epoch is a point. In general, deterioration is defined as the loss, spoilage, decay, evaporation, and oldness of deposited items and the situation results in decreasing effectiveness. Consequently, the control and repairing of inventories of deteriorating items have become a vital problem for assessment creators in modern organizations. Numerous scholars have considered deteriorating inventory earlier. Goyal and Giri [4] also provided an exhaustive review of deteriorating inventory literatures. Koh et al. [7] have discussed eco-friendly items with the simple recovery process. Singh et al. [12] have discussed perishable inventory model with quadratic demand, partial backlogging, and permissible delay in payments. Inflation is a concept closely linked to epoch. Inflation is usually associated with rapidly rising prices which decrease the purchase of money that varies depending on the time, and it depends to a great extent. A small amount of inflation can affect the S. R. Singh · K. Rana (B) Department of Mathematics, CCS University, Meerut, India e-mail: [email protected] S. R. Singh e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 P. K. Kapur et al. (eds.), Strategic System Assurance and Business Analytics, Asset Analytics, https://doi.org/10.1007/978-981-15-3647-2_36

503

504

S. R. Singh and K. Rana

economy positively. It is a thought of everyone that inflation is not good but it is not necessarily so. Inflation affects different people in different ways. Buzacott [1] was the master who is credited with the exploration of an unknown alley when he first put forward his study on inflation. Hwang and Shon [5] developed the management of decaying items under inflation. Dutta and Pal [3] derived the effects of inflation and time value of money on an inventory model with linear time-dependent demand rate and shortages. Yang et al. [16] also deliberated inflation in their study. Singh et al. [13] derived the two-warehouse inventory model for deteriorating items with shortages under inflation and time-value of money. Singh and Singh [14] developed imperfect production process with exponential demand rate, Weibull deterioration under inflation. In their paper, the demand is fulfilled by a new product and recycled old products. Kannan et al. [6] have established multiechelon closed-loop supply chain network model for a multiperiod and multiproducts. They have studied the case of battery recycling where old battery material is used in the production of new batteries. Rajoria et al. [8] have derived EOQ model for deteriorating items with power demand pattern. Moreover, the effect of inflation and time value of money are introduced in this model. Inflation is a global phenomenon in present-day times. Recently, Chen et al. [2] have developed models for the retailer, assuming that the retailer trades the new items as well as collects the used items. Shastri et al. [10] presented supply chain management for two-level trade credit financing with selling price-dependent demand. Singh and Sharma [11] derived a production reliable model for deteriorating products with random demand and inflation. Saxena et al. [15] presented a green supply chain model of vendor and buyer for remanufacturing. The present paper is an extension of the inventory system of Shah and Vaghela [9]. In the presented work, we have presumed that a retailer trades the new item to the customers in addition to gathers and trades the used items. The situation is presumed that the demand rate is pricedependent quadratic demand purpose for deteriorating items. The return of used items as a price and linearly time-varying purpose are discussed. To create our study as more apt to present-day market, we have done our research in an inflationary atmosphere. The optimal pricing, the ordering quantity of new items, and the used item’s optimal quantity are discussed, where customer’s demand is sensitive to time and the retail price. The total profit is maximized with respect to selling price and cycle time. The rest of the paper is structured as follows: Sects. 36.2 and 36.3 start with notations, assumptions, and mathematical model followed by the proposed model. The solution procedure is outlined in Sect. 36.4. Section 36.5 provides numerical results, sensitivity analysis, and managerial insights. Finally, the conclusions and future research directions are drawn in Sect. 36.6.

36 Optimal Refill Policy for New Product and Take-Back …

505

36.2 Assumptions and Notation 36.2.1 Assumptions 1. 2. 3. 4.

The inventory system comprises only single type of product. The replenishment is instantaneous and planning horizon is infinite. The Lead time is zero and shortages are not permissible. The deterioration rates are considered for new product and used product, respectively. The inventory system for deteriorating items has been an article of study for a long period; however, little is known about the effect of investing in reducing the rate of item deterioration and their momentous influence in the professional. 5. The time value of money and inflation are measured. Inflation is usually connected with quickly rising prices which decrease the purchase of money that varies depending on the time, and it depends to a great extent.   6. The Demand rate of new product R(p, t) is taken as, R(p, t) = α 1 + α1 t − α2 t 2 − βp where α > 0 represents the scale demand and α1 , α2 > 0. The parameter β > 0 represents the price elasticity. 7. The return rate of used product is deliberated as Ru ( p, t) = a (1 − bt) – p(1 − p0 ) where a, b > 0, and p0 are the parameters connected with price for the used product.

36.2.2 Notation A C hn hu Qn Qu θ1 θ2 r u v d T τ P R(p, t) Ru (p, t) I(t)

Ordering cost for retailer ($/order) Purchase cost per item (constant), ($/order) Inventory holding cost per unit item for new item ($/unit) Inventory holding cost per unit item for used item ($/unit) The replenishment quantity for new item The used item’s quantity The deterioration rate of new item The deterioration rate of used item inflation rate per unit time, where 0 ≤ r < 1. Deterioration cost of used item per unit per time Deterioration cost of new item per unit per time The discount rate of used item The replenishment time (a decision variable) (years) The point of time when collection of used items starts (years) Selling price per item (a decision variable) ($/unit) Demand rate for new item at t ≥ 0 (units) Demand rate for used item at t ≥ τ (units) Inventory level at time t ≥ 0 for new product (units)

506

S. R. Singh and K. Rana

Fig. 36.1 Graphical presentation of the inventory system

Iu (t) Inventory level at time t ≥ τ for used product (units) π (p, T ) Total profit of the retailer during cycle time (in $)

36.3 Mathematical Model In this model, we present the general originations and solutions to the inventory models for a new product as well as for the used item. For the new item, the inventory is used up due to time and price-dependent demand. Suppose Qn is the ordering quantity to be sold for the duration of cycle time [0, T ], then the status of the inventory at any instant of time t, where 0 ≤ t ≤ T is governed by the following differential equation. dI (t) + θ1 I (t) = −R( p, t), 0 ≤ t ≤ T dt

(36.1)

With boundary condition I(T ) = 0 and I(0) = Q, the solution of the above Eq. (36.1) is   α1 (T 2 − t 2 ) α2 (T 3 − t 3 ) − (T − t) + 2 3  2  2 3 3 (T − t ) α1 (T − t ) α2 (T 4 − t 4 ) e−θ1 t +θ1 + − 2 3 4

I (t) = α

36 Optimal Refill Policy for New Product and Take-Back …

507

 θ1 (T 2 − t 2 ) −θ1 t e − βp (T − t) + 2  2    T α1 T 2 α2 T 3 α1 T 3 α2 T 4 + θ1 Q=α T + − + − 2 3 2 3 4   2 θ1 T −βp T + 2 

(36.2)

(36.3)

Now, for the used product in the period [τ , T ], the inventory level is precious and by the return rate of the used item, the status of the inventory at any instant of time t, where τ ≤ t ≤ T is governed by the following differential equation. dIu (t) + θ2 Iu (t) = −Ru ( p, t), τ ≤ t ≤ T dt

(36.4)

With boundary conditions Iu (τ ) = Q u and Iu (T ) = 0, the solution of the above differential Eq. (36.4) is given by 

θ2 (T 2 − t 2 ) b(T 2 − t 2 ) bθ2 (T 3 − t 3 ) − − I (t) = a (T − t) + 2 2 3   2 2 θ2 (T − t ) −θ2 t e − p(1 − p0 ) (T − t) + 2



e−θ2 t (36.5)

From the boundary conditions, the quantity of used product Qu is given by   θ2 (T 2 − τ 2 ) b(T 2 − τ 2 ) bθ2 (T 3 − τ 3 ) − − e−θ2 τ Q u = a (T − τ ) + 2 2 3   θ2 (T 2 − τ 2 ) −θ2 τ e − p(1 − p0 ) (T − τ ) + (36.6) 2 Now, to calculate total profit, we calculate all the components for both new products and used products. The components of profit function of the inventory system for new products are as follows. T SRn = Sales revenue from new product = T1 0 p R( p, t) e−r t dt   βp 2 (2 − r T ) α1 (3T − 2r T 2 ) α2 (4T 2 − 3r T 3 ) + − SRn = pα 1 − r T + 6 12 2 (36.7) PCn = Purchase cost =

cQ n T

(36.8)

A T

(36.9)

OCn = Ordering cost =

508

S. R. Singh and K. Rana

HCn = Holding cost =

hn T

T

I (t) e−r t dt

0

⎤     2  3  (θ1 + r)T 2 (θ1 + r)T 3 (θ1 + r )T 4 T T T h − + α − − α − α 1 2 ⎥ ⎢ n 2 6 3 8 4 10 ⎥ ⎢ ⎢ 

⎥    3 ⎥ ⎢ 2 3 4 4 5 T T T (θ1 + r)T (θ1 + r)T (θ1 + r )T ⎥ ⎢ + α1 − α2 HCn = ⎢ + h n αθ1 − − − ⎥ ⎥ ⎢ 3 8 4 10 5 12 ⎥ ⎢ ⎥ ⎢     ⎦ ⎣ T2 (θ1 + r)T 2 (θ1 + r )T 3 T − − h n βpθ1 − − h n βp 2 6 3 8 ⎡

(36.10) DCn = Deterioration cost =

v T

T 0

θ1 I (t) e−r t dt

  

T T2 T3 (θ1 + r )T 2 (θ1 + r )T 3 (θ1 + r )T 4 − + α1 − − α2 − 2 6 3 8 4 10   

2 3 3 4 4 (θ1 + r )T (θ1 + r )T (θ1 + r )T 5 T T T + h n αθ12 − + α1 − − α2 − 3 8 4 10 5 12  

2 2 3 T T (θ1 + r )T (θ1 + r )T − − h n βpθ12 − − h n βpθ1 (36.11) 2 6 3 8

DCn = vαθ1

The components of profit function for the used product are as below. T SRu = Sales revenue from used product = T1 τ p(1 − p0 )Ru ( p, t) e−r t dt   r (T 2 − τ 2 ) b(T 2 − τ 2 ) br (T 3 − τ 3 ) ap(1 − p0 ) (T − τ ) − − + SRu = T 2 2 3   r (T 2 − τ 2 ) p 2 (1 − p0 )2 (T − τ ) − (36.12) − T 2 PCu = Purchase cost = HCu = Holding cost =

1 T

T τ

C Q u (1 − d) T −τ

(36.13)

h u Iu (t) e−r t dt





⎢ ⎡ ⎤ ⎢ (θ2 + r )(T 2 − τ 2 ) (T 2 − τ 2 ) (θ2 + r )(T 3 − τ 3 ) ⎢ (T − τ ) + − − ⎢ ⎢ ⎥ 2 2T 3T ⎢ ⎢    ⎥ ⎢ ⎢ ⎢ (θ2 + r )T τ 2 b (θ2 + r )T τ 2 ⎥ θ2 (θ2 + r )T 3 (θ2 + r )T 3 ⎢ ⎥ 2 2 ⎢ T − − Tτ + − T − − Tτ + ⎢ + ⎥ ⎢ 2 2 2 2 2 2 ⎢ ⎥ ⎢ ⎥ ⎢ ah u ⎢  3   3  ⎢ ⎥ 3 4 4 3 4 4 ⎢ θ2 (T − τ ) (θ2 + r )(T − τ ) b (T − τ ) (θ2 + r )(T − τ ) ⎢ ⎥ ⎢ ⎢ − ⎥ − + − ⎢ ⎢ ⎥ 2 3T 4T 2 3T 4T HCu = ⎢ ⎢ ⎥ ⎢

 ⎢   ⎥ ⎢ 4 4 5 5 4 2 2 ⎣ ⎦ (θ (θ bθ2 (T − τ ) (θ2 + r )(T − τ ) bθ2 + r )T + r )T τ ⎢ 2 2 + − − T3 − − T 2τ + ⎢ ⎢ 3 4T 5T 3 2 2 ⎢ ⎢    2  ⎢ (θ2 + r )T 2 (θ2 + r )τ 2 (T − τ 2 ) (θ2 + r )(T 3 − τ 3 ) ⎢ − p(1 − p )h T− −τ + − − 0 u ⎢ 2 2 2T 3T ⎢ ⎢   ⎣ θ2 (θ2 + r )T 3 (θ2 + r )T τ 2 (T 3 − τ 3 ) (θ2 + r )(T 4 − τ 4 ) 2 + T − − Tτ + − + 2 2 2 3T 4T

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

(36.14)

36 Optimal Refill Policy for New Product and Take-Back …

DCu = Deterioration cost =

w T

T τ

509

θ2 Iu (t) e−r t dt

⎡ ⎤⎤ (T 2 − τ 2 ) (θ2 + r )(T 3 − τ 3 ) (θ2 + r )(T 2 − τ 2 ) − − (T − τ ) + ⎢ ⎢ ⎥⎥ 2 2T 3T ⎢ ⎢ ⎥⎥   ⎢ ⎢ 3 2 3 2 ⎥⎥ (θ b (θ θ (θ (θ + r )T + r )T τ + r )T + r )T τ ⎢ 2 2 2 2 2 ⎢ + ⎥⎥ ⎢ T2 − − Tτ + − T2 − − Tτ + ⎢ ⎥⎥ ⎢ 2 2 2 2 2 2 ⎢ ⎥⎥ ⎢ ⎢ ⎥⎥  3    ⎢ aθ2 w ⎢ ⎥⎥ ⎢ θ2 (T − τ 3 ) (θ2 + r )(T 4 − τ 4 ) b (T 3 − τ 3 ) (θ2 + r )(T 4 − τ 4 ) ⎢ ⎥⎥ ⎢ ⎢ ⎥⎥ − − + − ⎢ ⎢ ⎥⎥ 2 3T 4T 2 3T 4T ⎢ ⎢ ⎥⎥

 ⎢ ⎥⎥ DCu = ⎢   ⎢ 4 4 5 5 4 2 2 ⎣ ⎦⎥ (θ2 + r )T (θ2 + r )T τ bθ2 (T − τ ) (θ2 + r )(T − τ ) bθ2 ⎥ ⎢ 3 2 + − − T − −T τ + ⎥ ⎢ ⎥ ⎢ 3 4T 5T 3 2 2 ⎥ ⎢ ⎥ ⎢     2 2 2 − τ 2) 3 − τ 3) ⎥ ⎢ (θ (T (θ (θ + r )T + r )τ + r )(T 2 2 2 ⎥ ⎢ − p(1 − p0 )θ2 w T − − τ + − − ⎥ ⎢ 2 2 2T 3T ⎥ ⎢ ⎥ ⎢   3 2 3 − τ 3) 4 − τ 4) ⎦ ⎣ θ2 (θ (θ (T (θ + r )T + r )T τ + r )(T 2 2 2 + T2 − − Tτ + − + 2 2 2 3T 4T ⎡

(36.15) So the total cost of the system per unit time is given by π ( p, T ) = (SRn − HCn − PCn − OCn − DCn ) + (SRu − HCu − DCu − PCu )  α1 (3T − 2r T 2 ) α (4T 2 − 3r T 3 ) βp 2 (2 − r T ) cQ n A + 2 − − − ⎢ pα 1 − r T + 6 12 2 T T ⎢ ⎢ 



 ⎢ ⎡ ⎤ ⎢ (θ1 + r )T 2 (θ1 + r )T 3 (θ1 + r )T 4 T T2 T3 ⎢ h α − + α − − α − n ⎢ 1 2 ⎢ ⎥ 2 6 3 8 4 10 ⎢ ⎢ ⎥ ⎢ ⎢ 



⎥ ⎢ ⎢ ⎥ 3 4 5 3 4 2 ⎢ ⎢ ⎥ (θ + r )T (θ + r )T (θ + r )T T T T 1 1 1 ⎢ − ⎢ + h αθ ⎥ − + α − − α − n 1 ⎢ 1 2 ⎢ ⎥ 3 8 4 10 5 12 ⎢ ⎢ ⎥ ⎢ ⎢ ⎥ 



⎢ ⎢ ⎥ 2 3 2 ⎢ ⎣ ⎦ (θ1 + r )T (θ1 + r )T T T ⎢ − − − h n βpθ1 − h n βp ⎢ 2 6 3 8 π( p, T ) = ⎢ ⎢ ⎢ 



 ⎡ ⎤ ⎢ (θ + r )T 2 (θ + r )T 3 (θ + r )T 4 T T2 T3 ⎢ − 1 − 1 − 1 + α1 − α2 ⎢ ⎢ vαθ1 ⎥ ⎢ 2 6 3 8 4 10 ⎢ ⎥ ⎢ ⎢ ⎢ 



⎥ ⎢ ⎥ ⎢ 3 4 5 3 4 2 ⎢ ⎥ (θ + r )T (θ + r )T (θ + r )T T T T ⎢ 2 1 1 1 ⎢ + h n αθ ⎥ − − − + α1 − α2 ⎢ −⎢ ⎥ 1 ⎢ 3 8 4 10 5 12 ⎢ ⎥ ⎢ ⎢ ⎥ ⎢  

⎢ ⎥ ⎢ 3 2 ⎣ ⎦ (θ1 + r )T 2 (θ + r )T T T ⎢ ⎢ − − h n βpθ12 − 1 − h n βpθ1 ⎢ 2 6 3 8 ⎣ ⎡

510

S. R. Singh and K. Rana  ⎤ r (T 2 − τ 2 ) b(T 2 − τ 2 ) br (T 3 − τ 3 ) − + ⎥ 2 2 3 ⎥ ⎥  ⎥ 2 2 2 2 ⎥ p (1 − p0 ) r (T − τ ) C Q u (1 − d) ⎥ (T − τ ) − − − ⎥ T 2 T −τ ⎥ ⎥ ⎥ ⎡ ⎤ ⎤ ⎡ ⎥ (θ2 + r )(T 2 − τ 2 ) (θ2 + r )(T 3 − τ 3 ) (T 2 − τ 2 ) ⎥ (T − τ ) + − − ⎢ ⎥⎥⎥ ⎢ 2 2T 3T ⎢ ⎥ ⎥⎥ ⎢ ⎢



⎥⎥⎥ ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ 2 2 3 3 (θ + r )T τ (θ + r )T τ θ (θ + r )T (θ + r )T b ⎢ ⎥⎥⎥ ⎢ − Tτ + 2 − Tτ + 2 − T2 − 2 ⎢ + 2 T2 − 2 ⎥⎥⎥ ⎢ ⎢ ⎥⎥⎥ 2 2 2 2 2 2 ⎢ ⎢ ⎥⎥⎥ ⎢ ⎥⎥⎥ ⎢ ah u ⎢



 ⎢ ⎥⎥⎥ ⎢ (θ + r )(T 4 − τ 4 ) (θ + r )(T 4 − τ 4 ) θ (T 3 − τ 3 ) b (T 3 − τ 3 ) ⎢ ⎥⎥⎥ ⎢ ⎢ − 2 ⎥⎥⎥ − 2 − 2 + ⎢ ⎢ ⎥⎥⎥ ⎢ 2 3T 4T 2 3T 4T ⎢ ⎥⎥⎥ ⎢ ⎢ ⎥⎥⎥ ⎢



 −⎢ ⎢ ⎥⎥⎥ 5 5 4 2 2 4 4 (θ2 + r )(T − τ ) bθ2 (θ2 + r )T (θ2 + r )T τ bθ2 (T − τ ) ⎣ ⎦⎥⎥ ⎢ 3 2 ⎥⎥ ⎢ − − T − −T τ + + ⎥⎥ ⎢ 3 4T 5T 3 2 2 ⎥⎥ ⎢ ⎥⎥ ⎢   ⎥⎥ ⎢ (θ2 + r )T 2 (θ2 + r )τ 2 (θ2 + r )(T 3 − τ 3 ) (T 2 − τ 2 ) ⎥⎥ ⎢ ⎥⎥ ⎢ − p(1 − p0 )h u T − −τ + − − ⎥⎥ ⎢ 2 2 2T 3T ⎥⎥ ⎢ ⎥⎥ ⎢ +

 ⎥⎥ ⎢ 2 4 4 3 3 3 (θ2 + r )T τ (θ2 + r )(T − τ ) (θ2 + r )T θ2 (T − τ ) ⎦⎥ ⎣ 2 ⎥ − Tτ + − + T − + ⎥ 2 2 2 3T 4T ⎥ ⎥ ⎡ ⎤ ⎥ ⎡ ⎤ ⎥ (θ2 + r )(T 2 − τ 2 ) (θ2 + r )(T 3 − τ 3 ) (T 2 − τ 2 ) ⎥ (T − τ ) + − − ⎢ ⎥⎥ ⎥ ⎢ 2 2T 3T ⎢ ⎥ ⎢ ⎥⎥ ⎢



⎥⎥ ⎥ ⎢ ⎢ ⎥ ⎢ ⎥ 2 2 3 3 ⎥ (θ + r )T τ (θ + r )T τ θ (θ + r )T (θ + r )T b ⎢ ⎥⎥ ⎥ ⎢ − Tτ + 2 − Tτ + 2 − T2 − 2 ⎢ + 2 T2 − 2 ⎥⎥ ⎥ ⎢ ⎢ ⎥⎥ ⎥ 2 2 2 2 2 2 ⎢ ⎢ ⎥⎥ ⎥ ⎢ ⎥⎥ ⎥ ⎢ aθ2 w⎢



 ⎢ ⎥⎥ ⎥ ⎢ (θ2 + r )(T 4 − τ 4 ) (θ2 + r )(T 4 − τ 4 ) θ2 (T 3 − τ 3 ) b (T 3 − τ 3 ) ⎢ ⎥⎥ ⎥ ⎢ ⎢ ⎥⎥ ⎥ − − + − ⎢ ⎢ ⎥⎥ ⎥ ⎢ 2 3T 4T 2 3T 4T ⎢ ⎥⎥ ⎥ ⎢ ⎢ ⎥⎥ ⎥ ⎢



 −⎢ ⎢ ⎥⎥ ⎥ 5 5 2 2 4 4 4 (θ2 + r )(T − τ ) (θ2 + r )T τ bθ2 (θ2 + r )T bθ2 (T − τ ) ⎣ ⎦⎥ ⎥ ⎢ 2 3 ⎢ ⎥⎥ − −T τ + − T − + ⎢ ⎥⎥ 3 4T 5T 3 2 2 ⎢ ⎥⎥ ⎢ ⎥⎥   ⎢ ⎥⎥ (θ2 + r )T 2 (θ2 + r )τ 2 (θ2 + r )(T 3 − τ 3 ) (T 2 − τ 2 ) ⎢ ⎥⎥ ⎢ − p(1 − p0 )θ2 w T − ⎥⎥ −τ + − − ⎢ ⎥⎥ 2 2 2T 3T ⎢ ⎥⎥ ⎢ ⎥⎥

 ⎢ ⎥⎥ 2 4 4 3 3 3 (θ2 + r )T τ (θ2 + r )(T − τ ) (θ2 + r )T θ2 (T − τ ) ⎣ ⎦⎦ 2 − Tτ + − + T − + 2 2 2 3T 4T ap(1 − p0 ) T



(T − τ ) −

(36.16)

36.4 Solution Procedure The calculated value of total cost π (p, T ) is a function of two variables p and T. For a given values of p and T, the necessary condition for maximizing the total cost are ∂ ∂ π( p, T ) = 0 and π( p, T ) = 0 ∂p ∂T

(36.17)

Furthermore, the following conditions must be satisfied ∂2 ∂2 π( p, T ) < 0, π( p, T ) < 0 ∂ p2 ∂T 2  2  2   2 2 ∂ π ∂ π ∂ π −