126 24
English Pages 472 [453] Year 2021
Lecture Notes on Data Engineering and Communications Technologies 70
Neha Sharma Amlan Chakrabarti Valentina Emilia Balas Alfred M. Bruckstein Editors
Data Management, Analytics and Innovation Proceedings of ICDMAI 2021, Volume 1
Lecture Notes on Data Engineering and Communications Technologies Volume 70
Series Editor Fatos Xhafa, Technical University of Catalonia, Barcelona, Spain
The aim of the book series is to present cutting edge engineering approaches to data technologies and communications. It will publish latest advances on the engineering task of building and deploying distributed, scalable and reliable data infrastructures and communication systems. The series will have a prominent applied focus on data technologies and communications with aim to promote the bridging from fundamental research on data science and networking to data engineering and communications that lead to industry products, business knowledge and standardisation. Indexed by SCOPUS, INSPEC, EI Compendex. All books published in the series are submitted for consideration in Web of Science.
More information about this series at http://www.springer.com/series/15362
Neha Sharma · Amlan Chakrabarti · Valentina Emilia Balas · Alfred M. Bruckstein Editors
Data Management, Analytics and Innovation Proceedings of ICDMAI 2021, Volume 1
Editors Neha Sharma Analytics and Insights Tata Consultancy Services Pune, India Valentina Emilia Balas Aurel Vlaicu University of Arad Arad, Romania
Amlan Chakrabarti A.K.Choudhury School of Information Technology Kolkota, West Bengal, India Alfred M. Bruckstein Faculty of Computer Science Technion – Israel Institute of Technology Haifa, Israel
ISSN 2367-4512 ISSN 2367-4520 (electronic) Lecture Notes on Data Engineering and Communications Technologies ISBN 978-981-16-2933-4 ISBN 978-981-16-2934-1 (eBook) https://doi.org/10.1007/978-981-16-2934-1 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Preface
These two volumes constitute the proceedings of the International Conference on Data Management, Analytics and Innovation (ICDMAI 2021) held from 15–17 January 2021 on a virtual platform due to pandemic. ICDMAI is a signature conference of Society for Data Science (S4DS) which is a not-for-profit professional association established to create a collaborative platform for bringing together technical experts across industry, academia, government laboratories and professional bodies to promote innovation around data science. ICDMAI is committed to create a forum which brings data science enthusiasts on the same page and envisions its role towards its enhancement through collaboration, innovative methodologies and connections throughout the globe. This year is special, as we have completed 5 years, and it gives us immense satisfaction to put on record that we could successfully create a strong data science ecosystem. In these 5 years, we could bring 50 doyens of data science as keynote speakers and another set of 50 technical experts contributed towards workshops and tutorials. Besides, we could engage around 200 experts as reviewers and session chairs. Till date, we have received around 2093 papers from 42 countries, out of which 361 papers have been presented and published, which is just (17%) of submitted paper. Now, coming to the specifics of this year, we witnessed participants from 13 countries, 15 industries, 121 international and Indian universities. Total 63 papers were selected after rigorous review process for oral presentation, and the Best Paper Awards were given for each track. We tried our best to bring a bouquet data science through various workshops, tutorials, keynote sessions, plenary talks, panel discussion and paper presentations by the experts at ICDMAI 2021. The chief guest of the conference was Prof. Ashutosh Sharma, Secretary DST, Govt. of India, and guest of honours were Prof. Anupam Basu, Director, NIT Durgapur, and Mr. Ravinder Pal Singh, CEO, MerkhadoRHA & GoKaddal, Honour. Keynote speakers were the top-level experts like Phillip G. Bradford, Director, Computer Science program, University of Connecticut, Stamford; Sushmita Mitra, IEEE Fellow and Professor, Machine Intelligence Unit, Indian Statistical Institute, Kolkata; Sandeep Shukla, IEEE Fellow and Professor, Department of CSE, Indian Institute of Technology, Kanpur, Uttar v
vi
Preface
Pradesh; Regiane Relva Romano, Special Adviser to the Ministry of Science, Technology and Innovation, Brazil; Yogesh Kulkarni, Principal Architect (CTO Office), Icertis-Pune; Dr. Aloknath De, Corporate Vice President of Samsung Electronics, S. Korea, and Chief Technology Officer of Samsung R&D Institute India, Bangalore; Sourabh Mukherjee, Vice President, Data and Artificial Intelligence Group, Accenture; Pallab Dasgupta, Professor, Department of Computer Science and Engineering, IIT Kharagpur; Alfred M. Bruckstein, Technion—Israel Institute of Technology, Faculty of Computer Science, Israel. Pre-conference was conducted by Dipanjan (DJ) Sarkar, Data Science Lead at Applied Materials; Usha Rengaraju, Polymath and India’s first women Kaggle Grandmaster; Avni Gupta, Senior Data Analyst— IoT, Netradyne; Kranti Athalye, Sr. Manager University Relations, IBM; Sonali Dey, Business Operations Manager, IBM; Amol Dhondse, Senior Technical Staff Member, IBM; Vandana Verma Sehgal, Security Solutions Architect, IBM. All the experts took the participants through various perspectives of data and analytics. The force behind organizing ICDMAI 2021 was of the General Chair Dr. P. K. Sinha, Vice Canceller and Director, IIIT New Raipur; Prof. Amol Goje, President-S4DS; Prof. Amlan Charabarti, Vice President-S4DS; Dr. Neha Sharma, Secretary-S4DS, Executive Body Members of S4DS—Dr. Inderjit Barara, Dr. Saptarsi Goswami, Mr. Atul Benegiri and all the super-active volunteers. There was a strong support from our Technical Partner—IBM; Knowledge Partner—Wizer; Academic Partners—IIT Guwahati and NIT Durgapur; and publication partner Springer. Through this conference, we could build the strong data science ecosystem. Our special thanks go to Janus Kacprzyk (Editor in Chief, Springer, Advances in Intelligent Systems and Computing Series) for the opportunity to organize this guest-edited volume. We are grateful to Springer, especially to Mr. Aninda Bose (Senior Publishing Editor, Springer India Pvt. Ltd) for the excellent collaboration, patience and help during the evolvement of this volume. We are confident that the volumes will provide state-of-the-art information to professors, researchers, practitioners and graduate students in the area of data management, analytics and innovation, and all will find this collection of papers inspiring, and useful. Pune, India Kolkota, India Arad, Romania Haifa, Israel
Neha Sharma Amlan Chakrabarti Valentina Emilia Balas Alfred M. Bruckstein
Contents
Track I Verification, Validation and Evaluation of Medicinal Prescription System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Krantee M. Jamdaade and Seema U. Purohit Creation of Knowledge Graph for Client Complaint Management System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shreya Shinde, Shubhangi Gaherwar, Avani Sathe, Malvika Menon, and Sheetal Barekar
3
31
Automation of Bid Proposal Preparation Through AI Smart Assistant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sanjeev Manchanda
45
A Review on Application of Machine Learning and Deep Learning Algorithms in Head and Neck Cancer Prediction and Prognosis . . . . . . . . Deepti and Susmita Ray
59
Multi-Index Validation Mechanisms for the Land Cover Classification of Multispectral Images: A Case Study of Kabini Reservoir . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Keerti Kulkarni and P. A. Vijaya Optimizing the Reliability of a Bank with Logistic Regression and Particle Swarm Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vadlamani Ravi and Vadlamani Madhav
75
91
Performance Evaluation of Classification Models for HIV/AIDS Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Daniel Mesafint Belete and Manjaiah D. Huchaiah Impact of Clustering Algorithms in Catastrophe Management: A Task-Technology Appropriate Perspective . . . . . . . . . . . . . . . . . . . . . . . . . 127 Harshita Jain, Nekkunj Pilani, and Ruchi Goel vii
viii
Contents
Track II Real-Time Soybean Crop Insect Classification Using Customized Deep Learning Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 Vivek Tiwari, Himanshu Patel, Ritvik Muttreja, Mayank Goyal, Muneendra Ojha, Shailendra Gupta, Ravi Saxena, and Swati Jain Writer-Independent Offline Signature Verification Using Deep Siamese Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Dipti Pawar and Vivek Mannige Automated Text and Tabular Data Extraction from Scanned Document Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 Pushkar Kurhekar, Shivani Nigam, and Shriram Pillai Detection of Moving Objects in a Metro Rail CCTV Video Using YOLO Object Detection Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 A. Abhinand, Jaison Mulerikkal, Anil Antony, P. A. Aparna, and Anu C. Jaison Literature Survey: Sign Language Recognition Using Gesture Recognition and Natural Language Processing . . . . . . . . . . . . . . . . . . . . . . . 197 Aditi Patil, Anagha Kulkarni, Harshada Yesane, Minal Sadani, and Prajakta Satav Application of Deep Learning Techniques on Sign Language Recognition—A Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 Pranjali Barve, Namita Mutha, Anagha Kulkarni, Yashshree Nigudkar, and Yael Robert Modelling of Surface Roughness Using ANN and Correlation with Acoustic Emission Signals in Turning of AISI 303 Steel . . . . . . . . . . . 229 Nikhil Khatekar, Raju Pawade, and Shivkumar Gaikwad Optimum Dataset Size for Ayurvedic Plant Leaf Recognition Using Convolution Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 K. V. N. Rajesh and D. Lalitha Bhaskari Track III OTA and IoT Influence the Room Occupancy of a Hotel . . . . . . . . . . . . . . . 265 H. M. Moyeenudin, G. Bindu, and R. Anandan Shortest Distance Lattice Cryptographic Algorithm for Data Points Using Quantum Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 K. Pradheep Kumar and K. Dhinakaran Exploring the Correlation Between Green Cover and Air Pollution by Data Democratization: A Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 Arpan Sil, Mishita Sharma, Monisha Jhamb, Aboli Marathe, and Neha Sharma
Contents
ix
Supply Chain Management During the Time of Pandemic . . . . . . . . . . . . . 313 Vaishnavi Nair, Shreyas Joshi, Manoj Patil, and Neha Sharma Augmented Reality Based Supply Chain Management System . . . . . . . . . 325 Chahat Bhatia Devanagari Handwritten Word Recognition for Extended Character Set Using Segmented Character Strokes . . . . . . . . . . . . . . . . . . . 337 Neelam Chandolikar, Vikas Nagare, Pushkar Joglekar, and Swati Shilaskar Digital Locker System for College or University Admissions Using Blockchain Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353 Pooja Vairagkar and Sayli Patil DietSN: A Body Sensor Network for Automatic Dietary Monitoring System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367 Samiul Mamud, Saubhik Bandyopadhyay, Punyasha Chatterjee, Suchandra Bhandari, and Nilanjan Chakraborty Track IV Analyzing the Supply of Healthcare Human Resource and Infrastructure of India to Handle COVID-19 Cases and Building a Prediction Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385 Atreyee Saha, Arjun Ghose, Aman Pande, Vineet Tambe, and Neha Sharma Cognitive Computing Strengthen the Healthcare Domain . . . . . . . . . . . . . 401 Kanak Saxena and Umesh Banodha An Alternative Approach to Propensity Score Matching Technique in Real-World Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415 Prithwis Kumar De and Tuhin Subhra Dey Correlation Between Air Quality Index and COVID-19 Recovery Period in India . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429 Kohinoor Chatterjee, Ishita Karna, and Vamsee Sonti An Analytical System: Data Modelling Practices for Handling an Epidemic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447 Yumnam Somananda Singh, Yumnam Kirani, and Yumnam Jayanta Singh Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
About the Editors
Neha Sharma is working with Tata Consultancy Services and is also a Founder Secretary of Society for Data Science. Prior to this she has worked as Director of premier Institute of Pune, that run post-graduation courses like MCA and MBA. She is an alumnus of a premier College of Engineering and Technology, Bhubaneshwar and completed her PhD from prestigious Indian Institute of Technology, Dhanbad. She is an ACM Distinguished Speaker, a Senior IEEE member and Secretary of IEEE Pune Section. She is the recipient of “Best PhD Thesis Award” and “Best Paper Presenter at International Conference Award” at National Level. Her area of interest includes Data Mining, Database Design, Analysis and Design, Artificial intelligence, Big data, Cloud Computing, Block Chain and Data Science. Prof. Amlan Chakrabarti is a Full Professor in the School of I.T. at the University of Calcutta. He was a Post-Doctoral fellow at the Princeton University, USA during 2011–2012. He has almost 20 years of experience in Engineering Education and Research. He is the recipient of prestigious DST BOYSCAST fellowship award in Engg. Science (2011), JSPS Invitation Research Award (2016), Erasmus Mundus Leaders Award from EU (2017), Hamied Visiting Professorship from University of Cambridge (2018). He is an Associate Ed. of Elsevier Journal of Computers and Electrical Engg. and Guest Ed. of Springer nature Journal in Applied Sciences. He is a Sr. Member of IEEE and ACM, IEEE Comp. Society Distinguished Visitor, Distinguished Speaker of ACM, Secretary of IEEE CEDA India Chapter and Vice President of Data Science Society. Prof. Valentina Emilia Balas is currently Full Professor in the Department of Automatics and Applied Software at the Faculty of Engineering, “Aurel Vlaicu” University of Arad, Romania. She is author of more than 300 research papers. Her research interests include intelligent systems, fuzzy control, soft computing, smart sensors, information fusion, modeling and simulation. She is the Editor-in Chief of the IJAIP and IJCSysE journals in Inderscience. She is the Director of the Department of International Relations and Head of Intelligent Systems Research Centre in Aurel Vlaicu University of Arad. xi
xii
About the Editors
Prof. Alfred M. Bruckstein, B.Sc., M.Sc. in EE from the Technion IIT, Haifa, Israel, and PhD in EE, from Stanford University, Stanford, California, USA, is a Technion Ollendorff Professor of Science, in the Computer Science Department there, and is a Visiting Professor at NTU, Singapore, in the SPMS. He has done research on Neural Coding Processes, and Stochastic Point Processes, Estimation Theory, and Scattering Theory, Signal and Image Processing Topics, Computer Vision and Graphics, and Robotics. Over the years he held visiting positions at Bell Laboratories, Murray Hill, NJ, USA, (1987–2001) and TsingHua University, Beijing, China, (2002–223), and made short time visits to many universities and research centers worldwide. At the Technion, he was the Dean of the Graduate School, and is currently the Head of the Technion Excellence Program.
Track I
Verification, Validation and Evaluation of Medicinal Prescription System Krantee M. Jamdaade and Seema U. Purohit
Abstract Medicinal Prescription System (MPS) uses the necessary framework provided by the Expert System including knowledge base, rules, working memory and inference engine efficiently for correct diagnosis and prescription. One of the important steps during the verification, validation, and evaluation of MPS, is Rule Base refinement. In this paper, authors have tried to formulate the Rule Base (RB) Refinement Scheme for their proposed Medicinal Prescription System (MPSAT) and demonstrated how it can be carried out effectively using ES building tools. Different validation approaches, case-by-case resolution strategies, priorities of rules conditions, reusability of rules, conflict resolution, and refinement at different levels are used innovatively to verify, validate, and evaluate the MPSAT. This RB refinement Scheme further recommends the system evaluation strategies along with the analysis of evaluation results. Keywords Expert system · Rule base refinement · Verification · Validation · Evaluation
1 Introduction Development of Medicinal Prescription System (MPS) continues until it acts like a perfect human practitioner and its performance is found satisfactory by the experts. Important characteristics of MPS include the data of patients and diseases, exhaustive list of symptoms, different test reports(clinical, pathological, sonological and radiological), correct diagnosis, prescribe medicines based on symptoms and test reports, their dosages and instructions for the patients to take the medicines, other dietary advices, etc.
K. M. Jamdaade (B) L.B.H.S.S. T’s I.C.A., Bandra East, Mumbai, India S. U. Purohit Department of Mathematics, Kirti College, Dadar West, Mumbai, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes on Data Engineering and Communications Technologies 70, https://doi.org/10.1007/978-981-16-2934-1_1
3
4
K. M. Jamdaade and S. U. Purohit
MPS being an Expert System uses knowledge base (KB), rules, working memory and inference engine to achieve the correct diagnosis and prescription. To improve performance of the MPS, it is required to have exhaustive knowledge base with well-formulated rules which use the working memory efficiently to give the accurate inferences. Incorrect inference due to faulty rules in MPS can cause wrong diagnosis and incorrect prescription, which stresses the need for thorough assessment and refinement of Rule Base (RB). The assessment and refinement of Rule Base (RB) comprises verification, validation as well as evaluation of the system [1–6]. The main objective of the paper is to provide guidelines to the developers about the methodologies to be adopted in earlier phases for RB validation, verification, evaluation and applying them on RB that is designed with the help of ES building tool. ES building tools provide rich environment including the milestones: word/spreadsheet formats, KB repositories, interface to write rules, rule engines, logic blocks and UX design. This paper focuses on different approaches of refinement to be carried out at every step of building the MPS vis a vis ES building tools. To have an encompassing view of all the tools researchers have classified 45 different ES building tools into six groups: Free, Commercial, Free for non-commercial use, Commercial—Open Source, Free—Open Source and Earlier existing tools but currently withdrawn [7]. Studying these tools in detail and analyzing them using the selection criteria: Knowledge Acquisition (KA), Knowledge Representation Scheme (KRS), Interface, Knowledge Base Repository (KBR), Rules Representation (RR), Rule Engine (RE), Rules Optimization Strategies (ROS), Support, Simplicity three ES building tools viz as: Exsys CORVID, Oracle Policy Modelling (OPM) and Open Rules Dialogs (ORD) are identified for designing the MPS [7–9]. It is found that the OPM meets the maximum criteria and the characteristics of good MPS as mentioned above. This paper concentrates on refinement of the important characteristics such as KB, working memory, rules and UX design. Hence an exhaustive literature review is carried out at these levels.
2 Literature Review Jones and Barrett., in their book on “Knowledge Engineering in Agriculture” have made an elaborate account of systematic development of ES including systems methodology and the model development steps. They have shown how this process helps to refine and expand the prototype; as well as to increase knowledge base in both depth and width [10]. As described by Waterman [11], the stages for developing ES are: • Identification stage is like the requirement analysis step in system development, where the problem, the objectives and resources need to be recognized.),
Verification, Validation, and Evaluation of Medicinal …
5
• Conceptualization stage is the initial stage of knowledge acquisition. Here knowledge engineer conducts overt observation to understand how experts make decision, what are the decision outcomes and which inputs are required to reach a decision. • Formalization stage includes formation of sequence of dialog once the attributes are decided. Based on this, order of the questions is determined by presenting the system to the domain expert for getting the feedback iteratively. • Implementation and Testing is the last stage where verification and validation are done to redesign the knowledge base representations and refine the system. On verification, actual working of the system is tested by domain experts in the field of medical prescription. Validation process of ES ensures the correctness, consistency, and completeness of the rules.
2.1 Verification and Validation Verification of MPS corresponds to determining whether the design and development system is appropriate and whether the domain knowledge is represented correctly or not [12]. Oleary [2] has discussed possibilities of analyzing MPS at large with respect to accuracy of the KB (examining those rules that have a consequence of death or fatal illness), Completeness of KB (discovering omitted rules is a contingency table), KB weight (weights can be examined for costly rules and highly certain rules), Inference engine (suggested three ways of validations such as Trace, Test data, Snapshot), Condition Decision Matches (used Turing test to generate a decision). Jafar and Bahill [3] have developed a general-purpose tool to verify and validate the expert system named it as “Validator”. It is a tool for verifying and validating personal computer-based Expert Systems. Validator has four major components: the syntactic error checker, the debugger, the rules and facts validation module and the chaining thread tracer. They have concluded that ideally it is difficult to guarantee that system will meet its software specifications. Through this paper O’keefe and O’leary [5] have provided an introductory tutorial for verification and validation. They have performed Turing test to compare the performance of systems but concluded that though there are validations and verification technology there can be unpredicted errors in an ES. For verification of ES, Sasi Kumar et al. [4] discussed the common RB verification problems. Those are (1) Redundant Rules: When 2 rules have the same conclusions, they are redundant rules. (2) Conflicting Rules: Two rules are conflicting if they succeed in the same situation but lead to contradictory conclusions. (3) Subsumed Rules: If the conditions in one rule are a subset of the conditions in the other and both the rules have the same conclusions. (4) Unnecessary Antecedent: Two rules have unnecessary antecedents if one rule has a condition and the other rule has the negation of the same condition and the two rules lead to the same conclusion. (5)
6
K. M. Jamdaade and S. U. Purohit
Circular Rules: A set of rules are circular if they are of the form such as: If A then B, If B then C, If C then A.
2.2 Rule-Base Refinement B. Bozorgtabar et al. proposed a system for the diagnosis of the deadliest type of skin cancer named as Malignant Melanoma. For this, they have given unsupervized multi-scale lesion segmentation method which improves the segmentation accuracy. To improve the segmentation accuracy and to refine the confidence map they have devised dynamic Rule-based refinement strategy [13]. Shen Yuong Wong has given a fast-fuzzy reasoning model called F-ELM. It is a modified version of ELM that includes three parameters: the standard deviation of the membership functions, rule-combination matrix and do not care matrix, which made F-ELM more sophisticated than ELM. They have used fuzzy rules to design the knowledgebase [14]. Highly logical fuzzy rules are used in designing F-ELM that made it efficient candidate for approximation, modelling and it can control the complex processes. Zhuge et al. developed a tool to refine a rule base that can determine and then eliminate or remove all the implications of redundant rules and generate a random one for test purposes. This tool is available at http://kg.ict.ac.cn. He has explained the rule-base refinement that has been used to enhance the efficiency of utilization of rule-base [15]. Fabien Cadoret et al. designed a pattern for the refinement of the rule-base. In this pattern, they have used superimposition technique. They have replaced superimposition modules by main module if rules are having same definition. They have also used decomposition of rules in a smaller rule to achieve specific goal [16].
2.3 Evaluation If we carefully investigate evaluation techniques that can be compared with techniques developed in software engineering, like object-oriented design, modularization and system specification which guide us to implement the aspects of reusability, verifiability, and reliability [17]. Peter D. Grogono et al. have found eleven different ES evaluation techniques: consideration of user needs(system verification by the user at its early development stage to find shortcomings), cooperation of human experts(continuous and active support of human expert to develop an efficient ES), analysis of knowledge base (testing both dynamically as well as statically in order to observe responses and examine its knowledge base), field tests(testing in development laboratory as well as in the area of application), use of previous experience or data (testing by
Verification, Validation, and Evaluation of Medicinal …
7
observing data records gathered previously with the help of domain experts), evaluation at all stage(regular system evaluation), independent experts (testing by the third-party experts to avoid biases), verification tools(to reveal the errors of the ES), modular design (for independent module evaluation to make the system economically viable), prototyping (to grow the system from initial to final stage), performance (to improve the performance of the ES, reliability, specificity and refractoriness must be checked [18]. They later developed COVER tool to check dead-end rules, unnecessary antecedent, and missing values. Also, they have discussed about some techniques for verification, validation and testing of ES [17]. Sharma and Conrath have reviewed some qualitative, quantitative and hybrid approaches. They mentioned that quantitative approaches can be used to measure reliability, availability and maintainability using ‘mean time between failures’ (MTBR) and ‘mean time to repair’ (MTTR) but quantitative approaches do not deliver an overall notion of quality [19]. In 2016, Munaiseche and Liando developed an ES for diagnosing skin disease in humans based on questionnaire methodology and tried to evaluate the usability of the system. They prepared a series of tasks to be given to users to perform formal usability testing and observe the system behaviour and implemented usability factors including efficiency, understandability, operability, attractiveness, error prevention, learnability, accuracy, and effectiveness for system evaluation [20].
3 Methods of Assessment of MPS 3.1 Description of Proposed MPS Researchers have worked extensively over a period of four years to design the MPS for HIV—AIDS patients using Ayurveda Therapy named as MPSAT as a part of funded research project, and not able to describe as per the research ethics in this paper. To improve the MPS as mentioned above after performing the literature review, researchers have developed steps for the refinement of MPS designed by them, done the extensive rule base refinement [15], prepared the test cases for testing the system and for the best utilization of rule base.
3.2 Validation of Proposed MPS Validation of MPS involves rigorous analysis as per the following Step 1.
Determine the accuracy of KB, by selecting the appropriate rules using any of the three approaches
8
K. M. Jamdaade and S. U. Purohit
a.
b.
c.
Examine those rules that have costly consequences. For example: in medical Knowledge Base, rules that have a consequence of death or fatal illness could be examined. Select those rules with high certainty as they maximize the combination of the probabilities that is select the rules that contain high probability. Examine those rules with low probability and eliminate them as they have very less chances to appear in a maximized solution.
To validate the proposed MPSAT researchers have used three different ES building tools, namely: Exsys CORVID, Open Rule Dialog and OPM as follows: and they may follow all three above-mentioned approaches but here, scenario related to one approach for each individual tool is presented.
3.3 Validation of Proposed MPS Using Tools 3.3.1
Tool 1 [Exsys CORVID]
While designing the MPS in Corvid, it uses tree logic diagram along with confidence factor to handle the uncertainty [10, 21]. To write complex rules, researcher prepared a table of tautology, so that no condition could be missed as well as to support appropriate decision making. For example—Let there are three different symptoms S1, S2 and S3. On their different combination, different dosages of medicine must be given, then in such situation Table 1 is considered. Based on Table 1 the complex rules are written in Logic Block, shown in Fig. 1, In this way, there will be less redundancy and more accuracy can be achieved. Now, if we consider the part of tree logic diagram shown in Fig. 2, is more illustrative, showing that if patient is suffering from symptom S2 and not from S1 and S3, then he/she has to take medicine, M7, M9 and M10. Table 1 Table for decision making
S2
S1
S3
T
T
T
T
T
F
T
F
T
T
F
F
F
T
T
F
T
F
F
F
T
F
F
F
Verification, Validation, and Evaluation of Medicinal …
Fig. 1 Tree structure for complex rules
Fig. 2 Tree structure for prescribing medicine against symptom S2
9
10
K. M. Jamdaade and S. U. Purohit
IF Patient is suffering from S2 AND not suffering from S1 AND not suffering from S3 THEN take medicine M7, M9 and M10.
Fig. 3 Graphical representation of tree structure for prescribing medicine against symptom S2
Along with this Corvid generated rules in RuleView, that is equivalent to the rule with graphical representation, as shown in Fig. 3 Validation approach: CORVID uses second approach defined by Daniel E. O’ Leary. According to that Exsys CORVID helps to choose one of the symptoms with confidence factor cf = 1. To implement the concept of confidence factor, Corvid enables the developers to define confidence variables that are only applied in the THEN part of the rule. A confidence value also known as probability or certainty score is assigned to the confidence variable and every confidence value shows the degree of certainty [10]. The value that is assigned to the confidence variable Indicates that how likely the action/item applies in a specific end user’s situation based on the answers they provide. Generally, there are multiple confidence variable that covers various possible actions, and the system will select the most likely [15, 21, 22]. If we consider the red marked part of Fig. 2, it has included the confidence factor as. [T RT _S2] = 10, which shows that if patient is suffering from symptom S2 and not from S1 and S3, then it is the 100% guarantee that system will display the prescription: “Take medicine, M7, M9 and M10” [10].
3.3.2
Tool 2 [OPM]
To handle the uncertainty while prescribing the medicine in OPM, researcher wrote some complex rules. As OPM used Patented Linear Inferencing Algorithm, according to this algorithm, rule-based system has rules and set of input facts. As soon as the new facts are inferred accordingly rules and facts are added in the rule base [23]. It represents the knowledge in the form of fact dependency tree that indicates which facts are used to produce other facts. In this dependency tree, nodes represent facts whereas arc is a one-way arrow that joins two facts [23]. In OPM rules are numbered as r1, r2…. and facts are labelled as f1, f2, etc. By considering one of the case—“For one symptom, there can be multiple medicines” for treatment for Cardiac Arrest.
Verification, Validation, and Evaluation of Medicinal …
11
The rule for diagnosis is shown in Fig. 4 that consists of eleven facts which helps the inference engine to diagnose as Cardiac Arrest. Dependency tree can be drawn between the facts of diagnosis rule and treatment rules, shown in Fig. 5 Based on this MPSAT fires treatment rule shown in Fig. 6 that consists of eleven facts which helps the inference engine to prescribe the treatment for diagnosed disease. Validation approach: OPM uses first approach defined by Daniel E. O’ Leary. According to that, it examines those rules that have costly consequences. To do this efficiently, OPM uses fact dependency tree.
Fig. 4 Cardiac arrest diagnosis rule
Fig. 5 Dependency tree
12
K. M. Jamdaade and S. U. Purohit
Fig. 6 Cardiac arrest treatment rule
3.3.3
Tool 3 [Open Rule Dialog]
Open Rule Dialog (ORD) is built on Business Rules Management System. It allows a layman to develop a web-based questionnaire that is also called as Dialog. There is no need of having knowledge of different web programming techniques, the little knowledge of Excel is enough to develop Dialogs [7]. These Dialogs can be developed by using layouts of pages, sections, questions in a very simplified way in Excel tables. Developer followed the steps given below, while designing questionnaire with OpenRules Dialog: Step1: Question Sheet This sheet is prepared for accepting the values from the user shown in Fig. 7. OpenRules support following question types: TextBox, Password, TextArea, ComboBox, CheckButton, RadioButton, ActionButton and ActionHyperlink etc.
Fig. 7 Question sheet
Verification, Validation, and Evaluation of Medicinal …
13
Fig. 8 Section sheet
Step2: Section Sheet Section sheet holds the questions in proper order. We can divide a page in maximum up to five sections shown in Fig. 8. Step3: Page Sheet Page sheet holds sections in an order in which we place sections in it. Figure 9 shows the Page ID “BasicInfo” is having two sections “PatientDetails” and “HIVtestInfo”. Step4: Update Rules Sheet This sheet allows the developer to define special conditions, according to the changes made by user in the content of the page. For example, the update rule sheet helps the developers to define conditions to hide or show page or section or question.
Fig. 9 Page sheet
14
K. M. Jamdaade and S. U. Purohit
Fig. 10 Navigation rules sheet
Step5: Navigation Rules Sheet It helps to navigate between pages. For example, in Fig. 10, in WhoStaging page, if the PatientGender is Male then control should go to the page “Curr Symptom” rather than “Female Symptom”. Validation approach: Open Rule Dialog uses third approach defined by Daniel E. O’ Leary. According to that, it is examining those rules with low probability because they have very less chances to appear in a maximized solution and thus not used in any solutions. To apply this, ORD uses “Update Rules Sheet” and “Navigation Rules Sheet”. This study is more focused on ES building tool OPM to build Medicinal Prescription System using Ayurveda Therapy (MPSAT) that is prescribing Ayurvedic Medicine to HIV/AIDS patients. To validate, verify and evaluate the following tasks are performed. In this study, three types of validations: System Validation, Field Validation and Rule-base Validation are done.
3.4 System Validation Knowledge Engineer (KE) has applied all the applicable business validations to avoid irrelevant branches for determining goal. In this system, KE has used specific set of questionaries’ to avoid unnecessary branches for goal determination which can be depicted by example of asking question like if the patients are suffering from HIV or not. In case the patient is found as HIV negative then system will stop determination without asking any further questions.
Verification, Validation, and Evaluation of Medicinal …
3.4.1
15
Field Validation
Mainly field validation helps a tester for validating fields that present in any application. Which is implemented in MPSAT in such a way that when the field expects the value as alphabets then user cannot enter numeric values for that field, in case user tries to enter numeric value then field validation will be firing an error message, which will be displayed for correction to restrict user to navigate to the next screen.
3.4.2
Rule-Base Validation
Basically, there are two types of validations: rule-base functional and structural validation [17]. To apply such validations on MPSAT, periodic feedback from medical practitioners is obtained after system development to ensure whether symptoms are placed properly as well as whether system is diagnosing disease and accordingly showing treatment correctly.
3.5 Verification of MPSAT The term Verification of an ES means determining whether the system is developed correctly, that is whether the system is representing the domain knowledge correctly [2]. The ES built using OPM is verified according to some guidelines given about the verification of Expert system, which helped a lot to verify as well as refine the rules of KB of existing system. While refining RB, KE should consider Specificity (more specific rules are customized to specific problems), Recency (rule which uses more recent data), and Refractoriness (when rules form cycle) of KB. When antecedents are same, but rules are concluding differently, such rules are called conflicting rules. To handle such situation, researcher took following precautionCase I: Earlier, Pathological Parameters for diagnosing diarrhea and dysentery in excretory system were same, shown in Table 2. Table 2 Common pathological parameters for diagnosing diarrhea and dysentery S. No
Diagnosis
Pathological parameters
Sonological parameters
Radiological parameters
1
Diarrhoea
Haemoglobin↓ RBC↓. WBC↑
–
–
2
Dysentery
Haemoglobin↓ RBC↓. WBC↑
–
–
16
K. M. Jamdaade and S. U. Purohit
Table 3 Modified pathological parameters for diagnosing diarrhoea and dysentery Diagnosis
Pathological parameters
Sonological parameters
Radiological parameters
Diarrhoea
Haemoglobin↓ RBC↓. WBC↑ Stool Test: (Radio) 1. Mucus 2 Vegetable fibres
–
–
Dysentery
Haemoglobin↓ RBC↓. WBC↑ Electrolytes imbalance (Boolean)
–
–
Table 4 Common pathological parameters for diagnosing dementia and numbness
Diagnosis
Pathological parameters
Sonological parameters
Radiological parameters
Dementia
RBC↓. WBC↑ – ESR↑
–
Numbness
RBC↓. WBC↑ _ ESR↑
–
So, when patient enter the values of haemoglobin, RBC and WBC then MPSAT would show diarrhoea and dysentery together. So, to resolve the conflict researcher explained this situation to expert and jotted down the suggestions as in Table 3. Now, antecedents are not same. Thus, conflict is resolved. Case II: Earlier, Pathological Parameters for diagnosing dementia and numbness in nervous system were same, shown in Table 4. So, when patient will provide the values for RBC, WBC and ESR then system will show dementia and numbness together. So, to resolve this conflict, researcher revisited the expert and gathered following details, based on that the rules were refined, and problem is solved (Table 5). Table 5 Modified pathological parameters for diagnosing dementia and numbness
Diagnosis
Pathological parameters
Sonological parameters
Radiological parameters
Dementia
RBC↓. WBC↑ – ESR↑
Brain MRI: 1. Cerebral ischaemia 2. Cerebral hypoxia 3. Cerebral atrophy
Numbness
RBC↓. WBC↑ – ESR↑
–
Verification, Validation, and Evaluation of Medicinal …
17
After developing an ES, we run the cases through the system and check how expected output can be inferred correctly with optimum conditions and we try to find out whether the system can be optimized further. Generally, refinement of knowledge is known as fine tuning of an ES. This could be done by simply adding or deleting a condition to a rule based on priorities that may lead to direct determination of goal [2].
3.5.1
Priorities of Rule Conditions
To identify patient is suffering from HIV/AIDS, we collect and add condition of specific parameters that are related to HIV/AIDS which leads to direct determination depending on priority of the condition that will diagnose disease with less no of rule execution.
3.5.2
Reusability of Rules
If any condition is getting used in multiple rules, then we create separate conclusion against that condition and append this conclusion within multiple rules that will be advantageous when KE have to update or change any sub-conditions for this conclusion (Fig. 11). To compare RBC, WBC or ESR value KE have defined separate conclusion which will be used for goal determination which will make the system more robust and maintainable at time of future change.
3.5.3
Conflict Resolution
When a rule is selected to fire from the set of conflicting rules is called conflict resolution strategy. The behaviour of the system is dependent on the strategy used for conflict resolution. Commercial systems use some cultured conflict resolution strategies like specificity, recency, and refractories. MPSAT, uses a combination of all three of the strategies mentioned above in different ways.
Fig. 11 Reusability of rule
18
K. M. Jamdaade and S. U. Purohit
Table 6 Conflict resolution
a.
Specificity: It means that more specific rules are customized to specific problems and thus preference is given to them. The easiest way to proceed in problemspecific cases is to simply add extra conditions to the rules to avoid the conflicts. In MPSAT the rules are conflicting, if they succeed in the same situation but they are leading to contradictory conclusions. Problem specific conflict resolution is done as per the following conflict resolution Table 6.
As can be seen in Table 6 expected result was “Cholelithiasis” but along with this MPSAT was showing “Hyper acidity” which is the conflict, so, to resolve this as per the discussion with domain expert (Physician) since “Belching” cannot be the symptom for diagnosing “Cholelithiasis”, the antecedent “the digestive system symptom is not belching” is added to consequent “the digestive system diagnosis is cholelithiasis”. b.
Recency: The rule which uses more recent data is likely to be more relevant than one which uses older data. Here, each element/fact of working memory has a time tag that reflects the chronological order in which fact was added. After comparing time tag of facts most recent /higher time tag fact will be chosen that match first condition of the rules.
At First Visit: Let it be assumed by way of example that there is a rule R1 for diagnosis. Diagnosis rule:
Treatment rule:
R1:
R2:
Follow-up 1: We will consider here two cases that, either symptoms of patient will aggravate or subside.
Verification, Validation, and Evaluation of Medicinal …
19
Case I: Let us consider that throbbing pain is aggravated Diagnosis rule:
Treatment rule:
R3:
R4:
Case II: Let us consider that vomiting symptom subside Diagnosis rule:
Treatment rule:
R5:
R6:
Consider the following facts in the working memory. No
Facts
Time tag
1
CHELELITHIASIS (throbbing pain, nausea, vomiting)
1
2
CHELELITHIASIS (ArogyaVardhini, Kutaki, SharPunkha….)
2
3
CHELELITHIASIS (throbbing pain is aggravated, nausea, vomiting)
3
4
CHELELITHIASIS (throbbing pain, nausea, vomiting subsides)
4
Since, after comparing time tag of facts most recent/higher time tag fact will be chosen that match first condition of the rules. So, based on occurrence of the fact either 3 and 4, the rule R4 and R6 will be fired. c.
Refractories: It ensures that same rule will not be executed again. In another words, it prevents the system to be trapped into loop and also improve the system by avoiding unnecessary matching. So, KE removed Alzheimer’s from Dementia and HIV/AIDS from Numbness, Insomnia and Excessive sweating.
3.5.4
Design Level Refinement
Under this, entity-level refinement and user interface refinement are considered.
Entity Level Refinement Entities are preferred over creation of new attribute whenever the object remains same, but values are different. It saves unnecessary attribute creation which make the whole rule base bulky and difficult to maintain (Fig. 12).
20
K. M. Jamdaade and S. U. Purohit
Screenshot of Data Tab of Phase II system
Screenshot of Data Tab of Phase III system Fig. 12 Block diagram of phase wise change in Data Tab
Project tab of Phase II system show 876 attributes, 163 goals, 146 intermediates, 567 inputs displayed on 37 screens. These screens are arranged in 9 stages with 0 entities and 0 relations (Fig. 13). Project tab of Phase III system show 1580 attributes, 156 goals, 332 intermediates, 1092 inputs displayed on 43 screens. These screens are arranged in 3 stages with 29 entities and 28 relations.
Verification, Validation, and Evaluation of Medicinal …
21
Screenshot of Project Tab of Phase II system
Screenshot of Project Tab of Phase III system Fig. 13 Block diagram of phase wise change in Project Tab
User Interface Refinement By using best practices of User Experience designing we have designed ES which is more user intuitive and interactive considering in mind user’s age and his/her awareness about ES by supporting data collection with precise supporting description and images.
22
K. M. Jamdaade and S. U. Purohit
Master Screen for Vikriti
Skeletal System Screen Fig. 14 Block diagram for lesser number of clicks
Less Number of Clicks System users are happier if they have to spend less time in entering details which can be achieved by hiding irrelevant data collection fields based on cases that will lead to lesser number of details collection to infer goal shown in Fig. 14. The selection and compilation of relevant information from anatomical systems will resolve the problem of considering each and every anatomical system, separately.
Verification, Validation, and Evaluation of Medicinal …
23
Proper Placement of Screens Patient should not be confused and clueless while using ES then it is particularly important to collect inputs from them in a flow by following structural stability of screen placements and hiding irrelevant screens. It is common at system level like a template. We have tried this approach by designing screen flows in such a manner that will collect symptomatic symptoms screen followed by symptomatic diagnosis screens which include inferred treatment followed by report parameter collection screen for confirming the diagnosis in the subsequent screen for every system.
3.6 Evaluation of MPSAT Evaluation of ES is the last stage of expert system development life cycle [24]. Here, researcher prepares a series of tasks and gives it to users to perform formal usability testing and observe the system behaviour. For evaluation process. Shafinah et al. has identified eight usability factors: efficiency, understandability, operability, attractiveness, error prevention, learnability, accuracy, and effectiveness [20, 25, 26]. MPSAT is evaluated by taking the feedback from the stakeholders: 30 MD doctors, 30 MD students and 5 technical experts. There were 10 questions asked to 30 MD doctors to check the validity and verify whether the MPSAT is working correctly, that demonstrate the implementation of Ayurveda terminologies related to diagnosis, treatment, and advice. The questionnaires were designed using Likert scale values strongly agree to strongly disagree.
3.6.1
Practicing and Experienced Physician
The questionnaires to evaluate the system behaviour is shown in Table 7. Analysis of feedback obtained based on Table 7 is shown in Fig. 15.
3.6.2
Medical Students
Tasks have been prepared by researcher and given to 30 MD students. These respondents were chosen from K.G. Mittal Hospital Charni Road (W) and YMT College of Ayurveda, Kharghar. Basically, these ten tasks are prepared to verify, validate, and evaluate system behaviour, broadly in three categories: knowledge design, decision making and knowledge representation in the form of report generation. Set of tasks prepared by researcher are mentioned in Table 8. According to this table, evaluation of Knowledge Design of MPSAT is represented by task number 1, 2, 3, 4; Decision Making is represented by task number 5 and 6; Knowledge Representation is represented by task number 7, 8, 9, 10, [27].
24
K. M. Jamdaade and S. U. Purohit
Table 7 Feedback from 30 MD Doctors Questions
SD (%) D (%) N (%) A (%) SA (%)
Easy to use and good comfort level for non—IT persons
0
0
3
50
47
Good resemblance to human expert advice
0
6
7
67
20
Graphical user interface and look & feel is appropriate and as per satisfaction
0
0
0
67
33
Medical prescription and reports are as per requirement
0
7
17
63
13
Use of English phrases and sentences is standard and 0 good for establishing dialog with the patient/medical practitioner
0
0
63
37
The medical terminologies and symptoms used for system designing is in proximity of Samhitas’ guidelines
3
3
13
44
37
The system developed is adequately exhaustive and in depth
0
10
10
50
30
The navigation time between the screens is satisfactory
0
0
0
50
50
The system can act as a support system for medical practitioners
0
0
0
57
43
The generation of reports is satisfactory
0
0
17
63
20
Note SD strongly disagree, D disagree, N neutral, A agree SA strongly agree
Fig. 15 Feedback from 30 MD doctors
Verification, Validation, and Evaluation of Medicinal …
25
Table 8 Tasks given to 30 MD students S. No Category
Question
1
Task 1: Enter patient details, check whether HIV test done or not
Knowledge design
2
Task 2: Respond to questions asked for Prakriti Nishchitikaran
3
Task 4: Respond to questions asked for diagnosis/treatment
4
Task 8: Respond to questions asked for diagnosis based on report
5
Decision making
6 7
Task3: Check whether system is showing Prakriti properly Task 5: Check whether system is diagnosing properly
Knowledge representation
Task 6: Check whether the treatment prescribed is proper
8
Task 7: Check whether system is showing proper advice
9
Task 9: Check whether the treatment based on report is proper
10
Task l0: Check whether the system is showing proper advice
All these ten tasks are mentioned in the following task Table 9 given to thirty MD students. Analysis of feedback obtained based on above table is shown in Fig. 16. Recommendations given by stakeholders are mentioned in Table 10. This table motivates the researcher to refine MPSAT to align with an Ayurveda physician’s way of diagnosis. According to this table, 24% physician recommended that MPSAT should accept the details of “Naadi, Koshta, Agni” and then prescribe medicine.
3.6.3
Technical Experts
The questionnaires given to technical experts to evaluate the system behaviour based on Usability, Functionality and Rule Formation is shown in Table 11. From Table 11, it is depicted that question number 1, 2, 3, 4 cover the usability aspect; question number 5, 6, 7, 8, 9, 10 cover the functionality of MPSAT; and question number 11, 12, 13, 14 cover the aspect of rule formation [25, 27]. Feedback received from technical experts is shown in Fig. 17. Figure 17, shows the summarization of feedback collected from technical experts, reveals that 60% of respondents agreed with usability of MPSAT, 66.66% of respondents agreed with proper functionality of MPSAT and 65% of respondents agreed that rule formation is as per the requirement.
26
K. M. Jamdaade and S. U. Purohit
Table 9 Task given for evaluation Task given for evaluation
SD (%)
D (%)
N (%)
A (%)
SA (%)
Task 1: Enter patient details, check whether HIV test done or not
0
0
17
50
33
Task2: Respond to questions asked for Prakriti Nishchitikaran
0
0
33
47
20
Task 4: Respond to questions asked for diagnosis/treatment
0
0
3
67
30
Task 8: Respond to questions asked for diagnosis based on report
0
3
20
50
27
Task3: Check whether system is showing Prakriti 0 properly
0
13
64
23
Task 5: Check whether system is diagnosing properly
3
3
23
40
31
Task 6: Check whether the treatment prescribed is 0 proper
0
3
60
37
Task 7: Check whether system is showing proper advice
0
3
16
54
27
Task 9: Check whether the treatment based on report is proper
0
0
16
61
23
Task 10: Check whether the system is showing proper advice
0
0
17
40
43
Note SD strongly disagree, D disagree, N neutral, A agree SA strongly agree
4 Findings By performing verification, validation, and evaluation to RB at the KB building phase makes MPS more sophisticated. The important research findings are 1. 2. 3.
4.
Majority (87%) of physician says MPSAT provides the necessary insight and guidance to upcoming doctors. With periodical reviews of the system will make the MPSAT more familiar to the medical practitioners. Proposed MPS can be improved further with some modifications such as use of simple words for patients or nonmedical people and making prescription of medicine more precis, include more symptoms to make system full proof. BMI and age should be considered while prescribing medicine, medical terminologies and more symptoms should be added in depth.
5 Conclusion The RB refinement scheme of Medicinal Prescription System(MPS) exhibited in Sect. 3, has achieved the objective of providing guidelines to the developers about
Verification, Validation, and Evaluation of Medicinal …
27
Fig. 16 Analysis of feedback for evaluation
Table 10 Recommendations by stakeholders Stakeholders
Recommendations
Percentages
MD students
More symptoms needed to be added. Terminology should be more patient-oriented
33
BMI and age should be considered while prescribing medicine
33
Treatment part can be improved according to Prakriti
36
Ayurvedic medicine must be prescribed after thorough examinations like Naadi, Koshta, Agni, etc.
24
Kostha-Agni-Prakriti should be considered to decide anupan and dosages
19
As per Ayurveda consider all eleven systems
20
Prakriti Parikshan needs the family history
13
MD doctors
the methodologies to be adopted in earlier phases for RB validation, verification, evaluation. Demonstration of RB refinement scheme using ES building tool has provided the direction to researchers to refine the MPS effectively. One can cross verify and validate their RB by considering different test cases to validate the RB
28
K. M. Jamdaade and S. U. Purohit
Table 11 Questionnaires given to technical experts S. No.
Category
Question
SD (%) D (%) N (%) A (%) SA (%)
1
Usability
Welcome screen design is appropriate
0
20
0
60
20
2
System evaluates Prakriti of the patient correctly
0
0
20
60
20
3
Relevant symptoms 0 are displayed on Vikriti screen for each module
0
0
60
40
System behaviour is according to user’s satisfaction
0
0
20
60
20
5
All validations are working properly
0
20
0
80
0
6
Screen navigation time 0 is adequate
40
0
60
0
7
Expected diseases are shown properly
0
0
0
60
40
S
System shows relevant 0 figure for disease
0
20
60
40
9
Treatment is shown in correct way
0
20
0
80
0
10
Report generation is in 0 legal format
40
0
60
0
4
Functionality
11
Rule formation Rules handle mandatory and associative symptoms
0
20
0
60
20
12
Rules written for handling visibility are proper
0
20
0
80
0
13
Rules are written by considering entity-relationship aspects
0
40
0
60
0
14
Conflict resolution done correctly
0
20
0
60
20
Note SD strongly disagree, D disagree, N neutral, A agree SA strongly agree
system. Discussion of different validation approaches, case by case conflict resolution strategies, refinement at different levels has provided necessary foundation for successful refinement of any MPS. System evaluation strategies along with the analysis of evaluation results have indicated that MPS built in such a manner will go nearer to the human medical practitioner.
Verification, Validation, and Evaluation of Medicinal …
29
Fig. 17 Feedback from technical expert
References 1. Shortliffc E, Pagan LM (1982) Expert systems research: modeling the medical decision-making process. Presented at the workshop on integrated approached to patient monitoring University of Florida at Gainesville, 5–6 Mar 1982 2. Oleary DE (1988) Methods of validating expert systems. Interfaces 18(6):72–79 3. Jafar M, Bahill AT (1990) Validator, a tool for verifying and validating personal computer based expert systems. In: Operations research and artificial intelligence: the integration of problem-solving strategies, pp 373–385 4. Sasi Kumar M, Ramani S, Muthu Raman S, Anjaneyulu KSR, Chandrasekar R (1990) A practical introduction to rule based expert systems 5. O’Keefe RM (1993) Expert system verification and validation: a survey and tutorial. Artif Intell Rev 7 6. Darlington KW (2011) Designing for explanation in health care applications of expert systems. SAGE Open 1(1):215824401140861. https://doi.org/10.1177/2158244011408618 7. Jamdaade KM, Purohit SU (2017) Comparison of expert system building tools: a case study of OPM and OpenRules Dialog. Int J Future Revolution in Comput Sci Commun Eng 3(11):241– 247. ISSN: 2454–4248 8. Jamdaade KM, Purohit SU (2017) Intelligent medicinal prescription system for HIV/AIDS patients using Ayurveda therapy: working of inference engine of oracle policy modeling and CORVID. Int J Appl Innov Eng Manage (IJAIEM) 6(11). ISSN 2319–4847 9. Kulikowski CA, Weiss SM (2019) Representation of expert knowledge for consultation: the CASNET and EXPERT projects. Artif Intell Med. https://doi.org/10.4324/97804290520712,pp.21-55 10. Jones DD, Barrett JR (1989) Knowledge engineering in agriculture. ASAE 11. Waterman DA (1986) A guide to expert systems. Addison-Wesley Publishing Co., Inc, Reading, MA 12. Preece AD et al (1996) Validating dynamic properties of rule-based systems. Int J Hum Comput Stud 44(2). 145–169. https://doi.org/10.1006/ijhc.1996.0008 13. Bozorgtabar B et al (2016) Sparse coding based skin lesion segmentation using dynamic rulebased refinement. Mach Learn Med Imag Lect Notes Comput Sci, 254–261. https://doi.org/10. 1007/978-3-319-47157-0_31
30
K. M. Jamdaade and S. U. Purohit
14. Wong SY et al (2015) On equivalence of FIS and ELM for interpretable rule-based knowledge representation. IEEE Trans Neural Netw Learn Syst 26(7):1417–1430. https://doi.org/10.1109/ tnnls.2014.2341655 15. Zhuge H et al (2003) Theory and algorithm for rule base refinement. Dev Appl Artif Intell Lect Notes Comput Sci, 187–196. https://doi.org/10.1007/3-540-45034-3_19 16. Cadoret F et al (2012) Design patterns for rule-based refinement of safety critical embedded systems models. In: 2012 IEEE 17th international conference on engineering of complex computer systems. https://doi.org/10.1109/iceccs20050.2012.6299202 17. Grogono P et al (1993) A review of expert systems evaluation techniques. AAAI technical report WS-93–05, pp 113–118 18. Grogono P et al (1992) A survey of evaluation techniques used for expert systems in telecommunications. Expert Syst Appl 5(3–4):395–401. https://doi.org/10.1016/0957-4174(92)900 23-l 19. Sharma RS, Conrath DW (1993) Evaluating expert systems: a review of applicable approaches. Artif Intell Rev 7(2):77–91. https://doi.org/10.1007/bf00849078 20. Munaiseche CPC, Liando OES (2016) Evaluation of expert system application based on usability aspects. IOP Conf Ser Mater Sci Eng 128:012001. https://doi.org/10.1088/1757-899x/ 128/1/012001 21. Es-Saheb M, Al-Harkan I (2014) An expert system for powder selection using EXSYSCORVID. Res J Appl Sci Eng Technol 7(10):1961–1977 22. ExsysCorvidAdvancedTutorial.pdf 23. Zusammenfassung (2002) Forward-chaining inferencing. https://www.google.ch/patents/US2 0050240546 24. Furmankiewicz M et al (2015) Evaluation of the expert system as a stage of the life cycle model ESDLC on the example of WIKex. Comput Sci Math Modell 2(2), 23–32. https://doi.org/10. 5604/15084183.1197448 25. Moore Jd, Mittal VO (1996) Dynamically generated follow-up questions. Computer 29(7):75– 86. https://doi.org/10.1109/2.511971 26. Shafinah K et al (2010) System evaluation for a decision support system. Inf Technol J 9(5):889– 898. https://doi.org/10.3923/itj.2010.889.898 27. Ahmed A, Aduragba T, Ajani AA, Jimada-Ojuolape B, Ahmed MO (2017) Expert system in rural medical care” published in Int J Eng Sci Res Technol Value: 3.00, in Sept 2017, ISSN: 2277–9655
Creation of Knowledge Graph for Client Complaint Management System Shreya Shinde, Shubhangi Gaherwar, Avani Sathe, Malvika Menon, and Sheetal Barekar
Abstract With the advancement in technology in recent years, IT administration and its infrastructure work on vast amount of data that needs to be stored and managed properly, for instance, handling of microservices which contains numerous systems. Client Complaint Management System is one such application system in every company. The IT Customer Support Department handles the complaints/queries of the customers where managing existing databases for support team is difficult due to its limitations to manage and maintain relationship of increasingly complex data. Knowledge Graph is a graph Database which stores and manages complex data by maintaining nodes and edges which define relationship between the nodes. This paper proposes an idea to create Knowledge graph for handling client complaint management system’s data with intention to solve the customer queries quickly and efficiently by using Naïve Bayesian Classifier and Artificial Neural Networks for Text Classification, Natural Language Processing techniques and tools for generation of Knowledge graph. Using Knowledge Graph in this System for storing data will handle complex data relationships, reduce cost, ease the access for IT Administrators and staff for cross-referencing, increasing the speed and efficiency of the Data Management techniques as well. Keywords Artificial neural networks · Data management · Knowledge graph · Naïve Bayesian classifier · Natural language processing · Text classification
1 Introduction With the recent advancements in the IT sector, automation has become a necessity. As a result, the database systems have to be compatible to handle and analyze large amounts of data. The client complaint management system is one such system that S. Shinde (B) · S. Gaherwar · A. Sathe · M. Menon · S. Barekar Computer Department, CCOEW, Pune, India S. Barekar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes on Data Engineering and Communications Technologies 70, https://doi.org/10.1007/978-981-16-2934-1_2
31
32
S. Shinde et al.
needs to manage and help in effective analysis of the issues and complaints faced by various customers of particular products. The process of answering to client complaints in an IT industry is a very tedious job requiring long manhours, resources, money and efforts. Automation of this process will help in providing results much faster than the traditional methods. In order to automate this system for effective analysis of the complaints and to provide necessary help to the customers, large and robust database needs to be used. Knowledge graph can be used to aid this as it is capable of handling complex semantic data such as the issues faced by the clients. Traditionally, clients log on to the company portal and explain their difficulties by typing out their problems on the portal. An employee from the industry then reviews this complaint manually and suggests solutions to or answers the client’s problem as shown in Fig. 1. Many a times the client’s problem is not immediately solved, as the time required searching relevance of it in the database and establishing relation manually is difficult if the data is not related properly. Knowledge retrieved from the database is not possible quickly and contains human errors too. An existing system of smart complaint management [1] can be used when the complaints need to be segregated and assigned to the respective departments to handle them. However, the issue needs to be analyzed by the concerned person in the department manually. Another proposed system [2] uses the argumentative nature of the complaints that are in form of dialogue to classify the complaint scenarios. This paper proposes creation of a data structure in form of a knowledge graph that will help in the direct resolution of the issues and assignment of the technical help needed for the problem automatically without human interventions (Fig. 2). The knowledge encoded in form of statements needs to be refined [3] so that the resultant data structure will be precise and accurate for the client complaint management systems. The paper uses Natural language Processing techniques and
Fig. 1 Traditional approach of handling databases manually
Fig. 2 Proposed approach of creating knowledge graph for managing database
Creation of Knowledge Graph for Client Complaint …
33
Machine Learning algorithms to train the data and to create a weighted knowledge graph containing all the necessary information which can be used further. The resultant knowledge graph can be used in various industries to automate the process of handling issues faced by the customers.
2 Design and Process Flow This section focuses on the overall architecture and process flow of the proposed knowledge graph creation approach for solving complex problems like client complaint management system. A.
Design
The design is shown in Fig. 3 consists of three parts, namely: Input module, Processing module, and output module. 1.
Input Module:
This module was comprised of data collection, data cleaning and data pre-processing. The pre-processed data was then given to processing module for further process. 2.
Processing Module:
This module was comprised of training and testing phase. We used text classifier. The text classifier is comprised of 2 layers of neuron and one hidden layer as shown in Fig. 4.
Fig. 3 Architecture of the proposed knowledge graph creation approach
34
S. Shinde et al.
Fig. 4 Multilayer artificial neural network
The algorithm used for text classification [4, 5] was Multinomial naïve Bayes. The output produced by this algorithm [6] was score and not the probability. Along with this algorithm we also used [7] NLTK (natural language toolkit) for (Fig. 5). 1. 2.
Tokenization Stemming
Tokenization was used for breaking up client queries into words. We have used [7] NLP techniques for feature extraction in order to get feature vector for corpus creation. The corpus was then fed to our algorithm. NumPy was used for matrix multiplication to be faster. Then sigmoid function was used to normalize values and its derivative Fig. 5 Sigmoid function
Creation of Knowledge Graph for Client Complaint …
35
was used to measure the error rate. The bag-of-words was used for transforming an input query into an array of 0’s and 1’s. Mathematical Model: The predefined classes were represented a c. Since position of the word was immaterial, bag of words representation was used. When we considered each word wi and j as the frequency count of occurrence of the word for utterance in the bag of words, we could write the equation of the model using naïve Bayes algorithm as P wi c j = count wi , c j / w ∈ V count w, c j Here, V was vector that belongs to the bag of words. The system used sigmoid function to calculate the relative weights. Let the vector that represents the parameter be . Let the weights are wt and the input be denoted by b. So, = [wt, b] = [wt, b] The vector with small changes after every iteration would give a new value new . new = + α · Using the Taylor series for large values, g( + ) ≈ g() + α ∗ m ∇ g() where the value of could be m ∇ g() < 0 The loss function for the sigmoid considered was g(wt, b) = 1/2 ∗ ( f (x) − y)2 So, ∇wt =
2 i=1
( f ( xi ) − yi ) ∗ f ( xi ) ∗ (1 − f (xi )) ∗ xi
36
S. Shinde et al.
Fig. 6 Process flow
∇b =
2
( f ( xi ) − yi ) ∗ f ( xi ) ∗ (1 − f (xi )) ∗ xi
i=1
To find the final relevance value, mean function was used. 3.
Output Module:
In this module, the final knowledge graph was created and visualized using graph database. B.
Process Flow
As shown in Fig. 6 Collected data was cleaned and pre-processed in order to get it into desired format and was fed to training module. Once the training was complete, the testing phase came into picture. In testing phase, testing data was classified and further weight calculation (of skills- certificate edge) was done. Finally, Knowledge was created and visualized using graph database.
3 Implementation Details/Methodology Python is used as programming language throughout the project implementation. Libraries used are NumPy, pandas, rake for handling large amount of multidimensional data, analyze and manipulate the data and perform statistics over the data. The implementation phase includes data collection, data preparation, and processing, data filtration, training, testing, visualization to finally produce Knowledge Graph.
Creation of Knowledge Graph for Client Complaint …
37
Fig. 7 Queries collected
A.
Data Collection
In data collection phase, relevant information from various websites relevant to the Client Complaint system was collected. Data was collected manually and from certain official Company’s Customer Support websites. Data collected, comprised of: 1.
2.
3.
B.
The Customer queries and complaints referred the problems, errors, and operating issues of hardware and software of devices. Data was collected randomly in huge categories and were from variety of domains. Figure 7 shows a few of the queries collected. Technical Staff capable of resolving issues/queries of the customer was generated manually with respective skills. Skills here refer to the domain certificates which define his/her specialization in resolving the queries. Figure 8 shows some of the data generated of the people ID referring to the skill certificate they hold. Personal Information of the Technical Staff was also generated manually. Personal Information is stored for the authentication and authorization purposes which defines a person at personal level. It mainly comprises Age, Address, Phone number, etc. Figure 9 shows some the personal data generated manually. Data Preparation and Processing Data preparation is a phase where data collected is made relevant for your application system according to the requirement of the pipeline design. In this phase, data is made suitable so as to provide input to the machine learning model to be used.
Fig. 8 Data of the technical staff and their qualification certificate
38
S. Shinde et al.
Fig. 9 Personal information of staff
Data collected in previous phase was processed by generating tags which categorizes the queries and also define the characteristics of the Certificate that Technical staff holds. Thus, data collected was processed as: 1.
2.
Training Data—50% of the queries collected were classified under the Certificate which addressed the issue specified in the query. Figure 10 is the Training dataset generated. Testing Data—Remaining half of the queries were similarly classified under the relevant certificate and tags of the certificate the both. Figure 11 refers to the testing dataset.
Fig. 10 Training dataset
Fig. 11 Testing dataset
Creation of Knowledge Graph for Client Complaint …
C.
D.
39
Data Filtering Data filtering is a phase where training data was refined with unnecessary parts in it and made ready for the Text Classifier Model. Rake package with natural language toolkit library [8, 9] was used to remove the duplicates, stop words, create data frame and were attributed underclasses and utterances for the Model. Thus, our filtered Training data is ready. Data Training The data from the data filtration phase was extracted and used as input for the training phase. Here, we have used Text Classification Model using Naïve Bayesian Algorithm [6, 10] and Artificial neural Networks. Steps in Training the data through text classification model involves:
Step 1: Tokenize and stem the words and organize the data structure into documents, classes and words. Step 2: Generation of bag of words for each statement and create an array of 0’s and 1’s for the corpus. Step 3: Clean the data and use the sigmoid function to normalize the values and perform matrix multiplication to reduce the error rate. Step 4: Generate synaptic weights by neural network function. Step 5: Train () Function by defining hidden layer neurons. E.
Data Testing After Training the model with the training data we need to generate the score of the tag-certificate relevance with the testing data. Testing data Fig. 11 was tested with the following steps:
Step 1: Testing data was read by the classify function to generate the scores of relevance between certificate and tag. The score was generated for each query under the tag and the certificate. This is the output of the Text Classifier Model. Step 2: The output of step 1, was used to perform mean function calculate the final score value of the relevance between Certificate and tag as shown in Fig. 12. Step 3: The output was stored in CSV file and fed as final data for the creation of Knowledge Graph. F.
Result and Analysis
The output of the testing phase was used for generating results. The output was in the CSV format which was then used as source of input. By importing the data into various tools, Knowledge graph can be generated and visualized. Various platforms such as GraphDB, Neo4j, Tableau, etc. can be used according to the requirements. We have used Neo4j for the visualization and creation of Knowledge Graph.
40
S. Shinde et al.
Fig. 12 Output of the text classification model
4 Result: Creation of Knowledge Graph Technical Staff data collected in data collection phase, output data from testing phase were used as knowledge base for the Knowledge Graph Creation [11]. Multiple Platforms can be used to generate the Knowledge graph and visualize [12] it. Data files of the Knowledge base were imported in the tool. Knowledge base was generated with the output of the model given after the operations on the data generated in testing phase. This output was used for the visualization which was the database for our knowledge graph. With the help of Cypher Query Language, queries were performed to link the data together and define the nodes and edges by cross-referencing and create the graph database. Thus, Knowledge Graph was created as shown in Figs. 13 and 14.
5 Conclusion and Future Work The paper proposes an efficient and effective solution to a problem faced by major IT industries. A careful study was conducted of existing Machine learning models and the relevant models were used in the creation of our Text Classification model. The comparison shows that our model gave results that are more favorable to the needs of the concern. The application of the model for the given client complaints data had given results with an accuracy of about 85% and a latency of less than
Creation of Knowledge Graph for Client Complaint …
Fig. 13 Knowledge graph
Fig. 14 Closer view of knowledge graph
41
42
S. Shinde et al.
Fig. 15 Pie chart on result accuracy of the text classifier model
1 s. The following pie chart shows the results of the Testing data that was given to the Model after the Training phase. Figure 14 shows the Testing data classified into correctly, incorrectly and null predicted categories as per the expected result (Fig. 15). For a complete usage, the model needs to be integrated with further processes. The output of the Text Classification model is a data structure that can be used by further systems to get the desired result. This provides flexibility in the usage of the model. Other than managing client complaints it can be used in other applications like (1) (2) (3) (4)
Maintaining a record of teacher’s certified domains to help a student find a suitable guide/mentor Any online delivery system can use this model where the knowledge graph can hold data of Restaurants or Shops of any kind. Tourism websites can use the model to help tourists find locations of public interest involving theatres and museums. The Text Classification Model can be used for any kind of specific search that complies with the characteristic needs of the user and maps them among the huge data of possible outcomes.
Acknowledgements The authors would like to thank Mr. Vishal Sharma and Mr. Rupesh Deore for their valuable guidance and support to help with the novel implementation of knowledge graph for client complaint management system. The authors offer their gratitude to Director, MKSSS Cummins College of Engineering, Pune and the Computer Department for their constant support.
References 1. Kormpho P, Liawsomboon P, Phongoen N, Pongpaichet S (2018) Smart complaint management system. In: 2018 seventh ICT international student project conference (ICT-ISPC),
Creation of Knowledge Graph for Client Complaint …
43
Nakhonpathom, pp 1–6. https://doi.org/10.1109/ICT-ISPC.2018.8523949 2. Galitsky BA, González MP, Chesñevar CI (2009) A novel approach for classifying customer complaints through graphs similarities in argumentative dialogues. Decis Support Syst 46(3):717–729. https://doi.org/10.1016/j.dss.2008.11.015 3. Paulheim H (2017) Knowledge graph refinement: a survey of approaches and evaluation methods. Semantic web 8(3):489–508 4. Frank E, Bouckaert RR (2006) Naive Bayes for text classification with unbalanced classes. In Fürnkranz J, Scheffer T, Spiliopoulou M (eds) Knowledge discovery in databases: PKDD 2006. PKDD 2006. Lecture notes in computer science, vol 4213. Springer, Berlin. https://doi. org/10.1007/11871637_49 5. Singh G, Kumar B, Gaur L, Tyagi A (2019) Comparison between multinomial and Bernoulli naïve Bayes for text classification. In: 2019 international conference on automation, computational and technology management (ICACTM), London, United Kingdom, pp 593–596. https:// doi.org/10.1109/ICACTM.2019.8776800 6. Xu S, Li Y, Wang Z (2017) Bayesian multinomial naïve bayes classifier to text classification. In: Park J, Chen SC, Raymond Choo KK (eds) Advanced multimedia and ubiquitous engineering. FutureTech 2017, MUE 2017. Lecture notes in electrical engineering, vol 448. Springer, Singapore. https://doi.org/10.1007/978-981-10-5041-1_57 7. Farkiya A, Saini P, Sinha S, Desai S (2015) Natural language processing using NLTK and wordNet. (IJCSIT) Int J Comput Sci Inf Technol 6(6):5465–5454 8. Paramkusham S (2017) NLTK: the natural language toolkit. Int J Technol Res Eng 2845–2847 9. Yogish D, Manjunath TN, Hegadi RS (2019) Review on natural language processing trends and techniques using NLTK. In: Santosh K, Hegadi R (eds) Recent trends in image processing and pattern recognition. RTIP2R 2018. Communications in computer and information science, vol 1037. Springer, Singapore 10. Shuo Xu (2018) Bayesian Naïve Bayes classifiers to text classification. J Inf Sci 44:48–59 11. Kertkeidkachorn N, Ichise R (2018) An automated knowledge graph creation framework from natural language text. IEICE Trans Inf Syst E101-D:90–98 12. Zhu A (2013) Knowledge graph visualization for understanding ideas. Int J Cross-Discip Subjects Educ (IJCDSE), 3(1)
Additional References 13. Buluç et al A (2013) High-productivity and high-performance analysis of filtered semantic graphs. In: 2013 IEEE 27th international symposium on parallel and distributed processing, Boston, MA, pp 237–248.https://doi.org/10.1109/IPDPS.2013.52 14. Dörpinghaus J, Stefan A (2019) Knowledge extraction and applications utilizing context data in knowledge graphs, pp 265–272. https://doi.org/10.15439/2019F3. 15. Nazaruka E (2019) Identification of causal dependencies by using natural language processing: a survey, pp 603–613. https://doi.org/10.5220/0007842706030613 16. Kertkeidkachorn N, Ichise R (2017) T2KG: an end-to-end system for creating knowledge graph from unstructured text. AAAI workshops 17. Manrique R, Mariño O (2018) Knowledge graph-based weighting strategies for a scholarly paper recommendation scenario. InKaRS@RecSys 2018, pp 5–8 18. Al Omran FNA, Treude C (2017) Choosing an NLP library for analyzing software documentation: a systematic literature review and a series of experiments. In: 2017 IEEE/ACM 14th international conference on mining software repositories (MSR), Buenos Aires, 2017, pp 187–197. https://doi.org/10.1109/MSR.2017.42
Automation of Bid Proposal Preparation Through AI Smart Assistant Sanjeev Manchanda
Abstract Preparation of bid proposal in response to new tenders is a very exhaustive process for any organization in manufacturing industry, understanding the contents of new tenders, analyzing feasibility of project, querying deviations, estimating timelines/costs, preparing bid proposal and responding to tender within stringent timelines and involves a lot of manual efforts from different teams of an organization. Above that human dependency is error prone, which may enhance the costs of the bid preparation and may lead to losses. Manually, extracting right information, analyzing that information, preparing proposal and making right decisions involve efforts from multiple teams of different departments of organization. This paper presents an intelligent artificial intelligence (AI) smart assistant system that automates the process of extracting insights from tender documents and preparing bid proposal for new tender documents. Keywords Automated proposal preparation · Request for proposal · Artificial intelligence/machine learning · Natural language processing · Natural language generation
1 Introduction Manufacturing organizations respond to many request for proposals (RFPs) for new tenders issued by their clients. Preparation of bid proposal in response to new tenders is an exhaustive process for any organization in manufacturing industry that consumes a lot of resources from various teams. Typical bid preparation involves many steps like understanding the contents of new tenders, analyzing feasibility of project, querying deviations, estimating timelines/costs, preparing bid proposal and responding to tender within stringent timelines. Most of these activities involve a lot of manual efforts from different teams from different departments of an organization. Above that human dependency is error prone, which may enhance the costs of the S. Manchanda (B) A & I R & I, TCS, Mumbai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes on Data Engineering and Communications Technologies 70, https://doi.org/10.1007/978-981-16-2934-1_3
45
46
S. Manchanda
project and may lead to losses. Manually, extracting right information, analyzing that information, preparing proposal and making right decisions involve efforts from multiple teams of different departments of organization. To help organizations in bid preparation, an artificial intelligence/machine learning (AI/ML) and natural language processing (NLP)-based system is proposed that can help in preparing bid proposals and will help in better decision making about bid proposal preparation and submission. This paper presents the processes of AI smart assistant system with results and comparisons.
2 Historical Background Automated proposal preparation is not much explored from proposal creation perspective; more focus was on proposal evaluation in the past, but state-of-theart latest advancements in AI/ML technologies, like NLP, NLG and neural networks, have enabled the possibility of its automation. Sheflott [1] patented an approach based on similarly for generating a proposal response, where questions are compared for similarity with questions in repository to find associated answer. Arslan et al. [2] presented an e-bidding proposal preparation system for construction projects. Philbin [3] discussed about a bid management process based on systems engineering. Renault et al. [4] explained the time optimization through requirements pattern catalogue and errors during the proposal preparation process. Hochstetter et al. [5] discussed in their study about the process of software proposal request and its automation. Paech et al. [6] explained the experiences of suppliers in analyzing received proposal documents and evaluated different requirements gathering techniques. Hochstetter et al. [7] explained the core solution development parameters for RFP and enumerated the challenges practice. Hamid et al. [8] explained the process of requirement identification from proposals to define scope, associated services and required service levels. Ellis [9] presented a list of many tools that help in automating proposal preparation. Rajbhoj et al. [10] presented an RFP response generation approach for deployment in business units having high volumes of proposal turnover with reasonable accuracy. As new technologies have enabled to develop next-generation solutions for automation of complex tasks like proposal preparation, current paper presents a system that helps in creating proposals automatically and supports key decisions for responding to RFP for new tenders.
3 Problem Definition Manufacturing organizations need to respond many request for proposals (RFPs) for new tenders issued by their clients in timely and accurately. Manual preparation of bid proposal in response to new tenders is time consuming and error prone.
Automation of Bid Proposal Preparation Through AI Smart Assistant
47
Fig. 1 Bid proposal preparation process with decision points
Multiple teams work together to create proposal document iteratively. A typical tender document may have few hundreds of pages to thousands of pages in portable document format (PDF) format, and due to limited capabilities of humans, many important details may be missed which may lead to losses for organization and may adversely affect to the reputation of the organization as well. Preparation of bid proposal involves certain checkpoints, where decisions are to be taken by management. Figure 1 depicts end-to-end process for bid proposal preparation. There are three major checkpoints in the process as follows:
3.1 Bid/No-Bid Decision Bid/No-Bid decision involves high-level review of key requirement parameters or attributes within tender document to understand the requirement and ensuring feasibility of deliverables. (a)
(b)
Tender Document Extraction Process starts with new tender document received in portable document format (PDF) format. Key highlights of tender document contents are extracted manually. Tender Document Review Different sections of tender document are reviewed manually by teams from different departments of organization, who received the tender from their client.
48
(c)
S. Manchanda
Summary of Important Metrics Preparation For decisions whether to bid or not to bid for new tender, this decision is taken based on key metrics extraction from tender documents. This is done to ensure the high-level feasibility of If the decision made is for No-Bid, then process terminates here, else the process moves to the next step, i.e., Go/ No-Go decision.
3.2 Go/No-Go Decision Go/No-Go decision involves detailed review of contents from tender document to understand the requirement by different teams from different departments. (a)
(b)
(c)
(d)
(e)
Detailed Content Extraction Detailed content is extracted from new tender document in PDF format and shared with teams from departments or outsourced partners of organization that need to be reviewed and responded. Division of Contents for Different Teams Contents to be extracted for different teams will be of two types viz. specific content for a team/partner as well as common content for all teams/partners. Review of Tender Contents Different teams will review the content and will raise their concerns for clarifications and deviations. Submission of Queries Collated queries are shared with the tender issuer for clarifications. Tender issuer receives queries from all of its partners, then prepares and shares a combined draft of clarifications to partners. Go or No-Go Decision Based on clarifications of queries received from tender issuer and later reviews received from different teams of organization, the management team decides whether to ‘Go’ or ‘No-Go’ decision for bidding. If the decision is ‘No-Go,’ process terminates here; otherwise, the ‘Go’ decision will leads to next steps below.
3.3 Preparation and Submission of Bid Proposal This process finds closest tender in the repository to calculate recommendations for new tender and to prepare bid proposal manually. (a)
Matching and Finding Closest Historical Tender and Bid Once ‘Go’ decision is taken for bidding, then search is initiated to find closest tender in the repository and associated bid for preparing bid recommendations for new tender.
Automation of Bid Proposal Preparation Through AI Smart Assistant
(b)
49
Preparation of Parameters or Components List Once closest tender and associated bids are found, then key parameters or components of project that are required for project and involve cost are searched within historical bid. There may be two types of parameters as follows: (i)
(ii)
Tender Parameters Tender parameters are those parameters that are specified within tender documents like electric motor specifications and manufacturing machine specifications. Non-Tender Parameters Non-tender parameters are those parameters that are not specified within tender document but are implicitly required for completion of project like electricity wires and switches.
All these available parameters or components of project are listed down and are matched with parameters or components required for new tender. This list is updated with one or more new components or parameters that are not available in historical tender and bid are also included in the list. After few iterations, this list of parameters or components is prepared, and associated cost estimates are calculated. Costs of components will have changed from old tender to new tender, so costs are updated with latest cost estimates. After many iterations, final recommendation list of parameters of components for bid preparation of new tender is created. (c)
(d)
Bid Proposal Preparation Based on recommendations, new bid proposal is prepared by team members from different departments. Bills of Materials (BoM) and commercials are updated into bid proposal to finalize the bid proposal contents. During bid proposal preparation process, it is reviewed repeatedly to check detailed feasibility of project, if any such unseen observation is received, then After multiple iterations and reviews final bid proposal is prepared. Bid Proposal Submission Once bid proposal is finalized by bid proposal preparation team, management reviews and approvals are taken. Followed by that bid proposal is finally submitted with tender issuer, and bid proposal is archived into system with tender document.
Figure 1 depicts the process of bid proposal as explained in steps above. Most of the activities in this process are manual, and a lot time as well as effort are consumed in reviewing and preparation of bid proposal. Manual outputs are error prone even after many iterations and teamwork. Many a times, feasibility of project is found; at the end, the overall process and process terminate without submission of such bid proposals. Manufacturing industry seeks support from technology to automate the process of bid proposal preparation to a significant extent. It is evident from the complexity of the whole process that complete automation of this process may not be possible, but many sub-processes may be automated through human in the loop approach, where machine and human experts can work hand in hand to achieve
50
S. Manchanda
Fig. 2 Bid proposal preparation process with decision points
reasonable quality output through automation of this process with minimal efforts, time and costs.
4 Proposed Solution As bid proposal preparation is a complex process and involves a lot time as well as efforts to prepare a bid proposal for simple to complex projects’ tenders received by different industry verticals worldwide. State-of-the-art advancements in technology have enabled to automate many processes that were mostly manual. Bid proposal preparation is also one of the complex process, and automation of this process is designed very carefully to meet the objectives. Automation scoped for bid proposal preparation corresponds to augment different decisions during the process as follows (Fig. 2).
5 Solution Implementation AI Smart Assistant is evolving and is currently developed in three phases to support key decisions as described in problem statement.
Automation of Bid Proposal Preparation Through AI Smart Assistant
51
Fig. 3 Attributes extraction process
5.1 Attributes Extraction Process Attributes extraction process as depicted in Fig. 3, is designed to support Bid/No-Bid decision with an objective to extract key attributes from document that can support Bid/No-Bid decisions at high level. User uploads the new tender document into system after logging into system. This document is usually PDF format document. Input document’s contents are extracted and converted into sections of document as per different paragraphs within input document. Different sections of document are processed further through Natural Language Processing (NLP) engine that is customized to extract right attributes from the document. NLP engine uses standard dictionaries, custom-made domain dictionaries and state-of-the-art deep learning neural network algorithms to identify right set of attributes. Once these attributes are extracted through attributes extraction module, these extracted attributes are customized to be reported into output file through customizing output module, and this module highlights key attributes and exceptions in those attributes, e.g., system highlights the thickness of drum’s wall for manufacturing has very big variation in cost, so more width implies more costs. System intelligently suggest whether new tender should be considered for bidding or not. Once this output file is generated with key attributes, exceptional highlights and recommendations, it is reviewed by experts, and then, is presented to management for final decision making. If decision is No-Bid, then process terminates here; otherwise, this process triggers next process for automated bid preparations.
5.2 Team-Wise Tender Contents Extraction Process Once management decides to go for bidding of new proposal, then contents of tender document are reviewed in detail by system and different teams specialized in review of contents. As depicted in Fig. 4, content sections extracted from PDF documents are further processed through team-wise content extraction module. Team-wise content extraction module identifies contents related to each team and highlights key points in these contents intelligently. There are certain contents that are specific to a team like manufacturing team will get manufacturing-related contents, finance team will get finance-related contents, civil team will get civil structures-related needs, whereas
52
S. Manchanda
Fig. 4 Team-wise tender contents extraction process
there will be certain contents that will be common to all, e.g., electrical switches, wires, etc., which will be needed for all teams, so such contents will be shared as common contents. System will intelligently highlight the key points in tender document in each section and will present those key points for each team, so that specialized teams or partners can take system’s input to review and to create final list of key points for bid preparation. Based on key points, teams and partners prepare a list of deviations and clarifications to be questioned from tender issuer organization. Teams and partners prepare a collective list of queries from different teams and share combined list of queries with tender issuer. Tender issuing organization receives queries from their bidding partners, prepares a combined list of clarifications and sends them to all of their partners. After receiving clarifications, different teams and management of bidding organization reach to a decision of whether they should Go or No-Go for bidding of this new tender. No-Go decision can be taken any time based on certain parameters that are infeasible and may incur losses or any other severe uncertainty or risk. If management takes No-Go decision, then process terminates here; otherwise, next steps of bid preparation are triggered.
5.3 Closest Tender/Bid Extraction and Bid Recommendations Process Once ‘Go’ decision is taken, the process of finding closest tender to new tender in the repository of historical tender documents initiates. Closest tender is matched based on wide range of parameters and system trained through rules and machine learning techniques, search closest tender and its associated bid so that suitable recommendations based on past experiences can be prepared. There are two possibilities whether closest acceptable tender is available in the repository or not.
Automation of Bid Proposal Preparation Through AI Smart Assistant
53
Fig. 5 Closest tender/bid extraction, bid proposal preparation process
As depicted in Fig. 5, system initiates the search and finds closest tender based on eligibility criteria of wide range of parameter as well as finds associated bid and then compares the key attributes and points identified in previous processes to prepare a list of key requirements in the tender document and how these requirements are proposed to be fulfilled in associated bid. Using historical tenders, associated historical bid, key attributes/points of new tenders as generated during previous processes, new tender document with clarifications received from tender issuer are processed together to generate recommendations for new tender. This is very complex process that executes a rule and machine learning engine to evaluate all inputs to generate recommendations. On the other hand, if no closest tender is found to qualify eligibility criteria, then system itself generates its recommendations using new tender document and files generated during previous processes. Recommendations generated by system are of two types viz. tender parameterbased recommendations and non-tender parameter-based recommendations. (a)
(b)
Tender Parameters-Based Recommendations Tender parameters are directly specified in the tender document, and the required specifications are described or specifically prescribed, e.g., thickness of drum, power of motors, etc. Based on these specified parameters, system finds suitable recommendations based on certain rules and machine learning-based processing. Non-Tender Parameters-Based Recommendations Non-tender parameters are not specified in the tender document, and the required specifications are unavailable, e.g., electric switches, wires, etc. Based on tender parameters, system calculates the need of non-tender parameter and finds suitable recommendations based on certain rules and machine learning-based processing.
Figure 6 depicts the machine learning process that trains the model to identify closest tender to find its associated bid and finds closest tenders and their associated bids for new tenders. After generating recommendations for tender and non-tender parameters, these recommendations are processed further for generating bid proposal draft automatically through natural language generation (NLG)-based proposal preparation engine. Natural language generation engine is used for generating bid proposals automatically. System-generated bid proposal is reviewed by bid proposal preparation experts,
54
S. Manchanda
Fig. 6 Training and learning processes for bid recommendations
and after multiple iterations, a final draft is prepared and is sent to finance team for including commercials and bills of materials (BoM) to calculate final bidding quotation. After multiple reviews, final bid proposal is prepared and is submitted with tender issuer, and new tender with final bid proposal is archived for training of algorithms and future use. All processes are depicted in Algorithm 1. Algorithm 1: AI Smart Assistant System’s Algorithm Input: Input Tender Documents D, Associated Bid Documents B, Document Sections Set S ← null, Attributes Set A ← null, Team-Wise Contents Set T ← null, Common Content Set C ← null and Recommendations Set R ← null Output: Bid Proposal Document Bnew. Step 1: Input new tender document set Dnew. Step 2: Initialize S ← null, i ← 1, j ← 1 Step 3: while i