153 37 2MB
English Pages 109 Year 2023
SpringerBriefs in Applied Sciences and Technology Computational Intelligence Parikshit N. Mahalle · Pravin P. Hujare · Gitanjali Rahul Shinde
Predictive Analytics for Mechanical Engineering: A Beginners Guide
SpringerBriefs in Applied Sciences and Technology
Computational Intelligence Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland
SpringerBriefs in Computational Intelligence are a series of slim high-quality publications encompassing the entire spectrum of Computational Intelligence. Featuring compact volumes of 50 to 125 pages (approximately 20,000-45,000 words), Briefs are shorter than a conventional book but longer than a journal article. Thus Briefs serve as timely, concise tools for students, researchers, and professionals.
Parikshit N. Mahalle · Pravin P. Hujare · Gitanjali Rahul Shinde
Predictive Analytics for Mechanical Engineering: A Beginners Guide
Parikshit N. Mahalle Department of AI and DS Vishwakarma Institute of Information Technology Pune, Maharashtra, India
Pravin P. Hujare Department of Mechanical Engineering Vishwakarma Institute of Information Technology Pune, Maharashtra, India
Gitanjali Rahul Shinde Department of Computer Science and Engineering (AI and ML) Vishwakarma Institute of Information Technology Pune, Maharashtra, India
ISSN 2191-530X ISSN 2191-5318 (electronic) SpringerBriefs in Applied Sciences and Technology ISSN 2625-3704 ISSN 2625-3712 (electronic) SpringerBriefs in Computational Intelligence ISBN 978-981-99-4849-9 ISBN 978-981-99-4850-5 (eBook) https://doi.org/10.1007/978-981-99-4850-5 © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Foreword
As the field of mechanical engineering continues to evolve and progress at an unprecedented pace, it’s becoming increasingly important for engineers to embrace the latest analytical tools and techniques to stay ahead of the curve. One field that has shown tremendous potential in this regard is predictive analytics—a powerful approach that enables engineers to anticipate and prevent problems before they occur. In this ground breaking book, Predictive Analytics for Mechanical Engineering, our esteemed authors take a deep dive into the world of predictive analytics, exploring a wide range of applications and techniques that can help mechanical engineers achieve greater efficiency, accuracy, and productivity in their work. From machine learning algorithms and data mining techniques to statistical models and simulation tools, this book provides a comprehensive overview of the latest technologies and methodologies that are transforming the way engineers approach their work. Whether you’re a seasoned mechanical engineer looking to upskill and improve your analytical capabilities, or a student just starting out in the field, this book is an essential resource for anyone seeking to stay at the forefront of this rapidly evolving discipline. Although written with mechanical engineers in mind, the insights and techniques discussed in this book are relevant to anyone involved in the field of engineering and beyond. I extend my sincere gratitude to the authors for their insightful contributions and dedication to advancing the field of mechanical engineering. I am confident that this book will serve as a valuable resource for many years to come, inspiring engineers and students alike to embrace the exciting possibilities of predictive analytics.
v
vi
Foreword
Prof. Dr. Anil D. Sahasrabudhe Chairman, National Educational Technology Forum Ministry of Education, Government of India New Delhi, India
The original version of this book was revised. Foreword has been placed in the Frontmatter of the book.
Preface
Work for work’s sake, not for yourself. Act but do not be attached to your actions. Be in the world, but not of it. —Bhagwad Gita
Artificial Intelligence and Machine learning are the buzz words that are now heard. Their applicability is wide and they possess tremendous potential to unveil the meaningful insights with analytics. Understanding the data, the patterns, and the relationship is a critical factor prior to the selection, development, or usage of any of AI/ML approaches. So, the fundamental requirement is to understand the data and communicating effectively through it. Predictive maintenance is one of the important applications of data analytics and becoming a popular tool for preventive mechanisms. This book also presents prominent use cases of mechanical engineering using Predictive Maintenance System (PMS) along with its benefits. In the current era of digitization and automation, the smart computing is playing a crucial role. Increasing number of companies is becoming more reliant on the technology and changing their all operations to artificial intelligence (AI) enabled and this AI revolution is bringing big change in the labor market. In the sequel, these advancements in operations and technologies are creating significant impact on human lives and livelihood. The use of AI for predictive analytics and smart computing has become an integral component in all the use cases surrounding us. This is giving birth to the fourth industrial revolution, i.e., Maintenance 4.0. The current trend of Maintenance 4.0 leans toward the preventive mechanism enabled by predictive approach and condition-based smart maintenance. The intelligent decision support, earlier detection of spare part failure, and fatigue detection are the main slices of intelligent and PMS leading toward Maintenance 4.0 This book focuses on key component required for building predictive maintenance model. This book also presents prominent use cases of mechanical engineering using PMS along
vii
viii
Preface
with the benefits. Basic understanding of data preparation is required for the development of any AI application in view of this the types of data and data preparation process; tools are also presented in this book. 1. Helpful for user to understand data analytics techniques required for prediction purpose. 2. Helps user to understand how different predictive analytics techniques will be implemented in mechanical engineering domain. 3. Helps to understand data pre-processing techniques and tools. 4. Helps to understand techniques and types of data analytics. 5. Helps to understand the role of PMS in mechanical design systems and manufacturing domains. 6. Presents case studies for building PMS. 7. Concise and crisp book for novice reader from introduction to building basic PMS. In a nutshell, this book puts forward the best research roadmaps, strategies, and challenges to design and develop PMS. Book is motivating to use a technology for better analysis in the need of layman users to educated users to design various use cases in mechanical engineering particularly in the context of PMS. Book also contributes to social responsibilities by inventing different ways to cater the requirements for government and manufacturing communities. The book is useful for Undergraduates, Postgraduates, Industry, Researchers, and Research Scholars in ICT and we are sure that, this book will be well-received by all stakeholders. Pune, India
Parikshit N. Mahalle Pravin P. Hujare Gitanjali Rahul Shinde
Contents
1 Introduction to Predictive Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Data Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Type of Data Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.1 Descriptive Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.2 Diagnostic Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.3 Predictive Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.4 Prescriptive Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Techniques of Data Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Predictive Analytics and Data Science . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4.1 Predictive Analytics Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.5 Applications of Predictive Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2 Data Acquisition and Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Exploratory Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Types of Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Categorization Based on Data Formats . . . . . . . . . . . . . . . . . . . 2.2.2 Categorization Based on Data Source . . . . . . . . . . . . . . . . . . . . 2.2.3 Categorization Based on Perspective of Data Generation . . . . 2.3 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Data Cleaning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Data Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 Data Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11 11 11 12 14 16 20 21 24 28 36 37 37
3 Intelligent Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Conventional Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39 39 41 42 ix
x
Contents
3.4 Deep Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Popular Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45 46 48 49 50
4 Predictive Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Predictive Maintenance and Machine Learning . . . . . . . . . . . . . . . . . . 4.3 Predictive Maintenance Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Implementation of Predictive Maintenance . . . . . . . . . . . . . . . . . . . . . . 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
51 51 54 56 58 59 59
5 Predictive Maintenance for Mechanical Design System . . . . . . . . . . . . . 5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Design Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Perception of Diagnostics and Prognostics . . . . . . . . . . . . . . . . 5.2.2 ML Algorithms for Fault Detection . . . . . . . . . . . . . . . . . . . . . . 5.2.3 Machine Learning-Based Bearing and Gear Health Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.4 AI for Condition-Based Monitoring and Fault Detection Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Predictive Analysis for the Useful Life of Bearing . . . . . . . . . 5.3.2 Predictive Analysis for Gear Tooth Failure . . . . . . . . . . . . . . . . 5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
61 61 61 62 64
6 Predictive Maintenance for Manufacturing . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Design Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 Predictive Analysis for Sound Absorption of Acoustic Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
77 77 78 79
7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Research Opening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Future Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
95 95 96 96
68 69 69 69 72 74 75
79 91 93
About the Authors
Dr. Parikshit N. Mahalle is a senior member of IEEE and is a Dean of Research and Development, professor and the head of department of Artificial Intelligence and Data Science at Vishwakarma Institute of Information Technology, Pune, India and has 23 years of teaching and research experience. He completed his Ph.D. from Aalborg University, Denmark, and continued as a postdoc researcher at CMI, Copenhagen, Denmark. His research interests are machine learning, data science, algorithms, Internet of things, identity management and security. He is guiding eight Ph.D. students in the area of IoT and machine learning, and recently, six students have successfully defended their Ph.D. under his supervision. He is also the recipient of “Best Faculty Award” by Sinhgad Institutes and Cognizant Technologies Solutions. He has delivered 200 plus lectures at national and international level and authored 54 books. Dr. Pravin P. Hujare is working as an associate professor in the Department of Mechanical Engineering, Vishwakarma Institute of Information Technology, Pune, India. He has done Ph.D. in Noise and Vibration Control from COEP, Savitribai Phule Pune University, Pune. He has 24+ years of teaching and research experience. He obtained M.E. (Mechanical Engineering) degree from the University of Mumbai. He has 15 patents and 35+ research publications in national and international conferences and journals. He has received research funding for the project “Investigations on Damping Performance of Segmented Passive Constrained Layer Damping Treatment” by SPPU, Pune. He has presented a research article in SAE Noise and Vibration International Conference, Grand Rapids USA. His research interests are machine design, mechanical vibration and noise control, machine learning, and data analytics. Dr. Gitanjali Rahul Shinde has overall 14 years of experience and presently working as an associate professor in the Department of Computer Engineering, Vishwakarma Institute of Information Technology, Pune, India. She has done Ph.D. in Wireless Communication from CMI, Aalborg University, Copenhagen, Denmark, on Research Problem Statement “Cluster Framework for Internet of xi
xii
About the Authors
People, Things and Services”—Ph.D. awarded on May 8, 2018. She obtained M.E. (Computer Engineering) degree from the University of Pune, Pune, in 2012 and B.E. (Computer Engineering) degree from the University of Pune, Pune, in 2006. She has received research funding for the project “Lightweight group authentication for IoT” by SPPU, Pune. She has presented a research article in World Wireless Research Forum (WWRF) meeting, Beijing, China. She has published 50+ papers in national and international conferences and journals.
Chapter 1
Introduction to Predictive Analytics
1.1 Data Analytics Data has always been an important part of any organization. It is necessary to study every facet of data, whether it is generated by giant corporations or by a single person, in order to get something from it. But how do we go about it? That’s when the phrase “data analytics” comes into play. Data analytics plays a crucial part in enhancing your company as it is utilized to uncover hidden insights, develop reports, conduct market analysis, and enhance business requirements. Data analytics is defined as the process of analyzing raw data to find patterns and offer answers, which broadly covers the field. However, it comprises a variety of tactics with a variety of goals. Data analytics is a general phrase for the use of quantitative methods and specialized knowledge to extract meaning from data and offer answers to crucial questions about a company, the environment, healthcare, science, and other fields of interest [1–3]. Few basic functions of the data analytics are as follows. . Gather Hidden Insights: Data are mined for hidden insights that are then examined in perspective of business needs. . Reports Generation: Reports are generated from the data and distributed to the appropriate teams and individuals to be dealt with in order to take further measures for a high growth in business. . Perform a market analysis—A market analysis can be done to determine the advantages and disadvantages of rivals. . Enhance Business Requirement—Data Analysis enables enhancing Business to Customer Experience and Requirements. Data analytics will give you a clear picture of where you are, where you have been, and where you should go by merging these elements. Usually, descriptive analytics is the first step in this process. This is how historical data trends are described. The goal of descriptive analytics is to explain © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. N. Mahalle et al., Predictive Analytics for Mechanical Engineering: A Beginners Guide, SpringerBriefs in Computational Intelligence, https://doi.org/10.1007/978-981-99-4850-5_1
1
2
1 Introduction to Predictive Analytics
what occurred. This frequently entails calculating conventional indicators like return on investment (ROI). For each industry, a distinct set of indicators will be employed. Neither predictions nor direct decision-making are supported by descriptive analytics. It emphasizes on providing a meaningful and descriptive summary of the data. Advanced analytics is the next crucial component of data analytics. This area of data science makes use of cutting-edge methods to extract data, anticipate the future, and identify trends. Both machine learning and traditional statistics are included in these technologies. Advanced analytics is made possible by machine learning technologies like neural networks, NLP, sentiment analysis, and others. This datadriven knowledge offers fresh insights of the data. What-if scenarios are addressed by advanced analytics. The employment of these techniques in numerous industries has been made possible by the accessibility of machine learning techniques, large data sets, and inexpensive computing power. These strategies are made possible by the gathering of massive data sets. Thanks to advancements in parallel processing and low-cost computing power, big data analytics allows businesses to derive valuable insights from complex and varied data sources [4, 5].
1.2 Type of Data Analytics Data analytics is divided into four types based on what insights are taken from data, shown in Fig. 2.1.
1.2.1 Descriptive Analytics The goal of descriptive data analytics is to provide a clear image of the world through the use of current raw data, i.e., “What is happening?” For instance, information from a company’s monthly profit and loss statements might be used to learn more about its performance [6]. Additionally, many metrics and measurements regarding the company might be collated to provide a comprehensive picture of its strengths and faults. Presenting findings for additional investigation is another benefit of descriptive analytics. The percentage of people in a specific age group could be determined by statistically analyzing the demographic information of customers. Data on sales and prices can be combined and analyzed over time or between departments. Data mining and data aggregation are two of the methods employed in this procedure. Data summarization is a component of descriptive analytics. It makes data easier to understand. Descriptive analytics, is utilized during the purification step to help detect faulty or unusable data. Descriptive analytics analyzes data to establish its average or distribution as a percentage using basic mathematical operations including sum, mean, median, proportions, and percentage values. Descriptive information offers an explanation for what is happening.
1.2 Type of Data Analytics
3
1.2.2 Diagnostic Analytics After “What is happening?” it is important to know “Why it is happening?” Diagnostic data analytics aids in this process of reasoning. To determine the cause of a situation, analysts read, scan, filter, and extract pertinent data. As the name implies, diagnostic analytics focuses on dissecting the information at hand and pinpointing the reasons behind particular issues, occurrences, and behaviors. For instance, a large firm could desire to learn more about its challenging workforce difficulties. Managers can look up and create snapshots of personnel working across many locations and divisions using data analytics. Additionally, they can compare and filter metrics related to their performance, tenures, and succession at work [7]. Diagnostic analysis provides an explanation for the situation. This is so that you can understand why something is occurring rather than just what is happening, as the diagnostic analysis method does. Together, they create for a fairly effective analytic combination that includes descriptive analysis. Diagnostic analysis uses all analytical techniques to determine the causes of a specific statistical connection. Interactive BI dashboards are very helpful in this regard for identifying the fundamental causes of issues. Correlations, data mining, data discovery, and drill-down are a few of the frequently utilized approaches in diagnostic analysis.
1.2.3 Predictive Analytics One of the most fascinating subsets of data analytics is predictive analytics. It aids in our ability to learn about the future! There is much ambiguity in the world. And we may never be entirely certain of what will occur. However, we can make better choices if we try to forecast what will happen in the future. We can estimate the possibility of an occurrence, a potential timeframe, or the magnitude of a future change with the use of predictive data analytics. To predict the future, it examines facts from the past and the present. Will there be an uptick or decline in sales? What will the financial situation look like in 2025? Analysts want to be as accurate as they possibly be while making their projections. Some of the methods that are becoming more and more well-liked in this field are data modeling and machine learning [6]. To forecast future behavior, predictive analysis makes use of historical data, patterns, and linkages. In order to compare trends and assist in making a prediction, it contrasts historical data models with current data models. If patterns from the current models and historical models show a high rate of parallel relationships, it is likely that future data will reflect a similar pattern to the data within the time period of the historical model. Predictive models are built using statistical algorithms and machine learning, which is a sort of automated model analysis software that looks for patterns in the data it is fed. Predictive analysis does not assess whether an event
4
1 Introduction to Predictive Analytics
will actually happen; it merely determines how likely it is to occur. What are the chances that this will happen? can be answered using predictive analytics.
1.2.4 Prescriptive Analytics Prescriptive analytics uses predictions to create value, whereas predictive analytics focuses on forecasts. By recommending the optimal course of action among the various options, it gives us the key to the future. Analytics develops a potential problem solution at this level using the insights from the previous three processes. Additionally, it involves comparing and choosing the best suggestions for the particular circumstance, rather than merely choosing one. For instance, a mobile traffic app can assist you in determining the most efficient travel path from your current position to your home. Prescriptive analytics is employed as a last resort to support the quantitative analysis of decision-making. Here, the emphasis is on obtaining practical insights into the anticipated effects of hypothetical future acts. Those making use of the results can more precisely plan their future moves using the possibilities generated by prescriptive analytic models. What is the best course of action? is at the center of prescriptive analytics [8]. The App would calculate the quickest or most efficient route for you by taking into account the distance, speed, and any traffic jams. Another illustration is a consulting firm that uses data analytics to recommend beneficial areas to launch a new product.
1.3 Techniques of Data Analytics Data Science (DS) and Business Intelligence (BI) are two techniques to take insights from the data. The procedure for acquiring data has been around for a while. In fact, forward-thinking businesses began gathering data before they had a plan for it. Even though they were still learning how to do it, they understood that data had immense value. How to use such data to gain important business insights is now the question. Professionals in business intelligence and data science are currently concentrating their efforts on how to utilize all of the data. The complexity of data is growing along with its volume, velocity, and variety. It is necessary to connect new data sources, both structured and unstructured, from the cloud and SaaS apps with existing onpremises data warehouses. Faster intake and processing are necessary since decisions must be made in the moment. To solve these problems, data science and business intelligence must collaborate. You require technologies that can handle both while effortlessly utilizing the same data if you want to integrate business intelligence and data science efficiently. The most straightforward method to distinguish between the two might be to see data science as the future and business intelligence as the past and present. While BI
1.3 Techniques of Data Analytics
5
focuses on descriptive analysis, data science deals with predictive and prescriptive analysis [9]. . Business intelligence Business intelligence (BI) is the process of descriptively analyzing data with the aid of technology and expertise in order to make wise business decisions. The suite of BI tools is used to gather, control, and manipulate data. By enabling data sharing between internal and external stakeholders, it enhances decision-making. BI’s objective is to produce useful intelligence from data. Among the activities that BI may make possible are: . . . .
improving one’s knowledge of the market identifying fresh business possibilities enhancing business operations keeping a step ahead of rivals
The cloud has been the most significant BI enabler in recent years. More data, from more sources, can now be processed more effectively thanks to cloud computing than was ever conceivable before. . Data science Data science is an interdisciplinary field that focuses on using data to derive important insights for the future. It makes use of mathematics, computer science, statistics, and subject-matter knowledge of the thing you are analyzing. Most frequently, the purpose of data science is to respond to hypothetical “what if” inquiries [10]. Parameter
Data science
Business intelligence
Type of analysis
Predictive Prescriptive
Descriptive Diagnostic
Perspective
Looks backward based on real data from real events
Looks forward, interpreting the information to predict what might happen in the future
Focus
Reports, KPIs, and trends
Focus on patterns
Process
Static and comparative
Dynamic
Data sources
Pre-planned and slowly added data
Real time data
Transform
BI answers a set of predefined questions
DS helps to discover new questions from data
Results
BI provides single form of facts
DS provides multiple forms of the facts, e.g. accuracy, precision and confidence level
6
1 Introduction to Predictive Analytics
1.4 Predictive Analytics and Data Science Predictive analytics is a key discipline within this field. In order to forecast activity, behavior, and trends, predictive analytics uses both current and historical data. Statistical analytic methods, data queries, and machine learning algorithms are applied to data sets to develop predictive models that provide a score or numerical value to the likelihood that a specific action or event will occur. Predictive analytics is unquestionably more important than ever. Data is essential to business analytics and is increasingly the driving force behind commerce. Data generated and gathered from internal operations and outside sources power all businesses, large and small. For instance, businesses gather information on each stage of the purchasing process, keeping track of the products, services, quantities purchased, and frequency of purchases. They keep tabs on client churn, grievances, late payments, defaults on credit, and fraud. The enormous amounts of data that organizations gather about their clients, operations, suppliers, staff performance, and so forth are useless unless they are used. High-level statistical proficiency and the capacity to create predictive analytics models are prerequisites for predictive analytics. Predictive analytics is normally the purview of data scientists, statisticians, and other trained data analysts. Data engineers support them by assisting with the collection of pertinent data and preparing it for analysis, while BI developers and business analysts support them by assisting with the creation of dashboards, reports, and data visualization.
1.4.1 Predictive Analytics Process The predictive analytics method is broken down into 5 distinct steps: planning, gathering, data analysis and statistical analysis, building the model, and monitoring the model. . Plan The first stage in the process is coming up with what you’re aiming to accomplish because it’s difficult to do anything without a strategy. Develop your inquiries at this point, and determine what you are attempting to forecast. Decide in advance the techniques you’ll employ and how. Creating definitions of what constitutes highquality data How will this be measured? What kind of information do you need? What data sources are used? Sometimes you need to refine your questions after doing a preliminary data exploration. In actuality, it happens frequently. You might incorporate an initial phase when you perform exploratory data analysis into your predictive model building process (EDA). Once you have finished an EDA, you must still have a plan for your goals. Sometimes you need to refine your questions after doing a preliminary data exploration. In actuality, it happens frequently. You might incorporate an initial phase when
1.5 Applications of Predictive Analytics
7
you perform exploratory data analysis into your predictive model building process (EDA). Once you have finished an EDA, you must still have a plan for your goals. . Collect The period during which you collect your data is known as the gathering phase. Here is where all of your preparation pays off. Whether the data is internal corporate data, survey data, or data received from outside sources, use your intended procedures to extract it. Knowing the type of data you’ll be concentrating on is also crucial. Are you seeking quality, quantity, or a combination of the three? Are you gathering data that is quantitative, qualitative, or both? Make sure you’re looking at the proper type of data in order to avoid setbacks. It’s crucial to know what kind of data you are looking for. . Data Analysis and Statistical Analysis After gathering data, you must filter through it to identify the information that will be most helpful to you. You will clean the data and do a quality check at this phase. For several reasons, data cleaning and quality control are crucial. Do you have any null or missing values? How will you approach these principles? There are numerous methods. Data relevance is checked as part of cleaning. Checking for duplicate data, damaged data, accurate data, and proper data formatting are all important. If at all feasible, you should take the time to amend your data. If not, you should weigh the danger of removing the wrong or pointless data to avoid having it affect your outcome. Statistical analysis entails the use of proper statistical techniques for the quantitative testing and validation of hypotheses. . Build the Model You should now be able to state with certainty which predictive model best accounts for the multiple impact factors you have discovered and examined. You will construct, test, and validate your predictive model during the model building process. Classification, clustering, and time series are just a few instances of potential predictive models that might be used. Predictive analytics uses machine learning techniques like Logistic Regression, K-Nearest Neighbors (KNN), and Random Forest.
1.5 Applications of Predictive Analytics . Marketing: Predictive analytics in marketing has revolutionized how businesses engage with customers and close deals. There are many use cases for predictive analytics, including next best action, lead qualification, proactive churn management, demand forecasting, and “data-driven creatives”—the use of predictive
8
1 Introduction to Predictive Analytics
analytics to determine what media style and form of messaging will resonate best with specific customers [11]. . Reducing Risk: Consider a straightforward example of lowering risks: credit scores, which are frequently used to identify defaulters’ propensity from user purchasing behavior. In reality, a person’s credit score is the result of a predictive model’s analysis of data related to their credit history. Insurance claims and fraud claim collections are two further types of risks [12]. . Healthcare: Its usage in healthcare is both challenging and anticipated to grow, as predictive analytics may help to save valuable human life by predicting disease at early stage. Predictive analytics is useful for managing supply chains, identifying patients at high risk of readmission to the hospital, and other applications in the field of health administration [13]. – Medical professionals can benefit from predictive analytics by using it to analyze data on global disease statistics, drug interactions, and patient diagnostic histories to deliver specialized treatment and carry out more efficient medical procedures. – By using predictive analytics on clinics’ historical appointment data, it is possible to more precisely predict no-shows and cancellation delays, which saves time and money. – The health insurance sector uses predictive analytics to identify patients who are most at risk of developing chronic or incurable diseases, which aids businesses in developing effective therapies. . Retail: Predictive analytics is crucial since every retailer, whether they operate online or in physical stores, wants to manage their inventories and operations. In order to improve operations and efficiencies, the method enables merchants to connect vast amounts of data, such as historical sales data, consumer behavior and product preferences, and geographic references [14]. Through predictive analytics and better targeting built on real-time data, customer sales data offers customized recommendations and promotions for specific customers. This helps merchants plan campaigns and create ads and promotions that will be most well-received by consumers. Predictive analytics of sales and logistics data aids merchants in ensuring the timely availability of high-quality merchandise in stores and adequate inventory/ products in warehouses. Timing sales and promotions have become an art; using predictive analytics on previous sales, inventory, and consumer data, one may determine the best conditions and times to drop or raise prices. Retailers can analyze the effects of promotional events and determine the best offers for customers using predictive analytics for item planning and price optimization. . Managing the supply chain: Better statistical models and predictions in supply chain management are now more important than ever as a result of the COVID19 epidemic. According to research analyst Alexander Wurm, the epidemic compelled businesses to update their processes with real-time data and third-party
1.6 Summary
9
information while throwing “previous data out the window.” Predictive analytics is more relevant in fast changing contexts because, for instance, IoT-generated real-time data informs businesses of products that have gone bad or are otherwise harmed [15]. . Fraud Detection: According to the most recent study on global crime, fraud rates are at all-time highs and have cost businesses around the world a whopping $42 billion in the last two years. Because many businesses only have small teams of investigators, detecting fraud requires the use of predictive technologies. Hundreds of thousands of insurance claims are scanned using predictive analytics, and only those that are most likely to be fraudulent are forwarded to investigative teams. Retailers utilize it as well to verify consumer identities throughout the login process and keep an eye out for any questionable activity [16]. Predictive analytics can be useful in – preventing credit card fraud by flagging atypical transactions, – To decide whether to approve or deny loan applications based on credit card score, – The most crucial step is to analyze customer churn data and enable banks to contact potential clients before they are likely to leave organizations. – calculating credit risk, making the most of cross-sell and up-sell opportunities, and keeping loyal clients. – Within 40 ms of the transaction happening, Commonwealth Bank uses predictive analytics to detect fraud activities for a certain transaction before they are carried out. . Predictive maintenance and monitoring: In order to predict equipment breakdowns, IoT data is used in predictive modeling. Manufacturers install sensors on mechatronic items like autos and factory floor machinery. The sensor data is then utilized to forecast when maintenance and repair work should be performed in order to avert difficulties. Monitoring of drilling rigs, wind farms, oil and gas pipelines, and other industrial IoT installations also uses predictive analytics. Another IoT-driven predictive modeling application is localized weather forecasts for farmers, which are partially based on information gathered from weather data stations with sensors put in farm fields [17].
1.6 Summary Data analytics plays a vital role in today’s business era, it helps to understand every facet of data. Data analytics is further classified into four categories, i.e., descriptive, diagnostic, predictive, and perspective analytics. Descriptive analytics explains about what is in data, whereas diagnostic analytics explains why it is happening. The predictive analytics helps to predict future happening from past data patterns. In this chapter process and application of predictive analytics is discussed in detail.
10
1 Introduction to Predictive Analytics
References 1. Elgendy N, Elragal A (2014) Big data analytics: a literature review paper. In: Industrial conference on data mining, pp 214–227. Springer, Cham 2. Saranya P, Asha P (2019) Survey on big data analytics in health care. In: 2019 International conference on smart systems and inventive technology (ICSSIT), pp 46–51. IEEE 3. Maganathan T, Senthilkumar S, Balakrishnan V (2020) Machine learning and data analytics for environmental science: a review, prospects and challenges. In: IOP conference series: materials science and engineering, vol 955, no 1, p 012107. IOP Publishing 4. Lazarova-Molnar S, Mohamed N, Al-Jaroodi J (2018) Collaborative data analytics for industry 4.0: challenges, opportunities and models. In: 2018 sixth international conference on enterprise systems (ES), pp 100–107. IEEE 5. Sharma A, Pandey H (2020) Big data and analytics in industry 4.0. In: A Roadmap to Industry 4.0: smart production, sharp business and sustainable development, pp 57–72. Springer, Cham 6. Williams G (2011) Descriptive and predictive analytics. In: Data Mining with rattle and R, pp 171–177. Springer, New York, NY 7. Reddicharla N, Ali MA, Cornwall R, Shah A, Soni S, Isambertt J, Sabat S (2019) Nextgeneration data-driven analytics-leveraging diagnostic analytics in model based production workflows. In: SPE middle east oil and gas show and conference. OnePetro 8. Bertsimas D, Kallus N (2020) From predictive to prescriptive analytics. Manage Sci 66(3):1025–1044 9. Wazurkar P, Bhadoria RS, Bajpai D (2017) Predictive analytics in data science for business intelligence solutions. In: 2017 7th international conference on communication systems and network technologies (CSNT), pp 367–370. IEEE 10. Provost F, Fawcett T (2013) Data science and its relationship to big data and data-driven decision making. Big data 1(1):51–59 11. Balusamy B, Jha P, Arasi T, Velu M (2017) Predictive analysis for digital marketing using big data: big data for predictive analysis. In: Handbook of research on advanced data mining techniques and applications for business intelligence, pp 259–283. IGI Global 12. Torvekar N, Game PS (2019) Predictive analysis of credit score for credit card defaulters. Int J Recent Technol Eng 7(1):4 13. Gonçalves F, Pereira R, Ferreira J, Vasconcelos JB, Melo F, Velez I (2018) Predictive analysis in healthcare: Emergency wait time prediction. In: International symposium on ambient intelligence, pp 138–145. Springer, Cham 14. Belarbi H, Tajmouati A, Bennis H, Tirari MEH (2016) Predictive analysis of Big Data in Retail industry. In: Proceedings of the international conference on computing wireless and communication systems 15. Stefanovic N (2014) Proactive supply chain performance management with predictive analytics. Scientific World J 16. Tiwari P, Mehta S, Sakhuja N, Kumar J, Singh AK (2021) Credit card fraud detection using machine learning: a study. arXiv preprint arXiv:2108.10005 17. Fu C, Ye L, Liu Y, Yu R, Iung B, Cheng Y, Zeng Y (2004) Predictive maintenance in intelligentcontrol-maintenance-management system for hydroelectric generating unit. IEEE Trans Energy Convers 19(1):179–186
Chapter 2
Data Acquisition and Preparation
2.1 Exploratory Data Analysis Exploratory Data Analysis (EDA) is the process of understanding, interpreting, and analyzing of the data. There are various ways to understand the data, it may be statistical, graphical, etc. In order to establish a foundation for further analysis, EDA seeks to spot patterns, anomalies, relationships, and other aspects of the data [1]. EDA is a crucial step in the data science process and can aid in raising the calibre of outcomes from statistical models. Data preparation is an important step in data analysis; it includes data cleaning, data transformation, and data reduction. Dataset may consist of noise and duplicates that need to be removed before analysis as it might lead toward wrong results. Similarly data should be on the same scale, hence various transformation methods are used. Data scientists use exploratory data analysis tools and techniques to explore, analyze, and summarize key characteristics of datasets using data visualization methodologies. EDA requires complete understanding of the data, hence data scientists need to know the type of data, various data formats in which data is stored, and techniques of data cleaning, transformation, and reduction [2].
2.2 Types of Dataset Data analysis is based on the type of data and hence understanding data is an important step of data preparation. Data cleaning, and transformation methods are there for data preparation, among those suitable methods can be applied to the data based on the type of data. There are several types of data in data science [3], including: • Numerical Data: the data present in the form of numbers, it is called the numerical data. It consists of numbers, e.g., price of the house, employee salary, etc. It is subdivided into continuous and discrete data. The analysis of continuous data is different from analysis of the discrete data. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. N. Mahalle et al., Predictive Analytics for Mechanical Engineering: A Beginners Guide, SpringerBriefs in Computational Intelligence, https://doi.org/10.1007/978-981-99-4850-5_2
11
12
2 Data Acquisition and Preparation
• Categorical Data: in many application data represents various groups/categories, it is called categories or labels of the data. The labels/categories are subdivided into nominal and ordinal data. The nominal categories are used where categories doesn’t represent any ordering/grading of the data, e.g., gender categories male, female, transgender. The ordinal categories represent the ordering of the data, e.g., examination results presented by first class, and second class labels. • Text Data: in general, most of the data present in the form of words and sentences, is called text data. Various applications surrounding us are analyzing text data, e.g., Google search engines, etc. This type of data is seen in the fields of natural language processing and text mining. • Image Data: image data is the input for all computer vision applications around us. Image data contains pixel values representing color intensity. There are different types of the images like binary, greyscale, RGB, etc. Handling image data includes various stages such as image acquisition, image enhancement, image compression, image segmentation, etc. • Audio Data: we are in digital era where people are more interested in digital assistance systems rather than traditional systems, e.g., Alexa, Siri, Google Audio assistance. In these applications, audio data is analyzed and includes sound waves. This type of data is seen in the fields of speech recognition and audio processing. To analyze audio data one needs to have knowledge of Digital Signal Processing (DSP) with low/high pass filters. • Video Data: every corner of the world is under CCTV surveillance for security and other benefits. The output of CCTV cameras is video data. Video data contains multiple frames. This type of data is seen in the fields of video analysis and computer vision. To analyze video, one needs to have knowledge of filtering, window size, and DSP. • Time-Series Data: in majority of applications data updating as per time and it is called time-series data, e.g., day temperature, stock market, number of patients in COVID-19. Time-Series data includes observations taken at regular time intervals. This type of data is seen in the fields of trend analysis and forecasting. Above mentioned datatypes are most frequently used in data science. Analysis of data will depend on the problem that is in the question.
2.2.1 Categorization Based on Data Formats Data can be subdivided based on formats of data [4], as follows: A. Structured Data: It is stored in a structured or well-defined format such as spreadsheets, database files, csv files, etc. As the name suggests, this type of data has a clear underlying structure where each data has its own space as well as meaning
2.2 Types of Dataset
13
in the dataset. Analysis and processing of structured data is very easy. Structured data can be easily accessed using query languages. You can also use various available tools to sort through structured data. Examples of structured data include: • The data available in the customer database such as customer name, address, and shopping history. • The data available on financial sheets such as sales and purchase data, daily expenses, generated revenue, etc. • The inventory database will include product details, available stock details, costing, etc. • Government statistics and demographic information such as types of public data also come under structured data. Structured data is most commonly used in many fields, such as business, finance, healthcare, and government. It also plays a vital role in the field of data science as it facilitates more detailed and advanced data analysis and modeling. B. Unstructured data: this type of data does not follow any structured or well-defined format. it is usually stored in the form of text documents, images, or audio and video files. As the name suggests unstructured data does not follow any specific format or predefined names. It can be a combination of text and images or other types of data as well. This type of data is bit difficult to analyze because of its structure. Hence you require special tools to extract information from the unstructured data. Examples of unstructured data include: • • • • •
Email messages and attachments. Social media posts and comments. Customer reviews and feedback. News articles and blog posts. Images, audio, and video files.
As more and more data is being generated in an unstructured format, unstructured data is quickly gaining popularity in the field of data science. Unstructured data can be seen in the fields of natural language processing, text mining, and computer vision. In such fields, the main task is to extract meaningful information from the unstructured data. To gain comprehensive understanding of an underlying problem, sometimes structured and unstructured data is used in combination. C. Semi-Structured Data: Semi-structured data is a type of data that has a certain degree of organization, but not as much as fully structured data like emails or XML files. It usually contains both structured and unstructured components and is usually saved in a format that enables some level of arrangement, such as JSON or XML files. Semi-structured data is more organized than unstructured data but less than structured data, which makes it somewhat challenging to analyze and process, but still, it is easier than unstructured data.
14
2 Data Acquisition and Preparation
Examples of semi-structured data include: • Emails have a specific format that includes headers, subject lines, and body text, but they can also have additional elements like attachments and images that are not structured. • Social media posts have a set format for the post and its metadata, but they can also include unstructured data such as text, images, hashtags, and mentions of other users. • Web pages follow a defined structure for HTML and CSS, but they can also have unstructured data like videos and images. The significance of semi-structured data is growing in the field of data science as it allows for the extraction of insights from data that cannot be easily analyzed through conventional structured data methods. To analyze and process semi-structured data, data scientists often use a blend of structured data tools and unstructured data methods.
2.2.2 Categorization Based on Data Source Data can be subdivided based on sources that generate the data, as follows: A. Internal Data: Data that is created and gathered within a company is referred to as internal data. This data is mainly generated from regular business operations and may include employee data, customer information, and sales transactions. Internal data is usually stored in various systems and databases and is more structured than crowdsourced data. Examples of internal data include: • Sales data includes product data, product info, transaction logs, customer information • Employee data, including designation, incomes, and employee information. • Inventory data, including stock levels, supplier information, and pricing data. • Log files and system data, including network and server logs, software usage data, and system performance data. Internal data provides significant insights into a company’s performance and operations, making it an essential resource for data science. With internal data, data scientists can build predictive models, recognize trends and patterns, and make informed decisions that can enhance business performance. Hence it is necessary to make sure that security and accuracy of internal data from unauthorized access. However, it is important to ensure that internal data is accurate, secure, and protected from unauthorized access, as it may contain sensitive information. B. External Data: The information that is gathered from sources outside of an organization, like social media data, economic indicators, and public datasets, is known
2.2 Types of Dataset
15
as external data. This type of data is usually collected from government agencies, industry organizations, or other companies, and can offer a wider understanding of the market, industry trends, and customer behavior, which can complement the internal data of an organization. Examples of external data include: • The information that comprises market data encompasses economic indicators, consumer behavior, and demographic information. • Competitor data, including product offerings, pricing, and market share. • Data derived from social media platforms, such as evaluations, reactions, and analysis of how customers feel toward a product or service. • Information on weather, which includes both past weather patterns and current weather conditions, must be included. • Data that is available to the general public, may include government-provided data such as census or environmental data. In data science, external data is useful in offering a broader view of a problem or occurrence and can help verify or complement internal data. Nevertheless, it is crucial to thoroughly evaluate the credibility and dependability of external data sources since they may be prejudiced or imprecise, and may need extensive cleaning and preprocessing before application. C. Crowdsourced Data: Data collected from a vast number of individuals, which is commonly gathered through surveys, polls, and feedback forms, is known as crowdsourced data. This type of data is typically contributed by a large group of people through online platforms or mobile applications. Unlike traditional data sources, crowdsourced data is generated by individuals who may not have expertise in a particular field or a strong understanding of the data’s structure or format. Crowdsourced data is frequently used in various fields, including marketing, urban planning, and disaster response, where it is crucial to gather data quickly and efficiently from a large group of people [5]. Examples of crowdsourced data include: • Data on traffic is obtained from mobile applications like Google Maps or Waze that are equipped with GPS technology. • The evaluations and feedback are provided by customers on various online platforms like Amazon or TripAdvisor. • Collecting data on disaster response through social media or mobile apps is crucial during emergency situations and cannot be overlooked. • Online platforms are used to distribute surveys and questionnaires to a large number of people. The use of crowdsourced data is beneficial for data science, as it enables the gathering of real-time data from a large population, which can supplement or authenticate other sources of data. Nonetheless, processing and analyzing crowdsourced data
16
2 Data Acquisition and Preparation
could be difficult due to its potential for inconsistency, bias, or errors, necessitating thorough cleaning and validation prior to use. D. Public Data: Public data refers to information that is openly accessible and encompasses different types of data, including geographical data, government statistics, and weather data. This data is generated and collected by public entities, such as government agencies and non-profit organizations, and is usually provided to the general public either for free or for a small fee. Public data plays a crucial role in data science, serving as a valuable resource for research, analysis, and decision-making activities. It provides extensive information on various subjects, including demographics, health, the economy, and the environment. The use of public data in data science can be beneficial as it offers a vast amount of information that can be utilized for various purposes such as solving problems, conducting experiments, and making informed choices. Nevertheless, one should be cautious of the credibility and precision of public data as it may be deficient in certain areas, outdated, or influenced by personal opinions. Moreover, accessing and utilizing public data can be challenging as it may need considerable preparation and reorganization before being suitable for use in data science applications. Data science relies on various sources of data, and it is crucial to consider the quality and origin of data as it can greatly affect the outcome of a data analysis project.
2.2.3 Categorization Based on Perspective of Data Generation Data can also be subdivided based on the way it is generated and perspective of information from data. A. Time-Series Dataset: A set of data that is collected at regular intervals over time is referred to as a time-series dataset. It provides information on a particular subject over time and allows tracking changes in one or more variables. The primary purpose of a time-series dataset is to analyze trends, patterns, and relationships between variables. This type of dataset is commonly utilized in various fields including finance, economics, and meteorology [6]. A time-series dataset is comprised of individual data points that correspond to a specific time or date and the corresponding value of a variable. The variables can include things like sales figures, stock prices, weather patterns, or other relevant characteristics. These data points are usually collected at regular intervals, such as daily, weekly, or monthly. It’s important to include all of this information when discussing a time-series dataset. Examples of time-series datasets include: • The stock market data comprises information about the price of a stock at a particular date and time, including its volume of trades and other relevant features.
2.2 Types of Dataset
17
• In contrast, climate data refers to the environmental conditions at a specific date and time, such as temperature and precipitation, and includes relevant environmental features. • Economic data, on the other hand, involves the value of a specific economic indicator, such as GDP or unemployment rate, at a particular date, along with the relevant economic details. Time-series datasets are advantageous for data science as they offer a complete overview of alterations and patterns over a duration, and they can assist in predicting and projecting. Nonetheless, these datasets may have seasonality, trend fluctuations, and other perplexing variables that can make comprehension and analysis difficult. B. Experimental Dataset: A dataset that is produced through a controlled experiment or intervention is known as an experimental dataset and is primarily utilized to establish causality. This type of dataset is extensively employed in various fields such as biology, psychology, and economics to examine cause-and-effect relationships and to test hypotheses. In an experimental design, individuals are randomly allocated to one or more groups, and each group is exposed to different conditions or treatments. The outcomes of the experiment are then measured and documented, resulting in a dataset that can be utilized to compare the results between groups and to draw inferences about the impact of the intervention. Examples of experimental datasets include: • In medical trials, participants are randomly assigned to receive either a treatment or a placebo, and their progress is monitored over time to evaluate the effectiveness of the treatment. • Marketing experiments involve randomly assigning subjects to various groups and exposing them to different marketing strategies or promotions to determine the most successful approach. • Social science experiments randomly assign participants to different conditions, and the results are measured to analyze the effects of different factors on attitudes, behaviors, or other outcomes. Experimental datasets are crucial in determining cause-and-effect relationships and are frequently used to make decisions in various fields such as policy implementation, product development, or treatment selection. However, such datasets may contain selection bias, confounding factors, and other sources of error, which can make analysis and interpretation difficult. C. Observational Dataset: Observational dataset is a type of information that is gathered through observation without any manipulation of the subjects. This dataset is widely used in various fields such as sociology, epidemiology, and marketing to investigate the link between variables and to draw conclusions about populations. During an observational study, data is collected from subjects without any intervention or manipulation, and the outcomes are measured and recorded. The primary aim
18
2 Data Acquisition and Preparation
of an observational study is to comprehend the relationship between variables and to make conclusions about the population based on the gathered data. Examples of observational datasets include: • The collection of data from individuals through the means of questionnaires or interviews is done in surveys. • In naturalistic observation studies, data is gathered by observing behavior in realworld scenarios. • Epidemiological studies focus on collecting data on health outcomes and risk factors in populations over a period of time. The collection of an observational dataset involves observing subjects in their natural setting, without any interference or manipulation. The purpose of an observational study is to examine the connections between variables and draw conclusions about populations from the data gathered. When obtaining an experimental dataset, it is done through controlled testing or intervention. In this type of study, individuals are randomly assigned to different groups and exposed to various conditions or treatments. After the experiment, the data collected is analyzed to compare the results between the groups and draw conclusions about the effects of the intervention. The key benefit of using an experimental dataset is the ability to establish a causeand-effect relationship. Since the subjects in this type of study are manipulated or intervened, it can be determined if changes in the dependent variable are a direct result of the intervention or due to other factors. The key contrast between observational and experimental datasets lies in their data collection methods and the types of conclusions that can be drawn from them. Experimental datasets enable the identification of causal relationships, whereas observational datasets offer valuable perspectives on the connections between variables. D. Synthetic Dataset: A synthetic dataset is a type of dataset that is artificially generated and not obtained from real-world sources. It is commonly used for testing and developing machine learning models, as well as for experimentation and benchmarking purposes. Synthetic datasets offer the benefit of being easily controlled and manipulated, which allows for a more accurate evaluation of a model’s performance. However, it should be noted that synthetic datasets may not always provide an accurate representation of real-world conditions, and therefore their results should be interpreted with care [7]. There are several methods to generate synthetic datasets: • Sampling: Creating a new synthetic dataset by randomly choosing data points from pre-existing datasets and merging them together without leaving out any information. • Generative Adversarial Networks (GANs): The process of deep learning involves the use of two neural networks, one acting as a generator and the other as a
2.2 Types of Dataset
• • • • •
19
discriminator, to create synthetic data samples that closely resemble the original dataset through training. Simulation: Creating synthetic datasets that accurately depict real-world situations using either mathematical or physical models. Bootstrapping: The process of creating numerous artificial datasets by resampling with replacement from the available dataset. Data augmentation: The process of modifying current data points artificially in order to expand the dataset’s size and introduce diversity. Synthetic Overlapping Data Generation (SODG): A technique that generates artificial datasets by combining random segments of already available data. Synthetic Minority Over-sampling Technique (SMOTE): The process of creating fake examples of the underrepresented group in an uneven dataset in order to achieve a more equal distribution of classes.
E. Big Data: Big Data is a term used to describe extremely large and complex datasets that are generated and processed by both individuals and organizations on a daily basis. These datasets are typically terabytes or petabytes in size and require specialized tools and techniques for processing and analysis. Due to their size, complexity, and fast-moving nature, traditional data processing tools and techniques are unable to handle such datasets [8]. The “4 Vs”—volume, velocity, variety, and veracity—are the characteristics that define Big Data. • Volume: Sheer amount of data generated is represented by volume. • Velocity: Speed at which data is generated and processed is known as velocity. • Variety: Different types of data, such as structured, semi-structured, and unstructured data are referred to as variety. • Veracity: The quality and accuracy of the data is called as veracity. The use of Big Data has allowed organizations to obtain valuable insights, enhance their decision-making processes, and foster innovation. Nevertheless, handling and examining Big Data also pose technical and ethical difficulties, encompassing privacy, security, and data bias. Big Data holds an important role in data science since it provides an enormous amount of data that can be applied to create and train predictive models, conduct exploratory data analysis, and obtain insights into complex systems. In data science, Big Data is usually managed and analyzed using advanced technologies, such as Hadoop, Spark, and NoSQL databases, to deal with the vast amounts and diverse types of data. Furthermore, machine learning algorithms and statistical models can be employed to Big Data to extract patterns and make predictions. The utilization of Big Data has made it possible for data scientists to tackle previously unsolvable issues such as image recognition, natural language processing, and recommendation systems. Nevertheless, working with Big Data also has its difficulties, such as data cleansing and preprocessing, feature selection, and model interpretability. Data scientists must possess a thorough comprehension of the data and analysis methods used, as well as consider ethical and legal concerns such as data privacy and
20
2 Data Acquisition and Preparation
Fig. 2.1 Types of data
bias. Categorization of data based on formats, sources, and perspectives is shown in Fig. 2.1.
2.3 Data Preprocessing The process of preparing raw data for analysis and modeling is known as data preprocessing. It involves cleaning, transforming, and arranging the data in an appropriate format [9], as shown in Fig. 2.2. This step is critical in the data science process because the accuracy and quality of the results are heavily influenced by the quality of the data used. It can be compared to preparing your house for guests. You would clean, eliminate clutter, and arrange things to ensure that your guests feel comfortable and have a great experience. Similarly, in data preprocessing, it’s imperative to ensure that the data is in good condition before commencing with analysis. Hence data preprocessing guarantees results from analysis will be accurate. Some of the most frequently used steps in data preprocessing involves: filling in any missing information, making sure all the data is in the same format, removing any errors or duplicates, and combining data from different sources into one place. Nature and structure of data can influence the steps required for data preprocessing but some simple tasks involve: • Handling missing values: Assigning some values to missing data or eliminating the records having missing data can be helpful as it will provide data that is accurate and complete. • Data normalization: Normalizing or scaling the data within same range can aid some models to remove the influence of feature with larger values.
2.3 Data Preprocessing
21
Fig. 2.2 Data preparation
• Data transformation: To perform better analysis and modeling it is imperative to transform data from one form to another (converting categorical data into numerical data). • Data cleaning: Eliminating the outliers, duplicate data, and error correction will significantly improve the accuracy of the results. • Data integration: To have a comprehensive view of data many times data is gathered from various resources into single dataset. To ensure the quality and accuracy of the results data preprocessing is a vital step in data science. It is implemented multiple times as new data becomes available.
2.3.1 Data Cleaning The process of data cleaning involves inspecting and correcting errors in your data. It is similar to the process of ensuring the accuracy of your facts and information before making important decisions or writing a report. It is essential to verify the reliability and correctness of the data used. Some common tasks in data cleaning involve removing duplicates, filling in missing information, correcting errors, handling unusual or extreme values, and making sure all the data is in the same format.
22
2 Data Acquisition and Preparation
To ensure the accuracy and reliability of the results obtained from statistical models, data cleaning is a crucial process that involves identifying and correcting errors, inconsistencies, and inaccuracies in the data. This process is also known as data cleansing or data scrubbing and plays a significant role in the data science process. Even minor inconsistencies and errors can have a significant impact on the accuracy of the results, making data cleaning an essential step in the analysis process. Data cleaning tasks can include: A. Removing duplicates: Duplicate records can influence results which will result in incorrect results.
B. Handling missing values: Meaningful analysis should not be influenced by missing values and thus should be handled properly. This can be achieved by assigning some values to missing data or eliminating the samples having missing data. The sample in which data is missing for majority of the features then dropping of such sample could be the best solution. This can be done using dropna() fuction in python, while dropping we can decide whether to delete sample if single feature values are missing or values for all features are missing, shown in code screenshot 2.2.
2.3 Data Preprocessing
23
Use fillna() function to fill missing values using use mean, median, and mode of respective feature
Use replace() function to fill missing values using use mean, median, and mode of respective feature.
24
2 Data Acquisition and Preparation
Use forward and backward fill function to fill missing values with valid previous and next observation respectively.
C. Correcting errors: Identification and correction of data will result in accurate results hence it is imperative to correct the errors in the data such as incorrect values, typos, etc. D. Handling outliers: Removing outliers and transforming data are the techniques that can be implemented for handling extreme values which can impact the output. In the data science process, data cleaning plays a significant role in guaranteeing the precision and quality of the outcomes derived from statistical models. It is an essential step that cannot be overlooked.
2.3.2 Data Transformation Data transformation is a crucial process that involves reorganizing or altering the format of data to enhance its accessibility and comprehensibility. This step is crucial when using data to make informed decisions. It’s similar to converting a recipe from metric units to imperial units or reorganizing your computer files for easy access. The objective is to ensure that data is in a suitable format that facilitates effective analysis. Some common tasks in data transformation include: converting text categories into numbers, making sure all the data is on the same scale, combining multiple pieces of data into one, and changing the structure of the data so it’s easier to work with. Data transformation is an essential process that ensures that data is in a suitable format for analysis and prediction. It involves converting data from one format to another to make it usable for analysis and modeling. The primary objective of data transformation is to ensure that data is transformed into a format that is suitable for the specific type of analysis and modeling that will be performed.
2.3 Data Preprocessing
25
Some common data transformation tasks include: • Encoding categorical variables: Converting categorical variables into numerical variables which will be used in statistical models. • Scaling data: Data in same range is vital for some statistical models. • Aggregation: Combining multiple records or data points into a single summary value, such as calculating the mean of a set of values. • Reshaping data: Altering the structure of the data, such as pivoting a data table or aggregating data by group, etc. • Normalization: Converting and transforming data so that it has a mean of zero and a standard deviation of one. Data transformation plays a crucial role in the data science process as it assists in organizing and refining data for analysis and modeling. Its significance lies in the fact that it can greatly influence the outcomes of statistical models. Thus, it is an essential step that cannot be overlooked. A. Normalization • Min-Max Normalization To ensure uniformity of data in a given set, min-max normalization is utilized. This is of great significance as certain computer applications employed for data analysis function more effectively when all the data is standardized. It is similar to comparing the height of individuals in a group, where some might be very tall and others very short. Min-max normalization changes everyone’s height into a range of 0 to 1, where 0 indicates the shortest person, and 1 represents the tallest person. The process involves identifying the smallest and largest values in a set of information, then changing each value into a new value between 0 and 1 by dividing it by the range, which is the difference between the largest and smallest values. x' =
x − min(x) max(x) − min(x)
The technique of min-max normalization is employed in data preprocessing to scale the features of a dataset to a specific range, usually between 0 and 1. This method aims to standardize all the features to a common scale, which is crucial for some machine learning algorithms that are prone to variations in feature scale. However, this technique can be impacted by outliers or unusual values in the dataset, and it is better to use an alternative normalization method in such cases. In min-max normalization, the minimum value of each feature is subtracted from all the values, and the outcome is then divided by the range (maximum value minus minimum value) of the feature.
26
2 Data Acquisition and Preparation
The formula for min-max normalization is as follows: / X' = (X − Xmin ) (Xmax − Xmin ) where X is the original value, Xmin is the minimum value of the feature, Xmax is the maximum value of the feature, and X' is the normalized value. The method of min-max normalization is easy to use and effective, however it can be affected by outliers as even one outlier can significantly affect the feature range. Alternative normalization methods like Z-score normalization or log normalization may be better suited for datasets that have outliers. • Z-Score Normalization v' =
v − μZ σA
μ: mean, σ: standard deviation Z-score normalization refers to a technique used to standardize data in a set of information by bringing all the data onto the same scale. This allows computer programs used for analyzing data to function more efficiently. Z-score normalization works by adjusting the data so that the average value is 0 and the standard deviation is 1, similar to how a person’s height can be converted into a score based on the number of standard deviations from the average height. This is achieved by subtracting the average value from each data point and dividing the result by the standard deviation. The advantage of this method is that it is less sensitive to outliers in the data compared to the min-max normalization method. However, it assumes that the data is normally distributed and may give rise to negative values which may not be wellhandled by some algorithms. Z-score normalization, or standardization, is a data preprocessing method that involves scaling the features of a dataset to a standard normal distribution with a mean of zero and a standard deviation of one. This technique is used to standardize the scale of the features, which is essential for certain machine learning algorithms that are affected by the feature scale. Z-score normalization involves subtracting the mean of each feature from all values and dividing the result by the standard deviation of the feature. The formula for Z-score normalization is as follows: / X ' = (X − Mean) Standarad Deviation where X is the original value, Mean is the mean of the feature, Standard Deviation is the standard deviation of the feature, and X' is the standardized value. Compared to min-max normalization, Z-score normalization is less affected by outliers because it is not influenced by the feature range. Nevertheless, Z-score normalization can generate negative values, which may not be compatible with
2.3 Data Preprocessing
27
some algorithms. Moreover, this technique presupposes that the data has a normal distribution, which may not be true in all situations. • Decimal Scaling v' =
v 10 j
Decimal scaling normalization is a data preprocessing technique that involves scaling a dataset’s features. The main objective of this technique is to standardize the features to a common scale, which is crucial for certain machine learning algorithms that are sensitive to feature scale. It is analogous to measuring a person’s height relative to the tallest individual. The decimal scaling normalization process begins by adjusting the data to ensure that the largest value equals 1. This is achieved by dividing all the values by a power of 10 to normalize the data. The advantage of this approach is that it is easy and effective. However, it assumes that all the data is positive or zero, and may not be suitable if there are negative values. In such cases, alternative normalization techniques such as min-max normalization or Z-score normalization may be more appropriate. The process of decimal scaling normalization involves identifying the highest absolute value in a dataset and then dividing all values by a power of 10. This is done to ensure that the largest absolute value in the dataset is equal to 1. The formula for decimal scaling normalization is as follows: / X ' =X 10k where X is the original value, k is the number of digits in the largest absolute value in the dataset, and X' is the normalized value. Where j is the smallest integer such that Max(|ν’|) < 1. B. Encoding The technique that is most commonly used in the field of data preprocessing to present categorical, nominal as well as numerical data is called encoding. This technique is also used in machine learning algorithms. As the name suggests, the categorical data is used to divide the underlying data into specific categories such as gender (male or female), etc. There are several encoding techniques that can be used, including: When deciding on which encoding technique to use, it is important to consider both the nature of the data and the needs of the specific machine learning algorithm being used. Different algorithms may have varying sensitivities to the order of categories and binary data. Therefore, the selection of an appropriate encoding technique should be based on the specific requirements of the algorithm and the characteristics of the data. Computers do not understand the categories or the names mentioned in the dataset. Hence, in order to make the computer understand encoding can help. This sometimes becomes a crucial part of data science when you are dealing with different types of
28
2 Data Acquisition and Preparation
categories, like the color of the mobile (black, white, purple), the type of animal (mammals, fish, birds), or the categories of the food (high protein food, low protein food), etc. To do this, there are a few different methods, including: • One-hot encoding: This method allots a separate and unique category for each piece of data. Categories are not repeated. For example, if there are three animals in the dataset (cat, dog, rabbit), then automatically three new categories will be created (cat, dog, rabbit). • Ordinal encoding: This method allots a unique number to create categories depending on the order. For example, if there are three priorities in the dataset (low, medium, high), then categories will be numbered as 1, 2, and 3. • Label encoding: This method gives numbers to the categories found in the dataset, but the order of the categories is not considered. For example, if there are three colors in the dataset (red, green, blue), then categories will be assigned numbers like 1, 2, and 3. The method you are going to select depends on two things. First, it will depend on what type of data you are working with and second is the requirements of the computer program that you are using. For some computer programs, the order of the categories is crucial and it must be preserved, while other programs will not take these things into consideration. Some programs are able to deal with the binary data (0’s and 1’s) while others will not be able to. Hence, while selecting the techniques, you will have to first see the type of data you are handling along with the requirements of the program.
2.3.3 Data Reduction Data reduction is a technique used to decrease the size of a large dataset, making it more manageable. This method allows for the retention of crucial information while eliminating superfluous data that is not required. There are different ways to reduce data, including: • Making the data simpler: The process of eliminating unnecessary data that is not relevant to the subject of study is known as “dimensionality reduction.” • Using a smaller sample: The act of utilizing only a segment of the entire data collection, rather than the entire set, can simplify the process of handling the data and reduce the time required for analysis. • Compressing the data: The process involves reducing the size of the data by eliminating duplicated information. This can be achieved without losing significant details or by sacrificing some information while still providing a comprehensive view of the dataset.
2.3 Data Preprocessing
29
The selection of a data reduction technique depends on the nature of data and the research objectives. Different methods are suitable for different types of data and studies. It is essential to choose the appropriate method that fits the requirement of the specific situation. In data science, data reduction is utilized to decrease the size of a dataset while maintaining significant information. The objective is to simplify the data and facilitate its handling while retaining its vital characteristics. There are several data reduction techniques that can be used, including: • Dimensionality reduction: The process of reducing the features or variables in a dataset is commonly employed to handle high-dimensional datasets. This method is particularly useful when a large number of features are present, but not all of them are essential for addressing the problem at hand. • Sampling: The process of sampling is employed to decrease the dataset’s size by choosing a smaller portion of the information. Sampling can be either random or systematic and is beneficial in terms of reducing the resources such as time and memory needed to manage the data. • Data compression: The purpose of this method is to decrease the amount of data in a set by getting rid of repetitive details. There are two types of data compression: lossless, which enables the complete recovery of the original information, and lossy, which leads to the loss of some information. The selection of a data reduction method is dependent on the properties of the data and the objectives of the problem at hand. Particular techniques may be better suited for datasets with high dimensions, while others are more suitable for larger datasets. The choice of technique should be based on the data characteristics and problem requirements. A. Sampling Sampling is a technique that helps in reducing the size of a large dataset, thereby making it simpler to analyze. Instead of examining the entire data, only a part of it is considered. This approach ensures that crucial information is not overlooked, while making it more manageable to work with and comprehend. There are two main types of sampling: • Random sampling: When dealing with large datasets that have multiple types of information, it is advisable to randomly select data points to ensure that no information is left out. • Systematic sampling: When selecting data points, it is important to follow a certain pattern. For instance, you can opt for selecting every 11th data point. This approach is ideal for datasets that have a discernible pattern, such as those gathered over a period of time.
30
2 Data Acquisition and Preparation
The choice of sampling method depends on the available data and the research objective. While both sampling methods can be advantageous, one may suit a particular situation better. The selection of data for sampling should be such that it accurately represents the entire dataset. B. Data compression Data compression refers to a method utilized in the reduction of data to minimize the size of a large dataset. The primary aim is to retain essential information while decreasing the volume of data that is required to be handled and saved. There are two main types of data compression: • Lossless compression: When data needs to be preserved completely without any loss of information, a compression method is used that reduces the data size without compromising any information. This technique is particularly valuable when all the data must be retained. • Lossy compression: When dealing with large amounts of data, it can be advantageous to use compression techniques that eliminate less critical information, resulting in a smaller overall file size. This approach is particularly beneficial when the data is extensive, and the removal of certain details will not significantly impact the analysis results. When it comes to choosing a data compression method, it is important to take into account the specific requirements of the analysis as well as the unique characteristics of the data. If the data needs to be preserved exactly as it is, then lossless compression is the way to go. On the other hand, if the data has extraneous information that can be removed without affecting the analysis results, then lossy compression is the better option. Data compression is generally a useful technique to reduce the size of a dataset for storage and processing purposes. However, it is crucial to exercise caution when using data compression as it may introduce errors into the data, which can ultimately affect the accuracy of the analysis results. C. Dimensionality Reduction The method of dimensionality reduction is frequently employed in data science to simplify intricate datasets by decreasing the number of variables or characteristics that define the data. The main objective is to extract crucial information from the data while minimizing the amount of data that needs to be analyzed and processed. This technique is commonly used to simplify data and make it more manageable by minimizing the number of elements used to describe it, which is necessary when there is an excessive amount of information in the data, resulting in difficulties in analyzing and comprehending it. There are several ways to do dimensionality reduction: • Picking the most important information: Retain only the most important information and eliminate out the rest. • Making new information: Creating new information that better describes the data while retaining original information.
2.3 Data Preprocessing
31
• Changing how the information is organized: The process includes rearranging the data in a manner that simplifies its comprehension and usage. The approach utilized for analyzing data relies on the specific details of the data. While dimensionality reduction can simplify data processing, it must be approached with caution to avoid the risk of losing significant information. Dimensionality reduction can be achieved through several methods such as: • Feature selection: To simplify the analysis process, it is necessary to choose the significant variables or features that accurately represent the data and disregard the others. This approach is particularly beneficial when a large amount of data has numerous variables or features that are not pertinent to the analysis. • Feature extraction: The process of generating fresh variables or attributes based on the already existing ones that can effectively represent the significant information contained in the data. This method comes in handy when the data exhibits intricate relationships between variables or attributes that require simplification. Feature selection To simplify complex datasets, it is common to use a method of dimensionality reduction that transforms the data into a new coordinate system which captures the most significant information. However, the choice of which dimensionality reduction method to use depends on the characteristics of the data and the requirements of the analysis. Although reducing the number of variables or features can make working with complex datasets easier, it can also introduce bias and lose important information. Therefore, feature selection is important to choose the most relevant information in a dataset that will help answer research questions or solve problems. There are different ways to pick which information is important: • Ranking: In ranking order of information is crucial. • Testing: While choosing the best machine learning model testing can help you to try out different combinations. • Automatically: Automation involves selection of vital information for machine learning process with the help of machine learning model. The selection of a method for handling data depends on the purpose for which the data is being used. Opting for the appropriate information can enhance the accuracy of analysis, decrease the likelihood of errors, and simplify comprehension of the model’s functioning. In data science, feature selection involves selecting a portion of the variables or features in a dataset to analyze. The aim of this process is to pinpoint the most significant and essential variables or features that can aid in addressing research questions or resolving the current issue. There are several approaches to feature selection including: • Filter methods: Ranking features or variables based on specific criteria, and choosing top variables are tasks involved in filter methods. One such method is the filter approach that prioritizes relevant information in a dataset based on
32
2 Data Acquisition and Preparation
its association with the problem statement. The information that bears the most relevance is deemed the most significant and is employed in the analysis. This method is simple and efficient, making it an ideal initial step when dealing with a large amount of data. Nevertheless, it may not consider the interaction between different pieces of information and may overlook pertinent data that is not directly related to the problem but is crucial in solving it. In short while selecting which information is useful filter method can be used especially when you are dealing with a ton of information. It will help in better and faster analysis. Filter method is one of the techniques used in data science to select features, where features are ranked by their statistical score or correlation measure, and only the topranked features are considered for further analysis. Multiple methods like mutual information, chi-squared test, or correlation coefficient can be used to calculate the score. In this method, the features are assessed independently of the target variable and ranked based on their relationship with the target variable. The features with the highest score are chosen for analysis as they are considered most relevant. The filter method has the advantage of being simple and fast to implement, and it can also be used as a preprocessing step to reduce the number of features before training a machine learning model. The disadvantage is that consideration of interaction between features will in turn result in missing of main features which have poor relationship with output variable required for prediction. In summary, in case of large datasets with multiple features filter method can be used for feature selection purpose. It will result in the reduced dimensionality of the dataset as well as enhanced performance of machine learning model. • Wrapper methods: The method involves examining a group of factors or characteristics by teaching a machine learning model using various subsets of factors and picking the most effective subset based on its performance. The wrapper method is a technique used to identify the most significant pieces of information in a dataset to solve a problem. It employs a machine learning model to assess the effectiveness of different combinations of information. The primary objective is to discover the best combination that yields the desired results. The algorithm initially attempts all the information combinations and then tries smaller ones to determine their effectiveness. This process is repeated with fewer and fewer information combinations until the optimal combination is determined. The wrapper method is advantageous because it considers how various pieces of information interact and affect the problem being addressed. However, it is also more time-consuming and complex than other methods. In conclusion, the wrapper method is a valuable tool for selecting crucial information when dealing with a large amount of data and understanding how various pieces of information may function.
2.3 Data Preprocessing
33
The wrapper method is advantageous because it considers the interaction between features, their relationship with the target variable, and the specific machine learning algorithm used for predictions. However, this method is computationally expensive and time-consuming, and it can be challenging to determine the optimal number of features to select. The wrapper method is a powerful strategy for feature selection, particularly when the interaction between features is crucial. Nonetheless, its computational demand should be taken into account, and it is essential to have a clear understanding of the machine learning algorithm before applying this method. • Embedded methods: The technique consists of incorporating feature selection into the machine learning algorithm. As the system undergoes training, the model will automatically pinpoint the most pertinent variables or characteristics. The process of selecting the appropriate feature selection method relies on various factors such as the type of problem, data characteristics, and analysis requirements. Utilizing a feature selection technique can enhance model performance, minimize overfitting, and increase the model’s interpretability. Feature extraction Feature extraction involves taking the information in a dataset and constructing new pieces of information that can assist in better problem-solving, similar to developing new ingredients that work together in a recipe. The goal of feature extraction is to keep the most important information and get rid of information that isn’t needed. This makes the solution to the problem simpler and easier to understand. There are many ways of performing feature extraction for example, searching for crucial parts of information and transforming them into new information, or collecting and combining parts of information to create new information. Hence feature extraction is vital step before use of any machine learning model. With the help of raw information the better solutions are designed which in turn performs better. In data science, feature extraction refers to the process of deriving new features from existing ones in order to enhance the performance of a machine learning model. This involves transforming raw data into a more useful representation that can be used in a machine learning algorithm. The main objective of feature extraction is to capture the most relevant and significant information in the data while eliminating any redundant or irrelevant information, which can lead to a more efficient and effective machine learning model, reducing overfitting and improving the interpretability of the results. Feature extraction incorporates various techniques, including dimensionality reduction, feature scaling, and feature selection, with common methods such as principal component analysis (PCA), linear discriminant analysis (LDA), and independent component analysis (ICA).
34
2 Data Acquisition and Preparation
To put it in brief words, feature extraction is a crucial step in data preprocessing, before giving it to machine learning model. Feature extraction helps in the transformation of the data (raw data to more useful data) which in turn helps in enhancing the efficiency and performance. • Principal Component Analysis (PCA) PCA, which stands for Principal Component Analysis, is a statistical method utilized in data science to simplify a dataset by transforming it into a lower-dimensional space while maintaining the original information as much as possible. It involves identifying the directions in which the data is most variable and then representing it in a new coordinate system that aligns with these directions, known as principal components. The first principal component captures the most significant information, followed by the second, and so on. By retaining only the initial few principal components, most of the original data can be preserved while reducing its dimensional complexity. PCA is commonly used as a preprocessing approach for machine learning algorithms, particularly for visualizing high-dimensional data and eliminating noise and redundancy. To put it briefly, PCA is an effective method that simplifies a dataset by converting it into a space with fewer dimensions, while preserving as much of the initial data as feasible. This technique can enhance the efficacy of machine learning algorithms and facilitate the interpretation and examination of the data. The following steps are taken for computing principal components for feature extraction: • Standardize the data: The transformed data must have zero mean and unit variance, as sensitivity of PCA toward the scale of variables. • Compute the covariance matrix: The relationship between each pair of variable in the dataset is represented by a square matrix referred as covariance matrix. It represents the variation of each variable as compared to all variables. • Compute the eigenvectors and eigenvalues of the covariance matrix:Eigenvectors represents the directions along which there is most variation in the data. Eigenvalues shows variance amount along each eigenvector. • Sort the eigenvectors and eigenvalues by the size of the eigenvalues: Most information in the dataset is captured by eigenvectors with the largest eigenvalues. • Choose the top k eigenvectors:Retain only top k eigenvectors having largest eigenvalues, where k represents desired number of dimensions for reduced data. These eigenvectors will create new coordinate system for reduced data. • Project the data onto the new coordinate system: Multiplication of new coordinate system with matrix of eigenvectors will transform original data. This will give a lower dimensional representation of underlying data which will be used for further data analysis as well as feature extraction. To put it in brief words, the calculation of principal components for feature extraction includes transforming the data, computing the covariance matrix, finding the
2.3 Data Preprocessing
35
eigenvectors and eigenvalues, sorting them by size, choosing the top k eigenvectors, and projecting the data onto the new coordinate system. • Linear Discriminant Analysis (LDA) Linear Discriminant Analysis (LDA) is a technique to make data more comprehensible (simplify) in order to understand and analyse it. This is achieved by finding the vital features of the dataset that separate groups in the dataset. Here’s how LDA works: • • • • •
Calculate average of each feature for each group Calculate distribution of the data within each group. Calculate the spread between the distribution of the group. Calculate the eigenvectors. Select the vital features and eliminate the rest.
The new transformed dataset contains only vital features of the dataset. Linear Discriminant Analysis (LDA) is a technique utilized in machine learning and statistical pattern recognition to reduce the number of features in a dataset while still maintaining essential information about different groups. It differs from other methods like PCA by utilizing group information to determine the most significant features. The main objective of LDA is to identify the most relevant variables that can effectively differentiate between various classes or categories in a dataset. Here’s how LDA works: • First, calculate the mean vectors of each class by finding the average of each feature for every class. • Compute the within-class scatter matrix, which shows the distribution of data points within each class. • Calculate the between-class scatter matrix, which shows the distribution of data points between classes. • Use linear algebra to find the eigenvectors and eigenvalues of the scatter matrices. • Keep only the top k eigenvectors with the largest eigenvalues, and they will form the new coordinate system for the reduced data. • Finally, project the data onto the new coordinate system by multiplying it by the matrix of eigenvectors, resulting in a lower-dimensional representation of the data with k variables instead of the original number of features. LDA is beneficial in situations where there are numerous features, and the objective is to reduce the size of the data while keeping the most important information regarding the class labels. Unlike PCA, LDA is a supervised method and utilizes the class labels to identify the most relevant variables for class separation. • Independent Component Analysis (ICA) Independent Component Analysis (ICA) is a tool that simplifies complex data by breaking it down into its fundamental parts. For instance, imagine sorting a mixed-up jigsaw puzzle into various piles based on colour or pattern.
36
2 Data Acquisition and Preparation
ICA can help by identifying the independent patterns or colors that form the data components, which are uncorrelated with one another, thereby enabling better comprehension of the data’s underlying structure. ICA can be applied in many fields such as segregating multiple sounds from audio, dividing the images in different parts, separating background noise from speech, identifying the edges of an image, etc. To separate signals into independent sources, Independent Component Analysis (ICA) is used. It is a statistical technique used in data science. Hidden information in data can be uncovered using ICA. ICA operates under the assumption that the signals present in a dataset are a combination of independent signals, and it discovers these independent signals by minimizing their mutual dependence. This process results in new signals or features that are not correlated with one another, and this can be useful for understanding the underlying structure of the data. ICA is employed in various fields like audio and image processing, speech recognition, and EEG signal analysis. It can be used to disentangle sounds or images into their independent components, like segregating speech from background noise in audio or separating textures and edges in images. In data science, PCA, LDA, and ICA are all techniques used to reduce the dimensionality of data and extract features. However, they have different underlying assumptions and are used for different purposes: PCA is an unsupervised technique that aims to determine the directions in the data that capture the highest variance. It is used to locate correlations and patterns within the data while retaining as much data as feasible by decreasing its dimensionality. LDA on the other hand, is a supervised method utilized for classification. It employs class labels to discover the most effective directions in the data to optimally distinguish different classes. It is helpful in identifying the most distinguishing features in the data and improving the performance of classifiers. ICA is an unsupervised method that presumes that the signals in the data are linear combinations of non-Gaussian signals that are independent. It is used to segregate the independent signals in the data and expose the data’s hidden structure. PCA is used for dimensionality reduction, LDA is used for classification, and ICA is used for signal separation.
2.4 Tools There are several tools available for data acquisition and preparation, including: • ETL (Extract, Transform, Load) tools are used to collect data from multiple sources, convert it into an appropriate format, and record the data in a data warehouse or other database. Popular resources in this category include Talend [10], Informatica [11], and Alteryx [12].
References
37
• Data scraping tools: These tools can be used to extract data from websites, APIs, and other online sources. Examples include Beautiful Soup [13], Scrapy [14], and Octoparse. • Data wrangling tools: These tools help clean, transform, and process data. Examples include Trifacta [15], OpenRefine [16] and Alteryx. • Data visualization tools: These tools are used to examine and visualize the data, assisting in the discovery of trends and connections. Tableau [17], QlikView, and Power BI[18] are a few examples. • Data storage tools: These tools offer a means of managing and storing large amounts of data. Hadoop, NoSQL databases like Cassandra and MongoDB, and SQL databases like MySQL and PostgreSQL are a few examples. • Data profiling tools: These tools examine the data to find problems with data quality and recommend ways to make the data better. Google Cloud Dataprep, DataRobot, and Talend Data Quality are a few examples.
2.5 Summary Understanding and preparation of the data is most crucial step in the data analysis as wrong data can lead to wrong predictions. Data preparation includes data cleaning, data transformation, dimensionality reduction, etc. In this chapter data preparation techniques are discussed in the details. Data preparation techniques varies based on the data type and source of the data, in this view, various types and sources of data are also discussed in this chapter.
References 1. Data MC, Komorowski M, Marshall DC, Salciccioli JD, Crutain Y (2016) Exploratory data analysis. Secondary analysis of electronic health records, 185–203 2. Shinde GR, Majumder S, Bhapkar HR, Mahalle PN, Shinde GR, Majumder S, ... Mahalle PN (2022) Exploratory data analysis, pp 97–105. Springer Singapore 3. ]Mahalle PN, Shinde GR, Pise PD, Deshmukh JY, Mahalle PN, Shinde GR, ... Deshmukh JY (2022) Data collection and preparation. Foundations of data science for engineering problem solving, 15–31 4. Eberendu AC (2016) Unstructured data: an overview of the data of big data. Int J Comput Trends Technol 38(1):46–50 5. Barbier G, Zafarani R, Gao H, Fung G, Liu H (2012) Maximizing benefits from crowdsourced data. Comput Math Organ Theory 18:257–279 6. Shrestha MB, Bhatta GR (2018) Selecting appropriate methodological framework for time series data analysis. J Finance Data Sci 4(2):71–89 7. Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2013) A review of feature selection methods on synthetic data. Knowl Inf Syst 34:483–519 8. Duan Y, Edwards JS, Dwivedi YK (2019) Artificial intelligence for decision making in the era of big data–evolution, challenges and research agenda. Int J Inf Manage 48:63–71 9. Brownlee J (2020) Data preparation for machine learning: data cleaning, feature selection, and data transforms in Python. Machine Learning Mastery
38 10. 11. 12. 13. 14. 15. 16. 17. 18.
2 Data Acquisition and Preparation https://www.talend.com/ https://www.informatica.com/ https://www.alteryx.com/ https://realpython.com/beautiful-soup-web-scraper-python/ https://scrapy.org/ https://www.trifacta.com/ https://openrefine.org/ https://www.tableau.com/ https://powerbi.microsoft.com/en-us/
Chapter 3
Intelligent Approaches
3.1 Overview The term “intelligent approaches” in the context of soft computing generally refers to methods that involve the use of artificial intelligence (AI) techniques to solve problems. Soft computing [1] is a branch of AI that focuses on the development of intelligent systems that can handle uncertainty, imprecision, and partial truth, among other things. Intelligent approaches are increasingly being applied in soft computing to improve the efficiency and effectiveness of various tasks. Soft computing refers to a collection of computational techniques that include fuzzy logic, genetic algorithms, and neural networks, among others. These techniques are used in situations where traditional “hard” computing techniques are inadequate, impractical, or unreliable. In soft computing, the use of intelligent approaches involves the application of artificial intelligence techniques, such as machine learning and data analytics, to improve the performance and accuracy of soft computing techniques. For instance, intelligent continuous software process management and improvement use artificial intelligence techniques for building intelligent systems that can identify inefficiencies in software testing or development. In the field of healthcare, hybrid intelligent approaches combining soft computing techniques like fuzzy logic and genetic algorithms can be used for practical applications like smart energy management. Therefore, intelligent approaches in soft computing are becoming increasingly important as they offer efficient and effective solutions in situations where traditional hard computing methods are inadequate. The use of soft computing along with intelligent approaches can significantly improve performance and accuracy, making them suitable for a range of applications in various fields. Intelligent approaches in soft computing include techniques such as neural networks, fuzzy logic, and genetic algorithms, to name a few. These methods seek to emulate human-like intelligence by using algorithms that can learn from data, adapt to changing environments, and make decisions based on complex information. Overall, the goal of intelligent approaches in soft computing is to develop systems that can © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. N. Mahalle et al., Predictive Analytics for Mechanical Engineering: A Beginners Guide, SpringerBriefs in Computational Intelligence, https://doi.org/10.1007/978-981-99-4850-5_3
39
40
3 Intelligent Approaches
perform complex tasks such as image recognition, natural language processing, and decision-making with speed, accuracy, and efficiency. Soft computing is a field of computer science that deals with approximate reasoning, imprecision, uncertainty, and partial truth. The following are some examples of intelligent approaches in soft computing: 1. Fuzzy logic: This approach deals with reasoning that is approximate rather than precise. It is often used for problems that have some degree of uncertainty or imprecision. 2. Neural networks: These are computational models influenced by the structure and function of the brain. They learn from data and can be used for tasks like classification, prediction, and pattern recognition. 3. Genetic algorithms: This approach uses evolutionary principles to search for solutions to complex problems. It involves creating a population of candidate solutions and using selection, crossover, and mutation to evolve new and better solutions. 4. Swarm intelligence: This approach is inspired by the behavior of social animals like ants and bees. It involves creating a population of agents that interact with each other and with their environment to solve problems. 5. Meta-heuristics: This is a general term for algorithms that can be used to solve optimization problems. Examples include simulated annealing, tabu search, and particle swarm optimization. Intelligent approaches in soft computing have been applied in various fields such as healthcare, industrial measurement systems, and continuous software development. Many researchers and practitioners are working on developing new methods and applications in this field. The use of machine learning in predictive maintenance is becoming increasingly common in various industries. Predictive maintenance seeks to identify potential equipment issues before they result in unplanned downtime, which can be costly for businesses. Machine learning algorithms can be used to analyze time-series data from machines and other equipment to identify patterns that may indicate potential failures. These algorithms can detect patterns that may be difficult for humans to identify, allowing for earlier detection of issues and proactive maintenance. Additionally, machine learning models can be trained on historical data to predict future failures and recommend appropriate maintenance actions. This can help to optimize maintenance schedules and reduce costs associated with unplanned downtime and unnecessary maintenance. Overall, the use of machine learning in predictive maintenance can help businesses to maximize the lifespan of equipment, optimize maintenance schedules, and reduce costs associated with unplanned downtime and unnecessary maintenance. More popular area of machine learning-guided applications is health sciences. There is significant research and development in the area of using machine learning for directed evolution, or optimizing the design and function of biological systems through iterative experimentation and computational analysis. In [2] the authors describe the use of machine learning to optimize the design of a protein sensor for detecting serotonin. In [3], the authors describe the use of a machine-learning
3.2 Conventional Learning Techniques
41
approach to optimize the design of molecular complexes for applications in chemistry. In [4], the authors describe the use of machine learning to guide experiments in the directed evolution of proteins. In [5], the authors describe the use of a machine-learning approach for predicting the evolution of biological traits.
3.2 Conventional Learning Techniques Machine learning is required when we have complex problems that cannot be easily solved with traditional rule-based programming. Machine learning algorithms are designed to learn patterns in data and make accurate predictions or decisions based on that learning. In situations where humans are unable to create rules or algorithms that can accurately solve a problem, machine learning can be used to analyze data and identify patterns that can help make predictions or decisions. Machine learning is also useful when the problem at hand involves large amounts of data that would be difficult for humans to parse and analyze manually. There are a variety of conventional machine learning techniques that are commonly used to process data and derive patterns or predictions. Some examples of these techniques include: . Linear regression refers to a statistical technique that uses a linear equation to analyze the connection between a dependent variable and one or more independent variables. It aims to model the relationship between these variables by fitting a linear equation to the available data. . Logistic regression: A statistical method used to analyze binary outcomes, such as dead/alive or pass/fail. . Decision tree: A tree-like model used to classify data based on a set of decision rules. . Support Vector Machine (SVM): A methodology that separates data points in multidimensional space by constructing hyper-planes. . Naive Bayes: A type of probabilistic model that makes predictions based on the probability of an event occurring given the evidence. . kNN: A classification method based on similarity measures that identifies the k-nearest data points to a target data point to determine its classification. . K-Means: A clustering algorithm that partitions data into K clusters based on similarity measures. . Random forest: A method that constructs multiple decision trees and then combines their predictions to make a final prediction. These are just a few examples of the conventional machine learning techniques that are available. There are many others as well, and the choice of technique depends on the specific problem being addressed and the nature of the available data. However, there are lots of ethical concerns while using machine learning; it is also not suitable to address deterministic problems, use cases where there is lack of data and interoperability issue is there, machine learning is not suitable, it also underperforms with the growing dataset size and it also lacks with the explainability. Although, machine
42
3 Intelligent Approaches
learning has been an emerging paradigm shift to address several soft computing problems, there are many situations where machine learning algorithm does not work as per the requirements [5]. The design issues of machine learning algorithms can vary depending on the specific algorithm and its intended use. For any machine learning algorithm, the lack of good data is the key design issues as due to this algorithm cannot learn effectively and may produce inaccurate or biased results. Another design issue is over-fitting, which happens when the algorithm is too complex and fits the training data too closely. This can lead to poor performance when predicting new data. Conversely, under-fitting happens when the algorithm is too simple, and cannot capture the complexity of the data, leading to poor performance on both the training and testing data. Machine learning algorithms can also suffer from bias, which means it produces predictions that are systematically off from the true values. This can be due to various factors, such as sampling bias, feature bias, or algorithmic bias. Some design issues may arise due to the lack of interpretability of the machine learning models. Black-box models, for example, can produce accurate predictions, but it can be difficult to understand how it arrived at those results. In conclusion, designing effective machine learning algorithms requires careful consideration of these and other issues, as well as ongoing testing and refinement to ensure that the algorithms perform well in various contexts. Machine learning algorithms fail, there can be several different causes and some of the most common reasons include: 1. Insufficient or poor quality data: Machine learning algorithms require large datasets with high-quality data to work effectively. If the data is incomplete, biased, or noisy, it can lead to poor model performance. 2. Over-fitting: If a machine learning model is trained too closely on a particular dataset, it may become overfit and therefore, not be able to generalize to new or unseen data. 3. Unrepresentative data: A machine learning model may fail if the training data is not representative of the real-world scenarios the model will encounter. 4. Algorithmic bias: If a machine learning model is trained on biased data, it may perpetuate and amplify the biases inherent in that data. 5. Incorrect assumptions: Sometimes assumptions about the data or the underlying problem are incorrect, leading to a mismatch between the problem and the model. 6. Limited control of learning algorithm: Adversaries have limited control and incomplete knowledge of the features of learning algorithms. It is important to investigate the cause of the failure carefully and address the underlying issue to improve and optimize the model’s performance.
3.3 Deep Learning Deep learning techniques [6] are essential in situations where the amount of data is very large and traditional machine learning algorithms may not be able to accurately capture the underlying relationships between the data. This is because deep learning
3.3 Deep Learning
43
algorithms use multiple layers to learn and represent the data in a hierarchical way, allowing them to identify complex patterns and relationships that may be difficult or impossible for traditional machine learning algorithms. Additionally, deep learning techniques are particularly useful in areas such as computer vision, natural language processing, and speech recognition, where the data can be very high-dimensional and complex. These techniques can help to achieve state-of-the-art performance in these fields, often surpassing human-level performance. Overall, the need for deep learning techniques arises when traditional machine learning algorithms fall short due to the complexities and characteristics of the data. Deep learning techniques typically involve training artificial neural networks with large amounts of data, allowing them to learn complex patterns and relationships within that data. The neural network is initially configured with random weights and biases, and then trained on a set of training examples, where the correct output is known for each input. During training, the network’s weights and biases are adjusted to minimize the difference between its predicted output and the correct output. Deep learning techniques can involve various types of neural networks, including convolutional neural networks (CNNs) for image or video processing, recurrent neural networks (RNNs) for sequence data like speech or text, and generative adversarial networks (GANs) for generating new data that resembles the training data. Many deep learning models are trained using a process called back propagation, in which the error of the network’s output compared to the expected output is propagated backward through the layers of the network to adjust the weights and biases to improve the output of the network. Overall, deep learning techniques involve using large amounts of data and powerful computing resources to train complex models that can make accurate predictions and classifications. Deep learning has many applications in fields like computer vision, natural language processing, and robotics. The main objective of deep learning techniques is to build better model as compared to the machine learning techniques. However, there are several challenges associated with deep learning [7]. Key challenges include lack of flexibility and multitasking, dynamic and frequently changing incremental data, dependency on the preciseness of data, abundant of big data with poor quality, under-fitting, over-fitting, and vulnerability to the spoofing.
44
3 Intelligent Approaches
Deep learning models are typically composed of multiple layers of interconnected neurons. Here is a brief description of some of the commonly used layers in deep learning: . Input Layer—The first layer in a neural network is the input layer. It receives the input data and passes it through to the next layer. . Convolutional Layer—This layer performs convolution operations on the input data. It applies a set of filters to the input data, sliding them over it to produce a set of feature maps. . Pooling Layer—This layer reduces the size of the feature maps produced by the convolutional layer by down-sampling or subsampling them. This helps to reduce the number of parameters in the model and prevent over-fitting. . Fully Connected Layer—Also known as a dense layer, this layer connects every neuron in the previous layer to every neuron in the current layer. It is typically used toward the end of the model to perform classification or regression. . Dropout Layer—This layer randomly drops out some of the neurons in the previous layer during training. This helps to prevent over-fitting and improves the generalization of the model. . Batch Normalization Layer—This layer normalizes the output of the previous layer by subtracting the mean and dividing by the standard deviation. It helps to improve the performance and stability of the model. Note that these are just some of the commonly used layers in deep learning, and there are many more variations and types of layers that can be used depending on the specific problem and data. The high-level picture of deep learning is depicted in Fig. 3.1. The difference between machine learning and deep learning is depicted in Table 3.1 for user better understanding. This difference will help reader to decide the application of these learning techniques to various use cases.
Fig. 3.1 Working of deep learning
3.4 Deep Neural Networks
45
Table 3.1 Difference between machine learning and deep learning Sr. no
Machine learning
Deep learning
1
Machine learning takes decision on past data
Deep learning takes decision on artificial neural networks
2
It requires low data for training and model building
It requires huge data for training and model building
3
Machine learning algorithms can be implemented on low-end systems
In contrast, deep learning algorithms require high-end system and computing infrastructure for implementation and deployment
4
Divide and conquer strategy is used to solve the problem. The problem is divided into parts, they are solved and then the method is devised to combine them into the solution of original problem
In deep learning applications the problem is solved in end-to-end manner by following end-to-end workflow
5
Testing phase requires more time in machine learning Testing phase requires less time in applications deep learning applications
6
Crisp rules and past data are used to take decision
Decision is taken based on multiple factors and hence it is difficult to interpret
7
It has less scalability
It has better scalability
3.4 Deep Neural Networks Deep neural networks (DNNs) are a class of artificial neural networks that are composed of more than two layers of interconnected nodes, allowing them to model and learn from complex patterns in data. The layers of a DNN consist of an input layer, one or more hidden layers, and an output layer. In contrast to shallow neural networks, which typically have only one or two hidden layers, DNNs can have a much greater depth, sometimes on the order of dozens or even hundreds of layers. This depth enables DNNs to perform highly complex computations and learn abstract features from data, making them highly effective in a wide range of applications, such as image and speech recognition, natural language processing, and game playing. DNNs are artificial neural networks with multiple layers between the input and output layers. They work by using a process called forward propagation or feed forward, where the input data is passed through the layers of the network, and the output is generated based on the learned weights of the connections between neurons. During training, the network adjusts the weights of the connections between neurons in small increments based on the difference between the actual output and the desired output. This process, called back propagation, allows the network to learn and improve its accuracy over time as it is exposed to more training data. Each layer in a DNN is composed of many individual neurons, which are connected to neurons in the previous layer and the next layer. Each connection between neurons
46
3 Intelligent Approaches
has an associated weight, which is adjusted during training to improve the accuracy of the network’s output. DNNs can be used for a variety of tasks, including image and speech recognition, natural language processing, and autonomous driving. They have been shown to be particularly effective at tasks like object recognition in images and language translation. DNNs are able to learn complex patterns and relationships in data, making them a powerful tool for a wide range of applications in machine learning and artificial intelligence. There are many challenges in using DNNs and are listed below: 1. Lack of transparency: “black box problem”—deep learning algorithms are often difficult to interpret and understand. 2. Supervised learning: requires large amounts of labeled data which can be difficult and expensive to obtain. 3. Solving inverse problems: neural networks have limitations solving certain classes of problems. 4. Network Compression: reducing the size and computational requirements of deep neural networks. 5. Data Scope: reducing the size of the input data, and handling the large amount of data deep learning requires. 6. Applicability: deep learning may not be suitable for all kinds of data and tasks 7. Interpretation: deep neural networks are often difficult to interpret and explain their decision-making process. These are just a few of the challenges associated with using deep neural networks, and there may be other challenges depending on the specific application or problem you are trying to solve.
3.5 Applications In comparison with other techniques, DNNs usually have large number of layers and they are commonly used in supervised, semi-supervised, or unsupervised learning applications. Optimization is a critical component of DNN and they utilize convolutional layers that apply kernels to input data to extract features for classification. DNN can learn representations of features at multiple levels of abstraction. Abundant resources and ability to extract global sequence features are other important features of DNN. Developing using deep neural networks involves designing, training, and evaluating neural network models to solve various machine learning problems. Here are some general steps to follow when developing a deep neural network: . Define the problem: Determine the specific machine learning problem that you want to solve using a neural network. For instance, you could be interested in text classification, image classification, regression, time-series prediction, or anomaly detection.
3.5 Applications
47
. Choose the architecture: Select the most appropriate neural network architecture for the problem at hand. Some of the common deep neural network architectures include convolutional neural networks (CNNs), recurrent neural networks (RNNs), and deep belief networks (DBNs). . Prepare the data: Gather and preprocess the data to make it suitable for feeding into the neural network. Depending on the problem, this could involve tasks such as scaling, normalization, encoding, and splitting into training, validation, and testing sets. . Train the model: Use the prepared data to train the neural network with the chosen architecture. This typically involves optimizing a loss function using gradient descent and back propagation. . Evaluate the model: Validate and test the trained neural network using appropriate evaluation metrics. This is important to determine if the model is generalizing well to new data. . Tune the model: Fine-tune the hyper-parameters of the neural network to improve its performance on the evaluation metrics. . Deploy the model: Use the trained neural network to make predictions on new data. This could involve integrating the model with other software systems or deploying it to a cloud-based service. However, the specific details of developing with deep neural networks may vary depending on the task and the data. DNNs have a wide range of applications across various industries. Few examples are listed and explained below: 1. Image and speech recognition: DNNs can be used to recognize and classify images and speech. This technology has a wide range of applications, from identifying objects in photos to transcribing spoken words. 2. Natural language processing: DNNs are used in natural language processing (NLP) to understand and interpret human language. This allows for applications such as sentiment analysis, chatbots, and language translation. 3. Fraud detection: DNNs can analyze large datasets and identify patterns that may indicate fraudulent activity, such as credit card fraud. 4. Robotics: DNNs can be used to build robots with advanced capabilities, such as object recognition and decision-making. 5. Autonomous vehicles: DNNs can be used in self-driving cars to recognize and react to traffic and road conditions. 6. Healthcare: DNNs have multiple applications in healthcare, including diagnosis, treatment planning, and drug discovery. 7. Financial analysis: DNNs can analyze large volumes of financial data to identify patterns and make predictions, improving financial planning and risk management. 8. Gaming: DNNs are used in gaming to improve AI opponents and create more realistic virtual environments.
48
3 Intelligent Approaches
These are just some examples of the many applications of deep neural networks. With the growth of artificial intelligence and machine learning, we can expect to see DNNs being applied in even more areas in the future.
3.6 Popular Techniques DNNs are a type of machine learning algorithm that is designed to simulate the workings of the human brain. They are made up of multiple layers of artificial neurons that process information and attempt to make more accurate predictions as the data propagates through the layers. Compared to traditional neural networks, DNNs have more layers and are capable of processing much larger amounts of data. This allows them to identify complex patterns and relationships within the data, making them well-suited for tasks such as image recognition, natural language processing, and speech recognition. One of the key advantages of DNNs is their ability to automatically extract features from raw data. This means that they can learn to identify the important features of a dataset without the need for explicit feature engineering. Another advantage is that DNNs can continue to improve their accuracy over time as they are trained on additional data. However, DNNs are also computationally expensive to train and can require large amounts of data to achieve good accuracy. Additionally, they are often considered to be black box models, meaning it can be difficult to understand how they are making their predictions [8]. Overall, DNNs are a powerful tool for a variety of machine learning tasks, and their popularity has continued to grow in recent years as new techniques and architectures are developed to improve their performance. There are several popular techniques of DNN and all these techniques are highly dependent on the type of data as well as underlined applications. The design issues of DNN plays a crucial role in the selection of appropriate technique. Few popular techniques are listed and explained below: 1. Convolutional Neural Networks (CNNs): These are used mainly in image and video processing tasks like image classification, object detection, and segmentation. 2. Recurrent Neural Networks (RNNs): These networks are used mainly in sequence-to-sequence classification and time-series prediction tasks. 3. Long Short-Term Memory Networks (LSTMs): LSTMs are a type of RNNs that are specially designed to capture long-term dependencies in data sequences. 4. Auto-Encoders: Auto-Encoders are neural networks that are trained to copy their input to their output. They can be used for dimensionality reduction, noise reduction, and anomaly detection. 5. Generative Adversarial Networks (GANs): GANs consist of two neural networks pitted against each other. They can be used to generate realistic synthetic data and are being used in image and video generation tasks.
3.7 Summary
49
6. Deep Belief Networks (DBNs): DBNs are hierarchical models that consist of multiple layers of restricted Boltzmann machines (RBMs). They are used mainly in unsupervised learning tasks like density estimation, data clustering, and anomaly detection. DNNs have become increasingly popular for solving complex problems, but they also have some drawbacks. One of the drawbacks of deep neural networks is that they require large amounts of data to be trained effectively. Without sufficient data, the model may not be able to learn the patterns in the data and perform as well as other techniques. Training DNN can be computationally expensive, especially when working with large amounts of data. This can make it challenging to train DNN on standard hardware. DNNs are often referred to as a “black box” because it can be difficult to understand how the network is making its predictions or decisions. This can make it hard to troubleshoot issues or to interpret the results of the model. DNNs also can be prone to over-fitting the training data, which can lead to poor performance on new, unseen data. Over-fitting occurs when the model becomes too complex and starts to memorize the training data rather than learning to generalize patterns. In addition to this, DNNs may not be as transferable to new tasks or domains as other machine learning techniques. This is because DNNs are often trained on large amounts of data specific to a particular domain or application, and may not generalize well to new types of data.
3.7 Summary The use of intelligent approaches based on soft computing is increasing in every use cases of the real world. Deep learning is a type of artificial neural network that is made up of multiple layers of neurons. Each layer in a deep neural network extracts progressively higher-level features from the input data, allowing for more accurate and sophisticated predictions or decisions to be made. Deep learning is particularly well-suited to complicated, high-dimensional problems such as image and voice recognition, and natural language processing. Deep learning allows computational models of multiple processing layers to learn and represent data with multiple levels of abstraction, making it possible for them to learn complex patterns in data. This chapter focuses on the need for DNN in various applications. Design issues, various applications, and popular techniques of DNNs are also presented and discussed in the last part of this chapter. To conclude, DNN has become an important tool in modern machine learning and is driving advances in many areas, including image and speech recognition, natural language processing, autonomous driving, etc.
50
3 Intelligent Approaches
References 1. Xiaolong H, Huiqi Z, Lunchao Z, Nazir S, Jun D, Khan AS (2021) Soft computing and decision support system for software process improvement: a systematic literature review. Scientific Programming, vol. 2021, Article ID 7295627, 14 p 2. Unger EK, Keller JP, Altermatt M, Liang R, Matsui A, Dong C, Hon OJ, Yao Z, Sun J, Banala S, Flanigan ME, Jaffe DA, Hartanto S, Carlen J, Mizuno GO, Borden PM, Shivange AV, Cameron LP, Sinning S, Underhill SM, Olson DE, Amara SG, Temple Lang D, Rudnick G, Marvin JS, Lavis LD, Lester HA, Alvarez VA, Fisher AJ, Prescher JA, Kash TL, Yarov-Yarovoy V, Gradinaru V, Looger LL, Tian L (2020) Directed evolution of a selective and sensitive serotonin sensor via machine learning. Cell 183 (7):1986–2002.e26. doi https://doi.org/10.1016/j.cell.2020.11.040. Epub 2020 Dec 16. PMID: 33333022; PMCID: PMC8025677 3. Janet, Paul J, Chan, Lydia, Kulik, Heather J, Accelerating chemical discovery with machine learning: simulated evolution of spin crossover complexes with an artificial neural network. J Phys Chem Lett 9(5) 4. Yang KK, Wu Z, Arnold FH (2019) Machine-learning-guided directed evolution for protein engineering. In Nat. Methods 16:687–694 5. Suciu O, M˘arginean R, Kaya Y, Daumé III, Dumitra¸s T (2018) When does machine learning FAIL? generalized transferability for evasion and poisoning attacks, 27th USENIX security symposium, August 15–17, 2018, Baltimore, MD, USA 6. Ahuja S, Panigrahi BK, Dey N, Gandhi T, Rajinikanth V (2020) Deep transfer learning-based automated detection of COVID-19 from lung CT scan slices. Appl Intell 7. Alzubaidi L, Zhang J, Humaidi AJ et al (2021) Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data 8:53 8. Sun S, Chen W, Wang L, Liu X, Liu T-Y (2016) On the depth of deep neural networks: a theoretical view. in proceedings of the thirtieth AAAI conference on artificial intelligence, February 12–17, 2016, Phoenix, Arizona, USA, Dale Schuurmans and Michael P. Wellman (Eds.). AAAI Press, 2066–2072
Chapter 4
Predictive Maintenance
4.1 Overview Predictive maintenance is one of the important applications of predictive analytics and becoming a popular tool for preventive mechanisms. In the current era of digitization and automation, the smart computing is playing a crucial role [1]. Increasing number of companies is becoming more reliant on the technology and changing their all operations to artificial intelligence (AI) enabled and this AI revolution is bringing a big change in the labor market. In the sequel, these advancements in operations and technologies are creating significant impact on human lives and livelihood. The use of AI for predictive analytics and smart computing has become an integral component in all the use cases surrounding us. This is giving birth to the fourth industrial revolution, i.e., Maintenance 4.0 [2]. The role of maintenance has become a vital factor to reach the corporate and economic goals. The current trend of Maintenance 4.0 leans toward the preventive mechanism enabled by predictive approach and condition-based smart maintenance. The intelligent decision support, earlier detection of spare part failure, and fatigue detection are the main slices of intelligent and predictive maintenance system (PMS) leading toward Maintenance 4.0. The main objective of PMS is to use emerging technologies, sensors, actuators, and sensor devices which use machine learning and data science techniques [3] to ease the work for humans and engage them in doing more quality work. The main objective of PMS is to use emerging technologies, sensors, actuators, and sensor devices which use machine learning and data science techniques [3] to ease the work for humans and engage them in doing more quality work. For any company, modern and smart equipment, optimal unexpected breakdowns, and optimal maintenance time are some key quality performance indicators and these can be achieved using new technologies and essentially using past/historical data for building prediction model for predictive maintenance.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. N. Mahalle et al., Predictive Analytics for Mechanical Engineering: A Beginners Guide, SpringerBriefs in Computational Intelligence, https://doi.org/10.1007/978-981-99-4850-5_4
51
52
4 Predictive Maintenance
Machine learning (ML) plays a key role in all PMS in order to optimize the repair time and unexpected and uncertain downtime and which leads to the improved lifetime. The potential outcomes of PMS also help to improve on the utilization rate of all resources and assets in the company and there are various use cases which are listed below: . Spare part failure: Prediction when a part will fail and have a replacement ready . Vibration analysis: Estimation of variation in vibrations of critical components machine . Fatigue analysis: Prediction of fatigue in regular and important components in the machine . Oil analysis: Prediction of the state of machine’s oil lubrication system . Preventive healthcare: Prediction of occurrence of disease/situation toward preventive approach . Acoustic analysis: Prediction and detection of gas and vacuum There are many information technology service and solution providers using PMS for business development and improving on the corporate status. Following are some real-world examples of PMS: 1. PredictEasy It is no code predictive analytics platform and enables users to perform data science applications for prediction. It provides useful functionalities like descriptive, predictive, and prescriptive analysis and building machine learning-based models [3]. 2. Amazon Recently Amazon launched need-based industrial ML services like AWS Monitron and Amazon Lookout for industrial equipment. Monitron is end-to-end equipment monitoring solution to detect abnormal equipment conditions for the applications where there is no underlined sensor network. Lookout service of Amazon is designed for the equipment sensor networks and uses ML models to detect anomalies in equipment behaviors. This kind of predictive maintenance is useful for scheduling maintenance regularly and helps to avoid downtime of the equipment. 3. Nestle Nestle [4] is known for its coffee service worldwide and has been more popular in the corporate world since last decade. Nestles corporate coffee vending machines are now Internet of Things (IoT) enabled having the capacity to serve more than 2500 users a coffee at a time. IoT-based machines allow the remote configuration of machines and PMS helps to achieve smooth running maintenance leading to improved lifetime. 4. Chevron Chevron and similar other organizations involved in renewable fuel [5] use machine learning in predicting preventative maintenance for high-speed metro lines across
4.1 Overview
53
the country. The data generated by the sensor ecosystem of metro lines is used to predict key issues and detects when the specific part needs replacement. In addition to this, General Motors uses AI for component failure where multiple cameras are deployed on assembly lines. Robots equipped with the cameras analyze falling robotic component motion and this data is used to avoid unplanned outages. AI for large datasets is used in the early warning system by VOLVO to track the events which happen during machine operations (rise and or decrease in the different machine parameters). These large datasets are processed by VOLVO using AI techniques and outcomes are used to assess different event impacts on unexpected breakdowns and failure rates. Corrosion is one of the major issues faced by several oil, gas, shipping, chemical, and utility industries. Detection of corrosion in time is very important for human life as well as environmental damage. Human intervention in this case is not at all affordable and can lead to the serious consequences in the longer run. The imagebased automated corrosion monitoring and detection using AI is one of the most successful projects powered by Infosys. Deep learning approach is used for better prediction enabling iterative gathering of new data and model re-training. The sample use cases mentioned above require effective predictive models for better prediction. Above use cases show that there are many benefits of having PMS in place for underlined enterprise. Potential benefits of PMS are listed below: Benefits of PMS . . . . . .
Optimization in unexpected downtime Regularities in schedule of maintenance activities Improved lifetime of machines, parts, and equipment Minimum breakdown and failure of industrial spare parts Improved life of service for industrial IoT Optimized time and cost in logistics
The use of PMS is positively impacting a wide range of mechanical manufacturers and all are looking for a great AI roadmap ahead toward PMS. There is a need to incorporate PMS right from the design phase in order to avoid the major defects at later stage at a higher cost. Technological solution in PMS also requires enhanced processes with respect to quality, work safety, and AI-based operational manufacturing technologies and will help us to create better landscape. PMS requires design and development of preemptive algorithms to predict the next state of machine or part failure in order to create the provision to prevent the failure from happening. The potential challenges to design PMS are listed below: . There is a scarcity of expertise required to build predictive analytics solutions as it required deep understanding of inferential statistics and emerging programming languages . PMS are standalone tools and needs integration to the existing business applications. In addition to this, PMS is hard to scale up, deploy, and upgrade. Integration of PMS with existing AI-based tool is another major challenge.
54
4 Predictive Maintenance
. Technical expertise and lack of resources in terms of time, scale, and talent is very important challenge in design and development of PMS. . The quality of data for better data storytelling and effective prediction toward preventive approach is also a crucial challenge for accurate predictions.
4.2 Predictive Maintenance and Machine Learning Data is the main oil for any efficient PMS and the standard data set includes the categories like assets usage data, maintenance data, fatigue details of the assets, environmental datasets equipment/machine state data, etc. Most of the data used in PMS comes from the sensor data which are deployed in various parts of the ecosystem. Smart sensors regularly or periodically collect the data regarding status of the equipment/machine/component and this data is given as inputs to the PMS. These datasets are mainly available at internal and external sources like internal networks deployed for infrastructure, gateways, enterprise cloud, and IoT networks are used to distribute the collected data. In the sequel, machine learning (ML) models are applied on these datasets. ML algorithms are designed and developed using soft computing approach and the difference between soft computing and hard computing is depicted in Table 4.1. Using the benefits of soft computing programming, intelligent and self-learning ML algorithms can build ML model. This model is developed using training dataset and the several rules based on inferential statistics. The models for PMS are incremental and progressive in nature and can be refined in terms of capabilities and prediction with new training datasets. All ML models and algorithms are designed for big data and they outperform for bigger datasets as compared to the smaller datasets. These ML self-learning algorithms are proactive in nature and they can improve their performance using incremental learning. The main stakeholders like Table 4.1 Hard and soft computing Sr. no
Hard computing
Soft computing
1
Based on exact model
Based on uncertain model
2
Use binary logic and crisp system
Use formal logic and probabilistic reasoning
3
Deterministic in nature
Stochastic in nature
4
Perform sequential computations
Capable of performing parallel computations
5
Need exact data
Can work on ambiguous data
6
Produces crisp and precise results
Produces approximate results
7
Program need to be written
Can emerge the program
8
Cannot address randomness
Can address randomness
9
Example: Merge sort, Binary search
Example: AI and ML Use cases like PMS
4.2 Predictive Maintenance and Machine Learning
55
managers looking after maintenance operations, and technicians looking after monitoring can be mainly empowered with PMS which aims at the proactive prevention of asset failures. ML is mainly used to design and develop smart PMS where the first step is data acquisition. Collection of conditional data by setting the baseline for the equipment is a very crucial step. Right and reliable data is very important to design and build efficient model with more precise results. IoT-based devices, sensor technologies, and interactive dashboards are used to collect right data from the equipment in service. The data collected from above mentioned sources can be structured or unstructured data. The structured data is organized in the form of rows and columns (Table form) which include spreadsheet data or the data stored in the form of databases and the unstructured data includes image or video datasets. PMS requires data from environmental parameters (humidity, temperature, etc.). The dataset taxonomy for ML datasets is depicted in Fig. 4.1. There is huge open source data available on the web to build ML models. However, it is important to understand the perspective in which these datasets are generated. Otherwise, the entire analysis and prediction can go fail. In the sequel, it is recommended to generate the datasets as per the requirements and have certain question to be posted on this dataset in the mind [6]. In addition to this, following points are crucial to be posted on the datasets before we go for processing them.
Fig. 4.1 ML datasets taxonomy
56
4 Predictive Maintenance
. Purpose The purpose of the dataset and analytics should be well defined and formulated before we go for building the ML model. The purpose of algorithms should be clear before applying them on the datasets. . Context There are many parameters which include order, hierarchy, clarity, relationship, and convention of the datasets define context of the datasets. . Machine learning and annotations The requirement of annotated datasets for supervised or semi-supervised ML or nonannotated datasets for unsupervised ML requires investing time and resources for data preparation. . General Requirements In addition, Input to PMS, input, data linearity and complexity, limitations and constraint analysis of the system, and feature extraction and analysis are some general points to be considered.
4.3 Predictive Maintenance Model PMS and the PMS model require predictive algorithm and the sensory data from the equipment. There are three major strategies which include reactive maintenance, preventive maintenance, and predictive maintenance. To design more accurate PMS, it is important to understand all three strategies fundamentally with examples. . Reactive Maintenance In reactive maintenance, the machine is used till its full capacity is exhausted and the repair is scheduled when the particular part of machine fails. However, this approach is not feasible and affordable for the complex and expensive systems. This approach of reactive maintenance can be applicable to simple and inexpensive systems like simple smart home automation system for bulbs. . Preventive Maintenance Regular checks and monitoring is carried out to detect and prevent failure of parts before it occurs. However, in this case it is challenging to decide scheduling of the maintenance. More human efforts are required in the tracking process and early scheduling of the machine can result into wasting of usable machine life. . Predictive Maintenance
4.3 Predictive Maintenance Model
57
Fig. 4.2 Maintenance strategies
This is an estimation task of predicting the time required to fail the spare parts of machine. This enables time optimization to schedule maintenance. This approach is more suitable to the complex and expensive systems where what parts need to be fixed well in time can be decided. These strategies are also depicted in Fig. 4.2. Figure depicts the relation between time window and the machine status with respect to these three strategies. The main outcome of any PMS model is the time window essentially few number of days when the machine or any part is likely to fail and there is a need to keep replacement ready as well as schedule the maintenance. PMS model mainly identifies and establishes the relationship between underlined extracted features and fatigue analysis of the equipment. In turn this enables the estimations of time remaining before the failure and prediction of the time window for maintenance. The main steps required for building PMS model are listed below: . Data collection This step deals with the collection of data under varied maintenance conditions and time window with the help of sensors and IoT. . Knowing data Understanding data in terms of syntax and semantics and then cleaning the data by filtering the noise is carried out in this step. . Condition indicators In this step, condition indicators (extracted features whose behavior is variable across the time) are identified in order to draw a clear line between normal and faulty operations. . Model training Extracted features in terms of condition indicators are then used to build the model for predicting residual service life. This model is then used for multiple datasets to predict. . Model implementation Edge device or cloud layer is the appropriate place to implement, deploy, and integrate ML model for PMS. . Model assessment
58
4 Predictive Maintenance
Fig. 4.3 ML workflow for PMS
Key performance parameters are then used to evaluate the model and confusion matrix (accuracy, precision, recall, etc.) is validated based on the underlined application. This ML workflow for PMS is presented in Fig. 4.3.
4.4 Implementation of Predictive Maintenance Distinctive features play an important role in PMS and mainly in building the ML models. As these features are used to train the model, their distinctness enables the model for more correct predictions as well as these distinctive features can also estimate the equipment’s current state. Progressively, the model can also be fine tune by feeding new data to the model. Such time domain features are useful in identifying the conditional indicators [7] for better forecasting and predictions. The common statistical operations like mean, variance, and skew-ness are used to decide useful time domain features. Boxplot is a good tool to confirm whether these distinct time domain features can differentiate between the different plots. In addition to this, there are many factors which are important to consider while developing the deployable PMS model. These factors are listed below: 1. Erroneous historical data Datasets for both normal condition and abnormal (failure) condition is required for the purpose of training in order to build ML model for PMS. Training examples are required in the dataset on erroneous historical data (normal and abnormal conditions both). 2. Past maintenance track Past maintenance track of the machine i.e. when in the past machine was repaired is most potential feature in the dataset required for PMS. Inaccurate data for this
References
59
feature is likely to get wrong results and hence it will lead to the failure ML model for PMS. 3. Machine state Fatigue or aging pattern identification and related periodical or continuous data values are very valuable. Anomalies in the fatigue or aging degradation also play a crucial role in ML model. These data values give correct state of the machine and in turn perform correct prediction on the machine healthcare. 4. Metadata of equipment Model details of the equipment, demographical details, service details, make of the machine, etc., include in the metadata of equipment. These static features deal with the technical information of equipment and act as ice on the cake to design PMS. 5. Clear prediction matrix Prediction is carried out on several performance metrics like precision, recall, accuracy, F1 score, etc. This matrix of key parameters is to be explored, analyzed, and confirms to assess the performance of any PMS. With the emergence of Industry 4.0 and SOCIETY 5.0[8], the digital technologies and automation is leading to the requirements of ML-based PMS. ML-based PMS will reduce the burden of human and the objective is to offload this burden to the machine. In addition to this, the human expertise will be engaged in doing more quality work. PMS in mechanical engineering with different case studies is presented and discussed in the next chapters.
4.5 Summary Compliance of safety measures, preemptive actions toward safety, and improved lifetime with minimum downtime are some key outcomes of PMS. This chapter has discussed the main task of the book, i.e., predictive maintenance. This chapter discussed the key component required for building predictive maintenance model. The next part of the chapter is focused on prominent use cases of mechanical engineering using PMS along with the benefits. PMS and machine learning along with the implementation steps are presented and discussed in the last part of this chapter.
References 1. Dey N, Wagh S, Mahalle PN, Shafi Pathan M, Dey N, Wagh S, Mahalle PN, Shafi Pathan M (2019) Applied machine learning for smart data analysis. ISBN 9781138339798, Published May 29, 2019 by CRC Press
60
4 Predictive Maintenance
2. Cachada A et al. (2018) Maintenance 4.0: intelligent and predictive maintenance system architecture. 2018 IEEE 23rd international conference on emerging technologies and factory automation (ETFA), pp 139–146. https://doi.org/10.1109/ETFA.2018.8502489. 3. www.predicteasy.com 4. https://www.computerweekly.com/news/450403200/Nestle-picks-Telefonica-to-build-IoT-forcoffee-machines 5. https://www.connection.com/~/media/pdfs/solutions/manufacturing/cnxn-ai-in-manufactu ring-whitepaper.pdf?la=en 6. Joshi PM, Mahalle PN (2022) Data storytelling and visualization with Tableau a hands-on approach. CRC Press 7. Shinde GR, Kalamkar AB, Mahalle PN et al (2020) Forecasting models for coronavirus disease (COVID-19): a survey of the state-of-the-art. SN Comput Sci 1:197 8. Patil RV, Ambritta NP, Mahalle PN, Dey N (2022) Medical cyber-physical system s in society 5.0: Are we ready?. In: IEEE transactions on technology and society
Chapter 5
Predictive Maintenance for Mechanical Design System
5.1 Overview A growing discipline called prognostics and health management (PHM) is committed to the development of scientific policies, particularly the management of health status. Making health indicators, predicting useful life, and providing general health care are the three main focuses of this field. By examining these signs, we may forecast a design’s remaining usable life and determine the degree to which a core component will not function as intended. In order to cut maintenance costs and lower safety hazards, PHM works to anticipate probable failures before they happen. Forecasting and health management is essential for reducing downtime and maintenance expenses for rotating machinery, such as bearings. Forecasting residual benefits on bearings is how this is accomplished. Early defect diagnosis and identification in engineering, sometimes referred to as fault detection and diagnosis (FDD), using condition monitoring (CM) and mechanical rotation (RM), is crucial. These procedures are necessary for the prompt identification and mitigation of serious harm. Usually, mechanical signals are recorded and processed in order to anticipate gear bearing failure using machine learning (ML) or deep learning (DL) approaches.
5.2 Design Issues The most frequently utilized mechanical parts of rotating machinery, bearings, and gears are crucial in terms of their state of health. Based on information gathered from conditions monitoring, it is anticipated that such components will be destroyed in practice. The significance of a bearing as a mechanical component in any machine is shown by research demonstrating that flaws in rolling element bearings can result in © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. N. Mahalle et al., Predictive Analytics for Mechanical Engineering: A Beginners Guide, SpringerBriefs in Computational Intelligence, https://doi.org/10.1007/978-981-99-4850-5_5
61
62
5 Predictive Maintenance for Mechanical Design System
a considerable number of failures. In order to analyze and anticipate rolling objects and gears, this chapter will present a thorough overview of recent developments and efforts in machine learning (ML) approaches. ML technology is anticipated to have a significant effect on the prediction and health management of rolling objects and bearing gears, the chapter also covers a detailed discussion of the various difficulties involved with its usage. In modern rotating equipment, the bearing is one of the most important mechanical components. Operating in hostile settings with high loads, high speeds, and little lubrication can result in bearing issues. Such operating circumstances result in increased corrosion, frictional torque, overheating, and performance loss over time [1, 2]. Failure of the device could result if these flaws are not reported [3]. Any industry that uses machinery can experience considerable financial losses from sudden bearing failures, which can be broken down into tangible and intangible expenses. Costs that can be seen, including labor, supplies, and materials, make mechanical maintenance easier.
5.2.1 Perception of Diagnostics and Prognostics A maintenance technique known as condition-based maintenance (CBM) uses data gathered from condition monitoring to guide maintenance choices. Data collection, data processing, and decision-making about maintenance are the three fundamental parts of CBM. Both analysis and prediction are crucial components of CBM. It involves spotting faults as they occur, pinpointing them, and isolating them. Forecasting, on the other hand, focuses on anticipating mistakes. Finding errors and their root causes is the goal of diagnostics, which is akin to a background inquiry. Forecasting, on the other hand, requires assessing and predicting a system’s health, including early failure indications and projected Remaining Useful Life (RUL). Examining and analyzing a condition, issue, or cause are steps in the diagnosis process. The analytical framework includes a system for data collection, signal processing, and extraction modules, and modules for extracting fault-related information from expert knowledge, real-world observations, and historical data. However, prophecy necessitates additional acts. Performance analysis, deterioration modeling, failure analysis, and prediction are all fundamental components of forecasting, in addition to the extraction of features and the use of defect knowledge. Predictive algorithms developed from time series analysis, statistics, or artificial intelligence technologies can forecast when device performance may decline to intolerable levels of surface degeneration. The early failure diagnosis, evaluation of present health, and estimation of remaining useful life are the current areas of emphasis for prognosis and health management research. Building predictive models with the aid of CBM tactics may be beneficial for achieving optimum performance, avoiding downtime, and making precise timing decisions.
5.2 Design Issues
63
The three primary groups of forecasting approaches are data-driven methods, model-based methods, and hybrid methods. Model-based approaches use data from simulated models that are performed under ideal degradation conditions to calculate the system’s remaining life [4]. While data-driven forecasts reflect the course of growing faults without relying on the accuracy of a system model, hybrid approaches that forecast the time to reach a predefined threshold level combine the results of model-based and data-driven methods to generate solid prediction results [5]. This technique uses signals from the machine and integrates techniques that have been utilized for prediction for a long time because of their success in mechanical performance, such as vibration signature analysis and oil analysis [6–10]. A list of PHM tools for typical critical components is given in Table 5.1, together with details on prevalent problems with the components, likely failure modes, traits, available data formats, features, and algorithms for prognostics and diagnostics. Table 5.1 Information for use of prognostics for bearing and gear Sr.No
Component
Failure type
Characteristics
Measuring parameters
Features
1
Bearing
Inner race, outer race, roller, and cage failures
High noise, low amplitude
Vibration acoustic emission, oil debris
Time domain statistical characteristics, vibration characteristic frequency, size, quantity,
2
Gear
Gear crack, manufacturing error, tooth pitting/spall, , gear fatigue/wear, tooth missing
High dynamic signal modulated with other factors, High noise
Vibration acoustic emission, oil debris
Algorithm
1. Artificial neural networks 2. Support vector Vibration signature frequencies, time domain machine 3. K-nearest statistical features neighbor 4. Decision trees 5. Naïve Bayes 6. Deep learning approaches 7. Principal component analysis 8. Hidden Markov modeling 9. Wavelet transform (WT) 10. Guassian regression 11. ShortTime frequency transform
64
5 Predictive Maintenance for Mechanical Design System
5.2.2 ML Algorithms for Fault Detection Different machine learning (ML) algorithms are employed for fault detection analysis. The field of ML algorithms is continuously evolving and expanding. Consequently, this section provides a comprehensive overview of several frequently utilized ML algorithms in the context of fault diagnosis and prognosis of rolling element bearings (REBs). These algorithms are depicted in Fig. 5.1. . Support Vector Machine Vapnik et al. [11] initially proposed the Support Vector Machine (SVM). SVMs are supervised machine learning techniques that can address classification and regression issues. Finding a hyperplane that effectively divides the training data into two groups is the basic goal of SVM. SVMs use a kernel function to project the feature space into a higher dimensional space in order to do this. The SVM’s goal in this projected region is to identify an improved separation hyperplane that maximizes the decision limit. The technique is known to work well in real-time analysis and performs particularly well for classification and prediction tasks when only a small number of samples are used [12]. A method for machine state prediction based on vibration signals utilizing wavelet transform and SVM was proposed in a study by Y. Li et al. [13]. The modeling method is improved by the wavelet transform and SVM combination. In particular, in scenarios with strong decision restrictions, the SVM-WT damage prediction model successfully lowers characteristic abnormalities and vibration signal complexity, improving decision accuracy. There is no default setting for selecting a kernel implementation for SVM. SVMs can only classify two classes at once because they are binary classifiers by nature. Many issues in real life, nevertheless, call for numerous classifications. Multiclass Machine Learning
Supervised Learning Logistic Regression approaches Decision Tree Support Vector Machine Naïve Bayes K-Nearest Neighbor Wavelet transform Artificial Nural Network
Unsupervised Learning Hidden Markov Modeling Principal Component Analysis Self organization map Gaussian Mixture Model
Fig. 5.1 ML algorithms classification for fault detection
Reinforcement Learning DeepLearning
5.2 Design Issues
65
SVM is employed here. The “one-and-all” and “one-to-one” approaches are the two most popular multiclass SVM methods. One classifier (all N classifiers) is trained in each class using the “one-versus-all” strategy, where samples from that class are deemed positive samples and all other samples are considered negative. However, this method frequently leads to an unbalanced dataset. In other words, the “one-toone” method creates N(N-1)/2 distributions by training a unique distribution for each pair of individuals. Despite being computationally demanding, this strategy is often insensitive to skewed data sets. . Wavelet Transform The wavelet transform is utilized to express time signals as a scaled and translated oscillating waveform with finite length or fast decay. This approach is particularly effective in analyzing non-stationary signals and offers superior resolution compared to time–frequency analysis techniques. . Artificial Neural Network (ANN) The structure and operation of biological neurons are mimicked by the artificial neural network (ANN) model showing a quick decision-making process when managing disruptive research projects. It also excels at modeling complex multidimensional data that is nonlinear and multidimensional [14]. When dealing with dynamically complicated systems that exhibit nonlinear behavior and unstable processes, ANNs are especially well suited. Artificial Neural Networks (ANNs), which are supervised machine learning algorithms, may address a variety of problems, including recognition of patterns, grouping, classification, regression, and nonlinear functional inference. ANNs are complex multi-node layered structures that work together to resemble how the human brain works. Activation functions, equalization nodes, connection weights and biases, output layers, hidden layers, input layers, and other components are all depicted in Fig. 5.2 along with the structure of a simple neural network (NN). Triangles are used to represent bias threshold nodes. The nervous system doesn’t participate in any processing; it only transmits incoming impulses. On the other side, at the buried layer, output neurons process and output groups. Neurons function as agents to arrive at coherent solutions in the buried layer. Even when the inputs are zero, biases are utilized to prevent null results. Simple arteries have no exterior connections, and each artery is connected to every other artery in the layer above it. Information moves from the input layer to the output layer forward during the training phase, and the computed error (estimated output—actual output) is transmitted backward through the weights via mean weight change. This indicates that there is a directed relationship between the nodes. For instance, the link between nodes 1 and 2 differs from the connection between nodes 2 and 1. When artificial neural networks are being trained, the weights are changed (randomly at first) until the error falls below a predefined acceptable level [15].
66
5 Predictive Maintenance for Mechanical Design System
Fig. 5.2 Artificial neural network
For complicated systems with nonlinear behavior and material instability, artificial tissues are well suited. However, ANN applications need a lot of representative training data that have undergone the degrading process for bearing prediction. Due to factors like environmental noise and machine-to-machine switching, classification discrepancies between training and test data frequently occur in real-world applications. Performance may be significantly impacted. . Decision Tree A decision tree (DT) is an induction technique that categorizes future events using top-down hierarchy and recursive logic. It can be likened to a flowchart or map that shows the possible outcomes of parallel choices. This algorithm is implemented in a rule-based approach, where the rules of the training samples are obtained with the obtained information. Starting from the root of the tree and following the branches, a decision tree helps to decide or classify data objects until it reaches a terminal node, also known as the leaf of the tree. This uses a different set of rules, which are listed in sequence [16]. Decision trees’ primary benefit is their ability to produce clear visualizations and simple interpretation, enabling quick analysis. One limitation, however, is that building a tree requires significant knowledge and skills. Since decision trees are non-metric models, their training speed is surprisingly fast compared to the metric model. This is because there is no need to calculate many parameters. Additionally, decision tree algorithms make it simpler to comprehend how intended models behave. New methods can be added to existing trees with
5.2 Design Issues
67
reasonable ease, but their potential for being overly handy makes them difficult to deploy in practice. . k-Nearest Neighbor The k-Nearest Neighbor algorithm (kNN), a supervised learning technique used for both regression and classification applications, was first introduced by Cover and Hart in 1968 [17]. The fundamental concept is to place newly discovered, unclassified cases in the class that includes the majority of their K nearest neighbors. The kNN algorithm is highly known for being clear-cut and user-friendly. However, this categorization speed’s tendency to be a little slow is a drawback. Additionally, it struggles to handle high-dimensional data. Additionally, this approach is noisy and expensive to calculate with. . Naïve Bayes The Naive Bayes classification method employs the Bayes theorem and assumes a conditional independent relationship between attributes to determine the posterior likelihood of a class. With this approach, uncertainty can be efficiently managed [18]. It is especially useful when there is limited information available. The Naive Bayes categorization model is recognized for its simplicity and low storage requirements. It does calculations quickly and precisely. But it’s important to remember that Naive Bayes’ accuracy is heavily dependent on prior assumptions and relies on accurate prior knowledge. . Logistic Regression (LR) By creating correlations between several independent variables, the linear regression method of supervised machine learning aims to predict the outcome of an event. The straight line that most closely resembles the given data points is sought. Continuous data generated by linear regression are represented by numerical values. It is possible to manage both a single independent variable and several of them. Logistic regression, on the other hand, is used to find the model that best fits the data and explains the correlation between a number of input variables and the output variable. It works best when the output is limited to the numbers 0–1. Logistic regression requires access to accepted behavior and typical feature domain descriptions in order to make accurate predictions. It is useless when dealing with limitless outputs. . Self-Organizing Map Using the concepts of competitive learning, Self-Organizing Maps (SOM) is an uncontrolled machine learning technique. The input layer of a SOM model, also known as a Kohonen neural network, is made up of N neurons, and the output layer is made up of M neurons. Each neuron has a weight vector attached to it that matches the dimension of the training data. Figure 5.3’s two-dimensional map illustrates how the neurons in the output layer are organized. The scale of the map influences the precision and generalizability of the SOM.
68
5 Predictive Maintenance for Mechanical Design System
Fig. 5.3 Self-organizing maps
Without knowing in advance to which classes the data belongs, it clusters data and finds intrinsic correlations between the input variables [19]. Its capacity to simplify multidimensional data visualization into two dimensions is one of its main advantages. Additionally, SOM relies only minimally during model training on prior knowledge of the properties of the input data and on human engagement. When faced with vast, complex, and nonlinear datasets, it excels at adjusting to its surroundings and arranging itself. However, when working with data that varies slowly or when training gets challenging, SOM may encounter problems.
5.2.3 Machine Learning-Based Bearing and Gear Health Indicators To develop machine learning-based health indicators for bearings and gears, it is critical to train a statistical and probabilistic model using historical data of typical bearings or gears. Then, any deviation from the training model can be utilized to assess the condition of a bearing or gear. Identification of the fault precedes diagnostics, which precedes prognostics in the evaluation of the health status. Finding out when a bearing or gear will entirely fail is the main goal of prognostics. This process makes it possible to carry out maintenance tasks on schedule and to have a complete understanding of the health situation. Figure 5.4 shows the general steps used for the determination of fault or remaining useful life (RUL) of mechanical rotary components viz. bearing and gear.
5.3 Case Study
69 Mechanical component Experimental Testing with sensors Data Acquisition Data Preparation
Testing Data
Training Data Preparation
Machine Learning algorithm
Faulty component data Preparation
Predictive model
Prediction of Fault in component Preparation
Fig. 5.4 Steps of fault prediction
5.2.4 AI for Condition-Based Monitoring and Fault Detection Diagnosis Monitoring the condition of industrial machinery and evaluating its remaining useful life (RUL) depend heavily on fault detection diagnostic (FDD). Predictive health monitoring (PHM) approaches are crucial to ensuring that machinery maintains the requisite state of health. Predicting how long a piece of equipment will operate before breaking down is useful to industry specialists because it prevents them from investing time and money in unnecessary maintenance. As a result, prognostics has developed into a significant industrial issue and a subject of intense interest to academics. Prognostics, according to ISO 13381-1, comprises predicting potential failure modes as well as determining the risk and time to failure for one or more current assets. Prognostics and health management (PHM) is the process of identifying abnormal conditions, faults and their causes, and providing prognostic insights regarding fault evolution in the future.
5.3 Case Study 5.3.1 Predictive Analysis for the Useful Life of Bearing The support vector machine (SVM) learning method is used in this article to produce a multiclass rolling bearing failure taxonomy. The classification of rolling bearings and the application of SVM for flaw detection are the main points of emphasis.
70
5 Predictive Maintenance for Mechanical Design System
After extracting the vibration signal’s time-domain characteristics, the one-versusone approach is used to categorize faults into several categories. The gamma value (RBF kernel parameter), the ratio of training and testing data, and the number of datasets for additional fault classification are discussed in relation to choosing the appropriate SVM parameters. The main topics of this study are the classification and prediction of bearing faults at different angular speeds. A fault in a bearing element, a combination of all faults, an inner race and outer race fault, and a healthy bearing are the five potential bearing fault situations that are considered. To the statistical characteristic parameters standard deviation, skewness, forecast bearing failures, and kurtosis are calculated from the time domain vibration data. Using SVM, the Gaussian RBF kernel and one-against-one multiclass fault classification algorithm are utilized to classify bearing defects. The chapter discusses the ideal datasets, training, and testing percentages, as well as the selection of SVM parameters like gamma. The results show that practising and testing at higher spinning rates produces predictions with almost perfect accuracy. General steps to determine RUL or Fault in bearing. (1) In order to anticipate bearing failure, a set of five distinct types of roller bearings, depicted in Fig. 5.5, are considered. (1) Healthy Bearing (HB) (2) Bearing with Outer Race fault (BORF) Ball Inner Race
Cage
Outer Race
1)Healthy Bearing (HB)
2)Beaing with Outer Race fault (BORF) Fig. 5.5 Types of bearings
3)Beaing with Inner Race fault (BIRF)
4) Beaing with ball fault (BBF)
5.3 Case Study
71
Fig. 5.6 The schematic diagram for the bearing fault simulator setup
(3) Bearing with Inner Race fault (BIRF) (4) Bearing with ball fault (BBF) (5) Bearing with Combine fault(BCF) (2) An apparatus fault simulator setup is needed to undertake experimental investigation. A single shaft is shown in the setup’s schematic diagram, which is depicted in Fig. 5.6. A flexible connection connects a three-phase induction motor to the shaft. A functioning bearing is placed close to the motor end of the shaft, and a defective (test) bearing is placed at the other end. (3) This configuration permits the investigation of bearing faults and the characterization of their vibrational signature by introducing defective bearings into the machine. (4) There are five main sorts of bearing fault circumstances to take into account when studying bearing faults. (5) Each of the 5 bearings must fit in the appropriate bearing house in order to undergo separate experimental testing. (6) The upper surface of the bearing housing must be mounted with a triaxial accelerometer sensor. The x, y, and z orthogonal directions can all be recorded by this sensor as accelerations in the time domain. (7) After gathering the vibration data, the data must next be processed to derive three statistical parameters in the x, y, and z axes. These three parameters are skewness, kurtosis, and standard deviation. The dataset will be better prepared for additional analysis when these parameters have been extracted. (8) Any machine learning algorithm, including SVM, ANN, deep learning, etc., can use the training data and testing data sets from this data set as input. (9) It is possible to estimate the bearing fault and remaining usable life (RUL) from a machine learning system. (10) The estimated remaining usable life (RUL) of the bearing is shown in Fig. 5.7.
72
5 Predictive Maintenance for Mechanical Design System
Failure Threshold value
Bearing Health marker
RUL
me
Fig. 5.7 Remaining useful life (RUL)
5.3.2 Predictive Analysis for Gear Tooth Failure (1) To predict the failure in Gear 4 different types of Gear as shown in Fig. 5.8 are considered. 1. Healthy Gear (HG) 2. Worn Tooth Gear (WTG) 3. Chipped Tooth Gear (CTG)
Healthy Gear (HG)
Worn Tooth Gear (WTG)
Missing Tooth Gear (MTG) Fig. 5.8 Types of gear
Chipped Tooth Gear (CTG)
5.3 Case Study
73
Fig. 5.9 Schematic diagram for gear fault simulator setup
4. Missing Tooth Gear (MTG) (2) To study fault detection in gears experimental testing is required. The schematic diagram for the gearbox fault simulator setup is shown in Fig. 5.9.
(3) (4) (5)
(6)
(7) (8) (9)
In this setup, an electric motor is connected to the shaft through a flexible coupling. The shaft is connected to the Gearbox through a pulley and belt mechanism. Using this configuration, it is possible to examine flaws by inserting defective gears into the gearbox and analyzing the vibration pattern they produce. To undergo separate experimental testing, each of the four gears must fit in the proper gear box. A triaxial accelerometer sensor that can measure accelerations in the time domain in three orthogonal x, y, and z directions must be attached on the top surface of the Gearbox house. After gathering the vibration data, three statistical parameters such as the skewness, the kurtosis, and the standard deviation must be retrieved in the x, y, and z directions. It will help with getting the data set ready. The training data and testing data sets from this data set can be used as input by any machine learning method, such as SVM, ANN, deep learning, etc. (8) Gear defect and remaining usable life (RUL) can be determined using a machine learning system. Fig. 5.10 shows the gear fault % prediction accuracy.
74
5 Predictive Maintenance for Mechanical Design System
Fig. 5.10 Fault % prediction accuracy
5.4 Summary This chapter’s thorough and up-to-date analysis focuses on how machine learning algorithms can be used to identify defects in rolling element bearings and gear as well as predict how long they will last. The key findings are listed below: (1) It has been discovered that statistical characteristics such as the standard deviation, skewness, and kurtosis are useful for precisely pinpointing the bearing failure. (2) To diagnose the bearing issue fast, an SVM machine learning technique with an RBF kernel is preferable. (3) The ideal RBF kernel gamma value is a crucial input parameter for the optimal fault prediction accuracy. (4) The highest possible signal-to-noise ratio is necessary for accurate prediction. (5) For each input signal, signal preprocessing techniques are recommended to get the highest signal-to-noise ratio. (6) Bearings with inner race and outer race faults are demonstrated to possess higher fault prediction accuracy than bearings with combined faults because they have superior vibration signature features. (7) The monitoring of gear progressive deterioration and RUL prediction work best in situations of higher load and lower rotational speed in a gearbox. (8) Support vector machines can be used to automatically diagnose faults and show enhanced classification abilities when it comes to identifying various gearbox problems. (9) Lower rotating speed and heavier load in a gearbox are more efficient for tracking gear progressive deterioration and RUL prediction. (10) It is noted that when data from all frequencies are provided for training and the test data is collected at comparable frequencies, the fault prediction accuracy
References
75
is great. Near-perfect prediction accuracies are seen for rotational speeds that are the same during training and testing. (11) According to reports, the most widely used methods for defect diagnosis and bearing RUL determination are SVM, ANN, and their extensions. However, the use of deep learning machine learning algorithms will grow in the future due to their low computing cost, robustness, dependability, and simplicity.
References 1. Singh J, Darpe AK, Singh SP (2018) Rolling element bearing fault diagnosis based on OverComplete rational dilation wavelet transform and auto-correlation of analytic energy operator. Mech Syst Signal Process 2. Singh J, Darpe AK, Singh SP (2017) Bearing damage assessment using Jensen-Rényi Divergence based on EEMD. Mech Syst Signal Process 3. Singh J, Darpe AK, Singh SP (2019) Bearing remaining useful life estimation using an adaptive data driven model based on health state change point identification and K-means clustering. Meas Sci Technol 4. Luo J, Madhavi N, Pattipati KR, Qiao L, Kawamoto M, Chigusa S (2003) Model-based prognostic techniques. In: Proceedings of the IEEE Autotestcon pp 330–340 5. Hansen RJ, Hall DL, Kurtz SK (1995) New approach to the challenge of machinery prognostics. J Eng Gas Turbines Power 117:320–325 6. Kemerait RC (1987) New cepstral approach for prognostic maintenance of cycle machinery. In: Proceedings of the IEEE Southeast conference pp 256–262 7. Fisher C, Baines NC (1988) Multi-sensor condition monitoring systems for gas turbines. J Cond Monit 1:57–68 8. Muir D, Taylor B (1997) Oil debris monitoring for aeroderivative gas turbine. ASME Power Division (Publication) PWR32 547–553 9. Byington CS, Watson MJ, Sheldon JS, Swerdon GM (2009) Shaftcoupling model-based prognostics enhanced by vibration diagnostics. Insight: Non Destr Test Cond Monit 51:420–425 10. Orsagh R, Roemer M, Sheldon J (2004) A comprehensive prognostics approach for predicting gas turbine engine bearing life. In: Proceedings of ASME Turbo Expo, vol. 2. Vienna, Austria, pp 777–785 11. Vapnik V (2013) The nature of statistical learning theory. Springer Science and Business Media 12. Chen F, Tang B, Song T, Li L (2014) Multi-fault diagnosis study on roller bearing based on multi-kernel support vector machine with chaotic particle swarm optimization. Measurement 47:576–590 13. Li Y, Xu M, Wang R, Huang W (2016) A fault diagnosis scheme for rolling bearing based on local mean decomposition and improved multiscale fuzzy entropy. J Sound Vib 360:277–299 14. Sikorska YZ, Hodkiewicz M, Ma L (2011) Prognostic modelling options for remaining useful life estimation by industry. Mech Syst Signal Process 25:1803–1836 15. Alguindigue IE, Loskiewicz-Buczak A, Uhrig RE (1993) Monitoring and diagnosis of rolling element bearings using artificial neural networks. IEEE Trans Ind Electron 40:209–217 16. Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106 17. Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13:21–27 18. Siegel D, Ly C, Lee J (2012) Methodology and framework for predicting helicopter rolling element bearing failure. IEEE Trans Reliab 61:846–857 19. Huang R, Xi L, Li X, Liu CR, Qiu H, Lee J (2007) Residual life predictions for ball bearings based on self-organizing map and back propagation neural network methods. Mech Syst Signal Process 21:193–207
Chapter 6
Predictive Maintenance for Manufacturing
6.1 Overview Sound is generated by vibrating surfaces and transmitted through a medium which is air in most cases. During transmission, a part of the sound is absorbed by the molecules of air. This absorption is high in open areas but is significantly reduced in closed spaces like a cabin. As sound strikes the surface, a certain amount of it is reflected, some of it transmits through the surface, and the surface absorbs remaining of it. The absorption mainly occurs due to viscous losses. The ratio of sound energy being absorbed by the surface to the sound energy that strikes the surface is defined as the coefficient of sound absorption and is denoted by alpha (α). The frequency and incidence direction are taken into account by the sound absorption coefficient. Every material absorbs some amount of sound but what sets them apart is the amount of sound they absorb. Generally harder and stronger materials show weak absorptive properties owing to their dense structure as opposed to porous materials. If a material has an average alpha value larger than 0.2 across six different frequencies (125, 250, 500, 1000, 2000, and 4000) Hz, it is considered to be sound-absorbing [1]. For all manufacturing processes in industries, tool life prediction is important. Though still having a useful life, milling tools sometimes get discarded. During the milling process, cutting speed, depth of cut, and feed rate are a few important parameters that need to be decided carefully. In the present study, 25 identical HSS tools with a 15 mm diameter were used in experimental testing using a vertical milling machine. The purpose of this study is to determine how these parameters affect the ability to forecast tool life in milling operations using XGBoost Regressor. With the aim to produce high-quality products at a considerable cost, Taguchi Design of Experiments (DOE) was used for the experimental testing. The XGBoost Regressor model predicts excellent tool life with a training accuracy of 0.99 and test accuracy of 0.98.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. N. Mahalle et al., Predictive Analytics for Mechanical Engineering: A Beginners Guide, SpringerBriefs in Computational Intelligence, https://doi.org/10.1007/978-981-99-4850-5_6
77
78
6 Predictive Maintenance for Manufacturing
6.2 Design Issues Multiple experimental methods like the Impedance Tube method, Beranek U-Tube setup, Champoux Pressure difference setup, and direct and alternate flow methods are used to determine the coefficient of sound absorption “alpha (α),” or components involved in the determination of alpha-like porosity, tortuosity, etc., require expensive equipment and an ideal surrounding condition either of which are difficult to obtain. This chapter explores the idea of determining alpha with the help of a machine learning (ML) model thus replacing all the difficult experimental methods and the errors arising thereof. Milling is the machining process of removing unwanted material from a workpiece by feeding in a direction that is at an angle with the tool’s axis. It encompasses a wide range of operations. The various milling processes performed by different milling cutters may be classified as milling on the outskirts of the component and milling on the face. To maximize the profits, industries are constantly looking for cost-effective solutions with shorter lead times, worldwide. It is recognized that the milling parameters such as cutting speed, depth of cut, and feed rate can greatly influence the machining economics, and thus, need to be optimized to improve the machining productivity, machining cost per component, and other appropriate criteria. Estimation of tool life is a more crucial problem that is influenced by a variety of factors, including speed, feed, and depth as well as tool geometry, material, cutting fluid, and workpiece material. The shorter tool life can adversely affect the machining economics. The importance of cutting tools cannot be overstated in any machining system. The quality of the cutting tools is frequently a determining factor in the final product’s quality. The cutting tool’s quality and performance have a direct impact on the machining systems and overall productivity. This chapter provides an overview of tool life prediction. Due to an inadequate number of open-source predictive maintenance datasets, we suggest a semi-synthetic predictive dataset that exhibits realistic results for the assessment of this work and can be used by the community. The explanatory performance of a comprehensive model with a feasible interface is subsequently discussed, trained, and compared.
6.3 Case Studies
79
6.3 Case Studies 6.3.1 Predictive Analysis for Sound Absorption of Acoustic Material 6.3.1.1
Acoustic Analysis and Machine Learning
A lot of industries, including factories, car interiors, theaters, electricity generators, lecture halls, etc., use acoustic materials for noise reduction purposes. The intensity, frequency range, and other characteristics of the sounds in each of these applications can be observed to vary. The coefficient of sound absorption is used to quantify the acoustic performance of synthetic and non-synthetic materials. The frequency becomes important because it influences how materials absorb energy. As a result, different noise control applications cannot make use of identical material. It is essential to know a material’s coefficient of sound absorption in relation to frequency. In the late seventeenth century, the research into how sound travels through porous materials began. Delany and Bazley [2], selected samples from commercial absorbing materials made from fibers and mounted them on an impedance tube for measuring the axial pressure gradient to measure the absorption coefficient [3]. f f Z C = ρo Co 1 + 9.08 ∗ 1000 − 0.75 − 11.9 j ∗ 1000 − 0.73 σ σ
(6.1)
where Z C = Characteristic Impedance ρo = Air Density at Rest Co = Speed of Sound in Air f = Frequency σ = Airflow resistivity The formula for normal incidence energy absorption coefficient is as shown in equation (ii), z − ρo Co 2 an = 1 − z + ρo Co
(6.2)
Their normalized data of Z C and α have direct applications in areas requiring materials of significant thickness. They also calculated the variation of α with thickness. As these were the first people to carry out work in this domain, it is preferred by almost everyone hoping to research the sound absorption domain. Later many alterations and improvements were made to the empirical equations by Miki [4], who discussed the regression models based on physical realizability. The original equations changed in such a way as to achieve the positive-real property, and the propagation constant was transformed into a regular function in the right half plane [5].
80
6 Predictive Maintenance for Manufacturing
f f ∗ 1000 − 0.632 − 8.43 j ∗ 1000 − 0.632 Z C = ρo Co 1 + 5.5 σ σ
(6.3)
It is determined that single power law relations were not capable for accurate prediction of the acoustical behavior across the entire frequency range. If used, they can result in significant inaccuracies in the σf values when employed for impedance functions that are physically more realizable. The statistical technique of quadratic regression, which represents the parabolic correlation between an underlying variable and one or more distinct variables is an extension of the linear regression statistical method used for predictive analysis. Mathematically, quadratic regression can be expressed as y = ax 2 + bx + c
(6.4)
where, y = Underlying/Dependent (Target Variable) x = Distinct/Independent (Predictor Variable) c = Y-intercept a = Coefficient of Regression also called scale factor b = correction factor The values for the x and y variables are used for training purposes as datasets for quadratic regression model representation [6]. Quadratic regression assumes a parabolic relationship between the variables. The features are assumed to be nonmulticollinear and have no autocorrelations. Other assumptions include the normal distribution of error terms and homoscedasticity between the variables. While developing a new product, the most crucial task is the proper selection of materials and other resources to minimize both under and over-designing and optimize profits. For this purpose, a wide range of materials needs to be considered. Basic properties of all materials like density, conductivity, hardness, and stress–strain limits are easily available on the internet and multiple printed sources but certain application-specific properties like sound absorptivity, tortuosity, air flow resistivity, etc., are difficult to obtain. The ML model was trained based on the data obtained from the impedance tube testing method and then compared in contrast with the values obtained using empirical relation given by Miki for a total of 3 porous acoustic materials. The model reported 97% and 90% accuracy when validated for 2 materials and hence it was then used to plot frequency vs alpha (α) graph for a material without any experimentation which showed 88% accuracy. Thus, alpha was determined for a porous material without any experimentation for a wide range of frequency values. • Data Generation for ML model The impedance tube method, in which the sound signal is produced using a speaker at one end and made incident on the test sample on the other end, is one of the most used methods for determining the sound absorption coefficient. Microphones placed before the test sample measure the incident and absorbed signal levels which are
6.3 Case Studies
81
Fig. 6.1 Schematic diagram for impedance tube experimental setup
then processed to analyze the results in the frequency domain. Figure 6.1 depicts the picture of the impedance tube experimental test setup used to test sound-absorbing material. The experimental testing data is used for the preparation of the ML model. Hemp Fiber and Polyester acoustic material are taken into consideration to establish experimental results. Specimen preparation was carried out for Polyester and Hemp Fiber having a fiber-to-binder ratio of 7:3 as suggested by Azma et al. [7]. The density and flow resistivity of hemp fiber acoustic material are 860 kg/m3 and 1400 N.s/m4 , respectively. The density and flow resistivity of polyester acoustic material are 1230 kg/m3 and 5119 N.s/m4 , respectively. Table 6.1 shows the sound absorption coefficients for Hemp Fiber and Polyester for a set of six frequencies. • Data Exploration for ML Model Machine learning algorithms develop data values that can highlight patterns or points of interest and detect links between various variables to assess whether there are any outliers. Manual data exploration is also possible but owing to the large size of the Table 6.1 Coefficient of sound absorption by experimental testing
Hemp fiber
Polyester
125
0.016
0.034
250
0.046
0.074
500
0.113
0.134
1000
0.213
0.321
2000
0.361
0.781
4000
0.537
0.799
Frequency (Hz)
82
6 Predictive Maintenance for Manufacturing
dataset, the exploration is done with the help of data.describe() command of Jupyter to extract meaningful data like count, mean, standard deviation, and minimum and maximum values from the dataset. Data exploration highlights the single variables thus revealing patterns and relationships between variables. Tables 6.2 and 6.3 show dataset values for two acoustic materials viz. Hemp Fiber and Polyester. The single most crucial parameter for accurate modeling of a program is precise data [8]. The dataset used for training the ML model was obtained by substituting the values of airflow resistivity given by Dunne et al. [9], in the empirical formula provided by Miki in equation (iii) to obtain coefficient of sound absorption values. Table 6.2 Dataset value for hemp fiber material Sr. No.
Freq (Hz)
Sound absorption coefficient
Sr. No.
Freq (Hz)
Sound absorption coefficient
1
50
0.011498
21
2050
0.384051
2
150
0.034165
22
2150
0.402436
3
250
0.052144
23
2250
0.420078
4
350
0.068224
24
2350
0.436892
5
450
0.083695
25
2450
0.452809
6
550
0.099208
26
2550
0.467774
7
650
0.115117
27
2650
0.481744
8
750
0.131611
28
2750
0.49469
9
850
0.14878
29
2850
0.506592
10
950
0.166647
30
2950
0.517444
11
1050
0.185181
31
3050
0.52725
12
1150
0.204319
32
3150
0.536021
13
1250
0.223966
33
3250
0.543778
14
1350
0.244007
34
3350
0.550547
15
1450
0.264316
35
3450
0.556362
16
1550
0.284755
36
3550
0.561261
17
1650
0.305183
37
3650
0.565288
18
1750
0.32546
38
3750
0.568489
19
1850
0.345448
39
3850
0.570915
20
1950
0.365019
40
3950
0.572618
6.3 Case Studies
83
Table 6.3 Dataset value for polyester material Sr. No
Freq (Hz)
Sound absorption coefficient
Sr. No.
Freq (Hz)
Sound absorption coefficient
1
100
0.021859
21
2100
0.714979
2
200
0.053565
22
2200
0.737204
3
300
0.086626
23
2300
0.757234
4
400
0.120586
24
2400
0.775118
5
500
0.155597
25
2500
0.790924
6
600
0.191761
26
2600
0.804737
7
700
0.229048
27
2700
0.816655
8
800
0.267306
28
2800
0.826782
9
900
0.306292
29
2900
0.835229
10
1000
0.345693
30
3000
0.84211
11
1100
0.385163
31
3100
0.847539
12
1200
0.424338
32
4000
0.84929
13
1300
0.462858
33
3200
0.851631
14
1400
0.500384
34
3900
0.852039
15
1500
0.53661
35
3800
0.85428
16
1600
0.57127
36
3300
0.8545
17
1700
0.604142
37
3700
0.855919
18
1800
0.635054
38
3400
0.856256
19
1900
0.663879
39
3600
0.856862
20
2000
0.690535
40
3500
0.857008
The dataset included 40 values of alpha corresponding to frequencies ranging from 0 to 4000 Hz for two different porous materials namely Polyester and Hemp Fiber. The α values for Hemp fiber and Polyester acoustic material are assessed using experiments and empirical (Miki) results in Table 6.4. Table 6.4 Comparison of experimental and empirical (Miki) results of coefficient of sound absorption Frequency (Hz) 125
Hemp fiber
Polyester
Expt
Miki
Accuracy
Expt
Miki
Accuracy
0.022
0.021
97.39
0.034
0.052
96.24
250
0.046
0.044
96.39
0.074
0.071
97.21
500
0.113
0.107
94.81
0.134
0.138
95.63
1000
0.212
0.195
92.30
0.321
0.295
91.97
2000
0.361
0.312
86.95
0.718
0.669
93.24
4000
0.537
0.439
81.81
0.799
0.618
85.29
84 Table 6.5 Coefficients of regression equation as predicted by the ML model
6 Predictive Maintenance for Manufacturing
Material
a
b
c
Hemp fiber
−1.854*10–8
0.0002198
−0.00689
Polyester
−6.62*10–8
0.0004916
−0.0537
• Machine Learning Model The model was created in Jupyter for all practical purposes and for ease of access to almost every library available in Python. Quadratic regression requires a few additional dependencies to function correctly. Pandas, Seaborn, matplotlib, Pyplot, and NumPy were the additional dependencies installed [10]. Based on the provided input dataset, the quadratic regression model predicts the coefficients a, b, and c of the quadratic equation (iv), and the best-fit curve approach is used to assess the model’s accuracy [11]. Table 6.5 contains the values of coefficients for the 2 materials. • Accuracy of the Model There are multiple evaluation methods used in linear regression which include: • R-squared (R2 ): This corresponds to the goodness of fit for a curve also called as best fit line. Usually, a greater R-squared value is considered better [12]. • Root Mean Squared Error (RMSE): This determines the model’s average error in predicting the outcome for a specific dataset. The model under consideration is better for the smaller RMSE values [13]. • Residual Error (RSE): This is a version of RMSE that takes the model’s number of predictors into account. It is also known by the name “model sigma” [14]. • Mean Absolute Error (MAE): In Machine Learning terms, it refers to the difference between prediction and actual dataset [15]. The R2 error value for hemp fiber obtained using ML model in the Jupyter IDE, is 0.9966, and it indicates that the predicted curve fits the actual data curve with a 99.66% accuracy. Whereas the R-squared error value for Polyester fiber is 0.9960, it indicates that the predicted curve fits the actual data curve with a 99.60% accuracy. • Discussion Following good accuracy, the ML model is utilized to predict the coefficient α for Natural Coconut Fiber. Natural Coconut Fiber is considered to have a flow resistivity of 5910 N.s/m4 [16]. As it was evident from Table 6.4, values given by empirical equations provided by Miki were in good agreement with the experimentally determined values. To evaluate the accuracy of α values for Natural Coconut Fiber predicted by the ML model, the same empirical equations were used to obtain α values rather than experimental ones. The command pol_reg.predict() is used to predict the values of coefficient of sound absorption for 6 different values of frequency. The values of α were predicted for 6 different values of frequencies by using pol_reg.predict() command. Table 6.6 contains experimental, empirical (Miki), and ML values for alpha.
6.3 Case Studies
85
Table 6.6 Comparison of experimental, empirical (Miki), and ML values for alpha Hemp Fiber Frequency
Exp
Miki
Polyester ML
Exp
Coconut fiber Miki
ML
Miki
ML
125
0.022
0.021
0.029
0.034
0.052
0.029
0.029
0.019
250
0.046
0.044
0.051
0.074
0.071
0.061
0.071
0.067
500
0.113
0.107
0.115
0.134
0.138
0.151
0.164
0.150
1000
0.212
0.195
0.198
0.321
0.295
0.368
0.371
0.349
2000
0.361
0.312
0.348
0.718
0.669
0.672
0.729
0.649
4000
0.537
0.439
0.552
0.799
0.618
0.741
0.872
0.845
Fig. 6.2 Frequency versus coefficient of sound absorption graphs for hemp fiber and polyester
As apparent from Fig. 6.2, all three methods yield similar results with relatively fewer errors. Hemp Fiber provided an accuracy of 97%, Polyester yielded 90% while Natural Coconut Fiber gave an accuracy of 88% when compared with experimental readings. Figure 6.3 shows the Frequency versus coefficient of sound absorption graph for Natural Coconut Fiber material. The primary reason for the deviation in the accuracy of empirical values at higher frequencies is the σf ratio. For a specific range of frequencies, Miki’s empirical equations are appropriate, but as the frequency or air flow resistivity is increased, accuracy gradually decreases [17]. Many people like Delany-Bazley, Qunli, Miki, Komatsu, Allard—Champoux and Mechel have proposed multiple ways to evaluate the α values. Being the improved version of Delany and Bazley, the empirical formulae given by Miki were used to obtain α values to compare the accuracy of the ML model. The quadratic regression models for Hemp Fiber and Polyester showed a curvefitting accuracy of 99.66% and 99.60% respectively, shown in Fig. 6.3. Prediction accuracy is found to be 97% and 90% respectively. The ML model shows an 88% prediction accuracy when used for estimating coconut fiber’s coefficient of sound absorption property. On observation of Fig. 6.3, it was found that empirical (Miki) sound absorption values start to deviate from the experimental and predicted values. This happens when the σf ratio reaches a critical value which is shown by the black line in Fig. 6.3.
86
6 Predictive Maintenance for Manufacturing
Fig. 6.3 Frequency versus coefficient of sound absorption graph for Natural Coconut Fiber
This yet again confirms the claim made by Miki that the empirical equations yield accurate values only in the range of 0.01< σf < 1. Thus, the Machine Learning model can be used to replace the traditional, timeconsuming, and expensive experimental methods for determining the coefficient of sound absorption with considerable accuracy. It is important to note that a single ML model cannot be used universally to predict alpha for every material, but it can be considered accurate for a certain material that falls in the given range of air flow resistivities for which the ML model is trained.
6.3.1.2
Predictive Analysis for Cutting Tool Life
• Importance of Tool Life in Milling Milling is the machining process of removing unwanted material from a workpiece by feeding in a direction that is at an angle with the tool’s axis. It encompasses a wide range of operations. The various milling processes are performed by different milling cutters. To maximize the profits, industries are constantly looking for cost-effective solutions with shorter lead times, worldwide. It is recognized that the milling parameters such as cutting speed, depth of cut, and feed rate can greatly influence the machining economics, and thus, need to be optimized to improve the machining productivity, machining cost per component, and other appropriate criteria. Estimation of tool life is a more crucial problem that is influenced by a variety of factors, including speed, feed, and depth as well as tool geometry, material, cutting fluid, and workpiece material. The shorter tool life can adversely affect the machining economics. The importance of cutting tools cannot be overstated in any machining system. Most of the time, the quality of the end product is determined by the quality of the
6.3 Case Studies
87
cutting tools. The cutting tool’s quality and performance have a direct impact on the machining systems and overall productivity. This chapter provides an overview of tool life prediction. • Predictions Tool Life in Milling It has been proven that fine-tuning factors like feed rate, cutting speed, and depth of cut can significantly improve a machine tool’s performance in terms of component tolerances, surface quality, operating time, and tool life [18]. The study introduces a variety of signal processing techniques, which include DWT, TDA, isotonic regression, and exponential smoothing, to assess tool wear at the time of face milling operations. In order to lessen the impact of random process changes and fixed faults, features are calculated robustly [19]. The ANN machine learning model is used to predict tool life in milling operations [20]. Many sensors are used to measure cutting forces and monitor their performance [21]. A complex classifier’s classification result is assessed by employing a synthetic predictive maintenance dataset [22]. A CNC milling machine is used with a ball-nose tungsten carbide cutter tool. To predict tool wear, a stochastic method based on a particle filter was used [23]. Using statistical and time-series methods, three tool wear indexes were created. 22 milling tests with a specific thresholding system have been provided. Results were beneficial in lowering the impact of modifications made to the cutting conditions [24]. Important concepts like estimators, and overfitting of the XGBoost regression model are used for investigation [25]. Tool wear is determined using a neural network-based sensor fusion during a CNC milling operation [26]. • Data Preparation for ML Model The first step is to create a database of various combinations of cutting parameters using the Taguchi Design of Experiments (DOE). XGBoost was used to model tool life in the second step. Validation is done in the third step by running the experiments. The statistical (RMS) method was applied to create the XGBoost model. The experimental testing is performed on the CNC milling machine. CNC machines are widely used in the metalworking industry and are the best option for most milling tasks. The experiments were conducted using a vertical milling as shown in Fig. 6.4, with a tool having a 15 mm diameter as shown in Fig. 6.5 with five cutter inserts. The milling bit was of HSS. The workpieces used for this experiment were Al 6061. The Taguchi approach involves applying a robust plan of tests to lower the variation in a process. Producing high-quality products at a considerable cost to the manufacturer is the method’s ultimate goal. To explore how various parameters affect the mean and variance of a process performance characteristic that indicates how well the process is operating, Taguchi came up with a method for designing experiments. In Taguchi’s proposed experimental design, the variables influencing the process and the levels at which they should be modified, are organized using orthogonal arrays. The Taguchi technique tests pairs of possibilities instead of testing every possible combination, as otherwise required by the factorial design. This reduces the amount
88
6 Predictive Maintenance for Manufacturing
Fig. 6.4 Vertical axis CNC machine Fig. 6.5 Milling bit
of experimentation required to get the data, sufficient to understand which aspects have the greatest influence on product quality, saving both time and resources. G-code was written for CNC considering Taguchi DOE. Random results of the experiment are shown in Table 6.7. • Preprocessing of Data The dataset of 200 points is semi-synthesized from the experiment, so as to remove bias and improve accuracy. Spindle speed data ranges from 100 to 1600 rpm, feed
6.3 Case Studies
89
Table 6.7 Random values of experiment Serial number
Spindle cutting speed (in RPM)
1
1480
Tool feed rate (in mm/min)
Depth of cut (in mm)
Actual life of tool evaluated during the operation (in min)
182
0.9
126
2
740.0
136.0
0.6
390
3
1450.0
98.0
0.6
139
4
125.0
24.0
0.1
233
5
1100.0
346.0
1.0
104
6
1290.0
26.0
0.2
246
7
520.0
90.0
0.3
430
8
1290.0
345.0
1.0
108
9
1180.0
95.0
0.4
281 539
10
680.0
24.0
0.1
11
125.0
120.0
0.7
40
12
280.0
182.0
0.9
250
13
600.0
360.0
1.0
255
rate from 22 to 370 mm/min, cutting depth from 0.1 to 1 mm, and tool life from 9 to 612 min. Data points are split into trained and tested (validation), 90% of data is given for training and 10% is given for validation. Data is then scaled because the parameters which have smaller values do not imply that they are less significant. Data is scaled using the Python library sklearn. preprocessing and from which standard scaler has been used. The standard score of sample x is evaluated as: z=
x −u s
where, u represents mean of the samples and s, the standard deviation of the samples. The density of data points is shown in Fig. 6.6. Figure 6.6 illustrates the dataset’s 3D visualization and redirects the efficient zone, which is represented by dark red scatter points and has a spindle speed range of 360–740 RPM, a feed rate range of 50–200 mm/min, and a depth of cut range from 0.2 to 0.7 mm. Various data visualization graphs have been plotted to get the proper correlation between the parameters as shown in heatmap Fig. 6.7. • XGBoost Regressor ML model The distributed gradient-boosting library known as XGBoost has proven capabilities and is highly efficient, versatile, and portable. The machine learning techniques are implemented as XGBoost uses the Gradient Boosting framework. For regression predictive modeling, XGBoost is an effective gradient-boosting solution.
90
6 Predictive Maintenance for Manufacturing
Fig. 6.6 Histogram representing the effect of machining parameters on the tool life
Fig. 6.7 Heatmap
Regression trees are poor learners because they take a data point from the input and turn it into one of their leaves, thereby providing a continuous score. XGBoost library minimizes a regularized objective criterion viz. regression tree functions, using a convex loss function that refers to the difference between target and predicted values; also a penalty term for the complexity of model. The training procedure is carried out iteratively by the addition of latest trees that are intended for the estimation of the residuals/errors of the preceding trees, which eventually are merged together
6.4 Summary
91
Fig. 6.8 XGBoost RMSE
to evaluate the final prediction. Here, XGBoost is hyper-tuned with n_estimators 1391 (Number of gradients boosted trees, equivalent to boosting rounds or in other words epochs or iteration), eta or learning rate of 0.04 and max_depth 6 (maximum depth of the tree). To prevent overfitting, these hyper-tuned variables were acquired by trial and error. • Result and Analysis Figure 6.8 shows the relation between root mean squared error (RMSE) and accuracy. The RMSE and accuracy for trained data are 0.00736 and 0.9999 respectively whereas RMSE and accuracy for test data are 18.78901 and 0.9801 respectively. Table 6.8 shows the effectiveness of XGBoost in forecasting tool life during the milling process. The relationship between the life predicted and the actual life is shown in Fig. 6.9, based on the XGBoost Regressor Analysis.
6.4 Summary With the objective to model and estimate tool life in CNC milling operations of the components made up of Al 6061 material, this study established an XGBoost Regressor Analysis. It has been found that the Design of Experiments approach helps achieve the required precision.
92
6 Predictive Maintenance for Manufacturing
Table 6.8 Result of XGBoost and actual experimental values Serial number
Spindle cutting speed (in RPM)
Tool feed rate (in mm/ min)
Depth of cut (in mm)
Actual life of tool evaluated during the operation (in min)
Predicted life of tool evaluated during the operation (in min)
Error observed (in %)
1
1480
182
0.9
126
123.009
2.374
2
740.0
136.0
0.6
390
387.627
0.608
3
1450.0
98.0
0.6
139
139.550
−0.396
4
125.0
24.0
0.1
233
249.651
−7.146
5
1100.0
346.0
1.0
104
102.649
1.299
6
1290.0
26.0
0.2
246
246.458
−0.186
7
520.0
90.0
0.3
430
438.070
−1.877
8
1290.0
345.0
1.0
108
102.976
4.652
9
1180.0
95.0
0.4
281
282.946
−0.693
10
680.0
24.0
0.1
539
521.881
3.176
11
125.0
120.0
0.7
40
26.118
34.705
12
280.0
182.0
0.9
250
253.123
−1.249
13
600.0
360.0
1.0
255
259.881
−1.914
Fig. 6.9 Comparison of actual life and predicted life
It was found that (XGBoost) prediction closely matches the outcomes of the experiments. Thus, the statistical relationship between the trained and tested data was determined to be 0.99999 and 0.98015 respectively, and the RMSE for the tested and trained data was computed to be 18.78901 and 0.00736 respectively. It can be used in industries to keep an ample amount of tool stock for inventory management and also help in pitching quotations to customers.
References
93
References 1. Li Y, Ren S (eds) Acoustic and thermal insulating materials. In: Woodhead publishing series in civil and structural engineering, building decorative materials. Woodhead Publishing, pp 359–374, ISBN 9780857092571, https://doi.org/10.1533/9780857092588.359 2. Delany ME, Bazley EN (1970) Acoustical properties of fibrous absorbent materials. Appl Acoust 3:105–111 3. Miki Y (1990) Acoustical properties of porous materials—modifications of Delany–Bazley models. J Acoustic Soc Jpn (E) 11(1):19–24 4. Quadratic Regression in Python. https://www.statology.org/quadratic-regression-python/ 5. The Room Acoustics Absorption Coefficient Database. https://www.ptb.de/cms/ptb/fachabtei lungen/abt1/fb-16/ag-163/absorption-coefficient-database.html 6. Qunli W (1988) Empirical relations between acoustical properties and flow resistivity of porous plastic open-cell foam. Appl Acoust 25:141–148 7. Putra A, Abdullah Y, Efendy H, Farid WM, Ayob MR, SajidinPy M (2012) Utilizing Sugarcane Wasted Fibers as A Sustainable Acoustic Absorber. In: Malaysian technical universities conference on engineering and technology 8. Dunne RK, Desai DA, Heyns PS (2021) Development of an acoustic material property database and universal airflow resistivity model. Appl Acoust 173:107730. ISSN 0003-682X, https:// doi.org/10.1016/j.apacoust.2020.107730 9. Komatsu T (2008) Improvement of the Delany-Bazley and Miki models for fibrous soundabsorbing materials. Acoust Sci Technol 29(2):121–129 10. Johnson DL, Koplik J, Dashen R (1987) Theory of dynamic permeability and tortuosity in fluid-saturated porous media. J Fluid Mech 176:379–402 11. Mechel FP (1988) Design charts for sound absorber layers. J Acoust Soc Am 83(3):1002–1013 12. ASTM C384-90a (1990) Standard method for impedance and absorption of acoustical materials by the impedance tube method. American Society for Testing and Materials, Philadelphia 13. Beranek LL (1942) Acoustic impedance of porous materials. J Acoust Soc Am 13:248–260 14. Champoux Y, Stinson MR, Daigle GA (1991) Air-based system for the measurement of porosity. J Acoust Soc Am 89(2):910–916 15. ASTM c522 Standard Test Method for Airflow Resistance of Acoustical Materials 16. Garai M, Pompoli F (2003) A European inter-laboratory test of airflow resistivity measurements. Acta Acustica uw Acustica 89(3):471–478 17. Seddeq HS (2009) Factors influencing acoustic performance of sound absorptive materials. Aust J Basic Appl Sci 18. Liang SY, Hecker RL, Landers RG (2004) Machining process monitoring and control: the state-of-the-art. Trans ASME J Manuf Sci Eng 126–2:297–310 19. Bhattacharyya P, Sengupta D, Mukhopadhyay S (2007) Cutting force based real-time estimation of tool wear in face milling using a combination of signal processing techniques. Mech Syst Signal Process 21:2665–2683 20. Khorasani AM, Yazdi MRS, Safizadeh MS (2011) Tool life prediction in face milling machining of 7075 al by using artificial neural networks (ANN) and Taguchi design of experiment(DOE). IACSIT Int J Eng Technol 3(1) 21. Matsubara A, Ibaraki S (2009) Monitoring and control of cutting forces in machining processes: a review. Int J Autom Technol 445–456 22. Matzka S (2020) Explainable artificial intelligence for predictive maintenance applications. Third international conference on artificial intelligence for industries (AI4I) 23. Wang J, Wang P, Gao RX Tool life prediction for sustainable manufacturing 24. Yan W, Wong YS, Lee KS, Ning T (1999) An investigation of indices based on milling force for tool wear in milling. J Mater Process Technol 89:245–253
94
6 Predictive Maintenance for Manufacturing
25. Shahani NM, Zheng X, Liu C, Hassan FU, Li P Developing an XGBoost regression model for predicting young’s modulus of intact sedimentary rocks for the stability of surface and subsurface structures. Front Earth Sci 9:761990 26. Ghosh N, Ravi YB, Patra A, Mukhopadhyay S, Paul S, Mohanty AR, Chattopadhya AB (2007) Estimation of tool wear during CNC milling using neural network-based sensor fusion. Mech Syst Signal Process 21:466–479
Chapter 7
Conclusions
7.1 Summary Data has always been an important part of any organization. It is necessary to study every facet of data, whether it is generated by giant corporations or by a single person, in order to get something from it. Data analytics plays a crucial part in enhancing your company as it is utilized to uncover hidden insights, develop reports, conduct market analysis, and enhance business requirements. This area of data science makes use of cutting-edge methods to extract data, anticipate the future, and identify trends. Both machine learning and traditional statistics are included in these technologies. In the current era of digitization and automation, smart computing is playing a crucial role. Increasing number of companies is becoming more reliant on the technology and changing their all operations to artificial intelligence (AI) enabled and this AI revolution is bringing a big change in the labor market. In the sequel, these advancements in operations and technologies are creating significant impact on human lives and livelihood. The use of AI for predictive analytics and smart computing has become an integral component in all the use cases surrounding us. This is giving birth to the fourth industrial revolution, i.e., Maintenance 4.0. First chapter of the book presents an overview of data analytics and its types, further this chapter covers discussion on the need of predictive analysis and its applications. Basic understanding of data preparation is required for the development of any AI application in view of the types of data and data preparation process, tools are also presented in Chap. 2. In this chapter data cleaning, transformation, and reduction techniques are explained in detail. Data preparation techniques vary based on the data type and source of the data, in this view, various types and sources of data are also discussed in this chapter. The current trend of Maintenance 4.0 leans toward the preventive mechanism enabled by predictive approach and condition-based smart maintenance. The intelligent decision support, earlier detection of spare part failure, and fatigue detection is the main slices of intelligent and predictive maintenance system (PMS) © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. N. Mahalle et al., Predictive Analytics for Mechanical Engineering: A Beginners Guide, SpringerBriefs in Computational Intelligence, https://doi.org/10.1007/978-981-99-4850-5_7
95
96
7 Conclusions
leading toward Maintenance 4.0. This book focuses on key component required for building predictive maintenance model. Compliance of safety measures, pre-emptive actions toward safety, and improved lifetime with minimum downtime are some key outcomes of PMS. Chapter 3 presents the main task of the book, i.e., predictive maintenance. In this chapter the key components required for building predictive maintenance model are discussed. Predictive maintenance is one of the important applications of data analytics and becoming a popular tool for preventive mechanisms. This book also presents prominent use cases of mechanical engineering using predictive maintenance system (PMS) along with its benefits in the last chapter of this book.
7.2 Research Opening PMS and AI are the growing areas where various tools and techniques like machine learning, deep learning, and AI-driven tools and techniques are playing a crucial role. All leaders in mechanical engineering are using PMS and there is still a wide scope to explore the research avenues. Sensors and actuators are heavily used in the production floor and the data generated by these sensors is used for predictions. These predictions are carried out based on the pattern identification and feature engineering to perform data analysis. In view of this, there is lot of scope in building mathematical models and efficient algorithms for feature engineering. Data-centric development of machine learning and AI algorithms is another area where researchers can explore using classification, regression, and anomaly detection. Future failures and maintenance needs are another very important research area where development of methods to estimate the machine reliability is to be investigated. Other research areas include prognostics and health management for machine health and performance prediction, decision-making and scheduling of maintenance activities, tiny IoT, tiny ML, and cost–benefit analysis.
7.3 Future Outlook The future of PMS is promising as it is one of the most evolving requirements of the mechanical engineering industries. Re-scaling and up-scaling of all existing PMS is very important futuristic aspect. This scaling involves multiple factors like (5 M), i.e., man, machines, materials, methods and money. The product visualization and product realization is also very important future aspects for PMS in order to get better predictions and estimations. AI enables infrastructure and use cases deployment on this infrastructure is a prominent application development transformation likely to be there in the future. Recently hydrogen mission and quantum mission has become the key short-term vision for every country across the globe and invent on demand is going to be the main future perspective. In addition to this, PMS is also becoming
7.3 Future Outlook
97
tightly coupled with IoT where data generated by IoT devices will be used for better predictions and micro scheduling of the activities. For better predictions, instead of increased use of AI, enhanced use of AI like narrow AI, and trusted AI are going to be more promising futuristic technologies. Proactive maintenance, and standardization for greater adoption are also very important future outlook for PMS.