143 98
English Pages [329] Year 2016
Copyrighted Material
PERFORMABILITY ENGINEERING SERIES
Series Editors: Krishna B Misra and John Andrews
UELIABILITY
Scrivener
Wiley
Publishing Copyrighted Material
Machine Tool Reliability
ScrivenerPublishing
100 Cummings Center, Suite 541J
Beverly, MA 01915-6106
PerformabilityEngineeringSeries
Series Editors: Krishna B. Misra ([email protected])
and John Andrews ([email protected])
Scope: A true performance of a product, or system, or service must be judged over the entire life cycle activities connected with design, manufacture, use and disposal in relation to the economics of maximization of dependability, and minimizing its impact on the environment. The concept of performability allows us to take a holistic assessment of performance and provides an aggregate attribute that reflects an entire engineering effort of a product, system, or service designer in achieving dependability and sustainability. Performance should not just be indicative of achieving quality, reliability, maintainability and safety for a product, system, or service, but achieving sustainability as well. The conventional perspective of dependability ignores the environmental impact considerations that accompany the development of products, systems, and services. However, any industrial activity in creating a product, system, or service is always associated with certain environmental impacts that follow at each phase of development. These considerations have become all the more necessary in the 21st century as the world resources continue to become scarce and the cost of materials and energy keep rising. It is not difficult to visualize that by employing the strategy of dematerialization, minimum energy and minimum waste, while maximizing the yield and developing economically viable and safe processes (clean production and clean technologies), we will create minimal adverse effect on the environment during production and disposal at the end of the life. This is basically the goal of performability engineering.
It may be observed that the above-mentioned performance attributes are interrelated and should not be considered in isolation for optimization of performance. Each book in the series should endeavor to include most, if
not all, of the attributes of this web of interrelationship and have the objective to help create optimal and sustainable products, systems, and services.
Publishers at Scrivener
Martin Scrivener ([email protected])
Phillip Carmical ([email protected])
Contents Preface Acknowledgements Chapter 1: Introduction 1.1 Basic Reliability Terms and Concepts 1.2 Machine Tool Failure 1.3 Machine Tool Reliability: Manufacturer’s View Point 1.4 Machine Tool Reliability: User’s View Point 1.5 Organization of the Book End Notes
Chapter 2: Basic Reliability Mathematics 2.1 Functions Describing Lifetime as a Random Variable 2.2 Probability Distributions Used in Reliability Engineering 2.3 Life Data Analysis 2.4 Stochastic Models for Repairable Systems 2.5 Simulation Approach for Reliability Engineering 2.6 Use of Bayesian Methods in Reliability Engineering 2.7 Closing Remarks
Chapter 3: Machine Tool Performance Measures 3.1 Identifying Performance Measures 3.2 Mechanism to Link Users’ Operational Measures with Machine Reliability and Maintenance Parameters 3.3 Closing Remarks End Note
Chapter 4: Expert Judgement-Based Parameter Estimation Method for Machine Tool Reliability Analysis 4.1 Expert Judgement as an Alternative Source of Data in Reliability Studies 4.2 Expert Judgement-Based Parameter Estimation Methods 4.3 Some Desirable Properties of a “Good” Estimator 4.4 Closing Remarks
Chapter 5: Machine Tool Maintenance Scenarios, Models and Optimization 5.1 Overview of Maintenance 5.2 Machine Tool Maintenance 5.3 Machine Tool Maintenance Scenarios 5.4 Preventive Maintenance Optimization Models for Different Maintenance Scenarios 5.5 Closing Remarks
Chapter 6: Reliability and Maintenance-Based Design of Machine Tools 6.1 Optimal Reliability Design 6.2 Optimal Reliability Design of Machine Tools 6.3 Failure Mode and Effects Analysis 6.4 Closing Remarks
Chapter 7: Machine Tool Maintenance and Process Quality Control 7.1 Development of Statistical Process Control (SPC) 7.2 Economic Design of Control Chart 7.3 Process Failure 7.4 Joint Optimization of Maintenance Planning and Quality Control Policy 7.5 Joint Optimization of Maintenance Planning and Quality Control Policy Using X-Control Chart 7.6 Joint Optimization of Preventive Maintenance and Quality Policy Incorporating Taguchi Quadratic Loss Function 7.7 Joint Optimization of Preventive Maintenance and Quality Policy Based on Taguchi Quadratic Loss Function Using CUSUM Control Chart 7.8 Extension of the Joint Optimization of Maintenance Planning and Quality Control Policy for Multi-component System 7.9 Closing Remarks Endnotes
Chapter 8: Joint Optimization of Production Scheduling with Integrated Maintenance
Scheduling and Quality Control Policy 8.1 Production Scheduling 8.2 Exploring the Link between Production Scheduling and Maintenance 8.3 The Optimal Scheduling Problem 8.4 Joint Optimization of Preventive Maintenance and Quality Control 8.5 Integration of Production Scheduling with Jointly Optimized Preventive Maintenance and Quality Control Policy 8.6 Numerical Illustration 8.7 Solving a Larger Problem 8.8 Extension of the Integrated Approach Multiple Machine in Series 8.9 Closing Remarks
Chapter 9: Machine Tool Reliability: Future Research Directions 9.1 Moving towards Servitization 9.2 Multi Agent-Based Systems 9.3 Closing Remarks
References Appendix: A1
Appendix: A2 Index
Machine Tool Reliability
Bhupesh K. Lad Divya Shrivastava Makar and S. Kulkarni
Scrivener Publishing
W ILEY
Copyright © 2016 by Scrivener Publishing LLC. All rights reserved. Co-published by John Wiley & Sons, Inc. Hoboken, New Jersey, and Scrivener Publishing LLC, Salem, Massachusetts.
Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission. Limit of Liability/Disclaimer of Warranty: While the publisher and authors have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.
For more information about www.scrivenerpublishing.com.
Scrivener
products
Library of Congress Cataloging-in-Publication Data: ISBN 978-1-119-03860-3
please
visit
Preface Reliability engineering as a subject matter is developed vastly in last few decades. Numerous books have been published on the subject, discussing basic principles, theories, models, tools and techniques, in general. However, every system is unique and some of them may require specific treatment while applying various tools and techniques of reliability engineering. This book explores the domain of reliability engineering for one such very important industrial system, called machine tools. Machine tools are at the heart of the manufacturing systems. Manufacturing industries rely on machine tools to fulfil their customers’ demand. Failure of machine tool hampers their production efficiency and creates uncertainties in managing the shop floor operations resulting into significant economic losses. Moreover, the users of such systems are now sharing the risk of failures with the machine tool manufacturers by engaging into long term maintenance or availability contracts. This has created new business avenue for machine tool manufacturers for “Servicizing” their traditionally product focused business. Machine tool manufacturers have the opportunity to package effective life cycle maintenance services with the hardware products, i.e. machine tools. It is therefore important for machine tool manufactures as well as users to focus on core of reliability engineering to model machine tool’s failure/repair and its interaction with other measures of system performances. This advanced text on machine tool reliability modelling aims to provide a consolidated volume on various dimensions of machine tool reliability and its implications from manufacturers and users point of view. From manufacturers point of view novel methodologies for reliability and maintenance based design of machine tools are covered. From users point of view novel methodologies are presented to integrate reliability and maintenance of machine tools with production scheduling and quality control. Application area, i.e. machine tools is very important and it covers entire manufacturing sector. The target audience of the book are researchers and practicing engineers in the field of reliability engineering and operations management. The book
can also be helpful to undergraduate students in the area of reliability to get an application flavour of the subject. It opens up various research dimensions for researchers. All the approaches are illustrated with the help of numerical examples. This makes the approaches easy to understand. This book does not intend to provide coverage to basic of reliability engineering. It is expected here that the readers have some basic knowledge of the reliability engineering, probability and statistics. However, Chapter 2 is provided for the reader to refresh their basic of probability and statistics required to follow the text.
Acknowledgements Authors would like to acknowledge the help received from Dr. Avinash Samvedi and Mr. Vikas Sankhla in writing some of the codes used in this book. We also acknowledge the help of Mr. Sandeep Kumar who helped in editing the references.
Chapter 1
Introduction Reduced cost of production, timely delivery and high quality of products are the prime objectives for manufacturing industries. Breakdowns of production machinery or machine tools affect the manufacturer’s ability to meet the goals of Cost, Time and Quality (CTQ). One of the studies suggests that the economic loss due to an unexpected stoppage in industry can be as high as US $70,000 to US $420,000 per day [1]. Application of reliability engineering tools and techniques to machine tools for improving the manufacturing system performance is therefore a vital area of study. The machine tool industry is one of the supporting pillars for the competitiveness of the entire manufacturing sector since it produces capital goods which in turn may produce manufactured goods. Customers of machine tool manufacturers (termed as “users” in this book) are, in many cases, vendors to other customers and have commitments to meet. Breakdowns of machine tools may jeopardize their ability to meet these commitments and also cost a lot of money to the users in terms of poor quality, slower production, downtime, etc. Since poor reliability and improper maintenance of a machine tool greatly increase the life cycle cost to the users, many machine tool users have changed their purchase criteria for a machine tool from initial acquisition cost to Life Cycle Cost (LCC) or Total Cost of Ownership (TCO). As reliability engineering plays an important role in reducing the LCC of machine tools, this book will be equally appealing to machine tool manufacturers and users. The book covers both the manufacturer’s and user’s viewpoint of machine tool reliability. Decisions made during the design phase of a product have the largest impact on the life cycle cost of a system. The inherent failure and repair characteristics of components and assemblies are frozen with the selection of the machine tool configuration at the design stage. Therefore, the maintenance requirements of the machine tools are also fixed at the design stage itself. For example, a higher reliability component may require
a lower replacement frequency for the same operating profile compared to a lower reliability component. Therefore, machine tool manufacturers need to consider the reliability and maintenance aspects at the design stage itself. On the other hand, the cost effectiveness of machine tools at the user’s end also depends on the shop-floor level operations planning decisions, i.e., scheduling, inventory, quality control, etc. These shop-floor level operations planning decisions have interaction effect with machine tool reliability and maintenance. Therefore, machine tool users need to consider the reliability and maintenance aspects during operations planning. The goal of this book is to provide a consolidated volume on various dimensions of machine tool reliability and its implications from the manufacturer’s and user’s point of view. The introductory chapter of the book describes basic reliability terms and defines machine tool failures. The importance of machine tool reliability from the manufacturers’ and users’ point of view is also discussed.
1.1 Basic Reliability Terms and Concepts This section introduces important reliability terms and concepts which will help the reader in following the rest of the sections of the book. Reliability: This is the probability that an item can perform its intended function for a specified interval under stated conditions [2]. In other words, it is the probability of survival over time. To determine the reliability of a particular component or system, an unambiguous and observable description of failure is essential. The machine tool failures are defined in the next section. If T is a random variable, representing time to failure of the system or component, then reliability can be expressed as: (1.1) It is contextual here to clearly differentiate the term “quality” and “reliability.” If quality is the conformance to the specifications at t = 0, then reliability can be considered as conformance to the specifications at t > 0.
However, in this book, “reliability” is used in the context of the machine tools, while “quality” is used in the context of the products produced using machine tools. Failure Rate (Hazard Rate): Failure rate or hazard rate is the instantaneous (at time t) rate of failure [3]. It is the instantaneous failure rate. This index is normally used for non-repairable components. A component of the system may have increasing, decreasing, or constant failure rate. It is further discussed in Chapter 2. Rate of Occurrence of Failure (ROCOF): This index is often used in place of hazard rate for repairable system. Failures occur as a given system ages and the system is repaired to a state that may be the same as new, or better, or worse. Let N(t) be a counting function that keeps track of the cumulative number of failures a given system has had from time zero to time t. N(t) is a step function that jumps up one every time a failure occurs and stays at the new level until the next failure. The ROCOF is the total number of failures within an item population, divided by the total number of life units expended by that population during a particular measurement period under stated conditions [2]. Every system will have its own observed N(t) function over time. If we observed the N(t) curves for a large number of similar systems and “averaged” these curves, we would have an estimate of M(t) = the expected number (average number) of cumulative failures by time t for these systems. Maintenance: All actions necessary for retaining an item in or restoring it to a specified condition [2].
Corrective Maintenance (CM): All actions performed as a result of failure, to restore an item to a specified condition [2]. Corrective maintenance can include any or all of the following steps: localization, isolation, disassembly, interchange, reassembly, alignment and checkout. Preventive Maintenance (PM): All actions performed to retain an item in a specified condition by providing systematic inspection, detection, and prevention of incipient failures [2]. Predictive Maintenance: Predictive maintenance or Condition Based Maintenance (CBM) is carried out only after collecting and evaluating
enough physical data on performance or condition of equipment, such as temperature, vibration, particulate matter in oil, etc., by performing periodic or continuous (online) equipment monitoring [4]. Maintainability: It is the relative ease and economy of time and resources with which maintenance can be performed. More precisely, it is the probability that an item can be retained in, or restored to, a specified condition within a specified time when maintenance is performed by personnel having specified skill levels, using prescribed procedures and resources, at each prescribed level of maintenance and repair [2]. Availability: Depending on the purpose of analysis, a number of different definitions are used in the literature, some of which are given below [3]: Instantaneous or Point Availability, A(t): It is the probability that a system will be operational at any random time t. Unlike reliability, the instantaneous availability measure incorporates maintainability information. Average Availability: It is the proportion of time a system is available for use during a mission. Mathematically, it is calculated as the mean value of the instantaneous availability function over the period (0, T).
(1.2)
Steady State Availability: The steady state availability of the system is the limit of the instantaneous availability function as the time approaches infinity. (1.3) Inherent Availability: It is the steady state availability when considering only the corrective maintenance downtime of the system. It does not include delays due to unavailability of maintenance personnel, unavailability of spare parts, administrative procedures, etc. The inherent availability of a system is a function of the reliability of its
components and maintainability, which more or less get defined at the design stage of the equipment.
(1.4) where MTBF is the mean time between failures and MTTR is the mean time to repair. Operational Availability: It is a measure of the average availability over a period of time, including all the delays due to unavailability of maintenance personnel, spare parts, administrative procedures, etc. Operational availability is the availability that the customer actually experiences.
(1.5) where MTBM is the mean time between maintenance, SDT and MDT are the supply and maintenance delays respectively. Inherent availability and operational availability are used in this book and are discussed further in Chapter 3. Life Cycle Cost (LCC): It is the sum of acquisition, logistics support, operating, and retirement and phase-out expenses [2].
1.2 Machine Tool Failure The first step in applying any reliability engineering technique to any system is to clearly define the failures of that particular system. The Society of Automotive Engineering (SAE) defines the failure of production machinery/equipment as: “any event due to which the machinery/equipment is not available to produce parts at specified conditions when scheduled, or is not capable of producing parts or performing scheduled operations to specification” [5]. However, care should be taken in expressing the failure criteria as different users may have different expectations in terms of the product
performance. There may also be a diversity of opinion between machine tool users and manufacturers as to what exactly constitutes a degraded performance or failure. Therefore, while the SAE definition of failure can serve as a guideline, it is necessary that the failure criteria are clearly and quantitatively (wherever possible) defined by the designer, keeping in mind the user’s viewpoint. In this book, failures of machine tools are defined in terms of failure consequences. These consequences express the user’s view of failure under the mutually agreed upon operating conditions between the user and the manufacturer. Whenever failure occurs, it leads to one of the following Failure Consequences (FCs). Failure Consequence 1 (FC1): failure is detected immediately and the machine has to be stopped. Failure Consequence 2 (FC2): machine continues to operate, but at a lower production rate than designed (i.e., with increased cycle time). Failure Consequence 3 (FC3): machine continues to run, but produces more rejections than the normal rejection rate.
In many cases, failure consequences 2 and 3 are detected by the users after a time lag, during which the machine tool runs at a reduced performance level. The last two failure consequences can be considered as the result of partial failures and can be defined as degradation in performance without complete failure [6–8]. Figure 1.1 depicts these failure consequences on a time-performance curve. It clearly indicates the relation of a machine tool failure with the user’s shop-floor performance measures like Availability (A), Performance Rate (PR), Quality Rate (QR) and failure costs. Figure 1.1 Machine tool failure on time-performance curve (Reprinted with permission from [9]; Inderscience Publishers).
It was observed during one of the research projects carried out by the authors with a machine tool industry in India that many failure events of machine tools lead to failure consequences 2 and 3 and finally to consequence 1 when detected. Thus, such failure consequences must be considered explicitly by machine tool manufacturers, as well as by users, to reduce the life cycle cost of the machine tools. Table 1.1 provides examples of all three types of failure consequences for a CNC grinding machine.
Table 1.1 Failure consequences and affected performances (Reprinted with permission from [8]; Inderscience Publishers).
1.3 Machine Tool Reliability: Manufacturer’s View Point Historically, machine tool designers have done a good job of evaluating the functions and form of products at the design phase. Once the functional design of the machine tool is done, designers generally have multiple alternatives for many of the components/subassemblies that can satisfy the functional requirements of the system. Such alternatives, apart from their cost, also differ in their inherent failure and repair characteristics, like timeto-failure distribution, time-to-repair distribution, failure consequences, etc. For example, a designer may have two alternatives for spindle. viz., motorized and belted spindle. Even though both these alternatives may satisfy the functional requirements of the machine, they will have different failure and repair characteristics. Therefore, each of these alternatives will contribute differently to the reliability performance of the system. Further,
Preventive Maintenance (PM) can also be used to improve the reliability performance of the system. However, preventive maintenance again consumes resources and time which could otherwise be used for production, thereby affecting profit. Therefore, from the view point of a machine tool designer, the problem of reliability and maintenance-based design of machine tool finally boils down to selecting the optimal machine tool configuration from the available alternatives for different components/subassemblies by simultaneously considering reliability and maintenance parameters such that the final configuration meets the user’s performance requirements and budget constraints. However, optimization of reliability and maintenance schedule poses a challenge when users are unable to explicitly express their reliability requirements quantitatively. It was observed during a survey done by the authors that only a few corporate customers express their reliability requirements explicitly in terms of Mean Time Between Failures (MTBF). But, even these users are more concerned about their shop-floor level performance and they judge the reliability of a machine tool based on how well it performs in terms of performance measures like Overall Equipment Effectiveness (OEE), Life Cycle Cost (LCC), Cost Per Piece (CPP), etc. These performance measures are closer to the heart of the users and are affected by the inherent failure and repair characteristics of the machine tool components/subassemblies and the maintenance plans. However, the extent to which the inherent failure as well as repair characteristics and preventive maintenance of machine components/subassemblies affect a user’s performance measures also depends on the user’s cost structure and shopfloor level policies. For example, if a user has alternative machines available to bear the load of a failed machine, then the downtime cost of that machine may not be as significant as in the case where there is no alternative machine available to bear the load of the failed machine [10]. Similarly, if a machine is being used as a stand-alone machine, its downtime cost will be different than that in the case when the same machine is being used in a production line. As can be seen from the examples, the cost structures will be different for each of the cases. Similarly, a tighter quality control policy at the user’s end will detect process shifts due to failure of machine components/subassemblies much earlier, thereby reducing the rejection rate.
Thus, the effect of machine failures and maintenances on LCC, OEE, and other performance measures, may be different for different users. In the case of machine tools, users provide their functional requirements, like cycle time, process capability, material to be machined, etc., to the manufacturers. Based on these requirements, the manufacturer designs the machine tool. In general, the design is for one of the following: General purpose machine tools Special purpose machine tools Customized machine tools A general purpose machine tool is one which can be used for a wide variety of operations, on a wide range of size of work pieces [11]. Thus, they are designed for a wide range of users. A special purpose machine tool is one which is designed for some specific operations on a limited range of work piece sizes and shapes [11]. These are generally engineered to meet the requirements of a specific user. In a customized machine tool, some of the components/subassemblies, especially structural elements, are standard components, while others are designed based on the specific requirements of a customer. Thus, the machine tool designer has three different functional design scenarios (also referred to as manufacturer’s business scenarios in this book). While considering reliability and maintenance at the design stage of a machine tool, each of the above three functional design scenarios offers different opportunities and challenges to the designer. For example, a general purpose machine tool design must be able to meet the reliability requirements of a wide range of users. On the other hand, the designer of a special purpose machine tool must be able to capture reliability requirements of a specific user. Therefore, a reliability- and maintenancebased design approach for machine tools must also be able to address the design needs in each of these functional design scenarios. Figure 1.2 depicts the entire concept of reliability- and maintenance-based design of machine tools. Many times, the existing alternatives available to the designer for some of the components/subassemblies may not be able to give satisfactory performance under the specified operating environment of the users. The designer may then need to improve the existing design of such components/subassemblies. For example, one of the main causes of failure
of workhead spindle is seal failure, thereby allowing the coolant and chips to enter into the spindle bearing and causing it to fail early. In this case, the designer may have to change the design of the spindle to accommodate a different sealing technology such that it restricts coolant entry or provides better chip separation arrangement. Figure 1.2 Reliability- and maintenance-based design of machine tools.
Similarly, users may also be interested in improving the reliability performance of their existing machines. Users can do this in two ways: 1. By improving the design of components/subassemblies in collaboration with machine tool manufacturer. The designer can search for a better alternative design for the critical components/subassemblies of the machine. 2. By changing the operating environment, shop-floor policies and cost structure. For example, users may introduce more stringent Statistical Process Control (SPC) procedures to reduce the cost of failures through early detection, thereby reducing the criticality of the failures. However, any improvement effort needs investment. The designer and users need to make a trade-off between the cost of improvements and benefits from the improvements. Thus, an approach for considering reliability and maintenance at the design stage will also help in making such decisions.
1.4 Machine Tool Reliability: User’s View Point Users of machine tools are other manufacturing industries, which use them for producing consumer or capital goods and are under continuous pressure to meet their customers’ requirements of high quality, low cost and timely delivery of products. Failures of production equipment affect the shop-floor level performance of the users. Thus, the users evaluate the reliability of a machine tool based on how well it performs in their production environment to meet their customers’ requirements. Moreover, shop-floor level performance also depends on the operations policy pertaining to scheduling, maintenance and quality. Furthermore, these three aspects of operations planning are affected by machine tool failures and also have some interaction effect, and hence joint consideration of various policy options pertaining to quality, maintenance and scheduling, along with their effect on the performance of manufacturing systems, are important areas of investigation. Only in recent years have researchers started to develop approaches that try to simultaneously optimize their parameters [12–16]. For example, older approaches for production scheduling do not consider the effect of machine unavailability due to failures or preventive maintenance activities. Similarly, the older maintenance planning models did not consider the impact of maintenance on due dates to meet customer requirements. However, maintenance effectiveness cannot be measured in a meaningful way without taking into account whether the maintenance addresses the production requirements [17]. Delaying the maintenance actions to meet the production requirement may increase the process variability and risk of machine failure, which in turn may cause higher rejections or downtime losses. Ollila and Malmipuro [18] observed that maintenance has a major impact on efficiency and quality along with equipment availability. In a case study carried out in five Finnish industries, it was shown that well-functioning machinery is a prerequisite to quality products. It was also shown that a lack of proper maintenance is usually among the three most important causes of quality deficiencies.
Due to the operational complexity and the presence of deterministic and stochastic events, obtaining optimal policies for manufacturing systems is both theoretically and computationally difficult. The literature as well as input from industries has clearly indicated the need to explore the problem of joint consideration of these shop-floor operational aspects. Thus, from the point of view of users, reliability of machine tools can be used to model the interaction of various shop-floor level operations policies. Figure 1.3 depicts the concept of interaction of reliability of machine tool and various shopfloor operations planning like scheduling, maintenance and quality control. Figure 1.3 User’s view of machine tool reliability.
1.5 Organization of the Book The rest of the book is organized as follows: The second chapter presents a brief overview of the basic reliability mathematics. It presents a discussion on some of the most common lifetime distributions, viz., exponential, Weibull, Normal, etc. In the third chapter, various performance measures for machine tool reliability are discussed and detailed models are developed for each of these measures. These models relate a machine tool’s reliability and maintenance parameters with the user’s cost structure and shop-floor level operational parameters. To be more specific, models for availability, performance rate, quality rate, overall equipment effectiveness, life cycle costs and cost per piece are developed. The chapter also provides a discussion on the use of such models from the user’s and manufacturer’s point of view. Models developed in Chapter 3 rely on time-to-failure
distribution parameters for estimating the number of failures for a component in a given time interval. If sufficient filed failure data are available, the designer can use the conventional methods mentioned in Chapter 2 to estimate the time-to-failure distribution parameters. However, in many of the real life situations, designers do not have sufficient field failure data. For such cases, Chapter 4 explores the possibility of utilizing expert knowledge for obtaining time-to-failure distribution parameters. Chapter 5 discusses various maintenance scenarios for machine tools. The following three maintenance scenarios are identified for a machine tool based on the types of preventive maintenance actions and the degree of restoration after a repair: Perfect corrective and preventive replacement; Imperfect corrective repair, imperfect preventive repair and perfect replacement; Minimal corrective repair, imperfect preventive repair, imperfect overhaul and perfect replacement. For each scenario, a maintenance optimization problem is formulated. For the case of imperfect maintenance, complexities in obtaining the optimal preventive maintenance schedule are reduced by developing some approximate models for estimating the number of failures. For the case of minimal corrective repair, a conditional number of failures model is discussed. This model, apart from regular preventive repair and replacement, also helps the designer in considering the effect of major overhauls on the optimal maintenance schedule decisions. It is demonstrated in the chapter that the optimal maintenance schedule decision also depends on the user’s cost structure and shop-floor policy parameters. In Chapter 6, two methods for reliability-based design of machine tools are provided. The first design methodology allows selection of optimal machine tool configuration based on Life Cycle Cost (LCC) and other performance requirements of the user. The optimal solution is obtained by simultaneously considering reliability and maintenance at the design stage under three different functional design scenarios, viz., general purpose machine tool design, special purpose machine tool design and customized machine tool design. The second design methodology helps the designer in improving the design of the existing system by identifying the critical
components/subassemblies. A cost-based Failure Consequence Analysis (FCA) is proposed for this purpose. The proposed methodology can help a machine tool manufacturer in making effective cost-driven decisions while improving the reliability performance of the machine tool. It also provides guidance to the machine tool users in identifying the areas where they can focus to obtain better performance from the machine. Chapter 7 and Chapter 8 present the user’s perspectives on machine tool reliability by highlighting the interaction of machine failures and maintenance with shop-floor level operations policies. Chapter 7 presents various approaches for joint optimization of maintenance scheduling and quality control policy. In the first approach, a control chart (goal post approach) and an imperfect time-based maintenance policy for the simultaneous economic design of preventive maintenance and quality control policy are discussed. In the second approach, Taguchi’s loss function is incorporated in the simultaneous economic design of preventive maintenance and quality control policy. A control chart is that it is relatively insensitive to major limitation of a small shifts in the process mean. CUSUM (cumulative-sum) control charts can be used as an effective alternative to the control chart to detect small shifts in the process mean. In the third approach, an integrated maintenance planning and quality control model is developed considering the CUSUM control chart for detecting small shifts in the process. To make the integrated preventive maintenance and quality control model more generic, approach 2 is extended from the component level to system level, where a system is assumed to be comprised of a set of independent multiple components. In Chapter 8, a methodology is developed which integrates the production schedule with the jointly optimized maintenance schedule and quality control policy for a single machine problem. Total enumeration as well as the Backward-Forward heuristic and Genetic Algorithm for solving the optimization problem are discussed. Finally, Chapter 9 discusses, in brief, various possible directions for future research in the area of machine tool reliability. All the approaches explained in this book are illustrated with the help of suitable examples. It is hoped that this book will be useful for students, researchers and practicing engineers.
End Notes 1. Printed with permission from Inderscience Publishers: Lad B.K. and Kulkarni M.S., A mechanism for linking user’s operational equirements with reliability and maintenance schedule for machine tool, Int. J. Reliability and Safety (IJRS), Vol. 4, No. 4, 2010, p. 347. 2. Printed with permission from Inderscience Publishers: Lad B.K. and Kulkarni M.S., Integrated reliability and optimal maintenance schedule design: a life cycle cost based approach, Int. J. Product Lifecycle Management (IJPLM), Vol. 3, No. 1, 2008, p. 86.
Chapter 2
Basic Reliability Mathematics This chapter discusses in brief the reliability mathematics helpful for understanding the chapters which follow. Specifically, it discusses various functions used in reliability engineering and their interrelationships. Some of the most commonly used life distributions are discussed in brief. The chapter also presents an overview of data analysis for estimating time-tofailure distribution parameters. A note on the simulation approach for estimating numbers of failure and the Bayesian approach in reliability engineering are also presented. Extensive discussions of these topics are out of the scope of this book. For more discussion on these topics readers may refer to some of the basic text in reliability engineering, like Ebeling [3], Birolini [20], Rinne [21], etc.
2.1 Functions Describing Lifetime as a Random Variable Time to failure of a unit is not fixed; instead it is a random variable. Generally this variable is continuous and non-negative. There are several functions which completely specify the distribution of this random variable. These functions are discussed in brief in the following paragraphs. Let T (T ≥ 0) be a continuous random variable representing time to failures of a system (component), then Probability Density Function (PDF) of time to failures distribution f(t) has the following properties:
The Cumulative Distribution Function (CDF) of the failure distribution can be written as:
(2.1) F(t) has the following property:
As the reliability over time t is the probability of failure free operation over t, reliability Function R(t) is the probability that the time to failure is greater than or equal to t, i.e., (2.2) In terms of probability density function, R(t) can be written as: (2.3)
R(t) has the following property:
It can be seen from Equations 2.1 and 2.3 that: (2.4) The function R(t) is normally used when reliability for any time of operation of the system (component) is being computed, and F(t) is normally used when risk of failure is of prime concern. PDF, CDF and reliability function are shown in Figures 2.1–2.3. Figure 2.1 The probability density function.
Figure 2.2 The cumulative density function.
Figure 2.3 The reliability function.
Another important function used in reliability studies is the Failure Rate or Hazard Rate Function λ(t). It is the instantaneous rate of failure and can be written as conditional probability of failure per unit time. It can be expressed as: (2.5) This can be easily reduced to:
(2.6)
A component may have decreasing, constant or increasing failure rate based on the underlying failure mode or cause. A curve between failure rates and time is called a Bathtub Curve and is shown in Figure 2.4.
Figure 2.4 The Bathtub curve.
For a non-repairable component the failure rate will either be decreasing, constant or increasing. A decreasing failure rate indicates the presence of manufacturing defects. The constant failure rate represents random failures. An increasing failure rate represents the wear out failures of the component. For example, a gear wheel may face early life failures due to improper machining or heat treatment. Generally electronic components in a machine tool show the constant failure rate behavior. Providing excessive strength or redundancy can help in improving the reliability of such components. Most of the mechanical components exhibit increasing failures or wearout failures. Providing preventive replacement of such components may help in ensuring higher availability of the system. Optimal replacement of such components in a machine tool is discussed in Chapter 4.
2.2 Probability Distributions Used in Reliability Engineering Various theoretical distributions are used in reliability engineering to model the failure process. These are often called reliability models. In this section the most widely used probability distributions are discussed in brief.
2.2.1 Exponential Distribution
One of the most common failure distributions in reliability engineering is the exponential or Constant Failure Rate (CFR) model. Failures due to completely random or chance events will follow this distribution. It is generally characterized by a single parameter know as failure rate (λ) in the case of time-to-failure distribution. Various functions, mentioned in the previous section, for exponential distribution are expressed as follows: (2.7)
(2.8) (2.9) (2.10)
One of the important properties of exponential distribution is the memorylessness. The memorylessproperty makes the time to failure of a component independent of age or how long the component has been operating. Mathematically,
2.2.2 Weibull Distribution Weibull distribution is a widely used distribution to model failure process. It is normally characterized by two parameters, viz., shape (β) and scale (η) parameters. The value of shape parameter or Beta (β) provides insight into the behavior of the failure process as shown in Table 2.1. Table 2.1 Shape parameter vs failure process. 01), the users
prefer to replace the component preventively. However, in the absence of time-to-failure data, users cannot determine optimal replacement interval using quantitative methods. They use their past experience to decide the replacement interval. Nevertheless, the underlying logic used by the users can be considered to follow one of the replacement strategies available in the literature, viz., group replacement, age replacement, etc. In the group replacement policy, a preventive replacement of multiple components of the same type is performed after a constant interval of length tPReplace, irrespective of the age of the item, and failure replacement occurs for every failure during the interval (0, tPReplace) [56]. This type of policy is generally used when the economy of scale exists and the components can be replaced in groups. Expert judgement-based methodology to obtain the time-to-failure distribution parameters in such situations can be implemented as follows. The experts are first asked the following question: “Do you usually observe failures before preventive replacement interval?”. If the response of the experts to the above question is “Yes,” then it can be assumed that the preventive replacement interval used by the user (tPReplace) is more than the mode of the time-to-failure distribution. This is generally the case when the preventive replacement interval is incorrectly decided. In such situations, experts can be further expected to provide the most likely time-to-failure value, i.e., the mode (X) of the right truncated Weibull distribution. Thus Equation 4.2 is again useful in this case. Secondly, experts can be asked to tell the number of times the component failed before replacement in the last 6 months or 1 year. Since the total number of replacements during this period is known, the probability of component failure before preventive replacement time, i.e., F(tPReplace), can be estimated. Maintenance engineers of the user industry can give more accurate information about such figures. Thus,
(4.8) Equations 4.2 and 4.8 can be solved to get the values of Weibull distribution parameters. Example 4.2: Consider an example of a “wheelhead spindle belt” used in CNC grinding machines. Machine tool users generally replace the same
preventively at some fixed intervals of time. Thus, it can be assumed that the underlying policy used by the user is group replacement policy. Questions asked and corresponding judgements of the experts are given in Table 4.6. Table 4.6 Collection of expert’s judgement in the case of preventive replacements. Questions asked
Expert’s judgement
Do you replace the belt preventively?
Yes
How do you decide the preventive replacement time?
Past experience
What is the preventive replacement interval used by you?
3 months
Do you generally observe failures before PM?
Yes
What is the time when the belt is most likely to fail?
2.5 months
What is the probability of failure before preventive replacement interval? – How many times has the belt failed before replacement in the last 1 year? 2
It can be observed from Table 4.6 that even though the most likely time-tofailure, i.e., mode (X) for the belt, is 2.5 months, the preventive replacement time used by the user is 3 months. The following was observed as one of the reasons behind the same. The failure of wheelhead belt mostly (almost 80– 90%) leads to failure consequence 1 (FC1) discussed in Chapter 1. Thus it was critical mainly from the availability point of view only. In other words, the cost of failure was mainly due to downtime only. Also, mean active maintenance time required to perform corrective and preventive replacement was almost equal and delay time in corrective replacement was also not very large. Therefore, the difference in cost of corrective and preventive replacement was not very large. Also, the cost of the component is also not very high. Therefore, the user preferred to replace the belt preventively every 3 months when preventive maintenances of some of the other components/subassemblies are scheduled. The expert in this case was not able to provide the judgement regarding the probability of failure before the preventive replacement interval. However, from the service record it was observed that the belt was replaced 5 times in the last one year. Thus,
and from Equations 4.2 and 4.8 we get:
and
Solving the above two equations, we get:
If the failure cost is significantly high, the users generally try to avoid failures before preventive replacement. In such situations, it can be assumed that the preventive replacement interval set by user is at or before the mode value of the time-to-failure distribution. Therefore, the failure probability before preventive replacement will be much less. Thus the experts generally do not observe failures before the preventive replacement time. In such situations, experts can provide the estimates of most fail time only if they have past experience of operating the component without preventive replacement. Service engineers of manufacturers can be expected to have such information because they attend the failures of similar machines of many users. Alternatively, experts can be asked to predict the condition of the component at the time of the replacement. This can help the designer in estimating the remaining life of the component at the time of the replacement.
(4.9)
The following example illustrates this case. Example 4.3: Consider a filter used in the hydraulic system of a CNC grinding machine. The filter is replaced preventively after a fixed duration of time. Expert judgement-based method is applied to calculate the time-tofailure distribution parameters of the filter. Table 4.7 shows the questions asked and the corresponding responses of the expert. Table 4.7 Collection of expert’s judgement in the case of preventive replacements. Questions asked
Expert’s judgement
Do you replace the filter preventively?
Yes
How do you decide the preventive replacement time?
Past experience
What is the preventive replacement interval used by you?
2 months
Do you generally observe failures before PM?
No/Only rarely
What % of filters fail before the preventive replacement interval?
10%
Do you think that the filter can be used for a longer time if not replaced at tp; if yes, No then for how long? Note: 10% indicates the percentage of components that have failed before preventive replacement. In other words, if the user has witnessed about 10 replacements, it is likely that he or she has observed one corrective replacement.
It is clear from Table 4.7 that the expected remaining life at tPReplace can be assumed to be negligible. In other words, the component is preventively replaced at the modal value of the time-to-failure distribution. Thus, from Equation 4.9:
Therefore, from Equations 4.2 and 4.8 we get:
and
Solving Equations 4.2 and 4.8 we get:
Goodness test for results obtained from the expert judgement-based method: The accuracy of the expert judgement-based parameter estimation method depends on the accuracy of the information obtained from the experts. In general, it can be expected that the more experience the expert has, the better the accuracy of their judgement will be. However, it is highly unlikely that the expert judgement will be totally free from error. Secondly, the experts usually express their judgement with some amount of uncertainty. On the other hand, the accuracy of all the statistical methods for parameter estimation depends on the amount of data. The more number of data points there are, the higher will be the accuracy of parameter estimation. However, as identified earlier, machine tool designers are quite often left with only a few data points.
In such situations, it will be important to see how well the expert judgement-based method compares with the statistical methods in the following two cases: 1. Expert information contains error, and 2. Expert information contains uncertainty. The following example shows the goodness test in the above two conditions. Example4.4: Consider a non-repairable component used in a machine tool whose time-to-failure follows a two-parameter Weibull distribution with shape and scale parameters as 2 and 200 hours respectively. For this component, the Maximum Likelihood Estimates (MLE) of the lifetime distribution parameters are obtained when only a few data points are available in the field failure records. For example, assuming that only 5 failure data points are available in the field failure record, their values are generated 1000 times using a known time-to-failure distribution (η = 200 hours and β = 2) and MLE estimates are obtained each time. Table 4.8 shows these estimates with 90 percent confidence level, considering the availability of different numbers of data points. Table4.8 Parameter estimation from MLE method.
Now for the same component, the actual values of the mode and maximum observed life as obtained from Equations 4.1 and 4.6 are 141 and 429 respectively. These values are obtained considering a very large sample size and, therefore, a very high value of 0.99 as the probability of failure at maximum life, with the justifications mentioned earlier. Let, for the same component, the expert judge the most likely time-to-failure (i.e., X) and the maximum observed life (i.e., Y) with ±15% error. Thus, the expert’s judgement for this component can be summarized as: Time at which component is most likely to fail = X= 141 ± 0.15 × 141= 120 or 162 Maximum life ever observed by the expert = y= 429 ± 0.15 ×429 = 365 or 492 Thus, a total of four combinations are possible with these point estimates. Table 4.9 shows the values of the distribution parameters estimated for each of these four combinations using the expert judgement-based method. Table4.9 Parameter estimation when expert judgement contains error.
It is clear from Tables 4.8 and 4.9 that the expert judgement-based parameter estimation method gives satisfactory estimations of β for ±15% error in expert judgement when compared with maximum likelihood estimates of the same parameter in the case when only a few data points are available. Examples of cases with 5, 6, and 7 data points are shown in this chapter. The expert judgement-based parameter estimation method, however, does not always compare equally well for the second parameter, i.e., η. However, the goodness of any parameter estimation method cannot be judged solely based on how accurately it estimates any parameter; it is the joint effect of all the parameters that are required to be estimated. Figures 4.1–4.4 compare pdf obtained based on both the parameters (η, β) estimated from the expert
judgement-based method with a given level of uncertainty and maximum likelihood estimation method with few data points.
Figure 4.1 +15% in X and +15% in Y.
Figure 4.2 +15% in X and −15% in Y.
Figure 4.3 −15% in X and −15% in Y.
Figure 4.4 −15% in X and +15% in Y.
It is clear from Figures 4.1–4.4 that the overall effect of both the parameters estimated from the expert judgement-based method, in terms of probability density function, is at least as satisfactory as the performance or accuracy of the maximum likelihood estimation method with only a few data points. Thus, the proposed method gives a satisfactory alternative to the statistical methods for time-to-failure distribution parameter estimation in the absence of sufficient failure data. However, as more numbers of data points become available, it is advisable to use the statistical methods for parameter estimation.
Now, assume that the expert information contains uncertainty, i.e., it does not provide any point estimate of the X and Y values, rather it provides interval estimates of the same. Let the experts provide the judgement with an uncertainty within ±15% of the actual values. For example, experts may mention that the most likely time-to-failure is somewhere between 120 hours to 162 hours and maximum life is somewhere between 365 hours to 493 hours. In such situations, a simulation-based approach can be used to model the uncertainty in the expert judgements. Considering a uniform distribution for X and Y values in the above range, 1000 simulation runs were performed using Equations 4.2 and 4.6. Alternatively, a beta distribution can also be used to model the uncertainty in the experts’ judgement using estimates from many experts and classifying them in terms of most likely, optimistic and pessimistic estimates. Figure 4.5 shows how the simulation approach works. Figure 4.5 Monte Carlo simulation model for estimating time-to-failure distribution parameters.
Again, the probability of failure at the maximum observed life is assigned a very high value (let’s say 0.99), with the assumption that the experts are highly experienced, and hence the sample size is very large. Results of the simulation are summarized in Table 4.10. Table4.10 Parameter estimation from the expert judgment-based method under uncertainty in judgement.
In order to study the effect of the uncertainty on the parameter estimated, one can compare the results obtained in Table 4.10 with that of Table 4.8. It reveals that the uncertainty in the judgment of the experts within some limit also does not greatly affect the performance of the proposed expert judgement-based parameter estimation method. The effect of uncertainty on the probability density function can be seen from Figure 4.6. It compares the pdf of the Weibull time-to-failures distribution for the parameters obtained from expert judgement-based parameter estimation method having ±15% uncertainty in judgement and MLE with only a few data points available. It again confirms the goodness of the proposed method. Figure 4.6 Effect of uncertainty on time-to-failure distribution.
Implications of Confidence Interval: While the mean values of the parameters can be used in further system reliability analysis, confidence intervals obtained here can be used to quantify the risk due to uncertainty in the experts’ judgment. In cases where the cost of failure is significantly higher than the cost of preventive maintenance, one may use lower limit value of η for calculating the preventive maintenance interval. Using lower limit of η for calculating optimal preventive maintenance interval will result in an optimal preventive maintenance interval that is comparatively smaller, thereby reducing any risk of failure. Preventive maintenance should be considered for only those components that have β1. If the expert judgement results in a confidence interval for β such that it contains the value 1, it is not a clear mandate in favor of preventive maintenance.
4.2.2 Repairable Assembly In a machine tool, assemblies like micro-taper, guideways, etc., are generally repaired upon failure. Such components are called repairable components. In the case of repairable components, apart from shape and scale parameters for the two-parameter Weibull distribution, an additional parameter called Restoration Factor (RF) is also required to be estimated from the life data. Restoration factor refers to the degree of repair. Restoration factor 1 indicates perfect repair and is generally analyzed using the Perfect Renewal Processes (PRP). A restoration factor with a value of 0, corresponds to minimal repairs. A commonly used model to analyze minimal repair is the Nonhomogeneous Poisson Process (NHPP). However, many repair activities may not result in such two extreme situations but in an intermediate one called general repair or imperfect repair/maintenance (restoration factor between 0 and 1) [57]. In general, as the number of unknown parameters increases, the life data required by statistical methods in estimating these parameters also increases. Further, in many cases, users repair an assembly only for a limited number of times and then replace the same. This is due to the reason that every time a repair happens, it may affect one or more of the product characteristics. After a few repairs a characteristic may end up being outside its specification limits, and hence the assembly may have to be replaced. For example, the spindle cartridge assembly needs to be checked for tolerances as pull out of bearings during their replacement affects the shaft and seat tolerances. It is because of this reason that users repair a spindle once or twice and then replace the entire cartridge assembly on the next bearing failure. It is also possible that the failure frequency is likely to increase after every repair and it may be economically preferable to replace it after a certain number of repairs. As a result, the manufacturer ends up with only a few repair data points. This in turn limits the accuracy of most of the statistical methods of parameter estimation based on the field data for repairable components. In the following section, an expert judgement-based method is presented to estimate the three parameters, viz., shape parameter, scale parameter and the restoration factor for a repairable component/system. The time-to-first-failure distribution parameters can be estimated using the methodology presented in Section 4.2. The experts are first asked about the mode and the maximum observed life for the time-to-first-failure of the component/system, i.e., for the new or replaced component/system. Therefore,
the larger the number of replacement incidents, the greater will be the experience of the maintenance personnel with the failures, and in turn the better will be the accuracy of their judgment regarding the time-to-first-failure distribution. This provides the key to use expert knowledge in estimating the time-to-failure distribution parameters for repairable components. The timeto-first-failure distribution can be obtained for both the cases, i.e., with and without preventive repair of the component. Once the time-to-first-failure distribution parameters are known, the restoration factor needs to be estimated for corrective and preventive repairs (if any). In the absence of preventive repairs, a component is operated till it fails and then repaired correctively upon failure with a restoration factor ‘RFCA‘. As mentioned earlier, the component is repaired only for some limited number of times and then is replaced upon next failure with a new one. Theoretically, such decisions are based on trade-offs between the cost of repair and replacement. However, in the absence of sufficient failure data, users use their experience to decide the replacement. Thus, the assumption is that the experts have sufficient knowledge about the failure and its cost consequences, and hence they use their past experience to make such trade-offs and decide the number of times the component must be repaired before being replaced. In other words, users will replace the component at the uth failure if they judge it ecumenically advantageous compared to replacing it at (u − 1)th or (u + 1)th failure for a given restoration factor. Actually, cost per unit time (CPUT) is the economic index used for the same. Experts can provide the value of u, i.e., the failure number at which the component is usually replaced. The following algorithm can then be used to obtain the value of the restoration factor. Assumptions: The time-to-first-failure follows a Weibull distribution; Replacement brings the component to as good as new condition; Starting age, i.e., age at t = 0 is zero (i.e., V0 = 0). Let the cost of corrective repair be CCA and the cost of corrective replacement be CPReplace. The following steps are involved in the algorithm [58]: Step 1: Obtain the time-to-first-failure distribution parameters (η and β) using the expert judgement method mentioned in Section 4.2.
Step 2: Obtain the information from experts regarding the failure number u at which the component is usually replaced correctively. Step 3: Evaluate the Cost per Unit Time (CPUT) for replacement of component at (u − 1)th, uth and (u + 1)th failure for RFCA = 0. CPUT for replacement at wth failure can be calculated as:
(4.10)
(4.11) MRLi is the mean residual life or mean time-to-failure of the component when it has already survived to Vi, where Vi is the age after the ith corrective repair. It can be calculated as [24]: (4.12)
Step 4: If (CPUT)u−1 > (CPUT)u < (CPUT)u+1 then the restoration factor is RFCA, otherwise repeat Step 3 for RFCA = RFCA + s until RPCA = 1 where s is a very small constant increment in the restoration factor. Example 4.5: To illustrate the general application of this algorithm, consider an example of a subassembly having the time-to-first-failure distribution as Weibull with scale and shape parameters as η = 1000 and β = 2 respectively. Let’s say that one of the users has mentioned that they repair the spindle not more than three times and whenever it fails for the fourth time, they replace it with an identical new one, i.e., u = 4. Let the value of the cost of corrective repair and corrective replacement be CCA = 6000 and CPReplace = 18000 respectively. The time to first five failures and CPUT for replacement at 3rd, 4th and 5th failures is shown in Table 4.11. Table 4.11 CPUT and Mean Residual Life (MRL) calculation.
It is clear from Table 4.11 that RFCA = 0.2 gives minimum CPUT for replacement at 4th failure so the estimated value of the RFCA can be taken as 0.2. The above algorithm with some modifications can also be used for cases where users repair the component preventively. In such cases, component is repaired preventively at some fixed time interval (say tPRepair) for some limited number of times with a restoration factor RFPRepair and is replaced with a new identical component at the next preventive repair interval with a restoration factor of 1. The component is repaired correctively if it fails between (0, tPRepair). The algorithm assumes that the corrective repair is minimal and has a restoration factor 0. Thus the problem is to estimate the restoration factor used in preventive repair. Let the cost of preventive and corrective repair be CPRepair and CCA respectively and the cost of preventive replacement be CPReplace. The modified algorithm for estimation of the restoration factor for preventive repair involves the following steps [58]:
Step 1: Obtain the time-to-first-failure distribution parameters (η and β) using the expert judgement method mentioned in Section 4.2. Step 2: Obtain the information from experts regarding preventive repair time (tPRepair) and preventive repair number at which the component is replaced preventively (let’s say, at kth preventive repair). Step 3: Compute mean time to failure for ith failure before jth preventive repair with RFPRepair = 0, using the following equation: (4.13)
Where dj is the number of failures before jth preventive repair such that (4.14) and Vi is the age after ith failure. For minimal corrective repair, Vi can be calculated as: (4.15) where V0 = 0 if j = 1, or else
where ′i′ exists between 0 to tPRepair and it restarts from 1 after each preventive repair. Step 4: Evaluate the Cost per Unit Time (CPUT) for replacement of component at (k − 1)th CPUT for replacement at kth preventive repair can be calculated as: (4.17)
Step 5: If CPUTk−1 CPUTk < CPUTk+1, then restoration factor RFPRepair is 0, otherwise repeat Steps 3 to 5 with RFPRepair = RFPRepair + s where s is a small constant increment in the restoration factor.
4.3 Some Desirable Properties of a “Good” Estimator Convergence and unbiasedness are two important properties one may look for in a good estimator. If, for a given sample size, the mean value of an estimator equals the true value of the quantity it estimates, then the estimator is called an unbiased estimator. Thus, for a given sample size, the difference between the true value and the mean value of an estimator is the bias in the estimation [59]. The ability of an estimator of a parameter θ to produce estimates that get closer to the true value θ0 with larger sample sizes is called convergence or consistency of the estimation [59]. Convergence is thus an asymptotic property. Both “bias” and “convergence” are related to the sample size used for estimation, which is not available in the case of the expert elicitation-based parameter estimation method discussed above. However, the experience of an expert can be considered equivalent to the sample size. In general it can be assumed that the longer the experience, the larger will be the sample size being considered by the expert at the time of answering the questions posed to him, and in turn the lower will be the bias in his/her judgements. It is important to note that the experience of experts is to be looked at in the context of the expert’s experience with a particular machine and not just years of experience. We have seen that with failure data from just the warranty period, the estimates are not very correct. This is because of the fact that all these failures, even though from different machines, are early life failures and may not represent the population behavior. These failure times would most likely lead to a distribution closer to an extreme value distribution obtained from the smallest failure times from different samples. In short, for the same number of failures of a component, experience with less number of machines for a long duration is better than that with more number of machines for a short duration.
4.4 Closing Remarks Machine tool manufacturers generally do not have sufficient failure data for dedicated reliability and maintenance analysis. In such situations, the
knowledge of the maintenance personnel can be used for estimating the timeto-failure distribution parameters. This chapter provided a methodology for both repairable and non-repairable components. Practical use of the proposed methodology was demonstrated with the help of some examples, and the results show that the proposed method is promising, straight forward and useful to the industry. Expert judgement is not a substitute for life data analysis. However, it can serve as a starting point to apply various reliability engineering concepts to machine tools. As soon as sufficient data are available, the parameters obtained from the expert judgement-based method should be updated. The Bayesian approach discussed in Chapter 2 can be used for the same.
Chapter 5
Machine Tool Maintenance Scenarios, Models and Optimization Maintenance is the process used to keep the system in good condition or restore the system to an operational state following failure. Maintenance helps in extending the useful life of a product or system. Proper maintenance of equipment may also help in detecting the problems in the components and correct those problems before they become a major issue or cause a shutdown of the whole system. Without maintenance, component starts degrading, which affects the performance of the system. For a machine tool it also affects the product quality and productivity. Moreover, the downtime cost associated with failure has a big impact on profitability. This chapter first provides a brief overview of various types of maintenance and highlights some of the maintenance optimization models and solution techniques. Specific maintenance scenarios for machine tools are then discussed and preventive maintenance optimization problems under various scenarios are solved.
5.1 Overview of Maintenance Maintenance has been categorized based on the nature and purpose of the maintenance work and on its frequency. The previous definition of maintenance implies two types of maintenance actions: Reactive maintenance, or corrective maintenance (CM), and proactive maintenance. Reactive maintenance is generally a corrective or breakdown type of maintenance which is performed on failure of the unit. It generally leads to excessive unplanned downtime losses. The advantages and disadvantages of such types of maintenance are given below.
Advantages: Low cost associated with required maintenance task; Requires less staff since less work is being done.
Disadvantages: Increased cost due to unplanned downtime; Higher cost involved with repair or replacement of equipment; Possible secondary equipment damage from equipment failure. Proactive maintenance is generally performed before occurrence of the breakdown. Proactive maintenance may be either preventive maintenance or predictive maintenance. Preventive Maintenance (PM): As the name itself indicates, it is a scheduled or planned maintenance aimed at preventing future breakdowns and failures of a system that is functioning properly. Usually, it is performed on equipment on a regular basis based on the expected life of the equipment, and the frequency of the maintenance is generally constant.
Advantages: Reduced equipment or process failure; Lower cost of unplanned failures in capital intensive process.
Disadvantages: Labor intensive; Typically costly; Includes performance of unneeded maintenance; Catastrophic failures still likely to occur. The schedule for the preventive maintenance planning can be either calendartime-based or age-based. In calendar-time-based Preventive Maintenance (PM) strategy, equipment maintenance is performed based on a calendar time interval, for example, every two months, four months, etc. Age-Based Maintenance: Under this kind of maintenance strategy, equipment maintenance is performed based on the item’s age. Item age is tracked from its previous maintenance. The CM and PM can be perfect, minimal or imperfect. Perfect repair restores the item to as “good as new” condition, i.e., item age starts again from 0. Minimal repair restores the item to “as bad as old” condition, i.e., the age remains the same as it was at the time of failure. Imperfect repair restores the item to a condition better than “as bad as old” but worse than “as good as
new.” The item age in such situations can be calculated based on Kijima’s models for the general repair process discussed in Chapter 2. Predictive Maintenance: This is an on-demand maintenance strategy and performed based on the condition of the unit. Therefore, it is often referred to as Condition-Based Maintenance (CBM). It is carried out only after collecting and evaluating enough physical data on the performance or condition of equipment, such as temperature, vibration, etc., by performing periodic or continuous equipment monitoring. Such data are then used to predict the current health status of the system/components. Diagnostics and prognostics are generally used for the purpose. The results of diagnostics and prognostics are then used to prepare an appropriate maintenance plan. The basic aim of this maintenance strategy is to perform maintenance at a schedule point in time when the maintenance activity is most cost effective. The advantages and disadvantages of predicative maintenance are discussed below.
Advantages: Increased component operational life/availability; Reduces downtime; Decrease in costs for parts and labor.
Disadvantages: Large investment required for monitoring the health of the component; Increased investment in staff training.
5.1.1 Maintenance Models Maintenance models are the basis of any maintenance quantitative analysis, which can be used to analyze and evaluate the performance of systems [60]. Cui [60] classified the maintenance models into the following categories: Time model; Degradation degree models; Shock models; Inspection models; Reliability/availability models evaluation. In a time model, replacement/repair is done at specific time instant. Agedependent model or periodic repair model (calendar-time-based model), etc.,
come under this category. The objective function may be minimization of cost, maximization of reliability/availability, etc. A time-based maintenance model with cost as objective function can be written as:
(5.1) where CPUT is the cost per unit time, Nf is the number of failures for an evaluation, teval. Cp is the cost per preventive maintenance and Cf is the cost per failure, and Npm is the number of preventive maintenance in the evaluation periods. Simulation methods explained in Chapter 2 can be used to calculate Nf and Npm for any evaluation period based on time-to-failure distribution parameters and the repair process. Similarly, an age-based maintenance model with cost as the objective functions can be written as:
(5.2) where R(t) is the reliability of the component at time ‘t’, Cp is the cost per preventive maintenance and Cf is the cost per failure. is the mean time to failure of the component between intervals 0 to t. For more details on similar models the reader may refer to a paper by Barlow and Hunter [61]. The degradation degree model determines the repair or replacement time based on deterioration level. Examples of such models are: failure rate limit model, failure number limit model, etc. An example of this kind of model was presented by Bergman [62]. Under cost models, the main focus is on maintenance costs, i.e., the repair/replacement time is decided based on the corrective and preventive maintenance costs. Such models may use any maintenance strategy. For example, age-based or calendar-time-based repair replacement may be used with cost as the objective function as shown in Equations 5.1 and 5.2. In such cases, cost-based models may be seen as special cases of time-based models. More examples for such cost-based models can be found in Seo and Ahn [63]. Under shock-based models, it is assumed that the system/components is/are subjected to shocks arriving
according to some stochastic process. Inspection-based models apply condition monitoring techniques to monitor components health and decide the maintenance schedule based on the same [64]. Reliability/availability-based models focus on system reliability or availably [65]. The use of mathematical modeling for evaluating, improving and optimizing the performance of repairable equipment through preventive maintenance is well documented in the literature (Valdez-Flores [66], Garg and Deshmukh [67], Patra et al. [68], Wang [69], Nakagawa [70], Pham and Wang [71], Rai and Bolia [72], Yang et al. [73], Ahmad and Kamaruddin [74], etc.). A vast majority of these models assume either perfect repair (renewal), or minimal repair. In the past decade, more attention has been given to the concept of general repair or imperfect repair. Mathematical models for these repair actions are discussed in Chapter 2. In real life systems like machine tools, one can observe all the three degrees of repair in different components/subassemblies. Cui [60] pointed out one very important observation, which is that the criterion of better maintenance models is how best they describe the practical situation correctly. Therefore, it is important to identify maintenance scenarios for a particular case at hand and develop a maintenance model which suits the situation. Similar efforts are made in this chapter for machine tools.
5.1.2 Maintenance Optimization Techniques Once the maintenance strategy and objective function is decided, and the problem is formulated, one can work on it in a mathematical way to find solutions and explain the real physical problems in terms of the solutions [60]. Cui classified the optimization techniques as follows [60]: Conventional approaches such as the usual calculus method; Simulation approaches; Algorithms like metaheuristic algorithms such as genetic algorithms, tabu search and simulated annealing techniques; Artificial neural networks; Programming methods such as linear programming, dynamic programming, implicit enumeration, lexicographic searches procedure, integer programming, mixed integer programming, nonlinear programming techniques, etc.
When the objective and constraints of a practical problem are precisely known, then the model can be built in a precise manner. In most of the real life situations, the objectives and the constraints are not precisely defined; sometimes the resource constraints are not very rigid. Under such imprecise conditions, the classical optimization approach does not serve much purpose. The fuzzy approach is very useful in dealing with qualitative statements, vague objectives, and imprecise information. The reader may refer to Kuo et al. [75] for details on such approaches.
5.2 Machine Tool Maintenance As discussed earlier, most of the industrial systems experience deterioration with usage and age. For such deteriorating systems, preventive maintenance can help in extending their useful life and ensuring the quality of operations as well as reducing the cost of operations and preventing the occurrence of system failures [60]. However, Preventive Maintenance (PM) also consumes time and resources which could otherwise be used for production. Therefore, optimization of the preventive maintenance schedule is an important task for industrial systems. Especially in the case of machine tools, where the failure effects are mainly in economic terms, preventive maintenance optimization becomes more crucial for achieving life time profitability of the systems. Apart from the regular preventive maintenance, machine tools also receive major overhauls once every one or two years. The time required and the degree of repair a component/assembly receives during overhauls, is generally higher than that during regular preventive maintenance. Further, not all the components/subassemblies receive maintenance during overhauls. On the other hand, some of the subassemblies are maintained preventively only for some limited number of times and after that it becomes economical to replace them with new ones. The economic life of such subassemblies in a system depends on the preventive maintenance as well as overhauls and needs to be optimized simultaneously. Time-based preventive maintenance optimization models, in general, need to predict the expected number of failures in a given time period. It is well established that getting a closed form expression for the expected number of failures is not possible for all the distributions [57]. This makes the maintenance modeling complex. The complexity further increases in the case of imperfect maintenance and overhauls. Simulation is one of the
approaches used to solve such problems. However, a simulation-based approach becomes very time consuming and complex while optimizing the preventive maintenance schedules, especially when the number of components/subassemblies in a system is large. The complexity is magnified when there exist different maintenance scenarios based on the types of maintenance action and degrees of repair for different components/subassemblies in the system. The complexity of maintenance optimization models, especially in terms of predicting number of failures, will vary in each of these scenarios. Secondly, maintenance optimization generally strikes a balance between the cost of corrective and preventive actions. Pascual et al. [10] have classified the cost of production equipment failure and preventive actions, including overhauls, into two categories: the intervention cost and the downtime cost. Intervention cost includes labor and materials, while downtime cost includes cost of lost production as well as other consequential costs such as reduced product quality, lost raw material, etc. Komonen [76,77] proposed a classification of costs related to maintenance considering two groups: 1) direct (intervention) costs due to maintenance operations (administrative costs, labor, material, subcontracting), and 2) lost production costs due to equipment failure and poor quality production due to equipment malfunctioning. A similar cost classification is also presented by Khanlari et al. [78]. Vorster and De la Garza [79] have presented a model that has the capability to quantify the consequential costs of downtime and lack of availability, in four categories. The first category, associated with resource impact costs, deals with the cost that arises when failure in one machine affects the productivity and costeffectiveness of other machines working in close association with it. The second category, associated with lack of readiness, addresses the cost that may be incurred when a capital asset is rendered idle by the downtime resulting from a prior failure. The third category deals with the service level impact cost that arises when one machine in a pool of resources fails to the extent that other machines in the pool must work in an uneconomical manner to maintain a given service level. The fourth cost category, alternative method impact cost, deals with the consequential costs that arise when failure causes a change in the method of operations. Khalil et al. [80] proposed a failure cost model for machine tools in terms of lost production cost, production damage cost, bottleneck penalty cost and booked labor cost. Sondalini [81] classified the failure cost into three categories: fixed cost, variable cost and lost profit. Fixed
cost includes the overheads like manager’s salary, the permanent staff and employees’ wages, insurance, equipment leases, etc. Variable costs are the cost of fuel, power, hired labor and raw materials to make product, maintenance, etc. Lost profit includes lost sales for the downtime period. Tam et al. [82] considered the following costs of preventive maintenance activities: Replacement cost; Maintenance cost; Maintenance downtime cost; Failure cost. Both active time and delay time in maintenance are considered as the total downtime. These costs for a machine tool must include the cost of downtime, cost of poor quality, cost of slower production, cost of repair/replacement, etc., which as mentioned earlier, may vary from user to user or application to application. Therefore, the specific cost structure and shop-floor policy parameters of a user need to be considered while optimizing the preventive maintenance schedule. In this chapter, a cost-based optimal preventive maintenance schedule for machine tool components/subassemblies is derived considering user’s cost structure and shop-floor policy parameters. The three maintenance scenarios, based on the types of maintenance action and degrees of repair, are identified with the help of case examples of machine tools components/subassemblies. A generic optimization problem is formulated and the same is then solved for each of the maintenance scenarios. A fixed time-based maintenance policy is used in this research. For the case of minimal repair, a closed form solution for the expected number of failures based on the virtual age model [24] has been obtained. However, in other cases when the corrective maintenance is not minimal, a simulation-based approach is used to estimate the number of failures. The results of simulation have been used to derive regression models to predict the number of failures. The optimization models provide the optimal preventive maintenance schedule considering user’s cost structure and shopfloor policy parameters. For this purpose, the cost models proposed in Chapter 3 have been used.
5.3 Machine Tool Maintenance Scenarios A machine tool is a complex system consisting of many subassemblies and components. Analyzing failure and repair behavior at component level is more convenient. For many components, both corrective and preventive actions generally involve replacement. In such cases, the degree of restoration achieved at component level for both corrective and preventive actions can be assumed to be 1, i.e., perfect corrective and preventive replacement. In a machine tool, seal, filter, etc., are some such components that come under this maintenance scenario. Such types of components are also replaced at the time of scheduled maintenance (repair/replacement/overhaul) of other components/subassemblies. The implication is that the preventive replacement interval of such components will be in multiples of preventive repair/replacement/overhaul schedules of some other subassemblies in the machine tool. Sometimes it is more beneficial, and also practical, to analyze the failure and repair behavior at assembly level. For example, the spindle assembly of the workhead of a CNC grinding machine, which consists of bearings, cartridge, housing, flange, etc., receives preventive repairs like oiling, greasing, cleaning, resetting, etc., followed by major maintenance during overhauls, which in addition, involves replacement of some of the components like bearings, in this example, of the assembly. It is clear that the degree of restoration during overhaul is higher than the regular preventive repair. After a few preventive repairs or overhauls, it may be more economical to replace the assembly with a new one. In this research it is assumed that, under this scenario, the replacement will be done only at one of the overhaul intervals. Regular preventive maintenance intervals are generally decided based on the failure and repair characteristics and cost per corrective and preventive actions. However, the major overhaul is a relatively strategic decision. Overhauling is decided considering the OEM recommendations and maintenance requirements of the other machine tools on the shop floor. The implication is that a machine tool designer can either consider a fixed overhaul schedule of his own, or that provided by the user, and optimize the regular preventive repair and replacement decisions. Apart from preventive
repairs, overhauls and replacements, a assembly also receives corrective actions during failures, which typically involves repair/replacement of the failed components only. If there are many components in a assembly, and corrective action related to one component does not significantly affect the state of the assembly, the corrective action is considered as a minimal repair for the system. Periodic maintenance actions like oiling, greasing, resettings, etc., of the spindle assembly also affect the life of the spindle bearings, thereby affecting its failure and replacement intervals. Thus, in this case, it is more practical to specify preventive maintenance schedule at assembly level (e.g,, spindle assembly) than at component level (e.g., spindle cartridge). Other examples that follow this maintenance scenario in a machine tool are gear boxes, servo motors, etc. Not all the components/subassemblies receive maintenance during major overhauling of the machine tool. For example, a micro-taper used in the tailstock of a grinding machine, receives repair in the form of greasing, cleaning, resetting, etc., on failure or periodically. After some preventive repairs, it is replaced with a new one on the next scheduled repair. Thus, it involves corrective and preventive repairs followed by a preventive replacement. Both the corrective and preventive repairs are imperfect, with a relatively higher degree of restoration for preventive maintenance. Other examples in a machine tool that follow this maintenance scenario are belt tightening assembly, compression spring of tailstock, etc. Based on the above discussion, the following types of maintenance actions are possible for a machine tool: 1. Corrective action 2. Preventive action Preventive repair Overhauling Preventive replacement Considering these maintenance actions and the corresponding degrees of restoration or Restoration Factors (RF), three Maintenance Scenarios (MSc1, MSc2 and MSc3) are representative of a majority of the maintenance actions in a machine tool. Table 5.1 shows these maintenance scenarios. Table 5.1 Machine tool maintenance scenarios.
M S cl
(Perfect)
Preventive rep air
II
tj
Cu s l—l
w u
& s
u Cu s l—l
u
s
0 (M inim al)
1
C orrective A ction
O verhaul
R estoration Factor
1
(p9jjaduJi) 4!rdaMt,j a
Scenario
UJ
PHI
s z
ci
1
u s
RFCA < RF PRepair
RFPRepair < RF OH
1( Perfect) 1( Perfect)
1( Perfect)
Preventive R eplacem ent
C om m ent
The problem formulation and complexity involved in obtaining an optimal preventive maintenance schedule may be different in each scenario. The following section first shows a generic formulation of a preventive maintenance optimization problem. The same is then applied to each of the above scenarios and illustrated with the help of some of the examples of CNC grinding machine components/subassemblies.
5.4 Preventive Maintenance Optimization Models for Different Maintenance Scenarios Maintenance optimization is generally done to obtain preventive action (repair/replacement/overhaul) interval such that it maximizes/minimizes one or more of the performance measures while satisfying the constraints. In this chapter, life cycle cost contribution of the components/subassemblies measured in terms of Present Value of Cost (PVC), is considered as the objective criterion. Thus, the preventive maintenance schedule optimization problem formulation for ith component/assembly can be written as: (5.3)
where, (5.4)
(5.5) Caqi is the acquisition or initial cost of ith component/assembly for which the optimal preventive maintenance schedule is required. As for a given component/assembly, Caqi is constant; the same will not affect the
optimization trials. Therefore, Caqi is ignored in the maintenance optimization problems illustrated in the following sections. The details of Equations 5.4 and 5.5 are given in Chapter 3. As given in Chapter 3, E[CCAi] and E[CPAi], apart from failure of component/subsystem, also depend on user’s cost structure and shop-floor policy parameters. Thus, the preventive maintenance optimization model will help the designer in obtaining the optimum maintenance schedule, considering the user’s cost structure and shop-floor policy parameters. Equation 5.3 is a generic formulation and can be used for all three maintenance scenarios mentioned in Section 5.3. However, the three scenarios will differ in decision variables and complexity in estimating the number of corrective actions in each year. The following subsections present the maintenance optimization approach for each of the three maintenance scenarios.
5.4.1 Preventive Maintenance Optimization in Maintenance Scenario 1 (MSc 1) (Replacement Model) Maintenance Scenario 1 (MSc1) involves only corrective and preventive replacement of the components. Therefore, the expected number of preventive repairs and overhauls in any year will be zero in this case. Thus, the preventive maintenance schedule optimization aims at obtaining optimal preventive replacement interval ‘t*PReplace‘. The problem can be formulated as: (5.6)
If the operating hours in jth year is 4800, then the number of preventive replacements for ith component can be expressed as: (5.7)
Similarly, E[NCAi]j is the number of failures in jth year for the ith component. It can be calculated as follows:
Predicting the Number of Failures′ E[NCA]′ in Maintenance Scenario 1 (MSc 1) A simulation-based approach as discussed in Chapter 2 has been used to obtain the number of failures during a given period. Obtaining optimal replacement interval requires simulating the failures of the component/assembly for different replacement intervals. The computational effort can be reduced by fitting a regression model to the simulation outputs. However, in order to obtain a satisfactory model fit with a limited number of simulations, it is important to choose an appropriate simulation range. The following procedure can be used for this purpose. At 95% reliability level, most of the age is still left in the component, hence there is no point in replacing it before this time. Therefore, it is assumed that the replacement time is greater than or equal to the time corresponding to 95% reliability level of the component. Similarly, at 5% reliability level, most of the life of the component is over and performing preventive maintenance after this time period is almost equivalent to the run-to-fail case. Thus, time corresponding to 5% and 95% reliability levels is used as the range for the simulation. One can also use any other values to represent very high and very low reliability values. Say, for example, if the ratio of corrective to preventive replacement cost is very high, one may expect preventive replacement interval to be comparatively closer to the 95% limit. Thus, one may use comparatively higher value for the lower limit (5% reliability limit) of the simulation range. In order to illustrate the above procedure, the following example is presented. Example 5.1: Consider the example of a mechanical seal, used to prevent coolant and chip entry into the workhead spindle bearings. Time-to-failure of the seal follows a two-parameter Weibull distribution with shape (β) and scale parameter (η) as 1.8 and 1500 hours respectively. In the current example, these parameters are obtained using expert judgement-based parameter estimation method. Experts’ information was collected from the experienced maintenance employees of the machine tool manufacturing industry. In the case of sufficient failure data being available, these parameter values can also be estimated using Maximum Likelihood Estimation (MLE) method and other methods discussed in Chapter 2. In this example, age values corresponding to 5% and 95% are approximately 2800 hours and 300 hours respectively. For a Weibull time-to-failure distribution, these can be obtained from the following equation:
(5.8) Figure 5.1 shows reliability vs time curve for this component. Figure5.1 Reliability vs time curve.
It can be seen from this figure that the slope of the reliability vs time curve is comparatively flat above 95% and below 5% reliability region. Thus, it is reasonable to choose 5% and 95% reliability level as the tentative simulation range for replacement intervals. Within this range, 10 equally spaced preventive replacement times are selected for simulations. The average number of failures in a year considering preventive replacement at these times is obtained from simulation. Simulation output is given Table 5.2. A regression model for predicting the expected number of failures using the data given in Table 5.2 is given below. Table 5.3 shows the simulation vs predicted values.
Table 5.2 Simulation output for Example 5.1.
Table 5.3 Simulation vs predicted output.
(5.9)
R2 value for the model is 0.966, indicating a good fit. It can be seen from Table 5.3 that the maximum error within the simulation range is around 12%. It can be concluded that the model predicts the number of failures satisfactorily. The utility of such models is that they significantly reduce the computational effort of the simulation approach, while optimizing the preventive replacement interval. The Model 5.9 for predicting the expected number of failures is used along with Model 5.6 to obtain the optimal replacement intervals. One can use any optimization algorithm to solve the optimization problem. If in Model 5.6, instead of mean corrective and preventive downtime, a probability distribution is used to represent random downtimes, then a simulation-based optimization approach would be required, where the expected value of the objective function is optimized. We have used risk optimizer [83] to solve this. The same is illustrated in the following Example 5.2. Example 5.2: Let it be required to obtain an optimal preventive replacement interval t*PReplace for the seal for which the regression models have been developed in Example 5.1. Every time the seal fails, the probabilities that it will lead to failure consequences FC1, FC2, and FC3 are 0.3, 0.2, and 0.5 respectively. Let us consider the case of three different users using the same CNC grinding machine. The following assumptions are made: 1. All three users operate the grinding machine for 4800 hours per year; 2. Designed production rate (DPR) is 50 jobs/hours; 3. The mean active corrective and preventive replacement time for all three users is 2 hours; 4. The mean delay in corrective replacement follows a lognormal distribution with mean = 4 hours and standard deviation = 4 hours; 5. Preventive replacement, being a planned activity, does not face delays; 6. All three users use a
control chart to monitor the process quality.
The process is centered with upper and lower control limits at ±3·σ limits. Whenever the seal failure leads to FC3, it shifts the process mean by k.σ, where k = 1 in this example. Values of the control chart parameters for the three users are shown in Table 5.4, where S is the sample size and tS is the time between two samples.
Table 5.4 Control chart parameters for the three users.
The above data will be used to calculate the increase in rejection due to seal failure and time required to detect the same. Time to detect has been determined from type II error of the control chart using the values of control chart parameters given in Table 5.4. Equations 3.24 and 3.25 are used for this purpose. Similarly, failure of seal leading to FC2 reduces the production rate by 10% and user 1 and user 2 update their production status after every 6 hours. Thus, a reduction in production rate will be detected by both the users at the next update. However, user 3 updates the production status after every 8 hours. Therefore, time to detect the FC2 for user 3 is taken as 8 hours. Note that these 6 hrs and 8 hrs are actually the maximum delay; actual delay may be less than these if the failure occurs somewhere between the previous and next update. Table 5.5 shows the cost structure of the users.
Table 5.5 Cost structure of different users.
As in the case of replacement, the component is restored to as good as new condition, and the expected number of failures would be approximately the same every year. Therefore, in this example, instead of life cycle cost, expected annual cost (EAC) is considered as the objective function. In other words, L is considered as 1 year in Model 5.6. For the above data, the optimal preventive replacement interval is obtained by solving the problem formulation given in Equation 5.6. The optimal
replacement interval is obtained for all three users. The same is given in Table 5.6. The distribution of Expected Annual Cost at optimal replacement schedule is shown in Figure 5.2. Figure 5.2 Distrubution of Expected Annual Cost for three users at optimal replacment schedule.
Table 5.6 Optimal replacement schedules for different machine tool users.
It can be seen from Table 5.6 that the optimal preventive replacement interval varies with the user’s cost structure and other shop-floor policy parameters. It is therefore necessary to consider the user’s cost structure and shop-floor level policy while making maintenance decisions. In Figure 5.2 the vertical axis represents the probability density value and the horizontal axis represents the EAC (in INR) value.
5.4.2 Preventive Maintenance Optimization in Maintenance Scenario 2 (MSc 2) (RepairReplacement Model) In this scenario, units are repaired preventively at fixed time intervals or are repaired on failure. Both corrective and preventive repairs are imperfect with restoration factor RFCA and RFPRepair respectively. Past failure data of repairable system can be used to estimate the values of restoration factors along with shape and scale parameters for time-to-failure distribution. A maximum likelihood estimation method is possible for such cases in which there is reasonably enough data available. Alternatively, Lad and Kulkarni [58] have provided an expert judgement-based method to estimate restoration factors for repairable system. After some preventive repairs, it may be economical to replace the unit with a new one. In such a situation, the aim is to obtain the optimal preventive repair interval ‘t*PRepair as well as the optimal replacement duration
‘t*PReplace of the component/assembly such that the contribution of the component/assembly to the life cycle cost of the system is minimized. For the ith component/assembly, the problem can be formulated as: (5.10)
If the operating hours in jth year is 4800, then the number of preventive replacements for ith component can be expressed as: (5.11) Number of preventive repair in jth year can be expressed as follows:
(5.12)
Since it is assumed that the component is replaced after some preventive repairs; the preventive replacement interval will be a multiple of the preventive repair interval. Thus, (5.13) where ′x′ is an integer decision variable. E[NCAi]i is the number of failures in jth year for the ith component/assembly for which the optimal preventive maintenance schedule (t*PRepair, t*PReplace) is required. It can be calculated as follows.
Predicting the Number of Failures ′E[NCA]j′ in Maintenance Scenario 2 (MSc 2) In the present research, the simulation-based regression method is used to develop a model that predicts the expected number of failures as a function of repair and replacement intervals. The range for the preventive repair intervals can be selected based on the reliability vs time curve as given in the previous subsection. Components/subassemblies are generally replaced after a limited number of preventive repairs. The number mainly depends on the ratio of cost of preventive replacement to preventive repair. The higher the ratio, the larger will be the number of preventive repairs, after which the component/assembly is replaced. In general, for most of such components/subassemblies of a machine tool, this cost ratio falls in the range of 1 to 10. Also, the preventive repair number at which a component/assembly is replaced generally ranges from 1–8. However, if we simulate number of failures in a year (E[NCA]) for various preventive repair schedules (y) and for various x (i.e., number of preventive repair at which component is replaced preventively) and fit a model E[NCA] = f (y, x), such a model may suffer from poor accuracy. and are provided Therefore, some guidelines based on the ratios in Table 5.7 to increase the accuracy of such a regression model. Table 5.7 Guidelines for selecting simulation range for replacement intervals.
is less than 4, we expect to get the optimal replacement between the 1st and 4th preventive repair, if the ratio is between 5 and 8 the optimal replacement may be between the 5th and 8th
Thus Table 5.7 suggests that if
replacement, and so on. This information is then used for simulating the number of failures for developing the transfer function, i.e., E[NCA] = f(y, x), which is then used in further optimization. In order to illustrate the above procedure the following example is presented. Example 5.3: Consider the example of a assembly for which the time to first failure follows a two-parameter Weibull distribution. Let the shape (β)
and the scale (η) parameters of the Weibull distribution be 2 and 1500 hours respectively. Let the restoration factors for corrective and preventive actions be 0.2 and 0.6 respectively. These are obtained from the past failure data. In the present example, times for 5 to 95 percent reliability are 2400 hours and 300 hours respectively. Let the ratio of preventive replacement cost to be 3. The range of replacement, according to the preventive repair cost guidelines in Table 5.7 for simulation, will be (1 to 4)th repair. Number of failures in one year corresponding to various preventive repair intervals (i.e., y) for replacement after (1 to 4)th are simulated. The results are shown in Table 5.8. Table 5.8 Number of failures in a year obtained from simulation.
Thus, if the preventive maintenance time for component/assembly is 900 hrs and the component is replaced at 2nd preventive maintenance then the replacement interval will be 1800 hrs (900 × 2). Thus, the entry 2.62 (marked bold in Table 5.8) is the number of failures in the first year if the component is maintained preventively at every 900 hours and is replaced at every 1800 hours. A regression model was fitted to the data in Table 5.8. The model is as shown in Equation 5.14.
(5.14) The model expresses the number of failures in a year as a function of preventive repair and replacement time. Table 5.9 shows p and R2 values for the model. Table5.9 Statistical property of the model.
Table 5.10 compares the values obtained from simulation and the regression model for some randomly selected combinations of repair and replacement durations. Table 5.10 Experimental design vs. Simulation.
Use of the regression model reduces the complexity in maintenance optimization.
5.4.3 Preventive Maintenance Optimization in Maintenance Scenario 3 (MSc 3) (Overhauling Model) In this scenario, apart from preventive maintenance and replacement, the component/assembly also receives overhauling, generally once or twice in a year based on the production requirements and maintenance resource availability. The overhauling action differs from the regular preventive repair in terms of the degree of restoration achieved in the maintenance. The repair effectiveness (i.e., restoration) of overhauling is more than that of preventive repair. In this scenario, corrective action is considered as minimal. Optimal replacement duration also depends on the preventive repair and overhaul intervals. Thus, models are required that simultaneously optimize the optimal
repair and replacement intervals, considering imperfect overhauling of the assembly. The optimization problem seeks to obtain an optimal preventive repair interval ‘t*PRepair‘, and optimal replacement duration ‘t*PReplace‘, such that the life cycle cost of the component/assembly during the whole life of the machine is minimized. Thus, the maintenance optimization problem can be formulated as:
(5.15)
If a machine operates for 4800 hours per year then the number of preventive maintenances in jth years can be expressed as follows:
(5.16) It is assumed that overhaul is performed at the end of the year and component/assembly is replaced at one of the overhauls. Thus, if the planed replacement is at the end of the fourth year, then there will be three overhauls and one replacement over the four years period. E[NCAi] is the number of corrective action (failures) in jth year for the ith component/assembly for which the optimal preventive replacement interval is required. It can be calculated as follows:
Predicting the Number of Failures ′E[NCAi]′ in Maintenance Scenario 3 (MSc 3) Probability that a system will survive an additional time ′t′ given that it has already survived till its current age ′V′ can be written as Equation 5.17.
(5.17) For a Weibull time-to-failure distribution it can be expressed as:
(5.18)
And the conditional probability density function can be represented as Equation 5.19. (5.19) Thus the conditional failure rate λ(t|V) of the system is expressed as:
(5.20) If corrective repair is minimal, the number of failures can be calculated as [84]:
(5.21)
Thus, the number of failures at any time ′t′ when the starting age is ′V′ and corrective repair is minimal will be given by: (5.22) The above equation can be further simplified as follows:
(5.23)
Such a simplified expression for number of failures can be used to calculate the number of failures in any year, for a system or component subject to imperfect preventive repair and imperfect overhaul, when corrective repair is minimal.
Model Validation: The conditional number of failures model has been validated through simulation. The simulation approach discussed in Chapter 2 is used for the same. Number of failures is simulated for the component/assembly for a given
initial age. Table 5.11 shows the number of failures obtained from both simulation and Equation 5.23 at randomly selected initial age ′V′ and time ′t′ for a component/assembly having η = 3000 and β = 2. The corrective action is considered as minimal. It is clear from Table 5.11 that Equation 5.23 gives satisfactory results. Similar results were obtained for different values of η and β. The following example is provided to illustrate the user of the overhauling model in obtaining optimal preventive maintenance and replacement intervals. Table 5.11 Simulation vs conditional number of failures model results.
Example 5.4: Let time-to-failure of the spindle assembly used in CNC grinding machine workhead follow a two-parameter Weibull distribution with shape (β) and scale parameter (η) as 2 and 7000 hours respectively [85]. Let it be required to obtain optimal preventive replacement interval t*PRepair and t*PReplace for the spindle assembly. Failures of spindle assembly result in three consequences, FC1, FC2, and FC3, with probabilities as 0.3, 0.2, and 0.5 respectively. The machine is operated for 4800 hours per year with a Designed Production Rate (DPR) of 45 jobs/hours. Whenever
spindle assembly fails, it is minimally repaired and the mean active corrective maintenance time is 1 hour. Mean delay in corrective replacement is 8 hours. Spindle assembly also receives imperfect preventive repair with restoration factor of 0.3. The mean active preventive repair time equals 1 hour. Apart from preventive repair, the assembly also receives maintenance during major overhauls. It is done yearly. The degree of restoration achieved for spindle assembly in overhaul is 0.7. Mean active time required per overhaul is 2 hour. The assembly is replaced after some overhauls. Preventive repair and overhaul being planned activities, the delay is considered as negligible. Let the spindle assembly used in machines installed at three different users’ shop floors and repair times be the same for all three users. Users use a control chart to monitor the process quality. The process is centered with upper and lower control limits at ±3·σ limits. Whenever the spindle assembly fails with Failure Mode 3 (FM3), it shifts the process mean by k·σ. In this example k = 0.5 is used. The sample size and the time between samples are 4 and 8 respectively. Let failure of spindle assembly that leads to FC2 reduce the production rate by 30%. The reduction in production rate is detected by user after 6 hours. Let the acquisition cost of the spindle assembly be (Caq) is 80,000 INR. Table 5.12 shows the cost structure of the user.
Table 5.12 Cost data (in Rupee) for Example 5.1.
Discounting factor is considered as 0.1 and is assumed to remain constant throughout the life of the machine. Effective life of the machine is taken as 12 years. Due to logistics issues, user can go for preventive maintenance only at the end of any week. Let one week consist of 100 hours. Thus, the optimal maintenance intervals must be in the multiple of the 100 hours. Using the above data, the optimization model given in Equation 5.15 has been solved
using simulation-based optimization approach. The optimization algorithms used to solve the optimization problem are Genetic Algorithms (GA). In the example, risk optimizer [83] is used for the same. Number of failures in each year is calculated using Equation 5.23. The optimal values of the decision variables as obtained are as follows:
Thus, it is economical to do preventive maintenance of the spindle assembly at every 1600 hours and the same must be replaced at the end of the 2nd year, while major overhauling is performed yearly. In order to increase the confidence in the solution, the optimization was repeated with different starting solutions and population sizes. It was observed that the solution obtained with different starting solutions population sizes did not differ from each other.
Sensitivity Analysis: The model parameters Clp and Crej are assumed to be known with certainty. However, these parameters depend on various other factors and may be subjected to some variation. Thus it is important to perform sensitivity analysis to evaluate the robustness of the solution for small variations in these parameter values. In order to see the sensitivity of the preventive maintenance schedule the optimization procedure is repeated with small variations in these model parameters. The results showed that the optimum value of preventive maintenance schedule does not change with ±15% variation in cost of lost production and rejection per job. Thus, the model is robust against small variations in these parameters.
Effects of Change in User’s Cost Structure: As mentioned above, for a given user, the model is robust against small changes in the cost of lost production and rejection. However, these costs along with users shop-floor policy parameters, like control chart parameters, may take entirely different values for different users. For example, if a user is
using the machine in a production line, then the cost of lost production will be higher compared to a stand along machine or a machine used in a job shop kind of environment. Similarly, different users may use different control chart policies. Table 5.13 shows the effects of user’s cost structure and control chart parameters on the optimization results and life cycle cost of the component/assembly. Table 5.13 Effect of user’s cost structure and shop-floor policy parameters on optimal maintenance decisions.
It can be seen from Table 5.13 that optimal maintenance decisions vary with users’ cost structure and shop-floor policy parameters. Therefore, it is recommended that the cost structure of each user must be considered while recommending a maintenance schedule to different users.
5.5 Closing Remarks Depending on the nature of the machine tool component/assembly, it may receive different types and degrees of maintenance actions. Based on the type
and degree of maintenance, different maintenance scenarios may be present for a machine tool. The complexity of solving the optimization problem varies in each of the three scenarios. The complexity mainly comes from predicting the expected number of failures for a given time period under different maintenance scenarios. Simulation-based regression models can be used to predict the number of failures in the optimization problem. The importance of regression models is that they reduce the complexity involved in solving the maintenance optimization problem. They can help the designer in simultaneously optimizing the repair and replacement intervals, wherever applicable. However, the accuracy of such regression models depends on the range of the preventive maintenance and/or replacement interval used in the simulation and the same must be chosen carefully. It has also been demonstrated that the optimal maintenance decisions depend on the user’s cost structure and shop-floor policy parameters. However, the optimal solution remains robust for small variations in the user’s cost structure. Thus, the maintenance optimization approach presented in this chapter has been aimed at capturing realistic situations in the maintenance optimization of machine tools and has provided practical solutions to the designers. The application of the proposed models has been illustrated with the help of examples of machine tool components/subassemblies.
Chapter 6
Reliability and Maintenance-Based Design of Machine Tools Reliability is one of the most important attributes of performance in arriving at optimal design of a system since it directly and significantly influences the systems performance and its life cycle cots. Achieving the organization’s reliability goals requires that strategic vision, proper planning, sufficient organizational resource allocation and the integration and institutionalization of reliability practices be put into development projects [86]. It has been estimated that 80% of poor quality products and over 90% of field failures are the result of poor design [4]. Therefore, if there is any phase in the entire life cycle of a product that has maximum impact on field performance, it is the design phase. Poor reliability would greatly increase the life cycle cost of the system, and reliability-based design must be carried out if the system is to achieve its desired performance. An optimal reliability design is one in which all possible means available to the designer have been explored to enhance the reliability of the system with minimum cost under the constraints imposed on the development of a system. There are several alternatives available to a system designer to improve system reliability and availability. The most well-known approaches are [87]: Reduction of the complexity of the system; Use of highly reliable components through component improvement programs; Use of structural redundancy; Putting in practice a planned maintenance, repair schedule and replacement policy; Decreasing the downtime by reducing delays in performing the repair. System complexity can be reduced by minimizing the number of components in a system and their interactions. However, a reduction in the system complexity may result in poor stability and transient response. It may also reduce the accuracy and eventually result in the degradation of product quality [87].
The product improvement program requires the use of improved packaging, shielding techniques, derating, etc. Although these techniques result in a reduced failure rate of the component, they nevertheless require more time for design and special state-of-the-art production. Therefore, the cost of a part improvement program could be very high and may not always be an economical way of system performance improvement. Also, this way the system reliability can be improved to some degree, but the desired reliability enhancement may not be attained [87]. On the other hand, the employment of structural redundancy at the subsystem level, keeping system topology intact, can be a very effective means of improving system reliability at any desired level. Structural redundancy may involve the use of two or more identical components, so that when one fails, the others are available and the system is able to perform the specified task in the presence of faulty components. Depending upon the type of subsystem, various forms of redundancy schemes, viz., active, standby, partial, voting, etc., are available. The use of redundancy provides the quickest solution, if time is the main consideration. It is the cheapest method if the cost of redesigning a component is too high [87]. Thus, much of the effort in designing a system is applied to allocation of resources to incorporate structural redundancies at various subsystems, which will eventually lead to a desired value of system reliability. Maintenance, repairs and replacements, wherever possible, undoubtedly enhance system reliability [88] and should be employed in an optimal way. Further, decreasing the downtime by reducing delays in performing the repair can also be used to improve the availability of the system [89]. This can be achieved by optimal allocation of spares, choosing an optimal repair crew size, improving maintainability, etc. Therefore, the basic problem in optimal reliability design of a system is to explore the extent of the use of the above-mentioned means of improving the system reliability within the resources available to a designer. Such an analysis requires an appropriate formulation of the problem. The models used for such a formulation should be both practical and amenable to known mathematical techniques of solution. Another widely used methodology at the design stage for improvement of reliability performance of the system is Failure Mode and Effects Analysis (FMEA). Failure Mode and Effects Analysis is a methodology for
identifying potential reliability problems early in the product development cycle, where it is easier to take actions to overcome these issues, thereby enhancing reliability through design [30]. This chapter discusses these two reliability design methodologies, viz., optimal reliability design and FMEA, in detail and illustrates them in the case of machine tools.
6.1 Optimal Reliability Design In the literature, optimal reliability design problems are broadly put into three categories, namely reliability allocation, redundancy allocation, and reliability-redundancy allocation, according to the types of their decision variables. If component reliabilities are the decision variables, the problem is called reliability allocation [90–92]; if the number of redundant units is the decision variable, the problem becomes a Redundancy Allocation Problem (RAP) [93–97]; if the decision variables of the problem include both the component reliabilities and redundancies, the problem is called a ReliabilityRedundancy Allocation Problem (RRAP) [98–100]. The optimization criterion in these types of problems may be reliability, cost, weight or volume. One or more criteria may be considered in an objective function, while the others may be considered as constraints. From a mathematical point of view, the reliability allocation problem is a Nonlinear Programming (NLP) problem. It can be shown as follows [75]: Maximize Subjectto
where n is the number of components/subassemblies in a system, RS is the system reliability, Rj is the component/assembly reliability of stage j, Rjmin and Rjmax are the lower and upper limit on Rj,
gi(.) is the ith constraint function and f(.) is the system reliability function, bi is the resource allocated to ith constraint, and m is the number of constraints in the system. In the above formulation, reliability of components takes any continuous value between 0 and 1. In case the possible values of reliability are discrete, the following formulation can be used. Suppose there are uj discrete choices for component reliability at stage j or j = 1, …, k(≤ n) and the choice for component reliability at stage k + 1, …, n is on a continuous scale. Let, Rj(1), Rj(2), …, Rj(uj) denote the component reliability choices at stage j for j = 1, . ., k, then the problem of selecting optimal component reliabilities that maximize system reliability can be written as [75]: Maximize Subjectto
On the other hand, the Redundancy Allocation Problem is generally formulated as pure Integer Nonlinear Programming Problem (INLP) as shown below [101]: Maximize Subjectto
xj being an integer. Similarly, Elegbede et al. [102] used cost minimization for redundancy allocation. They formulated the problem as follows:
Minimize
Subjectto
xj being an integer. Similarly, the reliability allocation and reliability-redundancy allocation problem can also be formulated in the form of cost minimization problem. Reliability and redundancy allocation problems can be considered as mixed Integer Nonlinear Programming Problems (MINLPs) [103]. Maximize Subjectto
xj being an integer. Reliability optimization, in the literature, is also formulated as a multiobjective problem. For example, a multi-objective formulation for a reliability-redundancy allocation problem is used by Sakawa [104]. It is shown below. Maximize where, f2 represents a convex cost function. Subjectto
xj being an integer. Similarly, the multi-objective formulation for redundancy allocation and reliability allocation can also be formulated. Wang et al. [105] have used a multi-objective formulation for the Redundancy Allocation Problem (RAP) for parallel-series systems. Reliability allocation is usually easier than redundancy allocation, but it may be more expensive to improve the component reliability than to add redundant units. Redundancy allocation, on the other hand, results in increased design complexity and increased costs through additional components, weight, space, etc. RAP also increases the computational complexity of the problem, and is classified as NP hard in the literature [106]. The complexity further increases in the case of the reliability and redundancy allocation problem. Reliability and/or redundancy allocation problems have been researched for different system configurations like series, series-parallel, parallel-series, complex, bridge, etc. In their review, Kuo et al. [75] classified the reliability optimization research on the basis of system configurations. Researchers have also considered issues like: types of redundancy [107], mixing of components [108], multi-state system [109–111], etc. A majority of the literature on reliability and/or redundancy optimization, in general, does not consider the effect of repair. However, there are many systems, such as machine tools, power generation units, and gas and oil transportation systems, which undergo repair upon failure. Reliability design of a system should also be systematized in the case of repairable systems. Reliability as well as availability is considered as system performance criteria in such systems. Therefore, in the literature, such problems are often also referred to as optimal availability design of system. The work of Misra [88], Sharma and Misra [112], Gurov et al. [113], Nourelfath and Dutuit [114], Kumar and Knezevic [115], Yu et al. [107], Ouzineb et al. [116], Kajal et al. [117], Jazouli et al. [118], etc., deserve attention in this regard. The
problems considered in such literature in general aim at obtaining the optimal number of one or more of the following: redundancy, spares, and number of repair facilities. For example, Misra [88] proposed a joint failure and repair rate allocation problem in order to maximize system availability and/or reliability under system cost constraints. In designing the systems for reliability and maintainability, one may be interested in determining the pair (MTBF, MTTR), for which availability reaches a maximum value subject to a cost constraint. This problem of failure and repair rates allocation can be formulated as [88]: Maximize
Subjectto
Alternatively, a dual problem can also be formulated. Nourelfath and Ait-Kadi [119] have extended the classical redundancy allocation problem to find, under reliability constraints, the minimal cost configuration of a multi-state series-parallel system, subject to a specified maintenance policy. The component is selected from the discrete choices made from components available in the market. They formulated the problem as: Maximize
Subjectto
Monga et al. [120] proposed a joint optimization problem for obtaining optimal system configuration, PM interval and system economic life. Kumar and Knezevic [115] presented three models for spares optimization. The objective is to maximize the availability (or minimize the space) subject to space constraint (or availability constraint). Yu et al. [107] used probability analysis and formulated the system design problem as minimizing the system cost rate subject to an availability constraint to find the optimal reliability in terms of the mean time to failure of the components and the optimal intervals of good-as-new maintenances. Ouzineb et al. [116] proposed an approach to solve the redundancy allocation problem for multi-state series-parallel repairable systems. The proposed method determines the minimal cost system configuration under specified availability constraints. Misra [121] proposed a procedure of designing products, systems and services based on an overall index of performability which includes failure and repair, as well as environmental issues. From the previous discussion, it can be seen that reliability optimization is a nonlinear optimization problem. The solution methods for these problems can be categorized into the following classes:
1. Exact methods 2. Approximate methods 3. Heuristics 4. Metaheuristics 5. Hybrid heuristics 6. Multi-objective optimization techniques Exact methods provide exact solutions to reliability optimization problems. Dynamic programming (DP) [122,101], branch and bound [123,124], implicit enumeration search technique [125] and partial enumeration search technique [126] are typical approaches in this category. These methods of course provide high solution quality, but higher computational time requirement limits their application to simple system configurations and systems with only a few constraints. On the other hand, many heuristics have also been proposed in the literature to provide an approximate solution in relatively short
computational time [127,128]. A heuristic may be regarded as an intuitive procedure constructed to generate solutions in an optimization process. The theoretical basis for such a procedure in most cases is insufficient and none of these heuristics establish the optimality of the final solution. These methods have been widely used to solve redundancy allocation problems in series systems, complex system configuration, standby redundancy, multistate systems, etc. Recently, metaheuristics have been successfully used to solve complex reliability optimization problems. They can provide optimal or near optimal solutions in reasonable time. These methods are based on artificial reasoning rather than classical mathematics-based optimization. Genetic Algorithm (GA) [129,130,107], Simulated Annealing (SA) [103], Tabu Search (TS) [131,132], Immune Algorithm (IA) [99] and Ant Colony (AC) [133] are some of the approaches in this category which have been applied successfully to solve the reliability optimization problem. Metaheuristic methods can overcome the local optimal solutions and, in most cases, they produce efficient results. However, they also cannot guarantee the global optimal solutions. In the literature, hybrid heuristics [108,134] have also been proposed to solve the redundancy and reliability-redundancy allocation problem. Hybrid heuristics generally combine one or more metaheuristics or a metaheuristic with other heuristics. In this chapter, a novel methodology to look into the reliability optimization problem for a repairable system, i.e., machine tool, is presented. It considers the available alternatives for reliability of the components/subsystems, instead of obtaining a continuous decision variable value between 0 and 1. It also simultaneously optimizes the maintenance policy for various components/subassemblies. The problem is discussed in the following sections.
6.2 Optimal Reliability Design of Machine Tools In the case of machine tools, the designers generally have multiple alternatives for most of the components and subassemblies, each having its
own cost as well as failure and repair characteristics like time-to-failure distribution, time-to-repair distribution, failure consequences, degree of restorations, fixed cost per repair, etc. For example, Table 6.1 shows some alternatives for some of the grinding machine components/subassemblies. Table 6.1 Alternatives for machine tool component/assembly. Component/assembly Alterative 1
Alterative 2
Ball screw
Tsubaki (supplier) THK (supplier)
Servo motor
Fanuc (supplier)
Siemens (supplier)
Table feed coupling
Safety couplings
Non-safety coupling
Work head spindle
Motorized
Belt driven
Radial shaft seal
Bushak (Supplier) TCK (Supplier)
Spindle bearing
SKF (Supplier)
FAG (Supplier)
Thrust bearing
IKO (Supplier)
NTN (Supplier)
Oil seal
TCK (Supplier)
CFW (Supplier)
Belt
GATES (Supplier) CONTITECH
The designer decides the machine tool configuration by choosing one of the alternatives for each component/assembly. Thus, inherent reliability performance of the system is fixed at the design stage. Further, Preventive Maintenance (PM) can also be used to improve the reliability performance of the system. However, preventive maintenance again consumes resources and time which could otherwise be used for production, thereby affecting profit. Preventive maintenance optimization is performed to make a trade-off between corrective and preventive maintenance costs. Such trade-offs, in general, depend on inherent failure and repair characteristics as well as cost per corrective and preventive maintenances. Thus, optimal reliability design of a machine finally boils down to selection of optimal machine tool configuration from the available alternatives for different components/subassemblies by simultaneously considering reliability and maintenance schedule related decision variables such that the user’s performance requirements are met within the budget constraints. From the above discussion, the problem of simultaneous optimization reliability and maintenance can be formulated as follows. In this formulation, life cycle cost, measured in terms of present value of the cost (PVC), is considered as objective function, while Availability (A)
and Overall Equipment Effectiveness (OEE) are considered as the constraints. However, other performance measures discussed in Chapter 3 can also be used as objective function or constraints. Thus, one of the formulations can be shown as: Minimize Subjectto (6.1) Additionally, budget constraint can also be added, where RC1, RC2, … RCn and MS1, MS2, … MSn are the decision variables representing Reliability Characteristics and Maintenance Schedule of different components/subassemblies in the machine tool respectively, and ′n′ is the number of components/subassemblies in the machine tool. It is assumed that failure of any one of the components/subassemblies leads to failure of the machine. In other words, these can be considered in series from a reliability point of view. Performance models presented in Chapter 3 can be used to express the objective function and constraints as a function of reliability characteristics and maintenance schedule parameters. These models consider the effect of user’s cost structure and other shop floor policy parameters, thereby allowing the designer to provide a customized solution in terms of reliability characteristics and maintenance schedule. Let the expected life of the machine be ′L′ years and discounting factor is ′r′. It is assumed that ′r′ remains the same for the whole life of the machine. Using the models presented in Chapter 3, the above Formulation 6.1 can be rewritten as follows: Maximize
Subjectto (6.2) where, E[CCAi]j and E[CPAi]j are the expected cost of corrective and preventive actions of ith component in jth year. Using Equations 3.32 and 3.33, these costs can be expressed as: (6.3)
and (6.4)
Using Equations 3.12 and 3.27, Availability (A) and Overall Equipment Effectiveness (OEE) can be expressed as follows:
(6.5)
(6.6)
is the required budget, E[NCAi]j becomes the expected number of corrective actions in jth year, and, E[NPRepairi]j, E[NOHi]j, and E[NPReplacei]j respectively are the expected number of preventive repair, overhauls and replacements in jth year. For a given failure and repair characteristic of components/subassemblies (which are decision variables in the above problem and will vary with configuration), the expected number of corrective actions in any year based on Maintenance Scenario 3 (MSc 3) can be obtained from the conditional number of failures model presented in Chapter 5. Similarly, for other Maintenance Scenarios (i.e., MSc 1 & 2), regression models presented in Chapter 5 can be used. Each maintenance scenario considers different types of preventive maintenance actions and different degrees of restoration for maintenance. Number of preventive repair/replacement in any year will depend on the preventive maintenance schedule (which is also a decision variable in the above problem). For example, if preventive repair of any assembly is done after every 1600 hours and there are 4800 operating hours in any year, then the number of It is preventive repair for that assembly in that year will be assumed that at the end of every year assembly receives a major overhaul. Therefore, only overhauling will be done at the end of each year and no regular preventive repair is done at the time.
6.2.1 Machine Tool Functional Design Scenarios Machine tool designers generally have the following functional design scenarios: Designing a special purpose machine tool; Designing a general purpose machine tool; Designing a customized machine tool.
Each of these functional design scenarios provides different opportunities and challenges, while optimizing the reliability and maintenance of machine tools. These functional design scenarios are discussed in the following paragraphs, and the opportunities and challenges each of these scenarios offers to the designer, while simultaneously optimizing the reliability and maintenance parameters of the machine tools, are also identified.
6.2.1.1 Special Purpose Machine Tool Design Under this scenario, each machine tool is engineered to meet the unique requirements of a user [11]. The implication from a reliability and maintenance centered design point of view is that the designer can also capture the specific cost structure and shop-floor policy parameters of a user, and the same can then be used while optimizing the reliability and maintenance parameters of the machine tool for a specific user.
6.2.1.2 General Purpose Machine Tool Design A general purpose machine tool is one which can be used for a wide range of operations in a range of shapes and sizes of work pieces [11]. Thus they are designed for a wide range of users. These are generally purchased as standard products. The implication from a reliability and maintenance design point of view is that the designer has no opportunity to capture the specific cost structure and shop-floor policy parameters of the user.
6.2.1.3 Customized Machine Tool Design In a customized machine tool, some of the components/subassemblies, like tables, linear guideways, etc., are standard components/ subassemblies and are already designed before a user comes to purchase the machine tool. Other components/subassemblies are designed based on the specific requirements of a customer [11]. Thus, the designer has flexibility to capture the specific requirements of the user only for such customer-specific components/subassemblies. The implication from a reliability and maintenance centered design point of view is that the designer has no opportunity to capture the user’s cost structure and shop-floor policy parameters while optimizing the reliability configuration and maintenance schedule of the selected standard
components/subassemblies. However, he/she can capture the cost structure and shop-floor policy requirements of a specific user at the time of optimizing the reliability configuration and maintenance schedule of the customer-specific components/subassemblies. Thus this scenario involves the characteristics of both of the above design scenarios, viz., general purpose and special purpose machine tool design.
6.2.2 Simultaneous Optimization of Reliability and Maintenance under Three Functional Design Scenarios 6.2.2.1 Simultaneous Optimization for Special Purpose Machine Tool Under this scenario, the methodology can be described as follows: Step1: Identify all the alternatives for different components/subassemblies that meet functional requirements of the machine tool. Step2: Obtain the failure and repair characteristics for each alternative. These characteristic can be obtained from field failure data if the component has been used in a similar application in the past and the failure records are maintained properly. Alternatively, accelerated life test results can be used for the same. Step 3: Identify the maintenance scenario for each component/assembly, as discussed in Chapter 5. Step 4: Obtain cost structure and shop-floor policy parameters of the specific user for which the machine tool is to be designed. As mentioned earlier, being a special purpose design, it is possible to obtain such parameters for each specific user. Step5: Formulate the problem as given in Equation 6.2. Use performance models presented in Chapter 3 for this purpose and maintenance scenarios identified in Step 3. Step 6: Obtain the optimal solution (system reliability configuration as well as maintenance schedules for each component/subassembly). Metaheuristics like Genetic Algorithm (GA) can be used for this purpose.
Thus the methodology helps in providing customized reliability configuration and customized maintenance schedule for each specific user in the case of special purpose machine tool design. In order to illustrate the above methodology, the following example is provided:
Example 6.1: Consider a machine tool consisting of five subassemblies. The machine operates for 4800 hours per year. Let us consider three alternatives for assembly 1 and assembly 3; two alternatives for assembly 2 and only one alternative for assemblies 4 and 5. It is assumed that each of these alternatives for a assembly satisfy the functional requirements of the machine. Apart from acquisition cost, these alternatives also vary in one or more of their failure and repair characteristics like time-to-failure distribution, time required for repair, fixed cost per corrective action, fixed cost per preventive repair, replacement and overhauls, degree of restoration, failure consequences, etc. Table 6.2 shows such alternatives and their failure and repair characteristics.
Table6.2 Failure and repair characteristics of alternatives (Reprinted with permission from [43]).
3
o
d
Lf. S'.
o
d d
■Mi
\c
■Mi o r-J
o Tf
£
d d
Lfi Tf xC
o «n
o Tf
8
d d d Tf Tf
o rl
o Tf
d
o d a?
o