309 10 13MB
English Pages 271 [272] Year 2022
Advances in Reliability Science
Engineering Reliability and
RISK ASSESSMENT
Edited by
HARISH GARG School of Mathematics, Thapar Institute of Engineering and Technology, Deemed to be University, Patiala, Punjab, India
MANGEY RAM Graphic Era Deemed to be University, Dehradun, Uttarakhand, India; Peter the Great St. Petersburg Polytechnic University, Saint Petersburg, Russia
Elsevier Radarweg 29, PO Box 211, 1000 AE Amsterdam, Netherlands The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom 50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States Copyright © 2023 Elsevier Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).
Notices
Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. ISBN: 978-0-323-91943-2 For information on all Elsevier publications visit our website at https://www.elsevier.com/books-and-journals
Publisher: Matthew Deans Acquisitions Editor: Brian Guerin Editorial Project Manager: Clodagh Holland-Borosh Production Project Manager: Kamesh R Cover Designer: Mark Rogers Typeset by TNQ Technologies
Contributors
Daniel O. Aikhuele Faculty of Engineering and Built Environment, University of Johannesburg, Johannesburg, South Africa Joydip Dhar ABV-Indian Institute of Information Technology and Management, Gwalior, Madhya Pradesh, India Long Ding State Key Laboratory of Fire Science, University of Science and Technology of China, Hefei, Anhui, China Ahmed El-Awady Department of Environmental Engineering Sciences, Faculty of Graduate Studies and Environmental Research, Ain Shams University, Cairo, Egypt Ameneh Farahani Department of Industrial Engineering, Ooj Institute of Higher Education, Qazvin, Iran Zawar Hussain Department of Statistics, Faculty of Computing, The Islamia University of Bahawalpur, Bahawalpur, Pakistan Desmond E. Ighravwe Faculty of Engineering and Built Environment, University of Johannesburg, Johannesburg, South Africa Jie Ji State Key Laboratory of Fire Science, University of Science and Technology of China, Hefei, Anhui, China Komal Department of Mathematics, School of Physical Sciences, Doon University, Dehradun, Uttarakhand, India Girish Kumar Delhi Technological University, Delhi, India Mohit Kumar Department of Basic Sciences, Institute of Infrastructure Technology Research and Management, Ahmedabad, India Ajay Kumar ABV-Indian Institute of Information Technology and Management, Gwalior, Madhya Pradesh, India M.K. Loganathan Kaziranga University, Jorhat, Assam, India
ix
x
Contributors
Reetu Malhotra Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, India A.J. Nakhal Akel Department of Mechanical and Aerospace Engineering, Sapienza University, Rome, Italy Ram Niwas Department of Statistics, Goswami Ganesh Dutta Sanatan Dharma College, Chandigarh, India N. Paltrinieri Department of Mechanical and Industrial Engineering, Norwegian University of Science and Technology, Trondheim, Norway R. Patriarca Department of Mechanical and Aerospace Engineering, Sapienza University, Rome, Italy Kumaraswamy Ponnambalam Department of Systems Design Engineering, Faculty of Engineering, University of Waterloo, Waterloo, ON, Canada Vishal Pradhan ABV-Indian Institute of Information Technology and Management, Gwalior, Madhya Pradesh, India Anum Shafiq School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing, China Ahmad Shoja Department of Mathematics and Statistics, Roudehen Branch, Islamic Azad University, Roudehen, Iran Tabassum Naz Sindhu Department of Statistics, Quaid-i-Azam University, Islamabad, Pakistan Zhaojun Steven Li Department of Industrial Engineering and Engineering Management, Springfield, MA, United States Hamid Tohidi Department of Industrial Engineering, South Tehran Branch, Islamic Azad University, Tehran, Iran Huimin Wang School of Mechanical and Electrical Engineering, University of Electronic Science and Technology of China, Chengdu, China Om Yadav North Dakota University, Fargo, United States
CHAPTER 1
Bayesian networks for failure analysis of complex systems using different data sources Ahmed El-Awady1 and Kumaraswamy Ponnambalam2 1 Department of Environmental Engineering Sciences, Faculty of Graduate Studies and Environmental Research, Ain Shams University, Cairo, Egypt; 2Department of Systems Design Engineering, Faculty of Engineering, University of Waterloo, Waterloo, ON, Canada
1. Introduction Failure analysis is an important and challenging aspect of the study of complex systems. A system is defined to be consisting of components, subsystems, inputs, and outputs within system boundaries. The inputs provide physical resources and information to the subsystems, which are interacting among each other to produce some outputs. All interactions are assumed to take place within system boundaries. A complex system can be defined as a system structure that is composed of usually a large number of components that have complex interactions, [1]. Any failure in performing the required interactions among system components, or any failure in getting the expected output/result, is considered to be contributing to system failure [2]. Thus, analysis of a system with its components is a crucial step in determining the difficulties and complexities that the system will experience at any stage. However, in the real world, performance of both inputs and subsystems is affected by probabilistic uncertainty, and hence, a failure may come with an associated probability. The main goal of this chapter is to evaluate the probability of failure of complex systems, while finding the failure causes using Bayesian Networks (BNs). For any given system with its inputs and subsystems, probabilistic failure analysis depends on finding the probability of not getting the required or estimated output of that system. The required output of the BN analysis may be the effect that is produced from certain causes (i.e., prediction reasoning), or the determination of the cause responsible for certain results and effects (i.e., diagnostic reasoning), both in probabilistic measures. Thus, determining the causeeeffect relation is an important first step in the probabilistic failure analysis, which allows for better understanding to enhance the system reliability and take decisions for mitigating the negative effects or better enhancing the causes. In this chapter, the graph representation of systems is conducted using BNs, which allow for representing marginal, conditional, and joint probability measures affecting system components; BN analysis provides the ability to decompose a large system into a manageable number of subsystems for their own analysis and in the end aggregating these results Engineering Reliability and Risk Assessment ISBN 978-0-323-91943-2, https://doi.org/10.1016/B978-0-323-91943-2.00001-0
© 2023 Elsevier Inc. All rights reserved.
1
2
Engineering Reliability and Risk Assessment
to provide the whole system results. Representing systems of engineering applications using BNs is affected by multiple factors that affect the probabilistic quantification process. The aim of this chapter is to reveal the different approaches that facilitate the probabilistic quantification of BNs and hence, facilitate prediction of system failures.
2. Risk, reliability, and uncertainty There are many definitions of risk. Two common definitions of risk are: (i) probability of failure and (ii) the product of the probability of an undesired outcome (failure) and the consequences of that outcome [3e12]. The development of risk estimates or the determination of risks in a given context is called Risk Analysis, while Risk Assessment is the process of evaluating the risks and determining the best course of action. Uncertainty of outcomes is a common concept in all definitions of risk. Uncertainty may be defined as the state of having limited knowledge surrounding existing events and future outcomes, or imperfect ability to assign a character state to a process that forms a source of doubt, [3]. Thus, uncertainty is an intrinsic property of risk and is present in all aspects of risk management including risk analysis and risk assessment [4]. Generally, risk analysis is a systematic tool that facilitates the identification of the weak elements of a complex system and the hazards that mainly contribute to the risk. In Ref. [13], hazard analysis is described as “investigating an accident before it occurs,” with the aim of identifying potential causes of accidents that can lead to losses. According to Refs. [11,12,14, and 15], availability is the ability of a component or system to function at a specified interval of time. This is closely related to what is called “Reliability,” which describes the ability of a system or component to function under stated conditions for a specified period of time. Reliability engineering is a subdiscipline of systems engineering that emphasizes dependability in the lifecycle management of a product. In reliability engineering programsdwhere reliability plays a key role in the cost effectiveness of systemsdtestability, maintainability, and maintenance are parts of these programs. In reliability engineering, estimation, prevention, and management of high levels of lifetime engineering uncertainty and risks of failure are common areas to be dealt with. Theoretically, reliability is defined as the probability of success (Probability of success ¼ 1-Probability of failure). Sometimes, probabilistic stability analysis is referred to as “reliability analysis.” During failure probability estimation, reliability analysis cannot be used solely, and the results of such analysis must be moderated using engineering judgment and appropriate models as useful tools in estimating conditional probabilities. In some literature, according to Refs. [6,12,16, and 17], uncertaintydwhich is a common concept for expressing inaccuraciesdmeans that a number of different values can exist for a quantity, while risk means the possibility of loss as a result of uncertainties. Accordingly, any uncertain variable, which can take various values over a range, should be provided with an uncertainty analysis that is used to assess output uncertainty and to
Bayesian networks for failure analysis of complex systems using different data sources
identify the most efficient ways to reduce that uncertainty according to the contributing variables. Hence, in terms of statistical concepts, uncertainty can be thought about as a statistical variable and can be calculated using well-verified statistical procedures. In a broad applied statistical sense, the value reported for a measurement describes the central tendency (mean); while the uncertainty describes the standard deviation (deviation from the mean). Ideally, this measure of uncertainty is calculated from repeated trials or to be taken from estimates in whole or part in many engineering tests or research experiments. Thus, risk analysis forces the engineer to confront uncertainties directly and to use best estimates and predictions, especially, while taking decisions regarding the safety of large technological (complex) systems. Increasingly, such decisions are being based on the results of probabilistic risk assessments, which must be associated with adequate quantification of the uncertainties. Uncertain parameters can be treated as random variables with appropriate probability distributions. Such distributions are assigned on the basis of available data (which is often scarce), combined with the judgment of experts (which can vary widely), adding another element of uncertainty into the uncertainty analysis itself. This means that there might be different sources of uncertainty due to data available, limited knowledge, and subjective judgment, and uncertainty here is assumed to be available in probabilistic terms either from data or from expert judgment or logical inference, [3,16]. A complex system has a system structure that is composed of a many components that have complex interactions and may be represented as a network where the nodes represent system components and the edges (links) are their interactions. Given any complex system that includes inputs, outputs, subsystems, and boundaries, it is reasonable to assume that all of these system components are interacting either directly with one another or indirectly. In order to estimate the probability of failure for such system, the interactions should be represented mathematically including any probability measures. A full representation of the system facilitates its analysis from the failure point of view. The main obstacle in failure analysis of complex systems is how to represent the system components and their basic and conditional probabilities. BNs are found to solve this problem. BN provides a graphical representation of any system using basic probabilities, for system inputs, and conditional probabilities, for subsystems and their interactions. One of the main advantages of using BNs is the ability of integrating all types of data (social, environmental, technical, etc.) seamlessly in one representation. This is because of the probabilistic nature of the BNs, as everything is represented as a probability.
3. Bayesian networks (BNs) According to Refs. [18e22], BNs, or belief networks, are probabilistic graphical models used to represent knowledge about an uncertain domain using a combination of principles from graph theory, probability theory, computer science, and statistics. In the graph, nodes (vertices) are representing random variables, and the edges (arcs) represent the
3
4
Engineering Reliability and Risk Assessment
interrelationships (conditional probabilistic dependencies) among these variables, which can be estimated using known statistical and computational methods. BNs can model the quantitative strength of the interrelationships among variables (nodes), allowing their probabilities to be updated using any new available data and information. BN is a graphical structure known as a directed acyclic graph (DAG), which is popular in some fields of learning (statistics, machine learning, and artificial intelligence). This means that a set of directed edges are used to connect the set of nodes, where these edges represent direct statistical dependencies among variables, with the constraint of not having any directed cycles (i.e., cannot return to any node by following directed arcs). Thus, the definition of parent nodes and child nodes is obvious. The directed edge is often directed from a parent node to a child node, which means that any child node depends on its parent node(s). BNs are mathematically rigorous, understandable, and efficient in computing joint probability distribution over a set of random variables, along with being useful in risk analysis. Also, BNsddue to the requirements of being DAGsdare more easily solvable than Markov networks. In BNs, there are two main types of reasoning (inference support): one e predictive reasoning (top-down or forward reasoning), in which evidence nodes are connected through parent nodes (cause to effect), and two e diagnostic reasoning (bottom-up or backward reasoning), in which evidence nodes are connected through child nodes (effect to cause). Firstly, the topology of the BN should be specified (structuring of graphical causality model), then, the interrelationships among connected nodes should be quantified, i.e., conditional probability distributions using conditional probability tables (CPTs). Also, the basic probabilities of basic (evidence) nodes should be determined using basic probability tables (BPTs). As the number of parent nodes, and/or their states, increases, the CPTs get very large. Fig. 1.1 introduces the different
Figure 1.1 Types of reasoning in BNs [22].
Bayesian networks for failure analysis of complex systems using different data sources
types of reasoning in BNs. Nodes without any arrows directed into them are called root nodes, and they have prior (basic) probability tables, while nodes without children are called leaf nodes. Nodes with arrows directed into them are called child nodes, while nodes with arrows directed from them are called parent nodes. The prior basic probability tables, for the root nodes, and the conditional probability tables, for the parent and child relationships, may be obtained from historical database currently available, which can be updated in case of having any new data or information. Generally, quantifying BNs depends on four sources of data: statistical and historical data, judgment based on experience (i.e. expert judgment), existing physical models (or empirical models), and logic inference. Where no such sufficient data exist, either subjective probabilities from experts or detailed simulation models can be used to estimate conditional probabilities, which is discussed in details later in this chapter. The main challenge in BNs is that statistical data must be available in order to estimate probabilities. When the system is fully represented, the failure probability could be estimated using Bayesian equations. An alternative use of the BN is to evaluate the performance of the system components and their interactions to get some information about the failure causes. If the postfailure analysis stage is taken into consideration, determination of causes and mitigation or treatment actions should be considered in order to improve the performance and limit the overall system failure. An example of BN with seven variables is shown in Fig. 1.2. The joint probability function of random variables in a BN can be expressed as shown in Eq. (1.1):
Figure 1.2 An example of BN with seven variables [23].
5
6
Engineering Reliability and Risk Assessment
Pðx₁; ..; xn Þ ¼
n Y
P½xi jPa ðxi Þ
(1.1)
i¼1
Where Pðx₁; ..; xn Þ is the joint probability of variables x1, x2, x3,.. xn, and Pa (xi) is the parent set of xi. If xi has no parents, then the function reduces to the unconditional probability of P(xi). For more illustration of BNs and their applications, including mathematical relations and equations, see Refs. [18e26]. The joint probability can be derived according to Eq. (1.1), with the conditional probabilities being quantified using available information (e.g., statistical and historical data, expert judgment, and physical and empirical models), [24]. One of the features that BN allows is entering evidence as input, resulting in updating probabilities in the network when new information is available. This information will propagate through the network and the posterior probabilities can be estimated. Posterior Probability
¼
Likelihood* Prior Probability Evidence
The concept of posterior probability allows for identifying the events, which have higher contributing impacts on the undesired/failure event, and then the decisionmaker may pay more attention to these important factors, [27]. In BNs, the main concern is the causeeeffect relationships, and deriving causal inferences from a combination of diverse assumptions. Generally, the use of BNs helps to answer queries even when no experimental data are available. The structure of a relatively complex BN of the IEEE-RTS system is shown in Fig. 1.3 from Ref. [28]. This shows how complicated the system interrelationships could
Figure 1.3 The BN structure of the IEEE-RTS system [28].
Bayesian networks for failure analysis of complex systems using different data sources
be when represented as a BN, especially when large number of system components/ nodes need to be represented. This also reveals that BN may be used to represent different applications due to its probabilistic nature. As the main purpose is to predict the probability of failure, the way of representation should be probabilistic. BNs have the distinction to represent the different components of the system with their interrelationships, along with defining the different causes leading to certain effect(s) in a probabilistic representation. One of the main advantages of BNs is that they can incorporate any kind of data because all of them are represented in terms of their probabilities of occurrence, not their values. It would be crucial to have a simplified representation that includes all the system variables and factors to make the probabilistic representation of the BN useful. BNs, as Directed Acyclic Graphs (DAGs), have the ability to represent any network quantitatively (using probability measures) and qualitatively (using simple representation and dependency structure).
4. Probabilistic failure analysis of hydropower dams Dams and reservoir systems are complex civil engineering systems [29]. Studying safety of dams needs a comprehensive multidisciplinary analysis that should consider all the relevant factors and their interrelationships. It is shown in Refs. [29,30] how complex the decision-making process is while dealing with the challenging problem of dam safety. Although past cases of dam failures are taken to diagnose the causes of failure; this is not enough for predicting other dams’ failure probabilities as every dam is different in terms of human, environmental, design, and technical influential factors. Some of the shortcomings associated with traditional risk analysis and assessment approaches are listed in Ref. [30]. The current available approaches such as Monte-Carlo simulation are computationally expensive as they require detailed exhaustive system simulations. Therefore, they are inefficient for complex systems having a large number of elements and highly nonlinear relationships, and any improved practical approach to dam safety analysis and prediction, not just diagnosis, is of significant value. In this line, a paradigm shift has been suggested in Refs. [31,32] to deal with disaster management by quantifying disaster resilience instead of the traditional risk-based techniques. With these new approaches, system analysis will continue to be a primary approach to understanding the system behavior under uncertainty and other measures that need to be taken into consideration. This is an attempt to address some of these shortcomings, especially in enhancing the way of predicting the probability of system failure using systems analysis while dealing with data scarcity in some engineering applications. It can be shown in Fig. 1.4 that dam operation and control system models incorporate multiple interrelated subsystems. High-level decision-makers may have difficulty in understanding such representations. Decision-makers, as humans, focus on “what is important” when facing such complex systems in the case of lack of sureness [29]. They need a
7
8
Engineering Reliability and Risk Assessment
Figure 1.4 Example of a dam system model [29].
simplified system representation to include all the system components, variables, and subsystems while accounting for different interactions. When they try to evaluate the risk situation and take a control/mitigation action, they become aware of the situation of other system components. This kind of system representation should be at high level, which allows for analyzing the system to subnetworks having less number of states instead of dealing with the entire network components. And if needed, these subnetworks should have the ability to be disaggregated to its elemental components. BNs have shown potentials in this direction. In dam safety studies, three principal approaches are widely used: failure modes and effects analysis (FMEA), event tree analysis (ETA), and fault tree analysis (FTA). Recently, BN analysis has drawn attention as another alternative for dam safety studies. Based on the information in Refs. [19] and [33] attempts to extend the technique of BNs to the diagnosis of a specific distressed dam. The main objective of [19] is to develop a probability-based tool using BNs for the diagnosis of embankment dam distresses at the global level based on past dam distress data. Historical data for dam distresses are used to quantify the interrelations among system parameters. The critical step in dealing with safety of dams is that the representation must include the technical factors besides, at least, the human factors. One recent important example is what happened in Oroville dam, California, in February 2017, [34]. The dam suffered from some hydrologic, economic, and operational, strategic, and tactical problems, which put the dam structure in a critical situation, and put lives of hundreds of thousands on the edge. It wasn’t a pure technical problem in the dam design, but rather, the operation plan and strategy performed by humans were part of the disaster. To better present
Bayesian networks for failure analysis of complex systems using different data sources
such cases for future prevention, more than just technical factors should be considered in the failure analysis. A simple example for applying the BN representation on the safety of hydropower dams, to predict the failure probability, is illustrated in Figs. 1.5 and 1.6 Two dams are connected in series or in parallel, and the inflows of both dams are statistically dependent (and can be independent in other configurations), [35]. The inflow and the reservoir level of each dam are affecting the spill event (i.e., to have excess water more than the reservoir capacity), and if the spillway gates failed to open at the spill event (due to any electromechanical failures), the dam will experience an overtopping failure, and affecting the system failure according to the connection between both dams (serial or parallel). For this kind of systems, it is supposed to have the basic and conditional relations among system components/nodes from historical and operational data, if available, in order to feed the basic and conditional probability tables (i.e., BPTs and CPTs) of the BN to predict the failure probability. This can be used for the sake of prevention of any future failure that may affect the dams or the population at risk (PAR) living around dams.
Figure 1.5 BN of two series-dependent dams/reservoirs.
9
10
Engineering Reliability and Risk Assessment
Figure 1.6 BN of two parallel-dependent dams/reservoirs.
In preexisting systems, such as hydropower dams, the operational, historical, and statistical data can be estimated to quantify the BNs that represent these systems. The question is: can simulation be used as another source of information to quantify the complex systems represented by BNs? In [36], reassessment of dam safety events using BNs is illustrated. The BNs are built based on the event tree analysis and were supplemented with Monte Carlo simulations. This combination, BN-Simulation, with enough number of sample runs, can be an effective tool to narrow down the range of probabilities and may cover a wide range of uncertain events leading to failures. However, it can be seen in the approach of [36] that simulation is performed for relatively small networks (not that complex). Moreover, the basic data and statistics are known from the beginning for the system under study. So, if we are updating (reassessing) the network using simulation models, why not we provide the network with the probability estimates using simulation from the beginning? If sufficient historical and statistical data are available, there should be no need for simulation. Such data are not available in two cases: in future systems (i.e., blue print projects), or for networks that don’t have an efficient monitoring system to save the operational data with time. In both cases, relying on logic inferencing, expert judgment, or empirical models may be misleading and may add more uncertainty, especially in very complex systems. That is why simulation may be integrated as a useful source of data. But the challenge is that simulating a very complex system may be computationally expensive for the purpose of identifying the probabilistic interrelationships among systems’ variables and subcomponents. On the other hand, simulation results of decomposed subsystems may provide the BN with probability estimates that are used to
Bayesian networks for failure analysis of complex systems using different data sources
Figure 1.7 Proposed methodology of SSBN.
estimate probabilities of whole systems. A proposed methodology in Ref. [35]dSimulation Supported BN (SSBN) for a complex systemdis summarized in Fig. 1.7. The simulation will be computationally complicated if performed for the entire network, especially in complex networks with a large number of states. For that reason, SSBN proposes to have the network decomposed into smaller subnetworks (subtrees). Each subnetwork will have its own simulation according to the data available, or from random sampling in case only basic data are available (e.g., lower and upper bounds). For every subnetwork, simulation results are all about probabilistic quantification of this subnetwork’s BN. Thus, probability values are estimated from simulation and fed into the BN of the subnetwork. Once all Bayesian subnetworks are probabilistically quantified with their basic and conditional probability values, the subnetworks are ready to be recombined as one whole network representation. SSBN makes the complex system more readable for both the operators and decision-makers. SSBN overcomes the following obstacles: • Complex, time-consuming simulation models, • Complex representation of systems, • Propagation of uncertainty measures in a complex network, and • The integration among different sources of data, including simulation. As an example, in Figs. 1.8a and 1.8b, a 23-node BN is represented to show how complex system components can be interrelated. Each node is assumed to include at least two states, which means at least 223 states in that system. The more states the nodes have, the more complex the system is. When the analysis of the system is enhanced using the SSBN method, rather than simulating the entire system (in Fig. 1.8a), smaller subsystems may be simulated instead. In Fig. 1.8b, the BN is decomposed to six different subentities (subsystems, subnetworks, or subtrees). Each subsystem is less complex than the whole system, which means less number of states. In general, a system of N nodes/components, two states each (i.e., 2N possible states) can be decomposed to n subsystems and the number of possible states becomes n*(2N/n), which is much less than 2N [i.e., if N ¼ 12, n ¼ 4, 2N ¼ 4096, and n*(2N/n) ¼ 32]. The subsystem components are interrelated,
11
12
Engineering Reliability and Risk Assessment
Figure 1.8a A 23-node BN using Hugin software.
and the subsystems may also have interrelations among each other. By applying the SSBN concept, every subsystem is simulated separately, using the appropriate methods, to get the probability estimates needed for quantifying the BN. In order to quantify the conditional interactions among subsystem decompositions, domain knowledge and expert judgment may be required. If this kind of judgment is not available, assuming different scenarios/states can be used with sensitivity analysis instead. This means that different interactions among subsystem decomposition are quantified by assuming worst-case scenarios, best-case scenarios, and normal-case scenarios in order to estimate the system failure probabilities in different situations. According to Ref. [34], the American Society of Civil Engineers (ASCE) issued a report titled “Guiding Principles for the Nation’s Critical Infrastructure.” Risk management of critical infrastructure depends on four interrelated guiding principles, identified as follows: 1. To quantify and communicate risk, 2. To employ an integrated systems approach,
Bayesian networks for failure analysis of complex systems using different data sources
Figure 1.8b A 23-node BN decomposed to six subentities ready to be simulated.
3. To exercise leadership, management, and stewardship in decision-making processes, 4. To adapt critical infrastructure in response to dynamic conditions and practices. We are focusing mainly on the first two guiding principles, which is of how to represent all interrelated system components in a combined representation (integrated systems approach), while enhancing the ability to quantify this kind of system representation in order to better predict the failures for many purposes (risk management, risk reduction, etc.). An example of a real-world case study is shown in Figs. 1.9a and 1.9b by representing the proposed BN for probabilistic failure analysis of Mountain Chute hydropower dam in Ontario, Canada, operated by Ontario Power Generation (OPG). In this network, there are 21 nodes representing system components for the purpose of analyzing the failure of this system. This includes Probable Maximum Precipitation (PMP), ice loading, earthquake and seismic actions, water pressure, geology and rock type, flood severity, adequacy of discharge capacity, sluice gates, drainage, vegetation control, seepage, and other components. If more than two states are defined for every node, the system will turn to be a huge complex network to analyze. However, the more states the system components have, the more accurate the results are. But, the main problem faced is
13
14
Engineering Reliability and Risk Assessment
Figure 1.9a BN for probabilistic failure analysis of Mountain Chute Dam.
Figure 1.9b BN of Mountain Chute Dam decomposed to subentities ready to be simulated.
Bayesian networks for failure analysis of complex systems using different data sources
having limited historical, operational, and monitoring data. Only basic data of lower and upper bounds of inflows, outflows, and flooding events may be available, along with expert opinions and logic inferencing, and some accepted empirical models of reservoir system analysis. In such cases, mathematical modeling and simulation may be a first step to get probabilistic estimates. The advantage of the decompositional BN approach is obvious when dealing with such networks. Decomposing the system to new entities is shown in Fig. 1.9b, and SSBN method can be applied as demonstrated in Ref. [35]. Accordingly, simulation results, logic inference, and expert judgment may provide probabilistic data that can be fed to the recomposition of the entire network (in Fig. 1.9a) to estimate/predict the probability of failure for the entire system. In Refs. [37,38], Mountain Chute dam’s BN is quantified with logically inferred data, which have higher uncertainty, and then compared with the results of quantifying the BN with expert opinions. Here the logically inferred data play the role of the worstcase scenario, and the expert opinions play the role of the best-case scenario for the conditions that are most probable to happen at present. If sufficient data and mathematical models are available for the entire system variables and their interactions, simulation using SSBN decompositional approach may be used for the purpose of enhancing the probabilistic results and generating scenarios of taking different decisions for the network.
5. Summary and conclusions Failure analysis of complex systems having a huge number of interacting system components is challenging, especially while having probabilistic events that affect the systems performance. A probabilistic multifactor representation that represents different types of factors (i.e., technical and nontechnical) and events may be helpful in performing failure analysis of complex systems. It can be concluded that the engineering complex systems have many ways to be represented; however, BNs have shown advantages in representing such systems in terms of defining the interrelationships among system components. The quantification of BNs depends on different sources of data such as logic inference, expert engineering judgment, empirical mathematical models, historical and operational data, and/or detailed simulation. Complex systems, if simulated, will have complex and exhaustive simulations, while the aim is to facilitate the process of predicting the probability of failure of complex systems. Thus, the SSBN decompositional approach is illustrated and how it can be applied to one of the complex systems, i.e., hydropower dams. A real-world case study, i.e., Mountain Chute Dam, is also explained and how the BN can be quantified in different ways. The posterior capability of the BN may also be helpful in identifying the main contributing components to system failure. This may be useful in the design, operation, or decision-making stages.
15
16
Engineering Reliability and Risk Assessment
References [1] Y. Bar-Yam, General features of complex systems, Encyclopedia of Life Support Systems (EOLSS) 1 (2002). [2] J. Berk, Systems Failure Analysis, ASM International, 2009, pp. 1e9. [3] Office of the Gene Technology Regulator, Department of Health and Aging, Australian Government, “Risk Analysis Framework”, January, 2005. [4] H.P. Berg, Risk management: procedures, methods and experiences, RT&A #2 1 (17) (June, 2010) 79e95. [5] I. H€aring, Introduction to risk analysis and risk management processes, in: Risk Analysis and Management: Engineering Resilience, Springer ScienceþBusiness Media, Singapore, 2015, pp. 9e26. [6] C. Rodger, J. Petch, “Uncertainty & Risk Analysis, A Practical Guide from Business Dynamics,” Pricewaterhouse Coopers, MCS, Business Dynamics, United Kingdom, April, 1999. [7] G.L.S. Babu, A. Srivastava, Risk and reliability analysis of stability of earthen dams, in: IGC , Guntur, India, 2009. [8] FAO (Food and Agricultureorganization), Introduction to Risk Analysis e Basic Principles of Risk Assessment, Risk Management and Risk Communication, Yerevan, Armenia, October, 2010. [9] U.S. Army corps of engineers & U.S. Department of the interior, bureau of reclamation, Best Practices in Dam and Levee Safety Risk Analysis (July, 2015). [10] U.S. Army corps of engineers & U.S. Department of the interior, bureau of reclamation, Best Practices in Dam and Levee Safety Risk Analysis (February 26, 2015). [11] L. King, Reliability of flow- control systems, BC Hydro for Generations (May 12, 2014). [12] D.N.D. Hartford, G.B. Beacher, Risk and Uncertainty in Dam Safety, Thomas Telford Ltd., 2004. [13] N.G. Leveson, Engineering a Safer World, Systems Thinking Applied to Safety, The MIT Press, 2011. [14] S. Bernardi, et al., Dependability analysis techniques, in: Model-Driven Dependability Assessment of Software Systems, Springer-Verlag, 2013, pp. 73e90. [15] U.S. Army corps of engineers & U.S. Department of the interior, bureau of reclamation, Probabilistic Stability Analysis (Reliability Analysis) (March, 2015). [16] D.C. Cox, P. Baybutt, Methods for uncertainty analysis: a comparative survey, Society for Risk Analysis 1 (4) (1981) 251e258. [17] S.J. Kline, The purposes of uncertainty analysis, Journal of Fluids Engineering, ASME 107 (June, 1985) 153e160. [18] M. Peng, L.M. Zhang, Analysis of human risks due to dam-break floodsdpart 1: a new model based on Bayesian networks, Natural Hazards 64 (2012) 903e933. Springer. [19] L.M. Zhang, Y. Xu, J.S. Jia, C. Zhao, Diagnosis of embankment dam distresses using Bayesian networks. Part I. Global-level characteristics based on a dam distress database, Canadian Geotechnical Journal 48 (2011) 1630e1644. NRC Research Press. [20] C.J. Lee, K.J. Lee, Application of Bayesian network to the probabilistic risk assessment of nuclear waste disposal, Reliability Engineering & System Safety 91 (2005) 515e532. Elsevier Ltd. [21] I. BEN-GAL, Bayesian networks, in: Encyclopedia of Statistics in Quality & Reliability, Wiley & Sons, 2007, pp. 1e6. [22] K.B. Korb, A.E. Nicholson, Introducing bayesian networks, in: Bayesian Artificial Intelligence, second ed., Chapman & Hall/CRC Press LLC, 2004, pp. 29e54. [23] S. Hosseini, K. Barker, Modeling infrastructure resilience using Bayesian networks: a case study of inland waterway ports, Computers and Industrial Engineering 93 (2016) 252e266. Elsevier Ltd. [24] F. Nadim, Z.Q. Liu, Quantitative risk assessment for earthquake-triggered landslides using Bayesian network, in: Proceedings of the 18th International Conference on Soil Mechanics and Geotechnical Engineering, Paris, 2013. [25] P. Li, C. Liang, Risk analysis for cascade reservoirs collapse based on bayesian networks under the combined action of flood and landslide surge, Mathematical Problems in Engineering (2016) 1e13. Hindawi Publishing Corporation. [26] M. Smith, Dam risk analysis using bayesian networks, in: Engineering Conferences International, Geohazards Lillehammer, Norway, 2006.
Bayesian networks for failure analysis of complex systems using different data sources
[27] X. Zheng, Y. Wei, K.L. Xu, H.M. An, Risk assessment of tailings dam break due to overtopping, European Journal of Governmental and Economics 21 (7) (2016) 1641e1649. [28] T. Daemi, A. Ebrahimi, M. Fotuhi-Firuzabad, Constructing the Bayesian Network for components reliability importance ranking in composite power systems, Electrical Power and Energy Systems 43 (1) (2012) 474e480. Elsevier Ltd. [29] D.N.D. Hartford, G.B. Baecher, P.A. Zielinski, R.C. Patev, R. Ascila, K. Rytters, Operational Safety of Dams and Reservoirs, Understanding the Reliability of Flow-Control Systems, Institution of Civil Engineering (ICE) Publishing, 2016. [30] L.M. King, S.P. Simonovic, D.N.D. Hartford, Using system dynamics simulation for assessment of hydropower system safety, Water Resources Research (2017) 7148e7174. American Geophysical Union, no. 53. [31] S.P. Simonovic, From risk management to quantitative disaster resilience e a paradigm shift, Internatioanl Journal of Safety and Security Eng 6 (2) (2016) 85e95. [32] A. Schardong, S.P. Simonovic, H. Tong, Use of quantitative resilience in managing urban infrastructure response to natural hazards, Internatioanl Journal of Safety and Security Eng 9 (1) (2019) 13e25. [33] Y. Xu, L.M. Zhang, J.S. Jia, Diagnosis of embankment dam distresses using Bayesian networks, in: Part II. Diagnosis of a Specific Distressed Dam, vol. 48, NRC Research Press, 2011, pp. 1645e1657. [34] R.G. Bea, T. Johnson, “Root Causes Analyses of the Oroville Dam Gated Spillway Failures and Other Developments,” University of California, Center for Catastrophic Risk Management, Berkeley, July, 2017. [35] A. El-Awady, K. Ponnambalam, Integration of simulation and Markov chains to support bayesian networks for probabilistic failure analysis of complex systems, Reliability Engineering and System Safety 211 (2021). Elsevier. [36] Z.Q. Liu, F. Nadim, U.K. Eidsvig, S. Lacasse, Reassessment of Dam Safety Using Bayesian Network, Geo-Risk, ASCE, 2017, pp. 168e177. [37] A. Verzobio, A. El-Awady, K. Ponnambalam, J. Quigley, D. Zonta, An elicitation process to quantify bayesian networks for dam failure analysis, Canadian Journal of Civil Engineering (2020), https:// doi.org/10.1139/cjce-2020-0089. [38] A. El-Awady, Probabilistic Failure Analysis of Complex Systems with Case Studies in Nuclear and Hydropower Industries, UWSpace, University of Waterloo, 2019.
17
CHAPTER 2
Failure modes and effect analysis model for the reliability and safety evaluation of a pressurized steam trap Daniel O. Aikhuele and Desmond E. Ighravwe Faculty of Engineering and Built Environment, University of Johannesburg, Johannesburg, South Africa
1. Introduction Pressure steam line traps are one of the vital parts of the engine room of a shipping vessel; they control and connect several units in the engine room including the boiler and turbine systems that powered the ship [1]. The steam trap, which ensures the steam used as the driving force and for heating in the vessel is not wasted, can be described as an automatic valve that filters out condensed steam and noncondensable gases such as air from the steam line without letting the steam escape [2]. Shipping vessels, as we know them, are often operated in the most extreme condition. As they are exploited under these conditions over time, the highly pressurized steam line trap becomes weaker and prone to failure or damage [3]. This failure or damage that could happen well before expected may occur not only because of the internal pressure and temperature but because of the loads generated, which may be due to bad design and insufficient maintenance. This could also happen due to the frequent oneoff switching of the boiler, as well as the misuse of the start diagram. With the complexity in the designs of the components and materials employed in the engine room, especially the pressure steam line trap, and the high risk involved whenever any of these systems failed onboard ship, there is a need to constantly monitor and maintain the system design architecture [4]. The reliability of the pressure steam line trap should be at its optimal level all the time, to resist extreme stress, pressure, temperature, and to cater to the other many demands of the system. Also, any reliability failures discovered on board could result in the system/component not able to perform its intended function successfully, which in some cases could endanger the lives of the ship operators, and result in a costly and strategic issue and can lead to the need for an expensive redesigning of the part altogether should be avoided at all cost. One way of achieving that is to address the reliability failure issues before they occur and improve safety onboard ship by developing a systematic framework, model, and strategy to keep the components onboard the ship up and running.
Engineering Reliability and Risk Assessment ISBN 978-0-323-91943-2, https://doi.org/10.1016/B978-0-323-91943-2.00013-7
© 2023 Elsevier Inc. All rights reserved.
19
20
Engineering Reliability and Risk Assessment
To prevent accidents and risky situations onboard, safety protocol and assessment methods have been proposed and implemented to evaluate, improve, and prevent failure of the different components in the engine room of a shipping vessel. Among them include, the use of hardware such as the ultrasonic testing instrument for identifying pressure steam line trap failures [5], the ultrasonic testing instrument, which is an extensive and expensive method, requires reference calibration for its effective use. Analytical method such as the Decision-Making Trial and Evaluation Laboratory (DEMATEL) model proposed by Bashan and Demirel [6] has also been used for the evaluation of the critical operational faults in a marine diesel generator. The model contributes to the safety of the ship at sea as well as prevents the effect of hazardous machinery in the engine room; however, uncertainties in the use of the model were not considered. Jeon et al. [7] applied the failure mode and effects analysis (FMEA) evaluation method, which relies on the risk priority number (RPN) approach for the evaluation of the safety and reliability of the fuel-cell-based hybrid power system, used on board ship, the RPN approach as we know it is not comprehensive for complex reliability assessment [8]. Lazakis et al. [9] deployed a predictive maintenance approach for the reliability and criticality analysis of the diesel generator system of a motored cruise ship vessel. The predictive maintenance approach, which uses the failure modes, effects, and criticality analysis (FMECA) method also, implements the fault tree analysis (FTA) method as well as the reliability importance measures (IMs) for the estimation of the reliability of the system. Vizentin et al. [10] reviewed the failures associated with the propulsion component/ systems onboard a marine cargo vessel. The consequences of such failure, which could result in financial losses, delay in delivery time of the cargo, or a threat to the safety of the people and operator onboard were studied. Furthermore, experimental, analytical, and numerical methods were used in the evaluation of the failure of the propulsion system in the vessel. Anantharaman et al. [11] presents a model for the estimation of the reliability of the main propulsion engine of a merchant’s vessel. The model that is based on the integration of the Markov model is used for the constant failure components, while a Weibull failure model was used for wearing out components in the vessel. Similarly, Aikhuele et al. [12] presented a model for the detection of a failure in a marine diesel engine auxiliary system, the model that is based on an interval-valued intuitionistic fuzzy TOPSIS model and an improved score function uses a group of experts’ opinions to detect the root cause of the failure. Though the above models provide a more realistic and practical approach for the reliability estimation, however, the models were propulsion and diesel engine specific. A hybrid risk-assessment model that uses the concepts of FMEA as well as that of the multiple-criteria decision-making (MCDM) theory was proposed by Lo et al. [13] for the identification of potential failure modes (FMs) in a power supply equipment. The proposed model considered uncertainties in the information provided by analysts, which were drawn from varied backgrounds as well as use the entropy weights in the risk index.
Failure modes and effect analysis model for the reliability and safety evaluation of a pressurized steam trap
Jeon et al. [7] proposed an FMEA evaluation method that is based on an RPN for shipbuilding process. The method, which is aimed at securing safe operation of the ship, examines the suitability from the design stage of the ship, by setting up a preliminary review and countermeasures for failures and defects that may occur during the construction process. Ma et al. [14] proposed a quantifiable risk assessment method that is based on FMEA for improving the accuracy of braking distance measurement and for reducing the impact of human factors on brake risk assessment of escalator systems. The quantifiable risk assessment method, which allows for failure modes, failure mechanism, and consequences of the failure of the elevator systems to be analyzed, uses a special risk index assessment system. Aikhuele [15] improved the concepts of FMEA, by proposing a multicriteria decision-making model that is based on a triangular intuitionistic fuzzy hamming distance and a flexibility function for the identification, analysis, and ranking of the root cause of failure in a slewing gear system. The model used criteria such as Severity, Occurrence, Detection, Maintenance, and Environmental conditions in the evaluation process. The model, however, was slewing gear system specific. Similarly, with the help of field experts, Liang et al. [16] developed a dynamic heterogeneous social network consensus reaching model (DHSNCRM) with minimum adjustment distance as a replacement for the conventional FMEA model, which helps in the reaching of consensus when evaluating to obtain consentient priority of failure modes. The model, which is a two-stage feedback process, is used for reliability management of medical equipment. From the foregoing, it is not hard to see that not much consideration has been given to the evaluation of the reliability and safety of the pressure steam line trap onboard ship and no global organization worldwide has gathered reasonable data on its failure incidents including related components despite the many risks associated with the failure of the system. The models reviewed in this paper for the assessment of the reliability and safety of the different components in the engine room of a shipping vessel can be concluded to have one major shortcoming or the other. Among them include the inability of the models to address and account for uncertainty in the evaluation process, the exclusion of key reliability parameters such as the failure rate, mean time between failure, and the hazard rate in the evaluation process, and finally, the exclusion of the importance of probability, maintainability, and severity in the evaluation process. Thus, it is worthwhile and purposeful to develop a methodology and protocol for the evaluation of the failure modes, as well as to improve the system reliability, prevent expensive steam leakages, water hammers in the steam lines, and enhance its operational safety concepts onboard the ship. In this paper, however, an analytical modeldhybrid failure modes and effects analysis modeldwhich consists of a multicriteria decision-making tool that uses an Intervalvalued Intuitionistic Fuzzy Number (IVIFN) in a complex intuitionistic fuzzy set has been proposed for the reliability and safety evaluation of the pressure steam line trap
21
22
Engineering Reliability and Risk Assessment
onboard ship. The use of the multicriteria decision-making tool is aimed at the identification of the most critical failure modes that could cause high risk and reliability concerns in the pressure steam line trap when used as a valve in the engine room of the shipping vessel. Hence, the study contributes to the reliability evaluation literature by addressing and accounting for some of the shortcomings in the reviewed literature, some of which include uncertainties in the evaluation process, which have been addressed by the use of fuzzy-based numbers in the evaluation process. The use of reliability-based criteria such as severity and maintainability in the evaluation process as well as the integration of parameters such as the failure rate and mean time between failure of the pressure steam line trap in the multicriteria decision-making model, which to the best of my knowledge have not been used or introduced in any of the currently existing reliability literature. The complex intuitionistic fuzzy set theory is a generalized innovative concept developed from the traditional complex fuzzy set (CFS) theory by introducing the nonmembership term to the definition of CFS. The novelty of the complex intuitionistic fuzzy set used in this chapter lies in its ability for membership and nonmembership functions to achieve more range of values. The ranges of values are extended to the unit circle in complex plane for both membership and nonmembership functions instead of [0, 1] as in the traditional intuitionistic fuzzy functions. Other benefits from the use of the concept of the complex intuitionistic fuzzy set theory are in the ability of the set to indicate and represent both the presence and absence of association, interaction, or interconnectedness in one set of the complex intuitionistic fuzzy set instead of two sets as contain in the conventional CFS. The complex intuitionistic fuzzy set has a membership and nonmembership functions that can operate in the same set, whereas for the CFS, only the membership functions are used to indicate the presence or absence of association, interaction, or interconnectedness when making critical decisions. The rest of the paper is organized as follows: in Section 2, the hybrid failure modes and effects analysis model and its algorithm are introduced, this is followed by the presentation of numerical implementation of the model, observation, and discussions in Section 3. Finally, in Section 4, some concluding remarks about the study are presented as well as the authors’ future research.
2. Hybrid failure modes and effects analysis model In the evaluation of the reliability and safety of the high-pressure steam trap onboard ship, a hybrid failure modes and effects analysis model, which is based on a multicriteria decision-making model that uses an IVIFS, has been proposed. The evaluation of the reliability and safety of the highly pressurized steam trap is achieved by prioritizing the failure modes that could result in the main critical reliability issues in the system. The model, which can be likened to the reliability-centered maintenance concept, is designed
Failure modes and effect analysis model for the reliability and safety evaluation of a pressurized steam trap
to probe into the likely failure that could occur due to the application of the system in the field and on how best to prevent failure from occurring in the system. In the formation of the hybrid failure modes and effects analysis model, the concept of the complex intuitionistic fuzzy set (CIFS) theory, as well as the complex intuitionistic fuzzy Bonferroni, mean (CIFBM) aggregation operator, and the fuzzy VIKOR model are integrated to and used for the prioritization of the failure modes of high steam trap system. Details of the new hybrid failure modes and effects analysis model are given in the definitions below. 2.1 Complex intuitionistic fuzzy set (CIFS) The concept of the CIFS presented by Alkouri and Abdul Razak [17] was first developed from the traditional intuitionistic fuzzy set (IFS) theory originally proposed by Atanassov [18]. The concept, which is characterized by a multidimensional function of a complex argument plane, is used for addressing uncertainties, falsity, hesitation, and periodicity in a complex decision-making process with conflicting criteria. The multidimensional functions include real membership function, imaginary membership function, real nonmembership function, and an imaginary nonmembership function [19]. The mathematical formation for the CIFS has been presented in Definition 1. Definition 1. [17,20]. If a CIFS is defined as a multidimensional function of a complex argument plane in the form; 4 ¼ x; m4 ðxÞ; v4 ðxÞ : x ˛ U ; (2.1) where m4 ; v4 : U/fb : b ˛C; jbj 1g and is defined as m4 ðxÞ ¼ r4 x ei2pðmr4 ðxÞÞ and
v4 ðxÞ ¼ k4 ðxÞei2pðuk4 ðxÞÞ respectively. If Eq. (2.1) is bounded by the condition; 0 r4 ðxÞ; k4 ðxÞ; ur4 ðxÞ; uk4 ðxÞ; r4 ðxÞ þ k4 ðxÞ; ur4 ðxÞ þ uk4 ðxÞ 1, then any pair of the CIFS is therefore denoted by 4i ¼ ½ðri ; uri Þ; ðki ; uki Þ, which is a complex intuitionistic fuzzy number (CIFN), which is used in the system reliability and safety evaluation. 2.2 Complex intuitionistic fuzzy Bonferroni mean (CIFBM) operator The complex intuitionistic fuzzy Bonferroni mean (CIFBM) operator, which was originally developed by Garg and Rani [21], can be described as a mathematical operator used for the fusion and aggregation of expert’s opinion about a system’s reliability and safety. The CIFBM operator, which consists of at least four multidimensional functions, uses the CIFN for its evaluation. Details of the CIFBM aggregation operator have been presented in Definition 2.
23
24
Engineering Reliability and Risk Assessment
[21,22]. Let the CIFS for all i ¼ ð1; 2; 3; 4; .::; nÞ be given as 4i ¼ ½ðri ; uri Þ; ðki ; uki Þ. If the CIFBM aggregation operator of the dimension n is a mapping Un /U of the CIFBM, then the CIFBM operator is therefore given as;
Definition 2.
CIFBMð41 ; 42 ; 43 ; 4n ¼ !1 !1! 2 3 pþq pþq n n Y Y 1 1 p q nðn1Þ p q nðn1Þ 1 ; ; 1 1 rt rs 1 ut us 6 7 6 7 i¼1 i¼1 6 7 6 !1 ! 1 !7 6 7 pþq pþq n n Y Y 4 5 1 1 1 ð1 ð1 kt Þp ð1 ks Þq Þnðn1Þ ; 1 ð1 ð1 ukt Þp ð1 uks Þq Þnðn1Þ i¼1
(2.2)
i¼1
where p; q > 0 are real numbers, which in this case represents the failure and hazard rate, n is the number of experts, rt ; rs are the real membership function, ut ; us the imaginary membership function, kt ; ks the real nonmembership function while ukt ; uks are the imaginary nonmembership function. 2.3 Complex intuitionistic Fuzzy-VIKOR model The Vlsekriterijumska Optimizacija I Kompromisno Resenje model otherwise called the VIKOR model is one of the leading multicriteria decision-making tools that have found application in engineering applications [23]. It selects the most suitable alternative among a finite set when information about the problem is presented in a decision-making matrix format. Such information, which can either be in the form of crisp or linguistic terms, is used in the analysis and evaluation of the engineering applications. While crisp information is very easy to be analyzed and applied, however, its accuracy is always in doubt. Hence, linguistic information is mostly recommended for engineering management applications. In applying linguistic information for the analysis and evaluation of the engineering applications, first, they are analyzed using a fuzzy-based method; in this case, the CIFN is applied to form an intuitionistic fuzzy structure before the VIKOR methodology is applied. For example, a complex intuitionistic fuzzy-VIKOR (CIF-VIKOR) can be applied for solving expert opinion engineering-based problems by converting the linguistic responses from the experts to the CIFN, which is used to improve the robustness of the VIKOR model when dealing with vague data. In aggregating the experts’ opinion about the engineering-based problem, which is usually in the form of an intuitionistic fuzzy decision matrix, an aggregation operator is applied for analyzing and evaluating the information in the finite set (alternatives). In this paper, the CIFBM aggregation operator is applied in the aggregation of the expert’s opinions and responses as it relates to the evaluation and analysis of the engineering-based problem. Thereafter, the VIKOR methodology is applied by determining the utility of the alternatives and the regret values, whose formulas are given
Failure modes and effect analysis model for the reliability and safety evaluation of a pressurized steam trap
in Eqs. (2.3) and (2.4). These values are then used to evaluate the alternatives VIKOR index, depicted in Eq. (2.7). n o n wi f * fij X i Sj ¼ (2.3) * fi fi i¼1 n o! wi fi* fij (2.4) Rj ¼ max fi* fi if
and,
n o fi* ¼ max fij
(2.5)
n o fi ¼ min fij
(2.6)
The alternatives VIKOR index is given as; n n o o v Sj* Sj v Rj* Rj þ f1 vg Qj ¼ Sj* Sj Rj* Rj
(2.7)
where v is the contribution factor given as 0.5. 2.4 Algorithm of the hybrid failure modes and effects analysis model The proposed hybrid failure modes and effects analysis model algorithm, which is based on a multicriteria decision-making model that uses the IVIFS, has been presented in this study. The model, which is based on the integration of the concept of the CIFS, the CIFBM aggregation operator, and the fuzzy VIKOR model, presents a practical approach for systems evaluation. Let consider the critical failure modes that could cause high risk and reliability concerns in the steam trap as a problem with the following alternatives A ¼ fA1 ; A2 ; A3 ; .; Am g, if they are to be evaluated to the determine the failure mode with the highest risk and reliability concerns, the following criteria, which are denoted by C ¼ fC1 ; C2 ; C3 ; .; Cm g, can be used. The motivation here is to select the best alternative (failure mode) according to the hybrid failure modes and effects analysis model, which consists of CIFN and complex intuitionistic fuzzy decision matrix when the criteria weights information is unknown. In collecting the experts preference information for the alternatives with respect to the given criteria, a linguistic scale has been introduced, which comprises of some linguistic variables presented to the experts and
25
26
Engineering Reliability and Risk Assessment
the CIFNs, which are used for the evaluation proper. The model steps are summarized as follows: Step 1: Find several experts (E) with matching work experience and expertise on the subjectdengineering application, to evaluate the reliability and safety of the system. In this case, the experts are to evaluate and identify the most critical failure modes that may cause high risk in the pressurized steam trap when used as a valve system in the engine room of the shipping vessel. To ensure the integrity of the evaluation process, the experts should come from both the academic and the industry. Step 2: Design a template based on a set of failure modes specific to the subjectd engineering application with some reliability-based criteria. Then ask the invited experts to evaluate the failure modes based on the criteria using the linguistic scale, which has a CIFN equivalent as shown in Table 2.1. Step 3: With the results from the experts’ evaluations, construct a linguistic intuitionistic fuzzy decision matrix and the complex intuitionistic fuzzy decision matrix with the CIFN of the individual experts. The mathematical form of the matrix with the CIFN is given in Eq. (2.8) below. Z ¼ xij mx3 3 2 ðr11 ; ur11 Þ; ðk11 ; uk11 Þ ðr12 ; ur12 Þ; ðk12 ; uk12 Þ ðr13 ; ur13 Þ; ðk13 ; uk13 Þ 7 6 7 6 7 6 ðr21 ; ur21 Þ; ðk21 ; uk21 Þ ðr ; u Þ; ðk ; u Þ ðr ; u Þ; ðk ; u Þ 22 r22 22 23 r23 23 k22 k23 7 6 7 6 7 6 7 6 « « « ¼ 6 7 7 6 7 6 7 6 « « « 7 6 7 6 5 4 ðrm1 ; urm1 Þ; ðkm1 ; ukm1 Þ ðrm2 ; urm2 Þ; ðkm2 ; ukm2 Þ ðri ; uri Þ; ðki ; uki Þ (2.8)
Table 2.1 Linguistic scale and its CIFN for data collection. Linguistic terms
CIFN
Less Serious/Low/Difficult [(LS), (L), (LD)] Serious/High/Difficult [(S), (H), (D)] Very Serious/High/Difficult [(VS) (VH), (VD)] Extremely Serious/High/Difficult [(ES), (EH), (ED)]
((0.385, 0.462), (0.354, 0.615)) ((0.532, 0.615), (0.692, 0.769)) ((0.692, 0.769), (0.846, 0.923)) ((0.846, 0.923), (1.000, 1.000))
Failure modes and effect analysis model for the reliability and safety evaluation of a pressurized steam trap
Step 4: From step 3 above, aggregate the different results presented by the experts using the CIFBM operator. The aggregation result, which is termed “comprehensive complex intuitionistic fuzzy decision matrix,” is represented in the mathematical form ZC ¼ yij . mxn
Step 5: With the comprehensive complex intuitionistic fuzzy decision matrix in place, an intuitionistic entropy method originally proposed in Ye [24] is adopted and applied for the determination of the weight vector (wv) of the criteria. The mathematical form of the intuitionistic entropy method for the weight vector of the criteria is given as;
p* 1 þ mAL ðxi Þ þ pWmA ðxi Þ vAL ðxi Þ qWvA ðxi Þ 2 E ðAÞ ¼ Sin 4
p* 1 mAL ðxi Þ pWmA ðxi Þ þ vAL ðxi Þ þ qWvA ðxi Þ 1 þ Sin 1 *pffiffiffi 4 21 (2.9) where p; q˛½0; 1 are two fixed values WmA ðxÞ ¼ mAu ðxÞ vAL ðxÞ and WvA ðxÞ ¼ vAu ðxÞ vAL ðxÞ. Step 6: With the above information, determine the utility and regret values of the alternatives using Eqs. (2.3) and (2.4), respectively. Step 7: With a contributing factor v ¼ 0:5, determine the alternatives VIKOR index values (Eq. 2.7). Step 8: Rank the alternatives using the utility, regret, or VIKOR index values.
3. Numerical illustration In this section, the proposed hybrid failure modes and effects analysis model algorithm is applied for the evaluation of the reliability and safety of the highly pressurized steam trap by prioritizing the failure modes that could result in some of the most critical reliability issues in the entire system. In the prioritization of the failure modes, it is assumed that all information that is related to the reliability of the steam trap is known, along with the deterministic connection between every decision and their corresponding outcome. Also, it is assumed that the evaluation of the steam trap is case-specific, although the model could be used for other related engineering-related problems. The identified failure modes of the steam trap system have been discussed in Table 2.2. 3.1 Results and discussion In applying the proposed model, several related criteria from the literature have been carefully selected and used for the reliability evaluation of the system, by prioritizing
27
28
Engineering Reliability and Risk Assessment
Table 2.2 Failure modes of the escalator system. Failure modes and their codes
Description
1
Failure due to Blocked/lowtemperature traps (FBT)
2
Failure due to Coincident steam loss (FCL)
3
Failure due to open or blow-by condition (FOC)
4
Failure due to pressure surges (FPS)
5
Failure due to feeding water contamination (FWC)
6
Failure due to improper steam trap sizing (FIS)
This is a type of failure that results from a blocked condensate flow such that drainage is prevented in the system. It could create major safety and operational reliability issues in the system. The coincident steam loss is a type of failure that occurs in a steam trap as a result of the failure or malfunctioning of the mechanism of other components of the steam system. It is often referred to as the highest-value steam. This is a type of failure in a steam trap where the trap constantly passes out steam. Though it may not pose a major risk to the operations or safety of the plant personnel, however, if not checked, it could result in a very large financial impact on the management of the facility. This is the failure that occurs as a result of improper piping in the steam system, sudden steam valve openings, or trap misapplications which causes water/steam hammer and causes damage to the internal steam trap components. The steam trap could fail easily if the feed water is contaminated, this contamination could result in the formation of corrosion or scale and the wear in the steam trap and could lead to leakage in the system. The steam trap is likely to fail when used for an application without proper sizing. It is a piece of common knowledge that ‘undersized steam traps are inclined to fail faster compared to correctly sized steam traps’.
Failure modes and effect analysis model for the reliability and safety evaluation of a pressurized steam trap
the identified failure modes, to find the modes with the highest reliability risk concern. The selected criteria are given as follows: • Severity (CS) is based on the criticality of the effects of the particular failure on the system. • Occurrence (CO) is the determination of the probability that such failure will occur in the system, • Maintainability (CM) is the measure of ease to which the failure in the system could be maintained. In using the proposed model algorithm presented in the above section and to ensure integrity in the evaluation process, a group of experts both from the academic (2) and from the industry (2) were recruited to give their expert opinion on the reliability of the pressurized steam trap system using the failure modes and the selected criteria. The linguistic results of the evaluation, which otherwise is referred to as the linguistic intuitionistic fuzzy decision matrix and their CIFN equivalents are used in the proceeding computations. In using the CIFBM operator, the individual opinions of the experts based on the reliability of the high-pressure steam trap system are aggregated together to obtain a comprehensive complex intuitionistic fuzzy decision matrix. The use of the CIFBM operator in the reliability evaluation allows for the inclusion of some key quantitative reliabilitybased parameters in the steam traps reliability evaluation model. Some of the parameters that have been concerned in this study include the failure rate (p) and the mean time between failures (q). The integration of the parameters, which address one of the identified gaps in the review literature presented in this study, also provides robustness in the evaluation process. Let’s assumed that the steam trap has an average of 2000 operational hours and within this period records a total of seven (7) failures. The reliability-based parameters, which are captured as p and q in the model, therefore are estimated to have the following fixed values 12.6 and 158.7 min for p and q, respectively. Upon using these parameters for the reliability evaluation of the steam trap, the following results have been obtained and have been presented in Table 2.3. From the above table, the weight of the selected criteria is determined using the intuitionistic entropy method proposed by Ye [24], such that the results of the weight value computations are then used for the construction and computation of the utility and regret values for the failure modes. Finally, the VIKOR index values of the failure modes are determined. The results and the final ranking order for the computations are shown in Table 2.4. From the above Table, it is not hard to see that “Failure due to Blocked/lowtemperature traps (FBT)” is the failure mode with the highest risk and reliability
29
30
Engineering Reliability and Risk Assessment
Table 2.3 Comprehensive complex intuitionistic fuzzy decision matrix. Severity (CS)
FBT FCL FOC FPS FWC FIS
Occurrence (CO)
[(0.997, 0.998); 0.040)] [(0.994, 0.996); 0.022)] [(0.991, 0.993); 0.014)] [(0.994, 0.995); 0.019)] [(0.996, 0.997); 0.030)] [(0.992, 0.994); 0.016)]
(0.032, (0.017, (0.009, (0.009, (0.023, (0.010,
[(0.996, 0.997); 0.029)] [(0.993, 0.994); 0.017)] [(0.995, 0.996); 0.024)] [(0.992, 0.994); 0.016)] [(0.995, 0.997); 0.024)] [(0.994, 0.996); 0.022)]
Maintainability (CM)
(0.023,
[(0.995, 0.997); 0.026)] [(0.994, 0.996); 0.022)] [(0.991, 0.993); 0.014)] [(0.996, 0.997); 0.030)] [(0.993, 0.994); 0.017)] [(0.993, 0.994); 0.017)]
(0.012, (0.020, (0.010, (0.019, (0.017,
(0.019, (0.017, (0.009, (0.023, (0.012, (0.011,
Table 2.4 Results for the VIKOR index values of the failure modes.
FBT FCL FOC FPS FWC FIS
Si
Ri
Qi
Ranking
0.896 6.982 8.885 6.875 4.695 8.202
0.306 0.860 0.998 0.998 0.783 0.952
0.000 0.781 1.000 0.874 0.582 0.924
1 3 6 4 2 5
concerns, while “Failure due to open or blow-by condition (FOC)” has the least risk and reliability concerns among the failure modes evaluated by the experts, for the pressurized steam trap system. With the reliability results in place, it is expected, therefore, that management of the pressurized steam trap should be prioritized for maintenances and reliability inspection the failure mode with the highest risk and reliability concerns, as it will likely be the first area to fail or the root cause of failure in the entire system. Also, from the table, it can easily be deduced that the criteria used for the reliability evaluation of the pressurized steam trap also played a significant role in the prioritization of the failure modes, as the opinions and judgments of the experts are reflexed in the ranking of the failure modes. It should also be noted here that the quantitative parameters used in the reliability evaluation that is the failure rate and the mean time between failure were captured as p and q in the model and given the fixed values of 12.6 and 158.7 min,
Failure modes and effect analysis model for the reliability and safety evaluation of a pressurized steam trap
respectively, and also played a major role in the reliability evaluation process and the ranking of the failure modes. The ranking results have provided a more complete view of the reliability of the highpressure steam line trap failure modes, by looking at them from various scenarios depending on the interest of the experts and how it better fit the machine. The author believes the proposed model has provided a better and a novel alternative to existing reliability methods in literature for the high-pressure steam line trap. To check the feasibility and rational of the proposed model, the results from the study were compared with the results obtained when a similar model (Complex Intuitionistic Fuzzy e Technique for Order Preference by Similarity to the Ideal Solution (CIFTOPSIS)) was used under the same conditions to evaluate the reliability of the pressurized steam trap system. The compared results, which have been presented in Table 2.5, show similarity in the results presented meaning the model is feasibility and rational and can be used to solve any similar problem in engineering using similar assumptions. 3.2 Observation from the model implementation In the implementation of the model, the following observations however have been made, which the author believes will aid the use and management of the model for other future reliability evaluation. • The model involves a complex mathematical computation, it is expected that consultants, academics, or experts of some sort are the most probable eventual end users, where they may get involved with companies to aid its usage. • The model may be ideal for a company that wants to ensure flexibility, adjustability, and agility in the management of their product reliability and decision-making process and finally. • The knowledge gained during the evaluation process can be stored in a versioned repository for future cooperate reliability decision-making based on the pressurized steam trap evaluated.
Table 2.5 Ranking result for the high-pressure steam line trap failure modes.
FBT FCL FOC FPS FWC FIS
CIFPI solution
CIFNI solution
CCi
CIFTOPSIS Ranking
Qi
Proposed model ranking
0.00161 0.01231 0.01595 0.01245 0.00889 0.01523
0.01781 0.00718 0.00349 0.00713 0.01058 0.00424
0.91731 0.36864 0.17943 0.36412 0.54338 0.21776
1 3 6 4 2 5
0.000 0.781 1.000 0.874 0.582 0.924
1 3 6 4 2 5
31
32
Engineering Reliability and Risk Assessment
4. Conclusions A hybrid failure modes and effects analysis model, which consists of a multicriteria decision-making tool that uses an IVIFN, has been proposed in this paper, for the reliability and safety evaluation of a high-pressure steam line trap onboard ship. The use of the multicriteria decision-making tool is aimed at the identification of the most critical failure modes that could cause high risk and reliability concerns in the pressure steam line trap when used as a valve in the engine room of the shipping vessel. Contributes to the reliability evaluation literature by addressing the shortcomings in the reviewed texts, some of which include the need to account for uncertainties in the evaluation process, which have been achieved through the use of fuzzy-based numbers in the model. The use of criteria such as severity and maintainability in the evaluation process as well as the integration of parameters such as the failure rate and mean time between failure of the pressure steam line trap in the multicriteria decision-making model, which to the best of my knowledge have not been done or addressed in currently existing literature. Results from the evaluation show that the “Blocked/low-temperature traps (FBT)’ is the failure mode with the highest risk and reliability concerns, while ‘Blow-by condition (FOC)” has the least risk and reliability concerns among the failure modes considered by the experts for the high-pressure steam trap system. For management implication of the system, it is therefore expected that the high-pressure steam trap management should be prioritized for maintenances and reliability inspection the failure mode with the highest risk and reliability concerns, as it will likely be the first area to fail or the root cause of failure in the entire system. In the future, the study considers the cost implication of failure due to the different failure modes as well as their possible optimization.
References [1] C. Shen, SAGD for heavy oil recovery, in: Enhanced Oil Recovery Field Case Studies, Elsevier Inc, 2013, pp. 413e445. [2] K. Paffel, Removal of non-condensable gases, air is critical in a steam system, Plant Engineering 1e2 (2011). [3] J.R. Risko, “My Steam Trap Is Good d Why Doesn’t it Work?” Chemical Engineering Progress, American Institute of Chemical Engineers (AIChE), 2015, pp. 27e34. [4] T. Bass, Steam Trap Monitoring Enables Predictive Maintenance, vols 1e2, InTech: ISA’s Flagship Publications, 2018. [5] J.R. Risko, Understanding steam traps, Chemical Engineering Progress 107 (2) (2011) 21e26. [6] V. Bashan, H. Demirel, Evaluation of critical operational faults of marine diesel generator engines by using DEMATEL method, Journal of ETA Maritime Science 6 (2) (2018) 119e128, https://doi.org/ 10.5505/jems.2018.24865. [7] H. Jeon, K. Park, J. Kim, Comparison and verification of reliability assessment techniques for fuel cellbased hybrid power system for ships, Journal of Marine Science and Engineering 8 (2) (2020) 74, https://doi.org/10.3390/jmse8020074. [8] H.-C. Liu, L. Liu, N. Liu, Risk evaluation approaches in failure mode and effects analysis: a literature review, Expert Systems with Applications 40 (2) (2013) 828e838, https://doi.org/10.1016/ j.eswa.2012.08.010.
Failure modes and effect analysis model for the reliability and safety evaluation of a pressurized steam trap
[9] I. Lazakis, T. Osman, S. Aksu, Improving ship maintenance: a criticality and reliability approach, in: 11th International Symposium on Practical Design of Ships and Other Floating Structures, PRADS 2010 vol 2, 2010, pp. 1411e1420. [10] G. Vizentin, G. Vukelic, L. Murawski, N. Recho, J. Orovic, Marine propulsion system failuresda review, Journal of Marine Science and Engineering 8 (9) (2020) 1e14, https://doi.org/10.3390/ jmse8090662. [11] M. Anantharaman, F. Khan, V. Garaniya, B. Lewarn, Reliability assessment of main engine subsystems considering turbocharger failure as a case study, TransNav, the International Journal on Marine Navigation and Safety of Sea Transportation 12 (2) (2018) 271e276, https://doi.org/10.12716/ 1001.12.02.06. [12] D.O. Aikhuele, F.M. Turan, S.M. Odofin, R.H. Ansah, Interval-valued intuitionistic fuzzy TOPSISbased model for troubleshooting marine diesel engine auxiliary system, The International Journal of Microcircuits and Electronic Packaging A 159 (2016) 1e8, https://doi.org/10.3940/ rina.ijme.2016.al.402. [13] H.-W. Lo, J. Liou, J.-J. Yang, C.-N. Huang, Y.-H. Lu, An extended FMEA model for exploring the potential failure modes : a case study of a steam turbine for a nuclear power plant, Complexity 2021 (2021) 1e13. [14] H. Ma, Z. Li, Z. Wang, R. Feng, G. Li, J. Xu, Research on measuring device and quantifiable risk assessment method based on FMEA of escalator brake, Advances in Mechanical Engineering 13 (3) (2021) 1e17, https://doi.org/10.1177/16878140211001963. [15] D.O. Aikhuele, Intuitionistic fuzzy hamming distance model for failure detection in a slewing gear system, International Journal of System Assurance Engineering and Management 12 (5) (2021) 884e894, https://doi.org/10.1007/s13198-021-01132-9. [16] D. Liang, F. Li, Z. Xu, A group-based FMEA approach with dynamic heterogeneous social network consensus reaching model for uncertain reliability assessment, Journal of the Operational Research Society (2022), https://doi.org/10.1080/01605682.2021.2020694. [17] A.S. Alkouri, S. Abdul Razak, Complex intuitionistic fuzzy sets, AIP Conference Proceedings 1482 (2012) 464e470, https://doi.org/10.1063/1.4757515. [18] K.T. Atanassov, Intuitionistic fuzzy sets, Fuzzy Sets and Systems 20 (1) (1986) 87e96, https://doi.org/ 10.1016/S0165-0114(86)80034-3. [19] A. Anwar, On certain products of complex intuitionistic fuzzy graphs, Journal of Function Spaces 2021 (515646) (2021) 1e9. [20] Z. Ali, T. Mahmood, M. Aslam, R. Chinram, Another view of complex intuitionistic fuzzy soft sets based on prioritized aggregation operators and their applications to multiattribute decision making, Mathematics 9 (16) (2021), https://doi.org/10.3390/math9161922. [21] H. Garg, D. Rani, Multi-criteria decision making method based on Bonferroni mean aggregation operators of complex intuitionistic fuzzy numbers, Journal of Industrial and Management Optimization 13 (5) (2017), https://doi.org/10.3934/jimo.2020069, 0e0. [22] D.O. Aikhuele, G. Ijele-Aikhuele, Development of a hybrid reliability-centered model for escalator systems, International Journal of System Assurance Engineering and Management (2021) 1e11, https://doi.org/10.1007/s13198-021-01337-y. [23] D.O. Aikhuele, D.E. Ighravwe, O. Onyisi, I.O. Fayomi, E.B. Omoniyi, Development of a TQMbased framework for product infant failure assessment, Covenant Journal of Engineering Technology 4 (1) (2020) 1e15, https://doi.org/10.47231/haur1032. [24] J. Ye, Multicriteria fuzzy decision-making method using entropy weights-based correlation coefficients of interval-valued intuitionistic fuzzy sets, Applied Mathematical Modelling 34 (12) (2010) 3864e3870, https://doi.org/10.1016/j.apm.2010.03.025.
33
CHAPTER 3
Reliability and availability analysis of a standby system with activation time and varying demand Reetu Malhotra Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, India
Nomenclature **/* LaplaceeStieltjes/Laplace transform symbol Csa Cold standby unit under activation D Demand FR The failed unit under repair from an earlier state Fr/Fw The failed unit under repair/waiting for repair Operative/failed state Regeneration point Downstate /© Stieltjes/Laplace convolution symbol G(t) g(t), c.d.f and p.d.f of repair time n1/n2 Production by one unit/two units Op/Cs Operative/cold standby unit Qij(t), qij(t) c.d.f. and p.d.f of first passage time from “i” regenerative state to a regenerative/failed state “j” without visiting other regenerative states in (0, t] l The constant failure rate of the operative unit b Activation rate of the cold standby unit ðkÞ ðkÞ qij ðtÞ; Qij ðtÞ p.d.f and c.d.f of first passage time from “i” regenerative state to a regenerative/failed state “j” without visiting “k” state once in (0, t] ðkÞ
pij ; pij Probability of transition from an “i” regenerative state to “j” regenerative state without/with visiting “k” state once in (0, t] ’ Derivative symbol g11/g22 Rate of increase/decrease of demand when demand is at least equal to the production made by one unit and less than production by two units (n1 d < n2) g12 Rate of decrease in demand when demand is less than production by one unit (d < n1) g21 Rate of further increase of demand when demand is at least equal to production by two units (d n2)
1. Introduction The actual-global industrial systems are getting increasingly more complicated as there is peak demand for automation in this era. The example of such real-world industrial systems is sustainable energy systems, transportation systems, wireless communication networks, robotic systems, control systems, power systems, mission tool area, and Engineering Reliability and Risk Assessment ISBN 978-0-323-91943-2, https://doi.org/10.1016/B978-0-323-91943-2.00004-6
© 2023 Elsevier Inc. All rights reserved.
35
36
Engineering Reliability and Risk Assessment
manufacturing plants [1e33]; Though there is an enhancement in technology each day, yet the failure in the devices/structures is an unavoidable phenomenon. Such failures may additionally lead to heavy losses. One of the methods to minimize losses is to use redundancy [6]. Redundant systems often find functions in a variety of industrial and different setups. These systems are expensive and require more availability, reliability with excessive productivity, and can be stepped forward by term redundancy In this vital concept, distinct devices operate online with a supplementary unit used as redundant (standby), which takes over the operation, whenever required. Three types of redundant units [34] are hot, warm, and cold. In hot standby, the redundant unit works with the primary system simultaneously. In the warm redundant, the process already resides in memory, the redundant unit needs only milliseconds to operate. A cold redundant unit is an off-line unit that cannot fail and fully unloaded. Extensive research efforts [35] have been made on reliability issues for redundant systems by numerous authors, for example, [36]; and [37] evaluated the reliability and profit by using the mastereslave concept and repair facility of a standby the system. [38] developed a reliability model of cold standby with imperfect backup. [4,8e10,39,11,12,40] discussed several reliability problems of standbys, dynamic parameters tuning, replacement policy, optimization, and inspections. [41,42] discussed modeling for cell migration and optimal treatment decisions. All these existing works cause difficulty in situations where the demand for units [43] created is not fixed. So, there is a requirement for examining unwavering quality measures with varying demand. [25,44-46] discussed stochastic models with variation in demand, but the concept of activation time with varying demand has not been considered so far. Malhotra and Taneja [47] also compared the models with varying demand. This chapter focuses on the 2-unit cold redundant system by introducing activation time with variation in demand. As the proposed model is general, it can be used by any company where the same concept is used. This helps the system analyst/ user to check the availability of the system. When availability will be more, the system will be more profitable. Data has been collected from the “General Cable Energy System” to validate the results. The chapter has been organized as follows: Section 2 deals with the assumption of the proposed model. Section 3 includes nomenclature. Description of the model is given in Section 4. The transition probabilities, steady-state probabilities, mean sourjan times are also calculated. MTSF, different cases of availability depending upon demand are discussed in Section 5. Graphical interpretations and conclusions are drawn in Section 6.
2. Assumptions for proposed model 1. There are two similar units in a system. 2. Demand can be decreased only when two units are operative.
Reliability and availability analysis of a standby system with activation time and varying demand
3. 5. 6. 7. 8. 9.
There is one repairman who is always available. Each unit is new after repair. Activation time for a repaired unit to put it back to operation is not required. The standby unit gets priority for activation over the failed unit in model. During activation time, no other event takes place. Random variables are taken independently. Repair and failure times follow an exponential distribution with different rates.
3. Proposed system (model) Fig. 3.1 illustrates different states of a two-unit cold standby (Model). At the beginning (state S0), one unit is operative and the other is cold standby. The epochs of entry into states S10, S7, S6, S5, S4, S3, S2, S1, and S0 are regeneration points. Thus, Si are regenerative states for i ¼ 0,1,2,3,4,5,6,7,10 and for i ¼ 8,9,11,12,13 states are non regenerative states. Further, the system is down in S2, and S4, these states are down states. System is not working in failed states S9, S12, S13. To analyze purposed semi-Markov processes (SMP) model, the author required dealing with the following parameters: (i) Transition probabilities between different states ‘i’ and ‘j’ (ii) Steady-state probabilities between different states ‘i’ and ‘j’ (iii) Mean sojourn times (mi) in state ‘i’ The description of the various states and possible transitions from one state to another state are: (Table 3.1)
Figure 3.1 State transition diagram.
37
38
Engineering Reliability and Risk Assessment
Table 3.1 Description of states. State
Description of states
S0 ¼ (Op, Cs) S1 ¼ (Op, Csa)
Demand is less than the production by one unit (d < n1) Demand is at least equal to the production made by one unit than production by two units (n1 d < n2) Demand is less than the production by one unit (d < n1) Demand is at least equal to the production made by one unit than production by two units (n1 d < n2) Demand is at least equal to the production made by one unit than production by two units (n1 d < n2) Demand is less than the production by one unit (d < n1) Demand is at least equal to production by two units (d n2) Demand is at least equal to the production made by one unit than production by two units (n1 d < n2) Demand is at least equal to the production made by one unit than production by two units (n1 d < n2) Demand is less than the production by one unit (d < n1) Demand is at least equal to production by two units (d n2) Demand is at least equal to production by two units (d n2) Demand is at least equal to the production made by one unit than production by two units (n1 d < n2) Demand is at least equal to production by two units (d n2)
S2 ¼ (Fw, Csa) S3 ¼ (Op, Op) S4 ¼ (Fw, Csa) S5 ¼ (Fr, Op) S6 ¼ (Op, Op) S7 ¼ (Fr, Op) S8 ¼ (FR, Op) S9 ¼ (FR, Fw) S10 ¼ (Fr, Op) S11 ¼ (FR, Op) S12 ¼ (FR,Fw) S13 ¼ (FR, Fw)
and less
and less and less
and less and less
and less
4. Description of model Initially in a two-unit cold redundant system one unit is operative and the other is redundant (state S0). Also, demand is assumed to be less than the production by a single unit. The system moves to the downstate S2 if the operative unit fails and a redundant unit needs to activate. It may go to the state S1 if demand increases by the production of a single unit and less than that produced by two units. After activation, the system may move either state S2 to state S5, the state where the failed unit is under repair, and the redundant unit starts operating. The system goes to state S8 or S0 depending on the demand increased or decreased. If the other unit also fails, the system moves to state S9. Here, either the system gets repaired, it moves back to state S5, or demand increases, it will move to state S12. If demand further increases, the system goes to state S13. Similarly, the system moves from state S1 to state S3 where both the units are operating. If the demand increases, the system moves to the state S6 else if one of the units fails, the system may move to the state S7. If the unit gets repaired, the system moves back to state S3, else it moves to state S11 if demand further increases.
Reliability and availability analysis of a standby system with activation time and varying demand
4.1 Mean sojourn times and transition probabilities Probabilities (transition) are time-dependent and describe the movement from one state to another in a single step. These probabilities are given as follows: q01 ðtÞ ¼ g11 eðg11 þlÞt q02 ðtÞ ¼ leðg11 þlÞt q13 ðtÞ ¼ beðbþlÞt q14 ðtÞ ¼ leðbþlÞt q25 ðtÞ ¼ bebt q30 ðtÞ ¼ g12 eðg12 þg21 þlÞt q36 ðtÞ ¼ g21 eðg12 þg21 þlÞt q37 ðtÞ ¼ leðg12 þg21 þlÞt q47 ðtÞ ¼ bebt q50 ðtÞ ¼ gðtÞeðg11 þlÞt q853 ðtÞ ¼ gðtÞ g11 eðg11 þlÞt © eðg21 þlÞt q955 ðtÞ ¼ gðtÞ leðg11 þlÞt © eg11 t ` ð8;11Þ q5;6 ðtÞ ¼ gðtÞ g11 eðg11 þlÞt © g21 eðg21 þlÞt © elt ð8;12Þ q5;7 ðtÞ ¼ gðtÞ g11 eðg11 þlÞt © leðg21 þlÞt © eg21 t ð3;9Þ q2;6 ðtÞ ¼ gðtÞ leðg11 þlÞt ©g11 eg11 t © eg21 t ð8;11;13Þ ðtÞ ¼ gðtÞ g11 eðg11 þlÞt ©g21 eðg21 þlÞt ©l elt ©1 q5;10 ð8;12;13Þ ðtÞ ¼ gðtÞ g11 eðg11 þlÞt © leðg21 þlÞt ©g21 eg21 t ©1 q5;10 ð9;12;13Þ ðtÞ ¼ gðtÞ leðg11 þlÞt © g11 eg11 t ©g21 eg21 t ©1 q5;10 q63 ðtÞ ¼ g22 eðg22 þlÞt
39
40
Engineering Reliability and Risk Assessment
q6;10 ðtÞ ¼ leðg22 þlÞt q73 ðtÞ ¼ gðtÞeðg21 þlÞt q7;12 ðtÞ ¼ GðtÞ leðg21 þlÞt ðg21 þlÞt q11 © elt 7;6 ðtÞ ¼ gðtÞ g21 e ðg þlÞt g t 21 q12 © e 21 7;7 ðtÞ ¼ gðtÞ le ðg21 þlÞt © lelt q11 7;13 ðtÞ ¼ GðtÞ g21 e ð11;13Þ q7;13 ðtÞ ¼ gðtÞ g21 eðg21 þlÞt © lelt ©1 ð12;13Þ q7;13 ðtÞ ¼ gðtÞ leðg21 þlÞt © g21 eg21 t ©1 q10;6 ðtÞ ¼ gðtÞelt lt q13 ©1 10;10 ðtÞ ¼ gðtÞ le q10;13 ðtÞ ¼ GðtÞlelt For long run, nonzero steady-state probabilities pij are obtained as pij ¼ lim q*ij ðsÞ. s/0
p01 ¼
g11 g11 þ l
p14 ¼
p02 ¼
l g11 þ l
p13 ¼
b bþl
l g12 p25 ¼ 1 p30 ¼ bþl g12 þ g21 þ 2l
To shorten the length of the text, few probabilities are given, other probabilities are calculated in a similar way. Using these of probabilities, It can be validated that X ðkÞ pij ¼ 1 The mean sojourn time ðmi Þ in a specific state “i” is the total time a unit probably spends in a system before exiting it for betterment. mi in regenerative states “Si ” (i ¼ 10,7,6,5,4,3,2,1,0) are m0 ¼
1 g11 þ l
m1 ¼
1 bþl
m2 ¼
1 b
Reliability and availability analysis of a standby system with activation time and varying demand
1 g* ðg11 þ lÞ m5 ¼ g11 þ l 1 g* ðg21 þ lÞ 1 g* ðlÞ m7 ¼ m10 ¼ g21 þ l l
1 m3 ¼ g12 þ g11 þ 2l m6 ¼
1 g22 þ 2l
1 m4 ¼ b
Unconditional average time engaged by the system to passage for “j” state when counted from the epoch of entry into “i” is given by: ZN mij ¼
0
tqij ðtÞdt ¼ q*ij ð0Þ
0
Thus m01 þ m02 ¼ m0 m13 þ m14 ¼ m1 m25 ¼ m2 m30 þ m36 þ m37 ¼ m3 m47 ¼ m4 ð8;11Þ
m50 þ m853 þ m955 þ m56
ð8;12Þ
þ m57
ð9;12Þ
m57
ð8;12;13Þ
þ m5;10
ð8;11;13Þ
þ m5;10
ð9;12;13Þ
þ m5;10
¼ k1
m63 þ m6;10 ¼ m6 11 m73 þ m11 76 þ m7;13 þ m7;12 ¼ m10 ð11;13Þ
m73 þ m11 76 þ m7;10
ð12;13Þ
þ m12 77 þ m7;10
¼ k1
m10;6 þ m10;13 ¼ m10 Z where k1 ¼
m10;6 þ m13 10;10 ¼ k1 N
tgðtÞdt 0
4.2 Mean time to system failure (MTSF) Let 4i(t) (i ¼ 10,7,6,3,1,0) be c.d.f. of first passage time from “i” regenerative state to a failed state. Failure states are assumed as absorbing states. If a system reaches any absorbing
41
42
Engineering Reliability and Risk Assessment
state, the possibility of moving out of this state is 0. The total time required to reach such absorbing states is computed to yield MTSF. Following recursive relations for various 4i(t) are obtained by probabilistic arguments.
Take LaplaceeSteltjes Transform (LST) of relations given above and solve for 4** ðsÞ. If random variable T representing the time till failure of a machine starting with an initial operable state at t ¼ 0, then RðtÞ reliability of the device is given by: RðtÞ ¼ Pr½T > t ¼ 1 Pr½T t ¼ 1 4ðtÞ Take Laplace transform (LT) of the above relation, R* ðsÞ ¼ R(t) is R*(s). Thus,
14** ðsÞ .where s
LT of
1 4** ðsÞ ¼ N=D S/0 s
MTSFðMÞ ¼ lim R* ðsÞ ¼ lim S/0
where
h i N ¼ ðm0 þ p01 m1 Þ 1 p6;10 p10;6 ð1 p37 p73 Þ p63 p36 þ p37 p11 76 þ ðm3 þ p37 m10 Þp01 p13 1 p6;10 p10;6 þ m6 þ p6;10 m10 ð1 þ p01 p13 Þ p36 þ p37 p11 76 D ¼ 1 p6;10 p10;6 ð1 p37 p73 p01 p13 p30 Þ p63 p36 þ p37 p11 76
4.3 Availability analysis Due to breakdown if a system is unavailable, the concerned branch becomes interested in putting it back into operation with proper maintenance. In reality, availability is equally important as reliability because of extra costs and difficulty incurred when the system is not accessible. In general, availability “A(t)” is the probability that the system is currently in a nonfailure state at a time “t” although it may previously fail but be restored to its normal operating conditions. Let “t” tends to infinity and steady-state availability “A”
Reliability and availability analysis of a standby system with activation time and varying demand
is the limiting value of “A(t).” Availability (A) of the proposed system is the addition of availabilities evaluated in the following cases: (i) A single unit is operative (ii) Two units are operative It is noteworthy that such availability analysis is never done before in literature with variation in demand. 4.3.1 A single unit is operative When a single unit is operative, three subcases are identified on the basis of variation in demand (shown in Fig. 3.1). The system may available in any of the states mentioned below: (i) Production made by a single unit is greater than the demand. (ii) Demand production by a single unit and “ b a >
if b x c 0 ; me ; > 0 > A c b > > > : 0; if x < a0 ; x > c 0 8 x a > ; if a x b; > > b a ve A >
> A c b > > : 0; if x < a; x > c where a; a0 ; b; c 0 ; and c are all real numbers, me and ve are maximum and minimum A A degrees of membership and nonmembership, respectively, with conditions: 0 me ; ye A A i h 0 e 1 and 0 me þ ye 1 A TIFS A is denoted by A ¼ C ða ; b; c 0 Þ; me ; A A i A h ða; b; cÞ; ve D and represented pictorially in Fig. 4.1. A 2.3 Algebraic t-norm(TA) and t-conorm(SA) The t-norm(T) and t-conorm(S) are binary functions defined as T,S: ½0; 1 ½0; 1/½0; 1 and satisfy the four important conditions such as commutativity, associativity, monotonicity,
55
56
Engineering Reliability and Risk Assessment
i h i h Figure 4.1 A TIFS e A ¼ C ða0 ; b; c0 Þ; me ; ða; b; cÞ; ve D. A A
and boundary conditions [28]. The pairwise t-norm and t-conorm satisfy the De Morgan duality: Sðx; yÞ ¼ 1 dT ð1 dx; 1 dyÞ; cðx; yÞ ˛ ½0; 12 : In literature, variety of predefined pairwise t-norms and t-conorms exist such as algebraic, Einstein, Hamacher, and Frank t-conorms and t-norms, etc. The present study uses algebraic t-norm(TA) and tconorm(SA), which are defined as follows [28]: (a) Algebraic t-norm(TA) TA ðx; yÞ ¼ x y
(4.4)
SA ðx; yÞ ¼ x þ y x y
(4.5)
(b) Algebraic t-conorm(SA)
2.4 The fuzzy arithmetic operations defined on TIFS [29] # " h h i i Let e 2 ¼ C a02 ; b2 ; c20 ; m ; e 1 ¼ C a01 ; b1 ; c10 ; m ; ða1 ; b1 ; c1 Þ; v D and A A e e e A1 A1 A2 i h ða2 ; b2 ; c2 Þ; v e D be two TIFSs, then the algebraic t-norm and t-conorm-based four A2 basic fuzzy arithmetic operations are summarized in Table 4.1 [29]. 2.5 Failure probability evaluation for OR and AND nodes [6,30] h h i h i i e 1 ¼ C a01 ; b1 ; c10 ; m ; ða1 ; b1 ; c1 Þ; v e 2 ¼ C a02 ; b2 ; c20 ; m ; Let A D and A e e e A1 A1 A2 i h e 2 for ORe 1 WA ða2 ; b2 ; c2 Þ; v e D be two TIFSs, then the failure probability F A A2 e1 X A e 2 for AND-node can be defined as follows [6,30]: node and F A e1 W A e1 ; F A e2 e 2 ¼ Max F A F A (4.6)
Fuzzy attack tree analysis of security threat assessment
Table 4.1 Four basic fuzzy arithmetic operations defined on two TIFS Ai and A2 using algebraic t-norm and t-conorm [29]. Operation
Addition
Fuzzy expression
h i e 2 ¼ C a01 þa02 ; b1 þb2 ; c10 þc20 ; m þm m m ; e 1 4a A A e e e A1 A2 A1 e A2 i h ða1 þ a2 ; b1 þ b2 ; c1 þ c2 Þ; ve ve D A1 A2
Multiplication
Subtraction
Compliment
h
i
e 2 ¼ C a01 a02 ; b1 b2 ; c10 c20 ; m m ; e 1 5a A A e A1 e A2 i h 0 0 0 0 a1 a2 ; b1 b2 ; c1 c2 ; ve þ ve ve ve D A1 A2 A1 A2 h i e 2 ¼ C a01 c20 ; b1 b2 ; c10 a02 ; m v ; e 1 .a A A e A1 e A2 i h ða1 c2 ; b1 b2 ; c1 a2 Þ; ve þ me ve me D A1 A2 A1 A2 h i e 1 ¼ C 1 c10 ; 1 b1 ; 1 a01 ; v ; e 1.a A e A1 i h ð1 c1 ; 1 b1 ; 1 a1 Þ; me D A1
e1 X A e2 ¼ F A e 1 5F A e2 : F A
(4.7)
3. Proposed FATA method The following steps are executed in the proposed FATA method. Step 1. Development of attack-tree diagram: The information about the system is aggregated from available resources and integrated with experts for their advice, and then the attack tree diagram is prepared using OR/AND nodes denoting series/parallel goals as per the interaction among them. The attack tree can be understood by tracing back the whole process from the top goal to the bottom events. Step 2. Collection of failure intervals for all possible bottom events: The failure interval for each bottom event is collected from available resources and integrated with system experts. Based on suggestions, crisp uncertain failure data qi ¼ bi ; i ¼ 1; 2; :::n is converted into TIFSs h i i h and denoted as eqi ¼ C a0i ; bi ; ci0 ; meq ; ðai ; bi ; ci Þ; v eqi D; i ¼ 1; 2; :::n:. i Step 3. Computation of top goal fuzzy failure probability: h i i h Let eqi ¼ C a0i ; bi ; ci0 ; meqi ; ðai ; bi ; ci Þ; v eqi : D be the fuzzy failure probability of ith h i bottom event as obtained in Step 3. Let eqT ¼ C a0T ; bT ; cT0 ; meq ; T
57
58
Engineering Reliability and Risk Assessment
h i ðaT ; bT ; cT Þ; VeqT D be the top goal vague failure probability, then qT is obtained using attack tree diagram, Eqs. (4.6) and (4.7), and by applying fuzzy arithmetic operations given in Table 4.1. Step 4. Analysis of results and extraction of suggestions. The obtained results will be integrated with experts, and then some critical suggestions will be extracted. Based on the results, some future courses of action can be planned.
4. An illustrative application This section applies the proposed FATA method by analyzing the fuzzy failure probability of an internet security system failure during an attack [6]. An attack tree of an internet security system failure is shown in Fig. 4.2, which contains the main goal, the subgoals, the subtasks, and the physical tasks. The complete details of this FATA can be found in the work of Chang [6]. The description of bottom events and their crisp and fuzzy failure data values are provided in Table 4.2. To show the effectiveness of the proposed FATA method, the computed results are plotted and compared with results obtained from three other existing methods, such as the traditional method [31], Huang et al. [32] posbist FTA method, and Chang [6] FATA method. The detailed computation is given hereafter.
Figure 4.2 Attack tree diagram of the internet security system [6].
Table 4.2 The possible range of leaf node failure probabilities [6].
A B C D E1 E2 E3 E4 F G H I J K L M N O P Q
Cause description
Failure probability qpi
Crisp value of qpi
TIFS value of qpi denoted by e qpi ¼ C a0i ; bi ; ci0 ; mi ; ½ðai ; bi ; ci Þ; vi D
IP table configuration errors Address translation failure Authentication failure Firewall daemon configuration errors Scan failure Cleanse failure Audit failure Validation failure DNS configuration errors Monitor service failure Malicious access detected Transport layer security configuration errors Peer entity authentication Security parameter negotiation POP3 configuration errors Entity authentication Entry security parameter Security parameter authentication Key generation Data confidentiality
qA qB qC qD qE1 qE2 qE3 qE4 qF qG qH qI qJ qK qL qM qN qO qP qQ
0.35 0.15 0.10 0.05 0.13 0.11 0.05 0.04 0.10 0.05 0.40 0.14 0.20 0.30 0.18 0.25 0.40 0.09 0.07 0.04
C½ð0 : 33; C½ð0 : 12; C½ð0 : 90; C½ð0 : 04; C½ð0 : 12; C½ð0 : 10; C½ð0 : 04; C½ð0 : 03; C½ð0 : 09; C½ð0 : 04; C½ð0 : 37; C½ð0 : 13; C½ð0 : 16; C½ð0 : 28; C½ð0 : 17; C½ð0 : 24; C½ð0 : 39; C½ð0 : 08; C½ð0 : 06; C½ð0 : 03;
0 : 35; 0 : 15; 0 : 10; 0 : 05; 0 : 13; 0 : 11; 0 : 05; 0 : 04; 0 : 10; 0 : 05; 0 : 40; 0 : 14; 0 : 20; 0 : 30; 0 : 18; 0 : 25; 0 : 40; 0 : 09; 0 : 07; 0 : 04;
0 : 37Þ; 0 : 18Þ; 0 : 11Þ; 0 : 06Þ; 0 : 14Þ; 0 : 12Þ; 0 : 06Þ; 0 : 05Þ; 0 : 11Þ; 0 : 06Þ; 0 : 43Þ; 0 : 15Þ; 0 : 24Þ; 0 : 32Þ; 0 : 19Þ; 0 : 26Þ; 0 : 41Þ; 0 : 10Þ; 0 : 08Þ; 0 : 05Þ;
0 : 8; 0 : 9; 0 : 9; 0 : 8; 0 : 9; 0 : 9; 0 : 9; 0 : 9; 0 : 8; 0 : 9; 0 : 8; 0 : 8; 0 : 8; 0 : 8; 0 : 8; 0 : 8; 0 : 8; 0 : 8; 0 : 8; 0 : 9;
½ð0 : 30; ½ð0 : 10; ½ð0 : 80; ½ð0 : 04; ½ð0 : 11; ½ð0 : 10; ½ð0 : 03; ½ð0 : 03; ½ð0 : 08; ½ð0 : 04; ½ð0 : 35; ½ð0 : 12; ½ð0 : 15; ½ð0 : 27; ½ð0 : 16; ½ð0 : 22; ½ð0 : 38; ½ð0 : 07; ½ð0 : 05; ½ð0 : 03;
0 : 35; 0 : 15; 0 : 10; 0 : 05; 0 : 13; 0 : 11; 0 : 05; 0 : 04; 0 : 10; 0 : 05; 0 : 40; 0 : 14; 0 : 20; 0 : 30; 0 : 18; 0 : 25; 0 : 40; 0 : 09; 0 : 07; 0 : 04;
0 : 40Þ; 0 : 20Þ; 0 : 12Þ; 0 : 06Þ; 0 : 15Þ; 0 : 12Þ; 0 : 07Þ; 0 : 05Þ; 0 : 12Þ; 0 : 06Þ; 0 : 45Þ; 0 : 16Þ; 0 : 25Þ; 0 : 33Þ; 0 : 20Þ; 0 : 28Þ; 0 : 42Þ; 0 : 11Þ; 0 : 09Þ; 0 : 05Þ;
0 : 10D 0 : 00D 0 : 10D 0 : 10D 0 : 00D 0 : 00D 0 : 00D 0 : 00D 0 : 10D 0 : 00D 0 : 00D 0 : 10D 0 : 00D 0 : 10D 0 : 10D 0 : 00D 0 : 10D 0 : 00D 0 : 10D 0 : 10D
Fuzzy attack tree analysis of security threat assessment
Bottom event
59
60
Engineering Reliability and Risk Assessment
4.1 Results obtained from proposed FATA method This section provides results obtained from the proposed FATA method. The following attack tree, the fuzzy failure probabilities of subgoals (SSH, POP3, Transport Layer Security, DNS, Detection Service Failure, Firewall Daemon, and IP Table) can be calculated as follows. qand(SSH) ¼ qo 5a qp 5a qQ ¼ C½ð0:000144; 0:000252; 0:000400Þ; 0:576; ½ð0:000105; 0:000252; 0:000495Þ; 0:190D qand(POP3) ¼ qL 5a qM 5a qN : ¼ C½ð0:015912; 0:018000; 0:020254Þ; 0:512; ½ð0:013376; 0:018000; 0:023520Þ; 0:190D qand(Transport Layer Security) ¼ qI 5a qJ 5a qK ¼ C½ð0:005824; 0:008400; 0:011520Þ; 0:512; ½ð0:004860; 0:008400; 0:013200Þ; 0:190D qor(DNS) ¼ maxðqF ; qG ; qH Þ ¼ C½ð0:37; 0:40; 0:43Þ; 0:8; ½ð0:35; 0:40; 0:45Þ; 0:1D qand ðDetection Service FailureÞ ¼ qE1 5a qE2 5a qE3 5a qE4 ¼ C½ð0:000144; 0:000286; 0:000504Þ; 0:6561; ½ð0:0000099; 0:000286; 0:0000630Þ; 0D qor ðFirewall DaemonÞ ¼ maxqD ; qand ðDetection Service FailureÞ ¼ C½ð0:04; 0:05; 0:06Þ; 0:8; ½ð0:04; 0:05; 0:06Þ; 0:1D qor(IP Table) ¼ maxðqA ; qB ; qC Þ ¼ C½ð0:33; 0:35; 0:37Þ; 0:8; ½ð0:30; 0:35; 0:40Þ; 0:1D
Fuzzy attack tree analysis of security threat assessment
Finally, the top goal “Internet security system failure” during attack fuzzy probability is estimated as follows. qor (Internet Security System Failure) ¼ maxðqand ðSSHÞ; qand ðPOP3Þ; qand ðTransport Layer SecurityÞ; qor ðDNSÞ; qor ðFirewall DaemonÞ; qor ðIP TableÞÞ ¼ C½ð0:37; 0:40; 0:43Þ; 0:512; ½ð0:35; 0:40; 0:45Þ; 0:19D 4.2 Comparative analysis and discussion This section provides a comparative analysis of results obtained from the proposed and three other existing methods, along with a detailed discussion on it. From the proposed method, the probability of top event "Internet security system failure” during attack is obtained as a TIFS given by C½ð0:37; 0:40; 0:43Þ; 0:512; ½ð0:35; 0:40; 0:45Þ; 0:19D. However, the results obtained from three other existing methods, such as the traditional method [31], Huang et al. [32] posbist FTA method, and the Chang [6] FATA method, are available in the Chang [6] paper itself with complete details. The results are tabulated in Table 4.3 and plotted in Fig. 4.3. On comparing these results, it is easily seen that the developed FATA method optimizes the confidence values of the top event fuzzy failure probability. The traditional [31] and Huang et al. [32] posbist methods use crisp values and do not consider uncertainty and experts’ hesitation in data assessment. However, the Chang [6] FATA method considers both uncertainty and experts’ hesitation factor, but in their approach, minemin operator-based fuzzy arithmetic operations provide a compressed range of prediction of fuzzy failure probability, which sometimes is not realistic in nature. Therefore, the proposed method overcomes this difficulty by adopting algebraic t-norm and t-conorm-based arithmetic operations that provide an optimum range of fuzzy failure probability at each confidence level and seems realistic. Thus, the proposed FATA gives optimized results in comparison to other existing approaches under uncertainty and hesitation.
5. Conclusions and future scope The chapter has developed a novel FATA method for evaluating security threat assessments in an internet security system utilizing available information. The method applies ATA to model the system, TIFS to quantify data uncertainty in available information and the hesitation factor involved in experts’ judgment, while algebraic t-norm and tconorm-based arithmetic operations defined on TIFS are applied to evaluate membership and nonmembership degrees of top goal failure probability in terms of TIFS. The benefit of using algebraic t-norm and t-conorm-based arithmetic is that they involve membership and nonmembership degrees of each bottom event in evaluating top goal
61
Engineering Reliability and Risk Assessment
1
Degree of Membership
62
Traditional[6] Huang[6] Chang[6] Proposed Method
0.8
0.6
0.4
0.2
0 0.2
0.25
0.3
0.35
0.4
0.45
0.5
Failure Probability
Figure 4.3 Fuzzy failure probability of internet security system.
Table 4.3 Comparison between the proposed and existing methods. Method applied
Traditional method [6] Huang et al. Posbist method [6] Chang method [6] Proposed method
Input data type
Computed failure probability
Crisp
0.2359
Crisp
0.40
TIFS TIFS
C½ð0:37; 0:40; 0:43Þ; 0:8; ½ð0:35; 0:40; 0:45Þ; 0:1D C½ð0:37; 0:40; 0:43Þ; 0:512; ½ð0:35; 0:40; 0:45Þ; 0:19D
failure probability in terms of TIFS [15]. The method has been applied to evaluate security threat assessment in an internet security system. The results obtained from the proposed method are compared with three other existing ATA/FATA methods, such as the traditional method [6], Huang et al. [6,32] posbist method and Chang [6] method. The computed result confirms that the proposed method is more realistic than other studied methods for evaluating security threats as the method relies on TIFS and algebraic t-norm and t-conorm-based arithmetic operations. Based on the results, experts may suggest some appropriate and affordable action plans for minimizing the security threats in an internet security system in a more promising way. Some of the future scopes of the proposed FATA method are as follows: (a) The proposed FATA method can be applied to fault tree analysis of some critical systems related to patient health monitoring systems, waste management systems, computer security systems, etc.
Fuzzy attack tree analysis of security threat assessment
(b) The proposed FATA method may further be improvised for other fuzzy environments, and t-norms and t-conorms.
References [1] A.P.H. de Gusm~ao, M.M. Silva, T. Poleto, L. Silva, C.E. L ucio Camara, A.P.C.S. Costa, Cybersecurity risk analysis model using fault tree analysis and fuzzy decision theory, International Journal of Information Management 43 (2018) 248e260. [2] J.J. Zhao, S.Y. Zhao, Opportunities and threats: a security assessment of state e-government websites, Government Information Quarterly 27 (1) (2010) 49e56. [3] Y.L. Huang, A. Crdenas, S. Amin, Z.S. Lin, H.Y. Tsai, S. Sastry, Understanding the physical and economic consequences of attacks on control systems, International Journal of Critical Infrastructure Protection 2 (3) (2009) 73e83. [4] A.L. Opdahl, G. Sindre, Experimental comparison of attack trees and misuse cases for security threat identification, Information and Software Technology 51 (5) (2009) 916e932. [5] K. Wu, S. Ye, An information security threat assessment model based on Bayesian network and OWA operator, Applied Mathematics and Information Sciences 8 (2) (2014) 833e838. [6] K.H. Kuei-Hu Chang, Security threat assessment of an Internet security system using attack tree and vague sets, The Scientific World Journal 2014 (2014). Article ID 506714, 9 pages. [7] Z. Miao, Z. Wang, Fault-tree analysis on computer security system using intuitionistic fuzzy sets, in: Proceedings of 2009 4th International Conference on Computer Science & Education, IEEE, 2009, pp. 459e463. [8] M.M. Silva, A.P.H. de Gusmao, T. Poleto, L.C.E. Silva, A.P.C.S. Costa, A multidimensional approach to information security risk management using FMEA and fuzzy theory, International Journal of Information Management 34 (6) (2014) 733e740. [9] Komal, Fuzzy fault tree analysis for web access failure under uncertainty using a compensatory operator, in: Published in Reliability Management and Engineering: Challenges and Future Trends, CRC Press Taylor & Francis group, 2020, pp. 149e174, https://doi.org/10.1201/9780429268922-7. [10] Komal, Fuzzy fault tree analysis for controlling robot-related accidents involving humans in industrial plants: a case study, International Journal of Quality & Reliability Management 38 (6) (2021) 1342e1365. [11] A.F.H. Marco, S.G.G. David, C.S. Mario, J.P.A. Rolando, Fuzzy reliability centered maintenance considering personnel experience and only censored data, Computers & Industrial Engineering 158 (2021) 107440. [12] B. Schneier, Attack trees, Dr. Dobb’s Journal 24 (12) (1999) 21e29. [13] L.A. Zadeh, Fuzzy sets, Information and Control 8 (3) (1965) 338e353. [14] K.T. Atanassov, Intuitionistic fuzzy sets, Fuzzy Sets and Systems 20 (87) (1986) 96. [15] P. Liu, L. Yang, Z. Gao, S. Li, Y. Gao, Fault tree analysis combined with quantitative analysis for highspeed railway accidents, Safety Science 79 (2015) 344e357. [16] L. Abdullah, N. Zulkifli, H. Liao, E. Herrera-Viedma, A. Al-Barakati, An interval-valued intuitionistic fuzzy DEMATEL method combined with Choquet integral for sustainable solid waste management, Engineering Applications of Artificial Intelligence 82 (2019) 207215. [17] A.R. Mishra, A. Mardani, P. Rani, E.K. Zavadskas, A novel EDAS approach on intuitionistic fuzzy set for assessment of health-care waste disposal technology using new parametric divergence measures, Journal of Cleaner Production 272 (2020) 122807. [18] R.R. Yager, A.M. Abbasov, Pythagorean membership grades, complex numbers, and decision making, International Journal of Intelligent Systems 28 (2013) 436e452. [19] J. Yang, W. Zhou, S. Li, Similarity measure for multi- granularity rough approximations of vague sets, Journal of Intelligent and Fuzzy Systems 40 (1) (2021) 1609e1621. [20] Y.A. Mahmood, A. Ahmadi, A.K. Verma, A. Srividya, U. Kumar, Fuzzy fault tree analysis: a review of concept and application, International Journal of System Assurance Engineering and Management 4 (1) (2013) 19e32.
63
64
Engineering Reliability and Risk Assessment
[21] S.M. Chen, Analyzing fuzzy system reliability using vague sets theory, International Journal of Applied Science & Engineering 1 (2003) 8288. [22] J.R. Chang, K.H. Chang, S.H. Liao, C.H. Cheng, The reliability of general vague fault tree analysis on weapon systems fault diagnosis, Soft Computing 10 (2006) 531e542. [23] M. Kumar, S.P. Yadav, The weakest t-norm based intuitionistic fuzzy fault-tree analysis to evaluate system reliability, ISA Transactions 51 (2012) 531e538. [24] M.H. Shu, C.H. Cheng, J.R. Chang, Using intuitionistic fuzzy sets for fault-tree analysis on printed circuit board assembly, Microelectronics Reliability 46 (2006) 2139e2148. [25] K.H. Chang, C.H. Cheng, Y.C. Chang, Reliability assessment of an aircraft propulsion system using IFS and OWA tree, Engineering Optimization 40 (2008) 907e921. [26] K.H. Chang, C.H. Cheng, A novel general approach to evaluating the PCBA for components with different membership function, Applied Soft Computing 9 (3) (2009) 1044e1056. [27] M. Kumar, Applying weakest t-norm based approximate fuzzy arithmetic operations on different types of intuitionistic fuzzy numbers to evaluate reliability of PCBA fault, Applied Soft Computing 23 (2014) 387406. [28] M. Xia, Z. Xu, B. Zhu, Some issues on intuitionistic fuzzy aggregation operators based on archimedean t-conorm and t-norm, Knowledge-Based Systems 31 (2012) 78e88. [29] S.P. Wan, Power average operators of trapezoidal intuitionistic fuzzy numbers and application to multi-attribute group decision making, Applied Mathematical Modelling 37 (2013) 4112e4126. [30] R.R. Yager, OWA trees and their role in security modeling using attack trees, Information Sciences 176 (20) (2006) 2933e2959. [31] A. Park, S.J. Lee, Fault tree analysis on handwashing for hygiene management, Food Control 20 (2009) 223e229. [32] H.Z. Huang, X. Tong, M.J. Zuo, Posbist fault tree analysis of coherent systems, Reliability Engineering & System Safety 84 (2) (2004) 141e148.
CHAPTER 5
A new flexible extension to a lifetime distributions, properties, inference, and applications in engineering science Tabassum Naz Sindhu1, Zawar Hussain2 and Anum Shafiq3 1
Department of Statistics, Quaid-i-Azam University, Islamabad, Pakistan; 2Department of Statistics, Faculty of Computing, The Islamia University of Bahawalpur, Bahawalpur, Pakistan; 3School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing, China
Nomenclature Symbols
^ g tj D PDF ^ R tj D RF ^ H tj D CHRF ^ Y tj D Mills Ratio ^ G tj D CDF ^ h tj D HRF ^ Q q; D QF ^ R tj D RF
Abbreviations CDF Cumulative Distribution Function CHRF Cumulative Hazard Rate Function HRF Hazard Rate Function LSE Least Square Estimation MLE Maximum likelihood Estimation MSE Mean square error MTTF Mean Time to Failure MTTR Mean Time to Repair
Engineering Reliability and Risk Assessment ISBN 978-0-323-91943-2, https://doi.org/10.1016/B978-0-323-91943-2.00005-8
© 2023 Elsevier Inc. All rights reserved.
65
66
Engineering Reliability and Risk Assessment
PDF Probability Density Function QF Quantile Function RF Reliability Function TTF Time-to-Failure
1. Introduction As a discipline of statistics, reliability and survival analysis has a variety of usages in domains such as engineering, actuarial science, biomedical investigations, demography, and industrial reliability. In the statistical literature, several lifetime models have been developed to model data in a variety of practicable fields. The need for appropriate models and statistical distributions has become significant, as developing new distributions will allow us to more accurately interpret and anticipate phenomenal and experimental data, for more information and extra details we can refer to Refs. [1e6]. Because of its lack of memory particularity and mathematical tractability, the exponential distribution is employed in modeling actual data. However, because it only has a constant hazard rate and a decreasing density function, its applicability is limited. As a matter of fact, numerous researchers have proposed modified variations of the exponential distribution apt to expand the resilience of the model. Some novel modification of the exponential distribution for instance, exponentiated exponential [7], modified exponential [8], MarshalleOlkin logistic exponential [9], odd inverse Pareto exponential [10], heavy-tailed exponential [11], modified Kies exponential [12], log-logistic Lindley exponential [13], alpha power exponential [14], beta generalized exponential [15], logistic exponential (LE) distribution [16]. Researchers are still motivated to seek out new distributions in this regard. As a result, we were excited to introduce a model using the logistic exponential-G (LE-G) family. Under LE-G family, the inverse exponential (IE) distribution is employed for the first time. There were no previous studies on the subject. The purpose of this study is to suggest a novel flexible distribution family based on the logistic exponential distribution. The logistic exponential-G (LE-G) family is the name given to the new family, and its mathematical properties are thoroughly described. In this method, the baseline distribution’s flexibility will be increased to model the data. In particular, the capacity to model data with increasing, decreasing, and unimodal shaped failure rates is the driving force behind the new LE-G family. Furthermore, it has been demonstrated that the special models of this family produce superior fits than competing models. The new family appears to be an essential family that may be employed in a range of situations in modeling survival data due to its flexibility in supporting different forms of the risk function. Let Gðt; LÞ and gðt; LÞ be the baseline CDF and PDF belong a random variable, respectively. Using the LE distribution and the (T-X) generator of Alzaatreh et al. [17], the CDF and PDF of the LE-G family are (for t > 0 ) by
A new flexible extension to a lifetime distributions, properties, inference, and applications in engineering science
Gðt;LÞ Z
FðtjL; m; xÞ ¼ 0
¼ 1
mxðexpfxtg 1Þm1 expfxtg ðexpfxtg 1Þm þ ð1Þ2
dz
1 ðexpfxGðt; LÞg 1Þm þ 1
(5.1)
and f ðtjL; m; xÞ ¼
mxðexpfxGðt; LÞg 1Þm1 expfxGðt; LÞg gðt; LÞ ; ðexpfxGðt; LÞg 1Þm þ ð1Þ2
(5.2)
where gðt; LÞ ¼ dGðt; LÞ=dt and L represents the vector of parameters for baseline cdf G. T w LE-G( L; m; x ) will now be utilized to denote a density-based random variable (2). The reliability function (RF), hazard rate function (HRF), and cumulative hazard function (CHF) are all functions that can be used to calculate the reliability of a system are 1 ; ðexpfxGðt; LÞg 1Þm þ 1
(5.3)
mxðexpfxGðt; LÞg 1Þm1 expfxGðt; LÞg gðt; LÞ ; ðexpfxGðt; LÞg 1Þm þ ð1Þ
(5.4)
RðtjL; m; xÞ ¼ hðtjL; m; xÞ ¼ and
HðtjL; m; xÞ ¼ logððexpfxGðt; LÞg 1Þm þ 1Þ:
(5.5)
Furthermore, the following are the primary motivations for employing the LE-G family in practice: • To define remarkable models for all sorts of hrf, as a result, the given model can be implemented to study a variety of datasets. • Since the LE-G model’s PDF and CDF are smooth closed forms, it can be used to analyze censored data. • It is more adaptable than popular versions, such as the inverse exponential model. • It can be applied to data that is leptokurtic or platykurtic in shape. • It can be used to model data that is both overdispersed and underdispersed • It can generate distributions with reversed-J shape, right-skewed, or symmetric. The leftover portion of the article is formatted as shown. We present one specific model in Section 2 together with plots of their PDFs and HRFs. We derive some of its generic reliability features in Section 3, such as Cumulative Hazard Rate Function, Mills Ratio, Mean Time to Failure, and Mean Time to Repair. In Section 4, the model parameters are estimated using maximum likelihood estimation (MLE) and Least Square
67
68
Engineering Reliability and Risk Assessment
Estimation (LSE), and simulation results are presented to assess the performance of the maximum likelihood method. To demonstrate the flexibility of the new family, we present three applications in Section 5.
2. Special model One unique LE-G model is shown in this section. When GðtÞ and gðtÞ have concise, analytic formulations, the PDF (2) will be the most tractable. 2.1 The LE-inverse exponential (LE-IE) model The LE-IE distribution is defined from Eq. (5.1) by taking GðtjlÞ ¼ exp
lt
, CDF
of the inverse exponential (IE) distribution a scale positive parameter l: We derive the CDF of the LE-IE model employing the CDF of the IE in Eq. (5.1), which may be represented as ^ 1 m F tjD ¼ 1 ; t; > 0; l > 0; (5.6) l exp x exp 1 þ1 t and m1 l l l l mx 2 exp x exp exp x exp 1 exp ^ t t t t ; (5.7) f tjD ¼ 2 m l exp x exp 1 þ1 t ^
where D ¼ ðl; m; xÞ and ðl; xÞ are the positive scale parameters, while m is positive shape parameter. ^ ^ Fig. 5.1 shows several graphs of f tj D and h tj D for various parameter values. ^ The abovementioned PDF and hrf demonstrate how the parametric vector D affects the density of LE-IE model. We may point out that the values for parameters were chosen randomly until a wide variety of shapes for the parameters in interest could be captured. The hrf of LE-IE model can clearly accept failure rate with increasing, bathtub-shaped, increasing-decreasing-constant, decreasing, and upside down bathtubshaped (or unimodal) as shown in the figure. As a result, the LE-IE distribution can be employed to evaluate a wide range of data in a variety of applications.
A new flexible extension to a lifetime distributions, properties, inference, and applications in engineering science
Figure 5.1 Plots of PDF and HRF of LE-IE distribution.
69
70
Engineering Reliability and Risk Assessment
2.2 Quantile function (QF) Hyndman and Fan [18] first proposed the notion of a quantile function (QF). The quantile of LE-IE model is presented here, including a few usages. The quantile function of a distribution has several uses in both theoretical and applied statistics, including means for model parameter estimation, generating random data, studying skewness and kurtosis, ^ and computing some model features, among others. Suppose that F tjD be the CDF of LE-IE model at qth quantiles Qq . Then the qth quantile of the LE-IE Model is ^ l 8 19; 0 < q < 1: 0 Q q; D ¼ (5.8) > 1 > =
> x 1 q ; : ^ e T ¼ Q 0:5; D gives the median of T : The other partition values can be explained in a similar way. In specific, by putting q ¼ ð0:25; 0:75Þ in Eq. (5.8), the first and third quartiles are attained. The accompanying quantile density function is provided by the differentiation of f ðqÞ:
11 m q ðq 1Þ2 1 q f ðqÞ ¼ 8 8 9 8 92 8 9932 : > > > > > 1 > 1 > 1 > < = = < < < m m m == q q 1 q 6 7 log 1 þ m 1þ log 1 þ 4log 5 > > > > > > > > 1 q 1 q x 1 q : ; : ; ;; : : l
(5.9) The maps of the QF and quantile density function are given in Fig. 5.2. The Bowley [19] skewness, say S ^ , and Moors [20] kurtosis, say K ^ , of the LE-IE distribution can be D D calculated using Eq. (5.9) with the following two formulas: Based on partition measures, the assessment of variability of skewness and kurtosis of T can be studied. The Bowley skewness is: ^ ^ ^ Q 0:75jq; D 2Q 0:5jq; D þ Q 0:25jq; D ; (5.10) S^ ¼ D IQR and the Moors’ kurtosis is:
A new flexible extension to a lifetime distributions, properties, inference, and applications in engineering science
Figure 5.2 Plots of QF and quantile density function f ðqÞ of LE-IE model along with m and q at different levels of.x:
^ ^ ^ ^ Q 0:875jq; D Q 0:625jq; D þ Q 0:375jq; D Q 0:125jq; D K^ ¼ D
S^
D
:
IQR
(5.11) ¼ 0 for symmetric distributions and S ^ > 0 or S ^ < 0 for right (or left) skewed D
D
distributions. The tail of the model gets heavier as K ^ rises. Eqs. (5.10) and (5.11) are used D to get the Galtons’ skewness and Moors’ kurtosis, where the QF is specified in (8), to ^
explore the influence of the parametric vector D on the LE-IE distribution. Fig. 5.3 shows Galton’s skewness and Moors’ kurtosis for the LE-IE model. The LE-IE model can be right skewed, or symmetric, as shown in Fig. 5.3. For fixed value of l; median kurtosis and skewness are decreasing functions of x and m: As the smaller inputs of the parameter x along m contribute the higher change in median curve and the value of S ^ : Also median yields lower values when x approaches to 2. On the D other hand, significant change in the skewness behavior is noticed along m for smaller values of x but as x increases, it stays unchanged : In the case of lesser values of, m, and a fixed level of l results in a quick shift in values of K ^ : D
3. Reliability measures We will look over some of the most prevalent reliability measures in this section. In reality, the most relevant and successful metrics for a particular device must be selected based on the manufacturer’s uniqueness and usage. 3.1 Failure function The failure function specifies the probability that an item will fail before or at the time of operation t; which is a basic reliability measure [21,22]. Based on the operating pattern
71
72
Engineering Reliability and Risk Assessment
Figure 5.3 Fluctuations of median, skewness and kurtosis of LE-IE model along with m and x at different levels of l.
and system usage, t is employed in a in a broad sense and can have units such as number of cycles, miles, flight hours, number of landings, etc. It is, the probability that the time-tofailure random variable T t (in this case, operational time). So, the failure function of the LE-IE reliability model is ^ 1 m F tjD ¼ 1 : (5.12) l exp x exp 1 þ1 t 3.2 Reliability function The capability of an object to perform the desired function for a certain period of time under specified operating conditions is known as reliability [23]. The probability that the device will not fail within the specified length of time t under specified operating conditions is defined by the reliability function RðtÞ. If TTF (time-to-failure) is the random variable, and FðtÞ denotes the failure function, then the reliability function RðtÞ is.
A new flexible extension to a lifetime distributions, properties, inference, and applications in engineering science
^ 1 m R tjD ¼ : l exp x exp 1 þ1 t
(5.13)
3.3 Hazard function In reliability theory, the failure rate function [24e26] hazard function (or hazard rate) is employed as a parameter for comparing two distinct designs. The hazard function is a metric of how age affects the system’s reliability. It evaluates the risk of failure as the system gets older. The hðtÞ, is not a probability; rather, it is the limiting value of the probability. The hazard function of the LE-IE reliability model is m1 l l l l mx 2 exp x exp exp x exp 1 exp ^ t t t t m h tjD ¼ : l 1 þ1 exp x exp t (5.14) 3.4 Mills ratio Mills Ratio is not a common approach to describe reliability, but it is noteworthy with its link to failure rate [27]. m ^ l R tjD exp x exp 1 þ1 ^ t Y tjD ¼ ¼ m1 : ^ l l l l f tjD mx 2 exp x exp exp x exp 1 exp t t t t (5.15) 3.5 Cumulative hazard rate function The CHRF is also called the integrated HRF. The CHRF is not a probability [28]. It is also, however, a measure of risk: the higher the HðtÞ value, the higher the risk of failure by t time (Fig. 5.4). ^ h yjD dy ¼ log½SðtÞ:
(5.16)
SðtÞ ¼ eHðtÞ and f ðtÞ ¼ hðtÞ eHðtÞ :
(5.17)
Zt HðtÞ ¼ 0
It is noted that
73
74
Engineering Reliability and Risk Assessment
Figure 5.4 Fluctuations of chrf of LE-IE distribution.
Therefore,
^
H tjD
m l ¼ log exp x exp 1 þ1 : t
(5.18)
3.6 Mean time to failure (MTTF) and mean time to repair (MTTR) MTTF and MTTR are reliability terms that refers to methodologies and procedures for predicting a product’s longevity. When deciding which product to buy for their application, users frequently need to consider reliability data. MTTF and MTTR are methods for calculating a numeric value depending on a set of data in order to quantify a failure rate and the time it takes for expected performance to occur. Furthermore, predicting the MTTF and MTTR is required in order to develop and produce a sustainable system [29]. ^
If T w LE-IE (D ), then reliability function is used to express MTTF, which is as follows: þN Z
MTTF ¼
RðtÞ dt; 0
where RðtÞ is given in Eq. (5.13).
(5.19)
A new flexible extension to a lifetime distributions, properties, inference, and applications in engineering science
The average time it takes to repair a failed device is known as the mean time to repair (MTTR), and its value is determined by maintenance conditions [30,31]. The mean life is a universal measure that no longer includes time-dependent information that has been earlier stored in f ðtÞ , FðtÞ , RðtÞ, and hðtÞ. If we assume that the time it takes to repair System t follows LE-IE distribution with parameters l; m and x, we can calculate the MTTR as follows: þN Z
MTTR ¼
tf ðtÞdt;
(5.20)
0
where f ðtÞ is given in Eq. (5.2). MTTR is a tough metric to compute, and it’s usually established empirically by referring to previous repairs [30,32].
4. Estimation inference via simulation Several statistical characteristics of the LE-IE distribution are contributed to this section, considering that parametric vector is unknown. The assessment of parametric vector is carried out by the two well-known estimation methods such as maximum likelihood estimation [33,34] and least square estimation (LSE), from now; t1 ; t2 ; :::; tn represent n observed values from T and their ascending ordering values. tð1Þ tð2Þ ::: tðnÞ : 4.1 Maximum likelihood estimation (MLE) ^
Let t1 ; t2 ; :::; tn be n observed values from the LE-IE model and D be the vector of un^
known parameters. The assessments of MLEs of D can be provided by optimizing the ^ ^ Qn likelihood function with respect to l; x; and m given by L tj D ¼ i ¼ 1 f ti ; D ^
or likewise the log-likelihood function for D given by n ^ Y lðtjQÞ ¼ ln f ti ; D
^
l tjD
¼ n logðlÞ þ n logðxÞ þ n logðmÞ 2
þx
n X i¼1
(5.21)
i¼1
l exp x exp 1 ti
n P i¼1
log ti þ ðm 1Þ
n X
log
i¼1
m n n X X l l exp þ l ti 2 log 1 þ exp x exp 1 : ti ti i¼1 i¼1 (5.22)
75
76
Engineering Reliability and Risk Assessment
By maximizing the log likelihood function, maximum likelihood estimators for the parameters can be derived. They can be found by evaluating nonlinear equations. ^
^
^
vl tj D vl ¼ 0; vl tj D vx ¼ 0 , and vl tj D vm ¼ 0, where ^ vl tjD vl
l n n n exp n X X X X n ai bi 1 a dm1 ti i i m ; ¼ þ ð1 mÞ x x þ 2mx ti l t gi i¼1 i¼1 i i¼1 i¼1 1 þ di ti ^
(5.23)
n n n X X n X l ai dim1 ai bi exp 2m ; ¼ þ m þ ðm 1Þ vx x i¼1 ti di 1 þ d i i¼1 i¼1 ^ vl tjD n n X n dmi logðdi Þ X ¼ 2 logðdi Þm1 log½logðdi Þ; m þ vm m 1 þ d i i¼1 i¼1
vl tjD
where
(5.25)
ai ¼ bi ¼ gi ¼ di ¼
l l exp x exp ; ti ti m2 l 1 log exp x exp ; ti l 1 ti ; exp x exp ti l exp x exp 1: ti
(5.24)
(5.26)
Although these equations cannot be analytically solved, we use statistical software through iterative approach such as Newton method or fixed-point iteration methods can be used to solve them. We require the information matrix, which can be obtained by taking the expectation of the second-order derivative, to determine correlated mathematical quantities such as the variance covariance matrix and the confidence interval for parameters. 4.2 Least square estimation (LSE) LSE is introduced to estimate beta model’s parameters in Ref. [35]. By minimizing the following function, least square estimates are obtained.
A new flexible extension to a lifetime distributions, properties, inference, and applications in engineering science
n ^ X ^ LSE D ¼ F tjD i¼1
i nþ1
2 (5.27)
32 2 n 6 X ^ 1 i 7 7 : 61 m LSE D ¼ (5.28) 5 4 l n þ 1 i¼1 exp x exp 1 þ1 tðiÞ ^ Minimizing LSE D with respect to l; x; and m we have following system of nonlinear equations: ^ vLSE D vl
¼ 2
3
2
n X mxai ðdi Þm1 6 1 i 7 7 ¼ 0; 61 m
5 4 m 2 l n þ 1 t 1 þ d i¼1 ðiÞ i exp x exp 1 þ1 tðiÞ
^ vLSE D vx
2 ¼2
(5.29) 3
n X mai ðdi Þm1 6 1 i 7 7 61 m
5 m 2 4 l n þ 1 i¼1 1 þ di exp x exp 1 þ1 tðiÞ
¼ 0; (5.30) and
^ vLSE D vm
2 ¼2
3
n X ðdi Þm log di 6 1 i 7 7 61 m
5 m 2 4 l n þ 1 1 þ d i¼1 i exp x exp 1 þ1 tðiÞ
¼ 0; (5.31) where ai and di are given in Eq. (5.26) respectively. To obtain estimates, this system of nonlinear equations can be numerically solved using any software.
77
78
Engineering Reliability and Risk Assessment
4.3 Simulation study A brief simulation is run to evaluate the performance of the maximum likelihood and least square methods for estimating the parameters. The estimators of parameters of current model have been evaluated by simulating: ðl; x; mÞ ¼ fð2; 3:8; 1:4Þ and ð3; 2:5; 4:5Þg. The density functions of the LE-IE model for these choices are shown in Fig. 5.5. With respect to various sample sizes, we evaluated the MLE and LSE approaches’ performance. The biases, or MSEs, of parameter estimates are evaluated. The validity of the estimators has been assessed using bias and the MSE of estimators. The efficiency of each parameter estimation approach for the LE-IE model in terms of sample size n is considered. Simulation study is executed for this purpose on the basis of given steps: 1. Using the LE-IE model, generate 1000 samples of size n ¼ 10; 20; :::; 900 at various parameter values. ^
2. Compute the MLEs and LSEs for the 1000 samples, say q j for j ¼ 1; 2; :::; 1000: 3. Calculate biases and MSEs. These objectives are obtained with the help of the following formulas:
Biasq ðnÞ ¼
X ^ 1 1000 qj q ; 1000 j ¼ 1
2 X ^ 1 1000 MSEq ðnÞ ¼ qj q ; 1000 j ¼ 1
(5.32)
(5.33)
where. q ¼ ðl; x; mÞ The results of simulations of this subsection are indicated in Figs. 5.6e5.9. These empirical findings show that the proposed estimate methods do a good job of estimating
Figure 5.5 Fluctuation of theoretical and simulated densities under different parametric values.
A new flexible extension to a lifetime distributions, properties, inference, and applications in engineering science
Figure 5.6 Fluctuations of bias of estimations under MLE and LSE for parameter values l ¼ 2; x ¼ 3:8 and.m ¼ 1:4:
the LE-IE distribution’s model parameters. Because the bias tends to zero as n increases, we can deduce that the estimators exhibit the attribute of asymptotic unbiasedness. The mean squared error behavior, on the other hand, indicates consistency because the errors tend to zero as n increases. From Figs. 5.6e5.9, the following observations can be extracted.
79
80
Engineering Reliability and Risk Assessment
Figure 5.7 Fluctuations of MSE of estimations under MLE and LSE for parameter values l ¼ 2; x ¼ 3:8 and.m ¼ 1:4:
• l ¼ 3;x ¼ 2:5The bias of b l; b x and m b decreases as n increases for both estimation approaches. • For both estimation approaches, the biases are generally positive except bias of b x under MLE approach.
A new flexible extension to a lifetime distributions, properties, inference, and applications in engineering science
Figure 5.8 Fluctuations of bias of estimations under MLE and LSE for parameter values l ¼ 3; x ¼ 2:5 and.m ¼ 4:5.
• • • •
For the LSE, the bias of parameters is higher than MLE approach (Figs. 5.6 and 5.8). Under LSE approach, the higher MSE of parameter b x and m b are observed (Figs. 5.7 and 5.9). In terms of bias, the performances of the LSE, is the worst (Figs. 5.6 and 5.8). The estimates under LSE approach are mostly overestimated.
81
82
Engineering Reliability and Risk Assessment
Figure 5.9 Fluctuations of MSE of estimations under MSE and LSE for parameter values l ¼ 3; x ¼ 2:5 and.m ¼ 4:5:
• It is noted that difference of estimates from assumed parameters reduces to zero with an increase in sample size under both the estimation approaches. • When compared to alternative estimation technique, MLE estimation is stronger in terms of bias and MSE for all specified parameter values, when sample size approaches infinity.
A new flexible extension to a lifetime distributions, properties, inference, and applications in engineering science
The general conclusion from the previous figures is that as the sample size rises, all bias and MSE graphs for all parameters will approach zero. This confirms the accuracy of these estimating methodologies as well as the numerical computations for the LE-IE distribution parameters.
5. Real data applications In this part, we use three real-world applications to demonstrate the empirical significance of the LE-IE distribution. These data are used to evaluate the LE-IE distribution’s fits to other competing models, including the inverse exponential (IEx), Exponentiated Exponential Family (EExF) [36], Generalized Half Logistic (GHL) [37], Exponentiated Exponential Logistic (EExL) [38], Exponentiated Weibull (EW) [39], and Exponential (Ex). The goodness-of-fit statistics for these distributions and other contending distributions are studied, and the MLEs of their parameter. To compare the fitted distributions, we use goodness-of-fit metrics such as AndersoneDarling (A*), Cramevon Mises (W*), -Log-likelihood (-LL), KolmogoroveSmirnov (K.S), and the P-Value (PV) of K.S test have been specified to compare models, In general, the lower these statistics are, the better the fit. Data set I: This is the first dataset from Ref. [40], and it contains the failure times of 20 mechanical components, which were also investigated by Ref. [41]. Data set II: The second dataset is obtained from Ref. [42], and it comprises the relief times of 20 patients who were taking an analgesic. Data set III: The times between successive failures of the air-conditioning unit on the Boeing 720 aircraft-4 are represented in this dataset (see [43]). Tables 5.1e5.3 provide the MLEs for each dataset, together with their standard errors (in parenthesis) and goodness-of-fit (GOF) measures. Tables 5.1e5.3 clearly show that the LE-IE model is the best of all the models examined. In data set I-III, some models, apart from the LE-IE model with larger PV, work pretty well, but we always look for the most appropriate model. Though the performance of the IEx model is nearly identical to that of the LE-IE model for dataset III, the LE-IE model has a slight advantage in some GOF measures such as A* , W * , and PV; this minor difference is much more significant when studying survival analysis in engineering or biosciences. However, we recommend analyzing these data using the LE-IE model. Figs. 5.10e5.12 depict empirically estimated CDF (ECDF), probabilityeprobability (PP), and fitted PDF (FPDF) plots for datasets I, II, and III, respectively, which support the findings of Tables 5.1e5.3. Furthermore, the datasets could have come from the LE-IE model, according to the study. Figs. 5.13e5.15 show the profiles of the log-likelihood function (PLLF) based on datasets I and II. Figs. 5.13e5.15 show that the estimators have an idiosyncratic solution in which the LL function profiles are unimodal curved.
83
84
Engineering Reliability and Risk Assessment
Table 5.1 MLEs and SEs are given in parentheses, LL and GOF statistics for the Dataset I (PV is given in parentheses). Distributions
MLEs/standard errors
W*
A*
KeS/PV
LLL
LE-IE ðl; x; mÞ
0.03527, 0.988099, 11.94391 [0.04611, 0.45716, 15.75017] 0.10050 [0.02247] 13.82430, 27.74993 [8.38010, 6.11076] 8.23904, 29.19975 [4.64849, 5.87642] 10.96276, 0.207398, 0.00810 [ 5.92904, 0.09467, 0.00312] 44.97143, 0.70626, 69.92967 [38.45480, 0.10542, 36.16434] 0.12156 [0.02717]
0.04735
0.29431
0.12979
38.68692
IEx ðaÞ EExF ða; bÞ GHL ð l; kÞ EExL ða; b; dÞ
EW ða; b; dÞ
Ex ðbÞ
[0.8976] 0.10167
0.77391
0.17579
1.25096
0.18003
1.27622
0.18640
1.31389
0.41642 [0.00194] 0.16029 [0.6831] 0.16063 [0.6805] 0.1733
23.24193 32.97643 32.64434 32.90148
[0.5852] 0.12866
0.95692
0.16573
34.93935
[0.642] 0.291196
1.902484
0.42374 [0.0015]
22.14859
According to the model parameters, datasets I and II are underdispersed (index of dispersion 1) as shown in Table 5.4. Furthermore, datasets I, II, and III have a leptokurtic and platykurtic form and are positive skewed.
6. Conclusion One of the most popular lifetime distributions is the exponential distribution in reliability analysis. The constant behavior of the hrf of this distribution, however, is a hindrance as lifetime analysis is concerned. Empirical hazard rate curves frequently take nonmonotonic shapes in real-world applications, such as a bathtub, an upside-down bathtub (unimodal), and others. As a result, there is a rational motivation to explore for extensions or modifications of the inverse exponential distribution that can provide additional flexibility in life span modeling. In this paper, a novel flexible three-parameter extension of an extreme distribution is given, which generalizes the inverse exponential distribution. Furthermore, it can also be used to describe over and underdispersed data. As a result, the new extension can be used to describe various types of data in a variety of disciplines. The MLE and LSE techniques have been used to estimate the model parameters. A simulation
A new flexible extension to a lifetime distributions, properties, inference, and applications in engineering science
Table 5.2 MLEs and SEs are given in parentheses, LL and GOF statistics for the dataset II (PV is given in parentheses). Distributions
MLEs/standard errors
W*
A*
KeS/PV
LLL
LE-IE ðl; x; mÞ
0.20505, 0.7807516, 38.2585499 [1.46793, 0.66491, 274.04126] 1.72474 [0.38566] 36.42812, 2.23106 [24.97785, 0.43422] 19.83630, 2.27363 [13.04629, 0.42273] 36.22527, 0.33586, 0.15076 [24.75523, 1.31071, 0.58734] 60.62907, 0.87010, 3.154597 [100.28760, 0.31871, 3.27778] 1.90003 [0.42487]
0.02152
0.13229
0.08277
15.68423
IEx ðaÞ EExF ða; bÞ GHL ð l; kÞ EExL ða; b; dÞ
EW ða; b; dÞ
Ex ðbÞ
[0.9992] 0.04898
0.28523
0.05435
0.31903
0.05542
0.32598
0.05444
0.31961
0.38725 [0.0050] 0.13453 [0.8621] 0.13442 [0.8628] 0.13470
32.66867 16.26066 16.32756 16.2615
[0.8611] 0.04926
0.28789
0.13117
16.05575
[0.8611] 0.105375
0.624352
0.43951 [0.0009]
32.83708
Table 5.3 MLEs and SEs are given in parentheses, LL and GOF statistics for the dataset III (PV is given in parentheses). Distributions
MLEs/standard errors
W*
A*
KeS/PV
LLL
LE-IE ðl; x; mÞ
139.69242, 8.47791, 0.41628 [80.34309, 7.31333, 0.27061] 40.22334 [10.38562] 0.92915, 0.00785 [0.31691, 0.00272] 0.69850, 0.00860 [0.22188, 0.00279] 0.92591, 0.01608, 2.05379 [0.31745, 0.03794, 4.84856] 37.20388, 0.27073, 2.72696 [62.32158, 0.10922, 9.67769] 121.2651 [31.31015]
0.04444
0.31622
0.14885
84.83179
IEx ðaÞ EExF ða; bÞ GHL ð l; kÞ EExL ða; b; dÞ
EW ða; b; dÞ
Ex ðbÞ
½0:8938 0.04569
0.31389
0.17299
1.00158
0.20804
1.18088
0.17302
1.00177
0.15282 [0.8749] 0.26636 [0.2376] 0.29412 [0.1492] 0.26605
84.28381 86.94371 87.96417 86.94664
[0.2388] 0.07550
0.482257
0.16522
84.64164
[0.8075] 0.172441
0.998935
0.27656 [0.2014]
86.96988
85
86
Engineering Reliability and Risk Assessment
Figure 5.10 The ECDF “left panel,” PP “middle panel,” FPDF “right panel” plots for data set I.
Figure 5.11 The ECDF left panel, PP middle panel, FPDF right panel plots for data set II.
Figure 5.12 The ECDF left panel, PP middle panel, FPDF right panel plots for data set III.
Figure 5.13 The PLLF for data set I.
A new flexible extension to a lifetime distributions, properties, inference, and applications in engineering science
Figure 5.14 The PLLF for data set II.
Figure 5.15 The PLLF for data set III.
Table 5.4 For data sets I, II, and III, some computational statistics. Data set
Mean
Median
Variance
Index of dispersion
Range
Skewness
Kurtosis
I II III
0.12 1.9 121.27
0.1 1.7 57
0.0081 0.49 23,799.23
0.0675 0.2579 196.25
0.42 3 490
3.32 31.59 1.37
10.72 2.35 0.33
has been executed with various sample sizes, and it was revealed that the MLE technique worked well for estimating the parameters for the datasets under consideration. Because of the most situations, the MLE technique performed well in terms of bias and MSE according to the findings of experiment. Ultimately, three data applications that demonstrate the novel extension’s versatility and superiority over other models have been examined. Other methods, such as maximum product spacing estimator (MPS), percentile estimator (PE), Cramerevon-Mises estimator (CME), AndersoneDarling estimator (ADE), and L-moment (LME) estimator and Bayesian for estimating the parameters of the LE-IE distribution and the under different censoring schemes, might be applied in future study.
87
88
Engineering Reliability and Risk Assessment
References [1] H.S. Klakattawi, The Weibull-gamma distribution: properties and applications, Entropy 21 (5) (2019) 438. [2] M.M. Mansour, M. Ibrahim, K. Aidi, N. Shafique Butt, M.M. Ali, H.M. Yousof, M.S. Hamed, A new log-logistic lifetime model with mathematical properties, copula, modified goodness-of-fit test for validation and real data modeling, Mathematics 8 (9) (2020) 1508. [3] A.S. Yadav, H. Goual, R.M. Alotaibi, M.M. Ali, H.M. Yousof, Validation of the Topp-Leone-Lomax model via a modified Nikulin-Rao-Robson goodness-of-fit test with different methods of estimation, Symmetry 12 (1) (2020) 57. [4] T.N. Sindhu, A. Atangana, Reliability analysis incorporating exponentiated inverse Weibull distribution and inverse power law, Quality and Reliability Engineering International 37 (6) (2021) 2399e2422. [5] T.N. Sindhu, A. Shafiq, Q.M. Al-Mdallal, On the analysis of number of deaths due to Covid- 19 outbreak data using a new class of distributions, Results in Physics 21 (2021) 103747. [6] T.N. Sindhu, A. Shafiq, Q.M. Al-Mdallal, Exponentiated transformation of Gumbel Type-II distribution for modeling COVID-19 data, Alexandria Engineering Journal 60 (1) (2021) 671e689. [7] R.D. Gupta, D. Kundu, Generalized exponential distribution: different method of estimations, Journal of Statistical Computation and Simulation 69 (4) (2001) 315e337. [8] M. Rasekhi, M. Alizadeh, E. Altun, G.G. Hamedani, A.Z. Afify, M. Ahmad, The modified exponential distribution with applications, Pakistan Journal of Statistics 33 (5) (2017). [9] M. Mansoor, M.H. Tahir, G.M. Cordeiro, S.B. Provost, A. Alzaatreh, The Marshall-Olkin logisticexponential distribution, Communications in Statistics - Theory and Methods 48 (2) (2019) 220e234. [10] M.A. Aldahlan, A.Z. Afify, A new three-parameter exponential distribution with applications in reliability and engineering, The Journal of Nonlinear Science and Applications 13 (2020) 258e269. [11] A.Z. Afify, A.M. Gemeay, N.A. Ibrahim, The heavy-tailed exponential distribution: risk measures, estimation, and application to actuarial data, Mathematics 8 (8) (2020) 1276. [12] A.A. Al-Babtain, M.K. Shakhatreh, M. Nassar, A.Z. Afify, A new modified Kies family: properties, estimation under complete and type-II censored samples, and engineering applications, Mathematics 8 (8) (2020) 1345. [13] M. Alizadeh, A.Z. Afify, M.S. Eliwa, S. Ali, The odd log-logistic Lindley-G family of distributions: properties, Bayesian and non-Bayesian estimation with applications, Computational Statistics 35 (1) (2020) 281e308. [14] A. Mahdavi, D. Kundu, A new method for generating distributions with an application to exponential distribution, Communications in Statistics - Theory and Methods 46 (13) (2017) 6543e6557. [15] W. Barreto-Souza, A.H. Santos, G.M. Cordeiro, The beta generalized exponential distribution, Journal of Statistical Computation and Simulation 80 (2) (2010) 159e172. [16] Y. Lan, L.M. Leemis, The logistic–exponential survival distribution, Naval Research Logistics 55 (3) (2008) 252e264. [17] A. Alzaatreh, C. Lee, F. Famoye, A new method for generating families of continuous distributions, Metron 71 (1) (2013) 63e79. [18] R.J. Hyndman, Y. Fan, Sample quantiles in statistical packages, The American Statistician 50 (4) (1996) 361e365. [19] A.L. Bowley, Elements of Statistics, fourth ed., Charles Scribner). Bowley4Elements of Statistics1920, New York, 1920. [20] J.J.A. Moors, A quantile alternative for kurtosis, Journal of the Royal Statistical Society: Series D (The Statistician) 37 (1) (1988) 25e32. [21] R.Q. Grafton, Cumulative density function (CDF), in: A Dictionary of Climate Change and the Environment, Edward Elgar Publishing Limited, 2012. [22] D. Kipping, Inverse Beta: Inverse Cumulative Density Function (CDF) of a Beta Distribution. Astrophysics Source Code Library, Ascl-1403, 2014.
A new flexible extension to a lifetime distributions, properties, inference, and applications in engineering science
[23] H. Wang, D. Zhou, F. Blaabjerg, A reliability-oriented design method for power electronic converters, in: 2013 Twenty-Eighth Annual IEEE Applied Power Electronics Conference and Exposition (APEC), IEEE, March, 2013, pp. 2921e2928. [24] M.R. Lyu, Handbook of Software Reliability Engineering, IEEE computer society press, CA, 1996 vol. 222. [25] K.R. Hess, V.A. Levin, Getting more out of survival data by using the hazard function, Clinical Cancer Research 20 (6) (2014) 1404e1409. [26] A.K. Sheikh, M. Ahmad, Z. Ali, Some remarks on the hazard functions of the inverted distributions, Reliability Engineering 19 (4) (1987) 255e261. [27] A. Gasull, F. Utzet, Approximating mills ratio, Journal of Mathematical Analysis and Applications 420 (2) (2014) 1832e1853. [28] O. Kharazmi, S. Jahangard, A new family of lifetime distributions in terms of cumulative hazard rate function, Communications Faculty of Sciences University of Ankara Series A1 Mathematics and Statistics 69 (1) (2020) 1e22. [29] Y.K. Lee, D.S. Hwang, A study on the techniques of estimating the probability of failure, Journal of the chungcheong mathematical society 21 (4) (2008) 573e583. [30] A. Goel, A New Approach to Electronic Systems Reliability Assessment, Rensselaer Polytechnic Institute, 2007. [31] R.F. Drenick, The failure law of complex equipment, Journal of the Society for Industrial and Applied Mathematics 8 (4) (1960) 680e690. [32] P. Ramachandran, Limitations of the MTTF Metric for Architecture-Level Lifetime Reliability Analysis (Doctoral Dissertation, University of Illinois at Urbana-Champaign), 2007. [33] I.J. Myung, Tutorial on maximum likelihood estimation, Journal of Mathematical Psychology 47 (1) (2003) 90e100. [34] S.R. Eliason, Maximum Likelihood Estimation: Logic and Practice (No. 96), Sage, 1993. [35] J.J. Swain, S. Venkatraman, J.R. Wilson, Least-squares estimation of distribution functions in johnson’s translation system, Journal of Statistical Simulation 29 (1988) 271e297. [36] R.D. Gupta, D. Kundu, Exponentiated exponential family: an alternative to gamma and Weibull distributions, Biometrical Journal: Journal of Mathematical Methods in Biosciences 43 (1) (2001) 117e130. [37] J.I. Seo, S.B. Kang, Notes on the exponentiated half logistic distribution, Applied Mathematical Modelling 39 (21) (2015) 6491e6500. [38] I. Ghosh, A. Alzaatreh, A new class of generalized logistic distribution, Communications in Statistics Theory and Methods 47 (9) (2018) 2043e2055. [39] M. Pal, M.M. Ali, J. Woo, Exponentiated weibull distribution, Statistica 66 (2) (2006) 139e147. [40] D.P. Murthy, M. Xie, R. Jiang, Weibull Models, John Wiley & Sons, 2004 vol. 505. [41] R.B. Silva, G.M. Cordeiro, The Burr XII power series distributions: a new compounding family, Brazilian Journal of Probability and Statistics 29 (3) (2015) 565e589. [42] A.J. Gross, V. Clark, Survival Distributions: Reliability Applications in the Biomedical Sciences, John Wiley & Sons, 1975. [43] J.F. Lawless, Statistical Models and Methods for Lifetime Data, John Wiley & Sons, 2011 vol. 362.
89
CHAPTER 6
Markov and semi-Markov models in system reliability Ameneh Farahani1, Ahmad Shoja2 and Hamid Tohidi3 1
Department of Industrial Engineering, Ooj Institute of Higher Education, Qazvin, Iran; 2Department of Mathematics and Statistics, Roudehen Branch, Islamic Azad University, Roudehen, Iran; 3Department of Industrial Engineering, South Tehran Branch, Islamic Azad University, Tehran, Iran
1. The reliability in systems Each system has goals in which the components interact to achieve that goal. Many systems functionally include multiple, interchangeable units. These units must be active simultaneously to meet the purpose of the system. When a system is used to achieve goals, it must be reliable. Reliability is used to measure the correct performance of systems quantitatively. System reliability equals the probability of operating for a specified period under limited conditions. Reliability measures the ability of a system to provide the desired level of service against deterioration and other shocks that affect its performance. Systems can be machines, humanemachine, or humans. However, reliability is usually used more for machine systems, products, engineering products, or man-made in the literature. In the past, reliability has been discussed in industries such as the military, communications, oil, gas, and aerospace. However, in recent decades, civilian sectors such as the pharmaceutical, health, transportation, medical, aviation, and home electronics industries have also focused. Therefore, in all industrial systems, one of the essential characteristics of a system is its reliability. In some systems, reliability affects security performance, so reliability is vital. Measuring the overall reliability of a system depends on the performance goals and expectations. But in general, a reliable system is a system that has continuous performance on the one hand. On the other hand, this performance is done correctly and according to the designed goals. A system with a high degree of reliability operates as it should every time used. Reliability has two essential dimensions time and working conditions. A system must perform its desired function during several years of its useful life, despite pressures, working conditions, and other environmental factors. As is clear from the definition above, reliability theory is related to equipment life and failure time. Therefore, if the life span of equipment is considered a random variable T,
Engineering Reliability and Risk Assessment ISBN 978-0-323-91943-2, https://doi.org/10.1016/B978-0-323-91943-2.00010-1
© 2023 Elsevier Inc. All rights reserved.
91
92
Engineering Reliability and Risk Assessment
according to probability theory, the reliability denoted by R(t) can be calculated according to the following formula: ZN RðtÞ ¼ pfT > tg ¼
f ðxÞdðxÞ
(6.1)
t
According to the above formula, the characteristic f(x) is a failure density function, and t is also a characteristic for displaying time. This formula calculates the probability that the equipment life span is longer than the specified value. The reliability of a system, like any other computational formula, has parameters and specifications that, by measuring each of the characteristics, can be more likely to be closer to the reliability of a set or part. Assuming that equipment can only have one of two perfect or failure states, or so-called binary, the probability of being healthy can be indicated by R, which is defined as follows: 0R tjc1 ; c2 ; .; Þ
(6.3)
The above formulae c1, c2, . are the specified operating conditions for equipment or system, usually ignored in the reliability analysis. Therefore, Eq. (6.3) defines reliability [190]. RðtÞ ¼ PðT > tÞ
(6.4)
The reliability of equipment R is equal to the probability that this equipment will have the required characteristics in a certain period and conditions. Otherwise, the nonreliability F equals the probability that this equipment will not achieve the required features in a given period under certain operating conditions. Nonreliability and reliability are both time-dependent. At zero time, the reliability of the equipment that starts working is 1; after some time, this value reaches 0.5, and then when it completely fails, it reaches 0. In other words, the nonreliability starts from 0 and moves upward, and when the system breaks down, the nonreliability gets 1. In general, the sum of reliability and nonreliability will be equal to 1 at any given time. In other words, reliability can be called the probability of success in a given period, which is the inverse of the probability of failure of equipment or process in that period.
Markov and semi-Markov models in system reliability
RðtÞ þ FðtÞ ¼ 1
(6.5)
One of the primary uses of statistics to show the behavior of random phenomena lies in Reliability Theory. Since this theory is mainly used in engineering and determining the life of equipment, it is sometimes called Reliability Engineering. According to probability theory, the degree of reliability is shown as follows: Reliability ¼ 1eProbability of Failure Failure Rate and Survival Analysis are parts of reliability theory that deal with components’ life span and failure rate. The failure rate indicates the frequency of failure of a piece. It is clear that the failure rate is not constant and changes over time. It may be based on time, as an increasing or decreasing function and/or bathtub curve. Therefore, different probability functions may be used to analyze survival and determine the probability lifetime distribution of each component or system. Of course, it should be noted that since the life span is a positive number, the random variable and its distribution must contain positive values (plus zero). There are several methods for modeling reliability: methods based on statistics and statistical information resulting from performance and determining the number of failures and the study of failure physics. In the black-box method, the system, in general, but in the transparent box method, the system’s structure and components are examined. In calculating the reliability of multicomponent systems, the system is decomposed into components. The system reliability is expressed in terms of the reliability of its components. To calculate the reliability of each component based on available statistical data, a model for failure rate is selected, and its parameters are estimated based on available data or estimated by simulation or engineering knowledge or experience of experts. In engineering methods, determining the modeling and mechanism of failure is very important. The two main functions in investigating failure behavior are failure density functions (f) and hazard rate (z). The failure density function indicates the mean (or overall velocity) speed of failures. But the hazard rate can be considered as the instantaneous speed of failures. It is usually modeled with one of the fixed, linear, or polynomial models. The constant hazard rate is sufficient in many applications and commonly indicates stochastic failure behavior (mid-life failure). In this model, the reliability and density distribution of the occurrence of failure will be exponential. But this model is not suitable for the aging system period (in which we face an increasing hazard rate). In this case, the linear hazard is used, and the density distribution of the failure will be a Rayleigh distribution. The third model (polynomial hazard) is the more general case of the above two cases, which is more accurate. In this case, the failure density will have a Weibull distribution, which is an important distribution in terms of reliability. The first two models are, in fact, special modes of this model. The hazard rate is obtained from the following relation: zðtÞ ¼
f ðtÞ f ðtÞ ¼ 1 FðtÞ RðtÞ
(6.6)
93
94
Engineering Reliability and Risk Assessment
The hazard function is defined to analyze the life distribution with historical failure data and online status monitoring data. The hazard function is based on the failure rate, which is a function of the age and condition of the equipment. If past data on a system failure are available, Gaussian mixture models and Weibull mixture models can be used to describe time-series data. The failure of a component affects the system in two ways: the reliability of the failed component is lost, and the reconfiguration of the system components is required. In multicomponent systems, each system consists of several independent components with a lifetime distribution, and the system’s behavior depends on the behavior of the components. Component failure can cause a subsystem or the entire system to fail. Systems may have functional components that can be repaired or replaced. After working for a while, they are in various health situations due to different working conditions. In these systems, the system balance is determined based on the performance levels of the components. Usually, in reliable models, there is a stochastic dependence between multiple system failure modes. In multicomponent systems, the probability distribution of the performance of each component must be estimated. When a system is operating in unreliable conditions, several components of that system may be involved. Different types of systems have been proposed in the discussion of reliability, which can include one component or more. In multistate systems, the components of each subsystem can have different performance rates with specific probabilities. A multicomponent system can be a system with k independent statistical components with the same distribution. Each component is exposed to common stochastic stress, and the system fails if the level of stochastic stress created in at least a few of its components from a defined level is higher. The reliability of a system depends on all its components. Different life distributions can be defined for components that lead to different failure rates. Systems can usually be thought of as series, parallel, or combined structures. Series systems contain components that are essential to the operation of the system, and failure of any component of the system causes failure of the entire system. For series systems, the total reliability depends on each component’s reliability. It can be concluded that in series mode, because the reliability of one component is always smaller than one, with increasing components, the reliability of the whole system decreases. In parallel systems, each component can operate separately, and for system failure, both components must fail. The conclusion is that as the number n increases, the reliability of the system increases. Of course, parallel networks are not a good idea to increase reliability. Because the initial increase in reliability is negligible after a while, the system includes a combination of components in series and in parallel in hybrid systems. In addition to these systems, selectable systems can be used for emergency stop systems. With the performance of at least one system component, the entire system starts operating and is used to reduce
Markov and semi-Markov models in system reliability
the likelihood of system failure. In standby systems, the systems have parallel components and are operating as continuous systems and are maintained in standby mode and operate if the normal operation of the system is disrupted and fails. Standby systems include warm and cold standby systems. In warm standby systems, after a certain period of time, the parallel warm standby component replaces it as soon as a component breaks down. A component must first be converted to a warm standby component in cold standby systems and then activated. Systems may contain one or more components, and if a system consists of several components, the components can be homogeneous or nonhomogeneous. A system can contain repairable or nonrepairable components or a mixture of repairable and nonrepairable components. Systems can be first subdivided into subsystems. The reliability of each subsystem can be obtained separately and then combined according to the system status for the reliability of the whole system. In sequential systems, its components work independently with probabilities that may not be the same. Systems are designed to perform different tasks in different environmental conditions. Systems can operate at different levels of efficiency; in other words, the performance rate of systems changes over time and under environmental conditions. Any system with a limited number of performance rates is called a Multistate System (MSS). The simplest systems can have two operating modes: a perfect functional state and a complete failure state. Systems usually act as multistate systems, consisting of a component or a set of components. The linear (circular) consecutive-k-out-of-n: F systems consist of n components arranged in a line (a circular). These systems fail if at least k consecutive components fail. The m-consecutive- k-out-of-n: F systems contain n components in a line that fail when there is at least m nonoverlapping run of k consecutive failed runs. A serieseparallel system can consist of two identical units that are connected in series. An additional component can be placed parallel to the main component to increase the reliability of each unit, so each unit is considered as two fault-resistant components. If at least one of the series equipment fails, the whole system will fail. The redundancy technique is often used to improve the reliability of serieseparallel (hybrid) systems in the system design phase. The use of additional components depends on some limitations in the system. By placing similar subsections in a system in parallel, the overall performance of the system in the event of an error can be guaranteed. Increased reliability leads to longer life and reduced maintenance costs. Mixing components, redundancy levels, and redundancy strategies are used to maximize reliability. In the problem of redundancy allocation, the goal is to identify the optimal design to achieve the best indicators in the system under certain constraints. The principle of redundancy and diversity can be used to prevent stochastic failures to deploy highly reliable equipment. In this way, the system is more reliable than the
95
96
Engineering Reliability and Risk Assessment
components. Warm standby is one of the techniques used to reduce the risk of failure when the system is active. Switching between the active and standby components is a way to balance component degradation and increase system life. Therefore, continuous improvement in system performance is very effective. But system failures are inevitable. Every subsystem must work in reliable conditions for a system to work without failure so that the system is reliably available over a predefined period. Uncertainties in systems are a significant challenge for systems reliability, especially since reliability must be calculated dynamically. Reliability indicators in systems usually include the reliability function, the first failure time distribution, the mean time before failures (MTBF), mean time to failure (MTTF), the failure and repair frequency functions, the point availability, the interval availability, and the system availability. Therefore, according to these indicators, the system failure process is very important to determine their reliability, and perhaps it can be said that if the system failure process can be determined, its reliability can be predicted.
2. Failure process of systems In assessing reliability, modeling the degradation process is an important issue. The accuracy of the indicators obtained from the models in the field of reliability depends on the quality and accuracy of the failure modeling considered. Systems are prone to two failures: deterioration due to intrinsic failure and catastrophic failure due to shock. A system is usually exposed to internal failures and external shocks that may be stochastic. The process of deterioration and the process of shock can be interrelated. Shocks can cause a sudden increase in destruction, which increases the deterioration process. The severity of the shock depends on the number of shocks received (the distance between the shocks). A system failure process is usually defined as a random process and can be defined as discrete or continuous. Component failure rates can be fixed or variable. Shock may cause damage to several units simultaneously. Shocks can be defined in several ways: whether the failure occurs when the accumulated damage exceeds a certain level, whether the system fails due to local damage due to successive shocks, or both happened simultaneously. Shocks can be independent of each other, or the occurrence of shocks can be interrelated. The system fails when the time interval between two consecutive shocks is less than a predetermined level. As the degree of deterioration in shock environments increases, the failure process accelerates. Shocks may be of the same severity, but they are more severe to the system due to the deterioration of the system over time. Shock and wear processes are usually dependent on the external environment, and shocks are generated according to the Poisson process, and the rate of entry and the magnitude of damage in a random environment should be considered. The shock model can be
Markov and semi-Markov models in system reliability
considered in multicomponent systems in such a way that the shock potentially affects each of the components, and the system breaks down when the cumulative damage reaches a certain level. The pure jump Levy process can model the intrinsic degradation of any component. In complex systems, failure of a component may cause a momentary and transient shock to the system. A stochastic increase can model the shock effect in the level of deterioration of each component. Reliability analysis deals with random failures, uncertainties associated with these failures, and failure modeling. The most common assumption in reliability is to consider failures independently and with the same distribution. But these two hypotheses are unrealistic because the times between failures are related and not evenly distributed. Finding the source of the failure for each component can be used to assess the state of the system and obtain the failure probability function. Failure of hybrid system components is usually achieved using the Failure Mode and Effect Analysis (FMEA) method. The probability of small failures in reliability problems can be obtained from an integral on an uncertain parameter space with large dimensions. Usually, the deterioration of the system itself is not reported and should be identified through inspection. The reliability of the effects of inspection errors can also be considered in the modeling. Failure of one component in the system can cause extensive damage to the system and cause a total system failure. Sometimes the failure of a component called the trigger component makes other components called dependent components inaccessible, and if the failure of any of the dependent components occurs before the trigger failure, the effect of failure propagation occurs, and the whole system will be failed. Different models have been used to examine the failure process and system life in terms of reliability in the literature. Markov and semi-Markov models have been widely used in modeling. In Markov models, the life span of equipment is considered exponential, and since the life span of some equipment is nonexponential, semi-Markov processes are used to consider the life span of equipment as nonexponential.
3. Markov and semi-Markov models in systems reliability Reliability modeling is based on the time-dependent Markov approach. The timedependent behavior of the system in the intermittent state includes the healthy, defective, and failed states, which are usually described by the Markov process. The degradation process in systems is usually defined in terms of the Markov process and increases over time. Markov and semi-Markov models have been widely used in various fields for reliability. In Markov models, the life span of equipment is defined exponentially. Unlike the Markov-based approach, which is limited to an exponential distribution, semi-
97
98
Engineering Reliability and Risk Assessment
Markov models are used in systems that are subjected to certain changes of modes at predetermined times, and the duration of their stay in each mode follows from one statistical distribution. Therefore, to overcome the nonexponential consideration of equipment life, semi-Markov processes can be used assuming stochastic independence of system components. In semi-Markov models, transfer rates are used instead of transfer probabilities. Transfer rates depend on the processing time and the residence time in each case. Markov models can easily describe the performance of a system as diagrams that show the state of system failure. Multidimensional Markov chains can be used to model the stochastic behavior of systems. In repairable systems, stiff Markov chains are used due to the sharp difference between failure and repair rates. If system components have a constant failure and repair rate, system reliability can be modeled with a continuous-time homogeneous Markov process. Multistate homogeneous Markov (MHMM) models are optimal for approximating any failure time distribution. However, not all distributions that are linear combinations of exponential expressions can be represented exactly with multistate Markov models. Fatigue-sensitive components can be modeled based on McGill Markov and closurelognormal stochastic processes. The Markov renewal process technique can model a system that includes the main standby redundancy and subsystems. Markov reward models and Markov models are used to analyzing system reliability. A repairable system in a changing environment can be considered according to the intermittent renewal process, and the reliability of this system is obtained using the Markov reward theory. In repairable systems, the degradation process can be described using continuous-time Markov chain or semi-Markov models that divide the finite state space into a set of up and down states. Markov renewal process and nonhomogeneous Markov processes are commonly used for system reliability. Partially observable Markov decision processes (POMDPs) are also used for sequential decision-making in the system to analyze behavior and reliability. Markov models are used for both individual components and systems with multiple dependent components. Markov chains are widely used to model nonfunctional and performance criterion analysis in the systems. Markov models are suitable for the exponential frequency distribution of the reliability function. Exponential distribution considers the rate of risk and repair as constant. Complex systems in which degradation phenomena are associated with uncertainty are usually modeled using Markov decision processes. Markov renewal process modeling is used to predict system failure. When the degradation process is observable by monitoring the situation, this process can be modeled using the hidden Markov model. In this system, the healthy, defective, and failure modes are the system modes where only the failure mode is visible. The whole system can be modeled using a multidimensional continuous-time Markov chain to evaluate the reliability dynamics.
Markov and semi-Markov models in system reliability
Markov renewal theory is considered for redundant systems in which components are prone to failure and, in case of failure, can be replaced with a new system. Dependencies are important to system reliability and affect the distribution of system failure. There are some classic types of random processes with different dependency relationships, including Markov processes, renewal processes, and Markov renewal processes. Modeling techniques commonly used in reliability issues include the Reliability Block Diagram (RBD), Markov BirtheDeath Probabilistic approach, Petri Networks, Bayesian Networks, Markov Chains, Fault Tree Analysis (FTA), Clustering techniques, Flow Network, Failure Mode and Effects Analysis (FMEA), and Event Tree Analysis (ETA). In the following, a literature review on reliability has been done by focusing on Markov and semi-Markov models. Bobbio et al. [1] investigates system reliability by modeling each component with a multistate homogeneous Markov Model (MHMM). Singh et al. [2] proposes a new Markov approach for calculating production system reliability indices. Johnson [3] presents Markov models for a public availability model and a public reliability model. Rao and Balasubramanya [4] performs the Markov technique to predict the reliability, availability, and maintenance of a typical repairable Dual-VHF Omni Range (Dual-VOR). Lindqvist [5] describes Markov models for single components and systems whose components are possibly dependent. Constantinescu [6] presents a standby system to activate industrial applications that require programmable fault-tolerant control instead of relay logic. System reliability analysis is performed with decomposition/aggregation technique and with two homogeneous Markov models with the discrete and continuous state. Pullum et al. [7] has proposed a new Markov chain method for calculating the reliability of multipath switching networks. Constantinescu [8] develops new closed-form solutions for the reliability, performance, and computational availability of homogeneous degrading processor arrays. Discrete and continuous-time homogeneous Markov models describe processes. Cao [9] studied a humanemachine system that operates under a changing environment under a two-state Markov process, in which reliability is achieved using Markov renewal theory. Manzoul and Suliman [10] has proposed a new approach to reliability analysis based on neural networks and the discrete-time Markov model in a nonredundant digital system, the simplex system with the repair. Gil et al. [11] studied Markov models for studying the safety, reliability, and availability of a watchdog processor. Bulleit [12] describes the development and use of a Markov model to estimate the lifetime reliability of structural systems. Shao and Lamberson [13] provides a complete Markov model for analyzing the reliability and availability of k-outof-n: G systems with the Built-In-Test technique (BIT). Lassen [14] describes a model of probabilistic cumulative fatigue damage based on the simple Markov chain approach to evaluate the reliability of welded joints in marine steel structures.
99
100
Engineering Reliability and Risk Assessment
Limnios and Cocozza-Thivent [15] studied two types of the stochastic reservoir, including uncontrolled reservoirs and other reservoirs controlled by fully controlled computers, to assess their reliability criteria. The stochastic behavior of reservoirs with two Markov chains is modeled in three and five dimensions, respectively. Senegacnik and Tuma [16] presents a Markov model for power plant operations and shows that this model is a successful tool for determining time-dependent reliability. Tan et al. [17] describes a computerized reliability model based on evolving Markov processes. Guo and Cao [18] examines the reliability of a multistate repairable system under a changing environment with a two-state Markov process. Singh and Sharma [19] studied analytical techniques for assessing reliability, availability, and a combination of performance and reliability metrics. They also calculated the state probabilities of the Markov model behavior. Stavrianidis [20] studied the performance evaluation of Programmable Electronic Systems (PES) in safety applications and used Markov modeling techniques to develop a reliability model. Yamada et al. [21] studied a software reliability growth model considering imperfect debugging. This model is described by a semi-Markov process. Sculli and Choy [22] tested the reliability of pump sets that supply water to boilers in large power plants. Failure and repair rates are estimated from historical data, and the system is modeled as a Markov process to calculate its reliability. El-Damcese [23] presents a generalized Markov model to test the reliability of multiple systems, consisting of n-identical/nonrepairable components. Csenki [24] studied the semibrand reliability models of repairable systems. Moustafa [25] studied Markov models for transient analysis of reliability with and without repair for K-out-of-N: G systems subject to two failure modes. Farhangdoost and Provan [26] has studied reliability analyses to predict the life of fatigue-sensitive components. McGille Markov models and closure-lognormal stochastic processes have been presented. Yoo et al. [27] proposed a Markov process approach for analytically extracting the mission reliability of an automated fault tolerance control system with distributed Built-in-Test. Chandra and Kumar [28] introduced a Markov reliability model for a transputer-based fail-safe and fault-tolerant node for use in a network of critical distributed safety rail signaling systems. Mustafa [29] studied a Markov model for analyzing the reliability of K-out-of-N: G systems that are prone to dependent failures with incomplete coverage and concluded that failure rate dependencies significantly reduce system reliability. Ouhbi and Limnios [30] estimated the reliability and availability of a turbo generator rotor. The rotor is modeled by a semi-Markov process used to estimate the reliability and availability of the rotor. Mustafa [31] studied Markov models for transient analysis of reliability with and without repair for K-out-of-N: G systems subject to M failure modes. Chang et al. [32] considered a system (n, f, k) consisting of n components arranged in a line or a cycle. This system fails if at least f component is damaged or at least k
Markov and semi-Markov models in system reliability
component; there are consecutive failures. They obtain system reliability formulas for linear and circular systems with the reliability of various components using the Markov chain method. Beshir et al. [33] presents a new framework for assessing probabilistic reliability. Seven operating modes are considered for power system conditions, and a Markov model defines these modes. Lisnianski and Jeager [34] proposes a redundant system in which the task of the whole system is a sequence of n phases, and the whole work must be performed in a limited time. For the reliability of the system, a suitable model is presented, and a semi-process Markov has been used as a mathematical technique. Whittaker et al. [35] used the Markov chain model to predict the reliability of multibuild software. Knegtering and Brombacher [36] offers a method that greatly reduces the computational effort required to obtain quantitative safety and reliability assessments to determine safety integrity levels for applications in the process industry. The method integrates all the advantages of Markov modeling with the practical advantages of Reliability Block Diagrams. Yan and Wang [37] analyzed the reliability of the train’s automatic protection system to achieve high reliability and safety. The failure rate is first divided into several parts, and the reliability and safety are analyzed using the Markov model. Tokuno and Yamada [38] provides a software reliability model that assumes there are two types of software failure. They use a Markov process to formulate this model and derive several quantitative criteria for evaluating its software reliability. Becker et al. [39] studied a nonhomogeneous semi-Markov process for modeling the reliability properties of components or small systems with complex experiments. Lassen and S0rensen [40] studied stochastic models to analyze the fracture reliability of welded steel plate joints. A Markov chain is defined that shows discrete damage modes related to the selected crack depth at the joints. Zupei and Xiangrui [41] studied the application of the Goal-Oriented (GO) method to a repairable system described by the Markov model. Ouhbi and Limnios [42] defined reliability and availability estimators in semi-Markov processes and showed uniformly highly consistent. Grabski [43] presents the reliability function properties of an object with a failure rate modeled by a semi-Markov process defined in the maximum countable state space. Bouissou and Bon [44] proposed a modeling formalism called Boolean logic Driven Markov Processes (BDMP). This formalism has two advantages: it is possible to define complex dynamic models and easy to build as fault trees compared to the conventional models used in reliability evaluation. Goda [45] proposes a strong Markov processbased reliability model for unidirectional composites with fibers in a hexagonal array. Wu and Patton [46] suggests using Markov models to analyze the reliability of faulttolerant control systems. Tokuno and Yamada [47] discusses a software reliability model for an incomplete debugging environment in which detected errors are not always
101
102
Engineering Reliability and Risk Assessment
corrected and removed from the system. Given the cumulative number of corrected errors, the uncertainty of the debugging activities is related to the increasing complexity of the corrected errors. A Markov process describes the random behavior of the error correction phenomenon with incomplete debugging. Several quantitative criteria have been derived from this model to evaluate software reliability. The level of reliability of power systems varies from time to time due to weather conditions, power demand, and accidental errors. It is essential to obtain an estimate of the reliability of the system under all environmental and operational conditions [48]. Tanrioven et al. [48] used fuzzy logic in the Markov model to describe both transfer rates and temperature-based seasonal variations. El-Gohary [49] presents maximum likelihood and Bayesian estimates of parameters in a three-state semi-Markov reliability model. Wang [50] introduces two reliability criteria, which are Markov-chain Distributed Program Reliability (MDPR) and Markov-chain Distributed System reliability (MDSR) for reliability modeling of Distributed Computing System (DCS). A discrete-time Markov chain with an absorption mode is constructed for this problem. Dobias et al. [51] provides a reliability assessment method for TTP/C-based distributed systems. This method is based on the Markov reliability model. Sadek and Limnios [52] studied nonparametric statistical inference problems for the reliability/survival, availability, and failure rate of continuous-time Markov processes. They assume that state space is limited. Azaron et al. [53], using the shortest path analysis in stochastic networks, proposed a new approach for the reliability function of timedependent redundancy systems. The shortest path distribution of this newly constructed network is obtained using continuous-time Markov processe. Bai [54] provides a Bayesian Markov network developed to model software reliability prediction with operational specifications. Wang et al. [55] extended the white box to an architectural approach, and a state machine is made from a discrete-time Markov model and is used to calculate the reliability of the software. Li and Zhao [56] used stochastic modeling to investigate the reliability assessment of Fault-Tolerant Control Systems (FTCS). System errors are described by a Markov chain, while the Fault Detection and Isolation (FDI) and system operations for reliability assessment are described by two semi-Markov chains. Ajah et al. [57] presents a hierarchical modeling approach that overcomes the exponential explosion in the size of the Markov model by increasing the number of components for modeling deterioration and system repair and in screening and accurate analysis of the reliability and availability of components of such infrastructure systems. Li and Zhao [56] used a stochastic modeling method to investigate the problem of assessing the reliability of FTCS. System errors are described by a Markov chain, while the FDI and system operation for reliability assessment are described by two semi-
Markov and semi-Markov models in system reliability
Markov chains. Kucera et al. [58] calculated the reliability of a turbine protection system used in Siemens Industrial Turbomachinery products. The Markov model of the system is introduced, and the analysis of the Markov model is presented. Tanrioven and Alam [59] provides a method for modeling and calculating Proton Exchange Membrane Fuel Cell Power Plants (PEM FCPP) reliability. This method involves the development of a state-space method for calculating the reliability of the standalone PEM FCPP using the Markov model. Platis [60] studied a general form of the performance criterion defined for the reliability of fault-tolerant systems and present various formulas using a homogeneous Markov chain and a cyclic nonhomogeneous Markov chain and their asymptotic expression. Csenki [61] considers Markov reliability models whose finite state space is divided into a set of up and down states. Fei and Hong-yue [62] described a set of Markov stochastic processes with finite state space to illustrate the FDI decision in the active FTCS. In addition, system stability and reliability are discussed. Dominguez-Garcia et al. [63] proposed an integrated method for reliability and dynamic performance analysis of fault-tolerant systems. This method uses a behavioral model of system dynamics. Markov chains are used to model the random process associated with the different configurations that a system can adapt in the event of a failure. Li et al. [64] studied a reliability monitoring scheme for active FTCSs. A semi-Markov model is developed to assess reliability based on the safety behavior of each regime model. El-Nashar [65] studied the combination of equipment reliability in the optimal design of cogeneration systems for power and desalination. The Markov process performs design optimization using hermoeconomic theory and equipment reliability using the state-space method. Chiquet and Limnios [66] studied the time evolution of an increasingly stochastic process by a first-order stochastic differential system. They consider a Markov renewal process (MRP) associated with the piecewise deterministic Markov process (PDMP) and its Markov renewal equation (MRE), which is solved in order to obtain a closed solution of the PDMP transfer function. It is then used in the context of survival analysis to evaluate the reliability function of a given system. Sisworahardjo et al. [67] presents a method for modeling and calculating the reliability and availability of low portable direct methanol fuel cells (DMFCs). A state-space method is proposed to calculate system availability using the Markov model (MM). Guo and Yang [68] presents a new technique for automating Markov models to evaluate the reliability of safety instrumentation systems. Many safety-related factors, such as failure modes, self-diagnostics, repairs, common cause, and voting, are included in Markov models. A framework is first created based on voting, failure modes, and selfdiagnosis. Then, repairs and failures are included in the framework to build a complete Markov model. Ehsani et al. [69] proposes an analytical probabilistic model to assess the reliability of competitive electricity markets. A Markov state-space diagram is used to
103
104
Engineering Reliability and Risk Assessment
assess market reliability. Since the market is a system that operates continuously, the concept of absorber modes is applied to evaluate reliability. Pil et al. [70] describes the reliability assessment of reliquefaction systems for boil-off gas (BOG) in LNG carriers with a focus on redundancy optimization and maintenance strategies. Reliability modeling is based on the time-dependent Markov approach. Do Van et al. [71] presents the development of Differential Significance Criteria (DIM), which have recently been proposed for use in risk-based decision-making in the context of Markov reliability models. Soszynska [72] presents a semi-Markov model of system operation processes and uses a model of system operation process and multistate system reliability to assess the reliability and risk of the port oil pipeline transportation system. Veeramany and Pandey [73,74] provide a general model for evaluating the rupture frequencies and reliability of the piping system in a nuclear power plant based on the semi-Markov process theory. Wei et al. [75] proposes a Markov-based model for estimating the reliability of a hierarchical architectural system. This model is proposed to replace the traditional RBD. Comparative studies between the Markov-based model and RBD have been performed. Experimental results show that Markov techniques can improve the predictive value of system reliability. Jiang et al. [76] suggests a reliable Markov model for effectively reducing manpower abnormal events in the high-security digital main control room of nuclear power plant (NPP). Veeramany and Pandey [73,74] have studied the reliability analysis of nuclear component cooling water systems (NCCWs). A semi-Markov process model is used in the analysis. Mathew et al. [77] presents the modeling and reliability analysis of a two-unit system of continuous casting (CC) plant using semi-Markov processes and regenerative point techniques. Grabski [78] considers the failure rate of a random process with continuous nonnegative and right continuous trajectories, and the reliability function is defined as a functional expectation of that random process. In particular, the failure rate is defined by semi-Markov processes. Theorems related to the renewal equations for the conditional reliability functions with the semi-Markov process as the failure rate are presented. Gupta and Dharmaraja [79] proposes an analytical reliability model for VoIP. The reliability model is analyzed using the semi-Markov process, which shows the effects of the non-Markovian nature of the time spent in different system modes. Liu and Rausand [80] examined this classification by studying the safety instrumented system’s reliability for different demand rates, demand durations, and test intervals. This approach is based on Markov models. Haghifam and Manbachi [81] proposes a reliability model based on state space and continuous Markov method for combined heat and power (CHP) with power generation, fuel distribution, and heat generation subsystems. Yuan and Meng [82] studied a warm standby system consisting of two dissimilar units and a repairman. In this system, using Markov process theory and Laplace transform, they obtain some important reliability indicators and some steady-state indicators. Hosseini et al. [83] discusses the effects of considering the reliability of equipment for thermal
Markov and semi-Markov models in system reliability
analysis of a combined and multistage flash water desalination plant. Equipment reliability is entered into thermoeconomic analysis to improve cost values using state space and the continuous Markov method. Cai et al. [84] examines the effects of stack configuration and types of mount on subsea BOP systems from a reliability perspective. A model based on the Markov method for performance evaluation is presented. The availability and reliability of the system are assessed by integrating Markov stand-alone models with the Kronecker product approach. Rosset et al. [85] has developed reliability models for a group membership protocol designed for TDMA networks such as FlexRay. Models are based on time-based Markov chains. Hosseini et al. [86] studied multiobjective optimization for the design of a combined gas turbine and multistage flash desalination plant. They also incorporate equipment reliability into the optimization approach using state space and the continuous Markov method. Gilvanejad et al. [191] propose a new Markov model for the reliability analysis of the fuse cutouts, which considers the hidden failures of the fuses. Wang and Liu [87] has developed an automated air traffic control system (ATCAS) and the Markov model, which collects 36-month ATCAS failure data. A method for predicting s1, s2, s3 ATCAS is based on the Markov chain, which confirms the reliability of ATCTS according to the inference theory of reliability. Lisnianski et al. [88] presents a multistate Markov model for a coal power generating unit. The proposed multistate Markov model is used to calculate important reliability indicators such as the Forced Outage Rate (FOR), the Expected Energy Not Supplied (EENS) to consumers, etc. Selwyn and Kesavan [89] used the concept of Markov analysis (MA) to model the failure characteristics of the main components to calculate the probability, availability, and reliability of different modes of a wind turbine (WT) system with capacities of 225 kW, 250 kW, and 400 kW. Cao et al. [90] presents a model using Markov to examine the quantitative relationship between software testing and software reliability in the presence of imperfect debugging. Zhang et al. [91] proposes a Markov modeling approach for system reliability analysis. Wang et al. [92] provides redundant design of the BCHP (Building Cooling, Heating, and Power) system and its mode of operation. The space-state method is combined with probabilistic analysis of the Markov model and is used to analyze the reliability of three forms of energy supply, including electricity, heat, and cold, in the BCHP system. Montoro-Cazorla and Perez-Ocon [93] studied an n-component system, one online and the rest in standby and repairable mode, to assess reliability by considering the Markov process governing the system, as well as Markovian failure shocks and repair times. Liu et al. [94] proposes a reliability model to evaluate the reliability and life span of laser diodes in a space radiation environment. The degradation process is divided into discrete states, and the reliability model is subsequently developed based on the Markov process. Karami- Horestani et al. [95] analyzed the role of different SVC (Static VAR
105
106
Engineering Reliability and Risk Assessment
Compensator) components in system availability and proposed a reliability model for a typical SVC. The Markov process is used to analyze the proposed model. They present an equivalent three-state model for SVC. Guilani et al. [96] evaluated the reliability of nonrepairable three-state systems and proposed a new method based on the Markov process for which a suitable state definition is provided. Malefaki et al. [97] considered a semi-Markov setting to study the main criteria for the reliability of a repairable continuous-time system under the hypothesis that the time evolution of its components is described by a continuous-time semi-Markov process. Mattrand and Bourinet [98] studied the reliability of cracked components under stochastic amplitude loads modeled by discrete-time Markov processes. Galateanu and Avasilcai [99] studied the reliability analysis of a business ecosystem based on the probabilities of its elements. This analysis is based on Markov chain theory. Soleymani et al. [100] studied wind farm modeling in assessing the reliability of power systems. The mechanical behavior of each wind turbine generator (WTG) is modeled by the sequential Monte Carlo method, and the Markov model is used for wind farm output power. Aval et al. [101] used the Markov method based on state-space analysis to investigate the effects of BCHP system performance on the reliability of power systems. Wu and Hillston [102] studied two types of mission reliability for mission systems that do not require routine work throughout the mission. The first type is related to the mission requirement that the system must be continuously active for a minimum period of time during a certain mission period, while the second type is related to the mission requirement that the total operating time of the system in the mission time window must be more than a certain value. Based on the Markov renewal properties, matrix integral equations for semi-Markov systems are obtained. Petroni and Prattico [103] studied the issue of power production by wind turbines using the semi-Markov chain as a wind speed model and calculated some of the main reliability criteria. Fazlollahtabar et al. [104] proposes an integrated Markovian and back-propagation neural network approach to calculate the reliability of a system. Timashev and Bushinskaya [105] described pipeline degradationdthe simultaneous growth of many corrosion defects and the reduction of residual pipe resistance (burst pressure) by Markov processes of pure birth and death type, respectively. Based on Markov models, a method has been suggested for assessing the probability of failure (POF)/ reliability of a defective pipeline cross section and a pipeline as a distributed system. Pham et al. [106] presents a reliability modeling scheme for a component-based software system whose models are automatically converted to Markov models for reliability prediction by the reliability prediction tool. Wan et al. [107] proposes a stochastic process prediction model for estimating the thermal reliability of an electronic system based on Markov theory. A stochastic model of thermal reliability analysis and prediction for the whole electronic system is constructed based on the Markov process. Ossai et al. [108] described how to use Markov
Markov and semi-Markov models in system reliability
modeling and Monte Carlo simulation to determine the reliability of internally corroded pipelines. Shariatkhah et al. [109] proposes an analytical method for modeling the dynamic behavior of thermal loads in MCEB reliability analyses. The proposed method is based on the Markov chain by integrating thermodynamic equations. Honamore and Rath [110] proposes Hidden Markov Model (HMM) and fuzzy logic prediction model to predict service (web) reliability. Honamore et al. [111] used the HMM and Artificial Neural Network (ANN) to model the web service failure model and predict the reliability of web services. Adefarati and Bansal [112] studied the effects of renewable energy resources (RERs) on a microgrid system. Stochastic properties of the main components of RERs and their effects on the reliability of a power system have been presented using the Markov model. Zhu et al. [113] studied an m-consecutive-k, l-out-of-n system with nonhomogeneous Markov-dependent components. Using the probability generator function method, closed-form formulas is used for the reliability of the m-consecutive-k, l-out-of-n system. Li et al. [114] studied the development of reliability criteria for a repairable multistate system operating under discrete-time dynamic regimes. The process of regime change is governed by one Markov chain, and the system operation process follows another Markov chain with different probability transfer matrices under different regimes. Du et al. [115] presents a model based on aggregated stochastic process theory to describe history-dependent behavior and the effect of neglected fractures on Markov history-dependent repairable systems. Based on this model, instant availability and steady state are obtained to determine the reliability of the system. Zeygolis and Bourdeau [116] proposes a new method for reliability evaluating the internal stability of reinforced soil walls, taking into account the very high strength-redundant properties of these structures. Redundancy is formulated based on transfer probabilities and Markov random processes. Kabashkin and Kundler [117] presents a Markov model for analyzing sensor node reliability in wireless sensor networks. Snipas et al. [118] offers solutions for Markov chain reliability models with a large state space. Li et al. [119] proposes a multistate decision diagram algorithm for Phased Mission System (PMS) and a multistate decision diagram model for the PMS to model nonrepairable multistate reliability. Based on the semiMarkov process, a method based on the Markov renewal equation has been presented to deal with nonexponential multistate components. For reliability design, Ye et al. [120] introduces a systematic approach to modeling the random process of system failures and repairs as a continuous-time Markov chain, which incorporates the maintenance effect to find the optimal choice of parallel units. Rebello et al. [121] presents a new method for estimating and predicting the functional reliability of a system using system functional indicators and component condition indicators. The proposed method uses both hidden Markov models and a dynamic Bayesian network to estimate and predict the operational reliability of the system.
107
108
Engineering Reliability and Risk Assessment
Sonal et al. [122] considers the reliability study of series/parallel systems of vibrating systems under multicomponent random excitations. An experimental protocol, based on a subset simulation with Markov chain splitting, is proposed to estimate the failure probability with a relatively smaller number of samples, thus reducing the test time. MontoroCazorla and Perez- Ocon [123] studied a reliability system exposed to shocks, internal failures, and inspections, and a Markov process has been developed for this system. For the first time, Yi et al. [124] addresses two new reliability criteria, the availability of point coverage and the availability of interval coverage for discrete-time semi-Markovian systems. Kabir and Papadopoulos [125] provides an overview of fuzzy set theory methods used in safety and reliability engineering, including fuzzy Fault Tree Analysis (FTA), Failure Mode and Effects Analysis (FMEA), and Event Tree Analysis (ETA), fuzzy Bayesian networks, fuzzy Markov chains, and fuzzy Petri nets. Gunduz and Jayaweera [126] studied the power system reliability assessment of an integrated physical-cyber system operation with multiple photovoltaic (PV) system configurations combining Markov chain transmissions for PV system components. Ardakan and Rezyan [127] proposes a two-objective reliability-redundancy allocation problem that uses a Markov-based approach to calculate exact values of reliability. Alizade and Sriramula [128] introduces a new reliability model for redundant safety-related systems using the Markov analysis technique. Su et al. [129] studied a systematic method for assessing the reliability of natural gas pipeline networks. The supply capacity of a pipeline network depends on the unit states and network structure, both of which change randomly. Thus, a random capacity network model is developed based on Markov modeling and graph theory. Steurer et al. [130] proposes a new Systems Modeling Language (SysML)-based method for the analysis of the reliability of unmanned aerial vehicles (UAVs). This method consists of three main stages, one of which the Dual-Graph Error Propagation Model (DEPM)-based evaluation of system reliability criteria using Markov chain models and advanced techniques for probabilistic model evaluation is the third stage. Cheng et al. [192] present a Markov model to mimic a solar-generating system with series inverters, unreliable bypass switches, and common cause malfunctions. They obtain quantitative reliability models of the system using renewal point analysis technique and semiMarkov process theory. Gao et al. [131] studied a new repairable balanced k-out-of-n: F system with m sectors. The characteristic of this balanced system is that the number of active components in the m sectors must remain constant at all times. The excellent Markov imbedding method is used to obtain system reliability and availability. Zhao et al. [132] introduces a Markov model for modeling hybrid DC circuit breaker reliability to calculate steady-state availability, failure rate, and average time between failures. Agrawal et al. [133] presents a Markov model for the reliability analysis of an EPBTBM (Earth Pressure Balance Tunnel Boring Machine) used in an irrigation tunnel. Wang et al. [134] integrated the Maximum Entropy Markov Model (MEMM) with time series motifs to achieve a new prediction model to address the problem of predicting
Markov and semi-Markov models in system reliability
component system reliability in a dynamic and uncertain environment. Yang et al. [135] provides a framework of the Markov/(Cell-to-Cell Mapping Technique) CCMT search engine platform for dynamic system modeling and reliability analysis. Honarmand et al. [136] proposes a new mathematical model for evaluating the reliability of distribution networks integrated with process-oriented intelligent monitoring systems. This model uses the Markov method and considers the effect of process failure factors on the overall reliability of the system. Che et al. [137] created piecewisedeterministic Markov process modeling that can combine machine degradation and human error to evaluate system reliability. Wang et al. [138], to evaluate reliability and operational availability (OA), proposes a Markov model with a Reliability Block Diagram method. Zhao et al. [139] proposes the overlapping finite Markov chain method for the first time to derive the system reliability and the expected shock length. Kabashkin [140] presents a way to increase the independence time of sensors in cluster-based wireless sensor networks (WSNs) prolongation using battery redundancy and proposes a Markov model for analyzing sensor node reliability in cluster-based WSNs with the proposed method. Wang et al. [141e143] studied the problem of multiobjective optimization by considering the optimal redundancy strategy, whether in active or cold standby mode. The exact reliability of additional cold standby redundant subsystems with incomplete detector/switch is determined by the introduction of a continuous-time Markov-chain-based approach. Son et al. [144] presents an integrated reliability assessment model of the reactor protection system in nuclear power plants based on the Markov model. Zhao et al. [145] studied multistate balanced systems and derived the corresponding probability functions and some other reliability indices using the two-stage finite Markov chain imbedding approach. Cheng and Yang [146] proposes a reliability model for multistate phased mission systems (MS-PMS) by sharing common bus performance taking into account transmission loss and performance storage. They present a new Markov model for multistate components to solve the problem of interdependence between phases. To demonstrate the importance of a total battery energy storage system (ABESS) in microgrid (MG), Pham et al. [147] examines the reliability of its operation and presents an analytical approach based on Markov models to evaluate the reliability of the overall operation of ABESS. Liu et al. [148] has proposed a modified one-dimensional Markov chain model for stratigraphic boundary uncertainty (SBU) modeling in slope reliability analysis considering soil spatial variability. Chakraborty et al. [149] has introduced a new coverage-reliability index, CORE, to quantify specific coverage-based reliability. CORE provides measurement of the ability of the sensor network with multistate nodes to satisfy the needs of the application-specific coverage area by delivering reliable data to the mobile sink. Wu and Ciu [150] proposes two Markov renewal shock models with multiple failure
109
110
Engineering Reliability and Risk Assessment
mechanisms. Shock size and time dependence between inputs are studied assuming that the magnitudes and times between inputs of shocks are controlled by an absorbing Markov chain with finite state space. Methods for calculating reliability functions and other reliability indices are presented under two Markov proposed renewal shock models. Jiang et al. [151] provides a framework that emphasizes quantitative analysis and increased reliability of kick detection sensor networks. They use the Markov chain to obtain a reduction in the reliability of measurement sensors over time. Wu and Cui [152] investigated the reliability of multistate systems under Markov renewal shock models with multiple failure levels. In these models, the times between the arrival and the magnitude of the shock are controlled by an ergodic Markov chain and an absorption Markov chain. Chiachico et al. [153] studied that the prognostic method relies on a stochastic degradation model based on Markov chains, which is embedded in the framework of sequential mode estimation to predict remaining useful life and timedependent reliability estimation. Raghuwanshi and Arya [154] offers two methods for evaluating the reliability indicators of an independent hybrid photovoltaic (PV) energy system. Markov model and frequency-duration (F- D) reliability techniques have been used to evaluate the reliability indicators. Guilani et al. [155] tried to apply heterogeneous components in a subsystem. A mathematical model based on the continuous-time Markov chain (CTMC) method has been developed, which allows the system reliability to be accurately calculated because it is sensitive to component sequencing. Anand et al.[156] has proposed a possible new approach to evaluate the impact of electric vehicles on the performance of power distribution systems. A dynamic hidden Markov model is used to record the movements of an electric vehicle in the traffic layer. Postnikov and Stennikov [157] performed probability modeling of district heating systems (DHS) states based on a Markov random process under conditions that can be specified for real systems. This method is based on Markov theory of stochastic processes and the basic principles of probability theory. Li et al. [158] developed the Markov-based dynamic fuzzy fault tree analysis method to solve the uncertainty of the failure rate in the hydraulic system of wind turbines, to model the reliability by considering the dynamic failure characteristics. Wang et al. [141e143] have proposed a multistate reliability modeling method based on the performance degradation process of the pipeline system considering the failure interaction. Also, the Markov model of the degradation process was proposed to determine the relationship between the probability of occurrence of each state and the path of decay. Zhou et al. [159] proposes an integrated supply reliability assessment method for multiproduct pipeline systems and used the discrete-time Markov process to describe stochastic failure and the Monte Carlo method to simulate the transition of system states. Wang et al. [141e143] have proposed a Markov process imbedding approach for analyzing the reliability of balanced systems. Roy and Sarma [160] analyzed the
Markov and semi-Markov models in system reliability
performance on energy efficiency, throughput, and reliability of a synchronous dutycycled reservation-based MAC protocol named Ordered Contention MAC (OCMAC) protocol by modeling the OCMAC queuing behavior with the Markov chain process. Wu et al. [161] considered a balanced performance-based system by sharing the performance of a common bus consisting of n components. When the system goes out of balance after a performance share or the performance of the whole system is less than a predetermined value, the system fails, whichever comes first. A Markov process is used to describe changes in the performance of each component and the global production function technique is used to obtain an analytical solution for system reliability. Wang [162] presents a method for evaluating the time-dependent reliability of aging structures. In this regard, a closed solution for structural reliability has been developed that models the sequence of load effects as a Markov process. Wang et al. [163e165] considered a component reassignment problem (CRP) for a balanced system with multimode components operating in a shock environment. In such a balanced system, the components are multistate and the balance of the system is defined based on the performance levels of the components. Some reliability indicators have been obtained with the two-stage Markov chain imbedding approach. Based on most practical systems, Fang and Cui [166] has developed an automated balancing mechanism for an innovative multistate balanced system consisting of two subsystems for theoretical and practical demands. Aggregated stochastic processes and semiMarkov processes are used to obtain reliability criteria. Peirayi et al. [167] created a continuous-time Markov chain model for mixed and K-mixed strategies. The proposed model estimates reliability under different redundancy strategies more effectively and in a simple way. In Tarineiad et al. [168], first, using Markov chain, a method for modeling the self-healing behavior of a component is proposed. Then, with different combinations of Taylor series expansion and self-repair, several criteria are proposed to evaluate the reliability of a software system. Yi et al. [169] developed a belief reliability analysis method for transportation systems based on traffic performance margin. They developed an uncertain percolation semiMarkov (UPSM) model to describe the essential physical properties of traffic accidents, taking into account random and epistemic uncertainties. Yi et al. [170] introduced two new reliability indicators including multipoint-bounded coverage availability and multidistance-bounded coverage availability for discrete time systems. Their explicit formulas are derived for first- and second-order discrete aggregated semi-Markov systems, whose state spaces are divided into three subsets of perfect states, imperfect states, and failure states, respectively. Wang et al. [164] proposes a new reliability method by combining relevance vector machine and Markov-chain-based significance sampling (RVM- MIS). Jun et al. [171] provides an integrated Markov/CCMT analysis platform for the dynamic reliability of a digital process control system.
111
112
Engineering Reliability and Risk Assessment
Farzin et al. [172] presents a Markov model that shows the different effects of harmonics on the reliability of overcurrent relays. Jiang and Li [173] developed the computational framework and modeling of multistate physics to evaluate the dynamic reliability of a multicrack structure through the implementation of piecewise deterministic Markov processes (PDMP). Wu and Cui [174] developed a reliability and maintenance model for a periodic inspection system with competitive risks exposed to a randomly changing operating environment, which is modeled as a continuous-time homogeneous absorption Markov process. Tahmasebzadehbaie and Sayyaadi [175] studied the watere energyeenvironment linkage in a city for burner gas recovery while considering downstream installation reliability. The Markov technique has developed a new integrated model for evaluating various proposed scenarios. Jagtap et al. [176] provides a framework for reliability, availability, and maintenance (RAM) analysis to evaluate the performance of a water circulation system (WCS) used in a coal-fired power plant (CFPP). WCS performance is assessed using the Reliability Block Diagram (RBD), Fault Tree Analysis (FTA), and Markov Possible BirtheDeath Approach. Mathebula and Saha [177] modeled the reliability and availability of the multimode IEC-61850-based Substation Communication Networks (SCN) using the structureefunction and Markov process to combine defective repairs and diagnostic coverage of the system. Li et al. [178] studied a Markov Regenerative Process (MRGP) model to consider the effect of shocks on PMS reliability modeling. Wang et al. [163e165] present a mathematical model for the reliability of a repairable (k1, k2)-out-of- n: G system consisting of two different types of components. The repair time of each failed component is exponential and the repairman’s break time follows the phase-type distribution. The system works if at least the components of k 1 type 1 and k 2 components of type 2 work simultaneously. System reliability criteria are analyzed using Markov process theory and matrix analytical method. Haghgoo and Damchi [179] proposes a new Markov model for assessing Capacitor Voltage Transformer (CVT) reliability. To obtain the model, first, a Markov model is presented for each CVT subsystem. Then, by merging these models, a 10-state Markov model is obtained, and finally, by combining similar modes, they obtain a three-mode model, including healthy, low-quality, and fault modes. Yin et al. [180] studied models for linear and circular k-out-of-n systems with common components consisting of subsystems. Using the Markov chain technique, the n-step transition rate matrix is obtained. Finally, system reliability is achieved by summing the reliability of all items with a limited Markov chain approach. Wu and Ding [181] provides a reliability model for systems that are exposed to multiple dependent competitive failure processes under the influence of Markovian environments. Markov reward models are analytical tools for stochastic processes. Systems that ultimately suffer irrecoverable failure are modeled with absorption Markov chains, while repairable systems are modeled with irreducible chains. In the first case, the mode space
Markov and semi-Markov models in system reliability
is divided into two parts, which are a set of up and a set of down from modes of the system. In repairable systems, transient behavior in such systems is described by values of reliability and point and interval availability. Markov models can be used for transient reliability analysis with and without repair for K-out-of-N:G systems. The Built-InTest technique with Markov model is used to analyze the reliability and availability of k -out-of-n:G systems. Methods for solving reliability problems mainly include analytical techniques and stochastic simulations. Each of these methods has advantages and disadvantages. One of the important analytical techniques is the Markov method for reliable problem-solving. Markov analysis is a powerful and flexible technique for measuring system reliability. In the Markov method, state space is used to consider the states that may occur for the system, and an appropriate definition of system states must be considered. Process state space can be general, discrete, continuous, finite, and infinite. Due to the high number of states defined in Markov methods, or in other words, due to the exponential explosion in the size of the Markov model with the increasing number of states, or in other words, the curse of dimensionality, this method of solution may be limited. And these methods are usually time-consuming. There are several ways to overcome this limitation in using the Markov method to solve complex systems. One of these methods is to simplify these models to reduce the number of state-space modes or to merge the modes. In Markov methods, the probabilities of state transfer are not definite and are calculated probabilistically. In semiMarkov systems, system states are defined, which are usually divided into healthy, defective, and failure states, in which the failure state is absorption and the system cannot come out after entering it. In semi-Markov processes, future behavior depends on current states as well as residence time. Markov reliability models have the ability to consider statistical correlations between failure events in dynamic systems. Reliability of systems is often a function of component failure, some of which are independent of components and others are interdependent. In using the Fault Tree (FT), the faults must be independent of each other. If the faults are interdependent, Markov analysis must be used, although the performance of the Markov model also largely depends on the size of the model. Of course, the Markov model is not suitable when the state space is large and when the time distribution is not exponential. Semi-Markov chain approaches are based on considering the length of the process stays in each case. Hidden semi-Markov models are used to obtain the probabilities of transition between modes and the length of stay in each mode. In semi-Markov reliability models, the state space is divided into two sets of up and down modes. The probabilities of state transfer can always be the same or change with age. Renewal equations are used for reliability functions in semi-Markov processes and conditionally in discrete state space. Non-Markovian models can be approximated by phase-type distribution. Markovian complex systems can also be analyzed. In system
113
114
Engineering Reliability and Risk Assessment
analysis, time is an important and influential factor in system capability and Markov models are suitable for analyzing these systems. Reliability theory uses various mathematical and statistical tools for calculations. These tools include total positivity, majorization and Schur functions, renewal theory, Bayesian statistics, isotonic regression, Markov and semi-Markov processes, stochastic comparisons and bounds, convexity theory, rearrangement inequalities, and optimization theory Boland and Proschan [182]. In reliability theory, the replacement of items in case of failure is defined by the renewal process. The renewal function is defined as the expected number of replacements at a given moment. When it is assumed that the lifetime distribution of items is in one of the reliability classes of lifetime distributions, lower and upper bounds can be obtained for the renewal function. Reliability in standby systems that are prone to failure and are replaced by a new system after failure can be calculated using Markov renewal theory. Each system has subsystems that have different levels of fault tolerance. The subsystems have dependencies. Each subsystem has several multistate components whose behavior can be analyzed using Markov models. To analyze the reliability of multicomponent systems, the components are not considered separately and are examined by considering the effect of component failure on each other. System status is assessed based on the failure level of all components. The system fails when the failure level of a component reaches a critical value or the difference in the deterioration level of the two components in symmetrical positions is higher than a certain limit. The occurrence of system failures in transition and stationary regimes can be obtained by using the matrix analytical method and probabilistic properties of phase-type distribution and Markov process theory. The more complex the structure of systems and their multistate characteristics, the more difficult it is to assess their reliability. Due to the various techniques available for analyzing multistate systems, simulation is the only applicable approach for these systems. However, simulation requires counting the number of possible system states and defining the cutting sets associated with each state. Therefore, in large complex systems, counting the states and defining the cutting set make the simulation problem difficult and require a detailed understanding of the failure mechanism. Each component in the system can be defined as a semi-Markov random process and the system performance can be imitated through discrete simulation.
4. Conclusions and future research System reliability can be defined as the system being able to operate continuously over a specified period of time with a certain level of performance. In some writings, instead of system reliability, they use functional reliability because reliability actually shows the level
Markov and semi-Markov models in system reliability
of system performance. Reliability is also defined as the frequency of a system failure. Reliability analysis is therefore usually dependent on the failure rate. In addition to time, the reliability of systems also changes due to environmental and operational conditions. Many systems deteriorate. This deterioration includes both natural degeneration and stochastic shock. Intrinsic deterioration and shock both cause the system to fail. There is usually a correlation between degradation and stochastic shocks, and both increase degradation. The time between two consecutive shocks can follow any distribution. Continuous phase-type distribution can be a suitable distribution. The times between consecutive shocks can be considered as a geometric distribution, and it can be assumed that shocks occur based on a binomial process. The shock process can be described by the Poisson process. Shock models with self-healing mechanisms can be used. Cumulative shock models and delta shock models are also used for shock modeling. Usually, during the life of equipment, different periods can be described in which the system configuration and failure criteria change. There are several models for evaluating longevity in terms of reliability, including models based on Markov’s reward theory and continuous phase-type distributions. Equipment life can be predicted using time series. The life span of your equipment is generally exponential. However, the longevity of many systems follows nonexponential distributions, including Weibull, and Markov modeling cannot be used for system behavior. Weibull distribution is one of the most important distributions in the reliability of systems. Therefore, in these systems, the semi-Markov process was used, in which the lifetime of the components in dynamic systems follows the nonexponential distribution. Phasetype distribution can be suitable for equipment life. This distribution increases the flexibility of the systems. Markov chains are commonly used to model the degradation process. In Markov models, exponential distributions are used for failure times, but the number of modes in complex problems can be reduced by assuming the Weibull distribution for failure times. Negative exponential distribution and geometric distribution can also be used for failure times. The gamma process can be used to describe the random behavior of degeneration. Thus natural degradation behavior can be explained by the gamma process. In reliability models, it can be assumed that components break down based on a multivariate gamma process with dependence on the Levy copula. The Wiener process can be used to model component degradation. The failure rate in the equipment is proportional to the number of faults in the system and the failure correction rate has two components, either an error is corrected at a certain rate or an error is corrected, but a new error is created with a new correction rate. Markov models are suitable for the exponential frequency distribution of the reliability function. The exponential distribution considers the hazard rate and repair to be constant. However, the Weibull model is one of the most widely used distributions to describe the frequency distribution in reliability and failure. Weibull distribution can be approximated to exponential distribution by linearization. Therefore, even if
115
116
Engineering Reliability and Risk Assessment
the Weibull distribution is considered for the reliability function, the Markov models can still be used for it by considering this approximation. The pattern of degradation in systems is usually nonlinear. In some systems, transient regeneration events can affect the rate of degradation. For the degradation process, a Bayesian inference exponential model is usually used that ignores transient behaviors and local fluctuations. As a result, they do not perform well. However, models must be presented in which transient behaviors are also considered, so a general model must be presented that considers both transient behaviors and the pattern of gradual degradation. A model based on the Poisson process can record transient behaviors. The rank regression method is used to determine the most appropriate failure distribution function. Semi-Markov continuous-time processes with distributed Weibull transmission times can also be used to model system failure. Dynamic Bayesian networks are also used to model degradation and reliability in cases where space is discrete and limited. Dynamic Bayesian networks integrate conditional stay time distributions with multiple dynamic degradations. Hidden Markov models are also included in the reliability. Bayesian inference can be used to reveal the unknown dimension of hidden Markov models. Uncertainty parameters are usually entered as random variables in reliability models and their distribution must be specified. Past data are usually used to estimate distribution parameters. Real data between failures can be used to infer model parameters using the Monte Carlo Markov chain simulation method. For reliability in mechanical systems with continuous-state degradation process, the gamma process with nonconstant degradation is often used, and the matrix-based approximation method is used to calculate the average residual life and the residual life distribution. It can be assumed that the system fails when the level of deterioration reaches a critical threshold. Sudden shocks are triggered at random times following a heterogeneous Poisson process, and the system fails when a sudden shock occurs. A nonhomogeneous continuous-time Markov chain is used to describe the state evolution process, and the nonhomogeneous Poisson process is used to model the number of external shocks. The probability of staying in any state of the Markov process is obtained through a recursive. Reliability should be defined quantitatively, which can be done using the regenerative point techniques and semi-Markov process theory. Statistical methods based on the Bayesian approach can be used to quantitatively evaluate the failure process. To estimate the reliability of a system, indicators should be used to check the status of each component. The ChapmaneKolmogorov differential equations are used to obtain reliability criteria. In systems that are prone to deterioration or impact, self-healing action is less considered. Assessing reliability by considering the self-healing effect of components will be an interesting topic, although systems with self-healing mechanisms are used in many areas.
Markov and semi-Markov models in system reliability
Due to the exponential explosion in the size of the Markov model with the increasing number of cases, or in other words, the curse of dimensionality, the Markov method may be limited in solving complex and large problems. There are several ways to overcome this limitation in using the Markov method to solve complex systems. Either these models need to be simplified to reduce the number of state space modes, or they can be combined to reduce the number of modes. Korolyuk’s method is a method of integrating state space into Markov models. Another solution is to use hierarchical modeling, which overcomes this increase in states in the Markov model. And you can use classical Markov methods or simulation techniques in Markov method. The Lz-transform can also be used. Reinforcement learning (RL) is used to improve Markov decision processes with large state spaces. The probabilities of state transfer in Markov methods are not definite and are calculated probabilistically. Fuzzy logic can be used in Markov models to determine the transfer rate. Most failures cannot be shown by Markov chain methods and the Poisson process. These methods are not suitable for analyzing non-Poisson failures. The Event tree (ET)/Fault tree (FT) method is one of the approaches used to assess reliability. Simple reliability problems are solved easily and quickly using error tree models. But in more complex problems, the state space increases exponentially and these models are prone to error, and this approach has not worked efficiently in modeling the reliability of dynamic systems. A dynamic FT with the definition of additional gates called Dynamic gates is able to solve complex systems using Markov models. Stochastic reward networks, which are a type of Petro-Stochastic networks, can be used for these models. The FT is commonly used to specify how the failure of individual components can lead to system failure. The dynamic FT shows the probability dependencies between the components and how these probabilities change over time. Reliability is often complex due to system dynamics and existing uncertainties, and statistical methods are usually used to calculate it. Statistical methods focus on past system data and do not consider system dynamics and change over time. Markov reliability models have the ability to consider statistical dependencies between failure events in dynamic systems. Markovian and non- Markovian models are used for dynamic equipment reliability. Markov models are not useable for fault detection and identification (PDI) in FTCS due to the inherent memory of these tests. But semi-Markov models can be used for these tests. Markov chains can be used to describe system faults. The metamodel method is also used in reliability analysis and the limitation of this method is to quantify the uncertainty of the model. Numerical algorithms are another analytical technique used to solve reliability problems. The Monte Carlo Markov simulation method has been widely used in solving reliability problems. This method is suitable for modeling the dynamic behavior of the system but is stochastic. The Hamiltonian Monte Carlo (HMC) method, which is a
117
118
Engineering Reliability and Risk Assessment
nonstochastic simulation method, can be used to solve problems. Riemannian Manifold Monte Carlo Simulation (RMHMC-SS) is used to overcome the limitations of Monte Carlo approaches to solving reliability problems in highly curved non-Gaussian spaces. Markov chain simulation is used with the MetropoliseHastings algorithm to estimate reliability. Bayesian methods can be incorporated into Monte Carlo Markov simulations and facilitate the inference of evaluation. The shortest path analysis in stochastic networks can be used in reliability where systems are time-dependent. Regenerative point techniques are also used for analysis. Common universal generating function (UGF) such as Lz-transform and recursive algorithm are used in systems with many components. Of course, universal generating function techniques are commonly used to analyze steady-state reliability. Traditional approaches such as simulation, Markov chains, and probability techniques are usually not capable of analyzing the reliability of complex systems. Petri net theory can be used to analyze reliability. Petri nets due to their visual nature can provide insight into the nature of the system being modeled. But the Petri model also has the problem of exploding mode space with increasing model complexity. Stochastic Petri nets have different types that can be used as modeling tools for system reliability. Stochastic Petri Nets (SPNs) and Generalized Stochastic Petri Nets (GSPNs) are used for continuous-time stochastic processes (continuous-time Markov chains), Extended Stochastic Petri Nets (ESPNs) with semi-Markov processes, and Deterministic and Stochastic Petri Nets (DSPNs) for Markov regenerative processes. Stochastic Petri net can be used for this modeling Choi et al. [183]. This approach calculates the probabilities of their transfer instead of evaluating or assuming them. The Markov renewal process can be used to increase the analytical power of Petri nets. There are limitations to reliability calculations in complex systems. These limitations include the curse of dimensionality with the exponential increase of the set of states with the number of components. Another limitation is the curse of history with the exponential increase of decision trees with increasing decision steps. Uncertainty is another limitation due to the inherent stochastic nature of system change. Also in complex systems, there are stochastic constraints related to resource scarcity Andriotis and Papakonstantinou [184]. Partially Observable Markov Decision Processes (POMDP) and multiagent Deep Reinforcement Learning (DRL) are used in reliability. Eigenvectors and eigenvalues can also be used to analyze system reliability. The reliability function can be obtained using the Laplace method. In semi-Markov systems with finite state space, algebraic calculus in convolution algebra can be used for reliability analysis. Differential equations can be used to solve reliability problems. Bayesian estimation can be used for system reliability using the Lindley approximation. Reliability models can be solved analytically in the form of Laplace transforms. Also, closed-form probability solutions are used to obtain reliability and mean failure time.
Markov and semi-Markov models in system reliability
Bayesian networks can be used in systems reliability. The growing use of Bayesian networks is due to the advantages that Bayesian networks offer compared to other classical methods of reliability analysis such as Markov chains. Bayesian models have been successful in analyzing complex systems. These models have good ability to predict and detect failure. Information can also be updated using the Bayesian network based on the observations obtained. Bayesian inference techniques are effective in assessing the reliability of complex systems. In this technique, using previous system performance data and other information, different probabilities of reliability hypotheses can be obtained. Due to the fact that real failure data may not be available, it is necessary to perform test designs at the subsystem level. The Bayesian hypothesis test can be used to determine the number of tests that can be performed to show that a reliability target is estimated in a system with a certain level of probability. Update on Bayesian networks is a powerful tool for quantifying model uncertainty using new observations. Bayesian models also have the ability to use multimodal variables. Bayesian networks also have the ability to accurately calculate the probability of an event occurring, which is one of the advantages of these networks in solving reliability problems. Today, in industrial systems, there is a set of components with dependencies between components and environmental conditions. System reliability can be achieved by influencing external variables on system degradation modes based on Bayesian dynamic networks. To effectively model the dependence between component degradation modes, a hierarchical structure must be defined. Dynamic programming is also used in solving reliability models. Traditional methods can also be used dynamically. Dynamic FTs, considering the characteristics of dynamic failure, can replace the traditional Markov-based FT, which may have the mode explosion constraint. Algebra of logics can be used to evaluate the reliability of large multicomponent systems if the use of differential equation-solving approaches or semi-Markov processes is problematic. As described above, the use of classical approaches to solving reliability problems is limited by the diversity and behavioral uncertainty of existing systems and data, as well as the functional dependence between components and multiple failure modes. Solution approaches can be developed that can analyze the reliability of a system if performance specifications and failure data are not available. Statistical solution methods and reliability Markov models need to be integrated with other modern problem-solving methods such as machine learning, artificial intelligence, and artificial neural networks. Intelligent monitoring systems can be used in smart networks to improve reliability, which indicates the amount of exposure to system components failure. FT analysis and reliability block diagrams are static methods that are not able to record the redundancy and dynamics of the system. But Markov models are dynamic models; they are not suitable for very large and unsolvable models, and the time to stay in any case in these models is exponential. Most failures cannot be demonstrated by Markov
119
120
Engineering Reliability and Risk Assessment
chain methods and the Poisson process. These methods are not suitable for analyzing non-Poisson failures. Markov modeling can be combined with the advantages of other modeling techniques such as reliability block diagrams. When man works with a machine, man also has an effect on the deterioration of the machine. The deterioration of the machine also causes fatigue and human error and accelerates the process of deterioration, so when examining the deterioration of the machine, manpower, environmental conditions, and raw materials must also be considered. By considering the humanemachine system, the reliability of the system can be analyzed under a Markov process. There is usually not much statistical data on component failure to probabilistic risk assessment (PRA). To overcome this limitation, fuzzy set theory is used. The solution approaches used to probabilistic risk assessment should be considered fuzzy. Reliability models must be flexible to be up-to-date. Bayesian networks are a powerful tool that can adapt to a variety of factors in complex problems. Bayesian methods including Bayesian networks and dynamic Bayesian networks are used to analyze the reliability of complex systems. Dynamic Bayesian networks (DBNs) are used in multicomponent systems to determine the deterioration dependence between different system components and the complex structural behavior of the system. The redundancy allocation problem (RAP) is used in many reliability optimization problems. To improve system reliability, component redundancy and component maintenance play an important role. Redundancy or standby is a technique used to improve reliability in system design. Reliability and availability analysis must be performed to decide on redundancy in the event of equipment failure. Systems can be interconnected and multistate, and demand in systems is usually random. Auxiliary resources can be used to improve system reliability. The reliability of multimode systems with redundancy should be studied. Standby system components are usually assumed to be statistically identical and independent. While in practical applications, not all components in standby mode can be considered the same because the failure rate and repair rate are not the same. A system that includes the main standby redundancy and subsystems can be considered and modeled with the Markov renewal process technique. There are many studies on the RAP, but few of them consider designing systems that work for a certain period of time. In these systems, checking reliability in infinite time will not be useful. In the redundancy allocation problem, nonrepairable components including standby, cold or warm subsystems are usually used. Repairable components can be used in the RAP. Also, in the RAP, subsystems containing homogeneous components are generally included in the system. Nonhomogeneous components in a subsystem can be used to develop the RAP. Since RAP is in the NP-hard class of optimization problems, new solution approaches to this problem can be proposed. Traditional reliability estimation methods, including reliability prediction methods, do not consider the impact of fault detection on the system. Models for estimating
Markov and semi-Markov models in system reliability
reliability can be proposed based on new approaches that take into account the impact of fault detection on the system. In calculating the failure rate, the repairman repair operation can also be considered as an effective factor in the equipment failure rate after repair. Failure and repair rates change with the operating age of the equipment, so the transfer rate also changes with the age of the equipment. This rate can be obtained based on fuzzy set theory or using the knowledge of experts. In systems, the failure rate can be defined independently of time and based on operational and environmental conditions. Probability modeling for Bayesian inference and quantification of uncertainty is one of the problems in determining the reliability of systems. New approaches can be proposed in this field. Hybrid processes can be modeled in which components that act as discrete parts interact with continuous components. In evaluating high-dimensional systems, these systems can be hierarchically decomposed into several levels, and each level can be described by one of the reliability approaches, such as FT, Reliability Block Diagram, and state transfer diagram. Systems are usually considered statically for analysis. Systems described by hybrid reliability models are usually dynamic. In dynamic systems, there is a functional interdependence between components. Reliability models often focus on fundamental faults while the reliability of transient faults must also be considered in predicting reliability. The numerical solution of a set of linear differential equations can be used simultaneously to calculate the transient state of repairable systems. For nonrepairable systems, transient probabilities for reliability can be obtained from closed-form solutions. Most reliability models assume that all failures are detectable and correctable. A reliability model in which not all failures can be identified and corrected can be analyzed. During a software debugging operation, there is a possibility that an additional fault will be entered into the program when deleting an existing fault. So complete debugging is an ideal assumption but in reality, it is impractical. Models can be provided for these systems that consider the debugging operation to be incomplete. The Internet of Things (IoT) connects a large number of objects to the Internet, which can be used to achieve a higher level of control in terms of reliability. The challenge in assessing reliability is the lack of failure data. The nonlinear leastsquares regression-based method uses past degradation data to predict longevity. The Bayesian method can be used to estimate useful life. In the discussion of reliability, the binary assumption (completely failed or completely intact) for the system in the theory of reliability is not acceptable. In this case, the fuzzy state assumption must be considered for the state of a system, and the fuzzy assumption must be taken into account in the calculation of the function of reliability, lifetime, and failure rate. Built-In-Test techniques are used in standby systems. This technique can be used with the Markov model in systems with nonidentical components to calculate reliability. Bivariate gamma process and fuzzy process can be used for stochastic modeling and reliability.
121
122
Engineering Reliability and Risk Assessment
The decision to replace faulty components is made based on the condition of the components and the system, and there is a limit to the inventory level in this decision. Both maintenance and refill decisions affect system performance. Integrated models can be proposed by considering reliability and inventory decisions in multicomponent systems. Models can be provided for the reliability of systems with regard to environmental conditions. Gray Markov chains can be an interesting and new approach to system reliability.
References [1] A. Bobbio, A. Premoli, O. Saracco, Multi-state homogeneous Markov models in reliability analysis, Microelectronics Reliability 20 (6) (1980) 875e880. [2] C. Singh, S. Asgarpoor, A.D. Patton, Markov method for generating capacity reliability evaluation including operating considerations, International Journal of Electrical Power & Energy Systems 6 (3) (1984) 161e168. [3] L.E. Johnson, Eigenvalue-eigenvector solutions for two general markov models in reliability, Microelectronics Reliability 26 (5) (1986) 917e933. [4] P.K. Rao, K.S. Balasubramanya, RAM analysis of a navigational system by Markov technique: a case study, Microelectronics Reliability 26 (3) (1986) 447e451. [5] B.H. Lindqvist, Monotone markov models, Reliability Engineering 17 (1) (1987) 47e58. [6] C. Constantinescu, Fault-tolerant, microcomputer based industrial programmable controller by redundancy, IFAC Proceedings Volumes 20 (5) (1987) 27e32. [7] G.G. Pullum, M. Chown, G.A. Whitmore, Reliability analysis of a multipath switching network, Microelectronics Reliability 28 (4) (1988) 619e633. [8] C. Constantinescu, Effect of transient faults on gracefully degrading processor arrays, Microprocessing and microprogramming 26 (1) (1989) 23e30. [9] J. Cao, Stochastic behaviour of a man-machine system operating under changing environment subject to a Markov process with two states, Microelectronics Reliability 29 (4) (1989) 529e531. [10] M.A. Manzoul, M. Suliman, Neural network for the reliability analysis of simplex systems, Microelectronics Reliability 30 (4) (1990) 795e800. [11] P.J. Gil, J.J. Serrano, R. Ors, V. Santonja, Dependability evaluation of watchdog processors, In Safety of Computer Control Systems 1990 (Safecomp’90) (1990) 89e94. [12] W.M. Bulleit, Experiences with a Markov model for structural systems with time variant member resistances, Structural Safety 7 (2e4) (1990) 209e218. [13] J. Shao, L.R. Lamberson, Markov model for k-out-of-n: G systems with built-in- test, Microelectronics Reliability 31 (1) (1991) 123e131. [14] T. Lassen, Markov modelling of the fatigue damage in welded structures under in- service inspection, International Journal of Fatigue 13 (5) (1991) 417e422. [15] N. Limnios, C. Cocozza-Thivent, Reliability modeling of uncontrolled and controlled reservoirs, Reliability Engineering & System Safety 35 (3) (1992) 201e208. [16] A. Senegacnik, M. Tuma, Operating reliability of a conventional power plant, Reliability Engineering & System Safety 37 (3) (1992) 211e215. [17] B. Tan, S. Yeralan, S. Babu, B. Osborn, Computer aided reliability modeling and applications in semiconductor manufacturing, Computers & Industrial Engineering 23 (1e4) (1992) 169e172. [18] T. Guo, J. Cao, Reliability analysis of a multistate one-unit repairable system operating under a changing environment, Microelectronics Reliability 32 (3) (1992) 439e443. [19] S.K. Singh, G.C. Sharma, Analysis of a composite performance reliability evaluation for Markovian queueing systems, Microelectronics Reliability 32 (3) (1992) 319e321. [20] P. Stavrianidis, Reliability and uncertainty analysis of hardware failures of a programmable electronic system, Reliability Engineering & System Safety 39 (3) (1993) 309e324. [21] S. Yamada, K. Tokuno, S. Osaki, Software reliability measurement in imperfect debugging environment and its application, Reliability Engineering & System Safety 40 (2) (1993) 139147.
Markov and semi-Markov models in system reliability
[22] D. Sculli, S.K. Choy, Power plant boiler feed system reliability: a case study, Computers in Industry 21 (1) (1993) 93e99. [23] M.A. El-Damcese, Analytical evaluation of reliability models for multiplex systems, Microelectronics Reliability 35 (6) (1995) 981e983. [24] A. Csenki, A new approach to the cumulative operational time for semi-Markov models of repairable systems, Reliability Engineering & System Safety 54 (1) (1996) 11e21. [25] M.S. Moustafa, Transient analysis of reliability with and without repair for K-out-of- N: G systems with two failure modes, Reliability Engineering & System Safety 53 (1) (1996) 31e35. [26] K. Farhangdoost, J.W. Provan, A stochastic systems approach to fatigue reliabilityd an application to Ti-6Al-4V, Engineering Fracture Mechanics 53 (5) (1996) 687e706. [27] W.J. Yoo, K.H. Jung, Mission reliability of an automatic control system integrated with distributed intelligent built-in-test systems, Computers & Industrial Engineering 33 (3e4) (1997) 753e756. [28] V. Chandra, K.V. Kumar, Reliability and safety analysis of fault tolerant and fail safe node for use in a railway signalling system, Reliability Engineering & System Safety 57 (2) (1997) 177e183. [29] M.S. Moustafa, Reliability analysis of K-out-of-N: G systems with dependent failures and imperfect coverage, Reliability Engineering & System Safety 55 (1) (1997) 15e17. [30] B. Ouhbi, N. Limnios, Reliability estimation of semi-Markov systems: a case study, Reliability Engineering & System Safety 58 (3) (1997) 201e204. [31] M.S. Moustafa, Transient analysis of reliability with and without repair for K-out-of- N: G systems with M failure modes, Reliability Engineering & System Safety 59 (3) (1998) 317e320. [32] G.J. Chang, L. Cui, F.K. Hwang, Reliabilities for (n, f, k) systems, Statistics & Probability Letters 43 (3) (1999) 237e242. [33] M.J. Beshir, A.S. Farag, T.C. Cheng, New comprehensive reliability assessment framework for power systems, Energy Conversion and Management 40 (9) (1999) 975e1007. [34] A. Lisnianski, A. Jeager, Time-redundant system reliability under randomly constrained time resources, Reliability Engineering & System Safety 70 (2) (2000) 157e166. [35] J.A. Whittaker, K. Rekab, M.G. Thomason, A Markov chain model for predicting the reliability of multi-build software, Information and Software Technology 42 (12) (2000) 889e894. [36] B. Knegtering, A.C. Brombacher, A method to prevent excessive numbers of Markov states in Markov models for quantitative safety and reliability assessment, ISA Transactions 39 (3) (2000) 363e369. [37] J. Yan, X. Wang, Reliability and safety analysis of automatic train protection system, IFAC Proceedings Volumes 33 (9) (2000) 615e619. [38] K. Tokuno, S. Yamada, An imperfect debugging model with two types of hazard rates for software reliability measurement and assessment, Mathematical and Computer Modelling 31 (10e12) (2000) 343e352. [39] G. Becker, L. Camarinopoulos, G. Zioutas, A semi-Markovian model allowing for inhomogenities with respect to process time, Reliability Engineering & System Safety 70 (1) (2000) 41e48. [40] T. Lassen, J.D. S0rensen, A probabilistic damage tolerance concept for welded joints. Part 1: data base and stochastic modelling, Marine Structures 15 (6) (2002) 599e613. [41] S. Zupei, W. Yao, H. Xiangrui, A quantification algorithm for a repairable system in the GO methodology, Reliability Engineering & System Safety 80 (3) (2003) 293e298. [42] B. Ouhbi, N. Limnios, Nonparametric reliability estimation of semi-Markov processes, Journal of Statistical Planning and Inference 109 (1e2) (2003) 155e165. [43] F. Grabski, The reliability of an object with semi-Markov failure rate, Applied Mathematics and Computation 135 (1) (2003) 1e16. [44] M. Bouissou, J.L. Bon, A new formalism that combines advantages of fault-trees and Markov models: Boolean logic driven Markov processes, Reliability Engineering & System Safety 82 (2) (2003) 149e163. [45] K. Goda, A strength reliability model by Markov process of unidirectional composites with fibers placed in hexagonal arrays, International Journal of Solids and Structures 40 (24) (2003) 6813e6837. [46] N.E. Wu, R.J. Patton, Reliability and supervisory control, IFAC Proceedings Volumes 36 (5) (2003) 137e142.
123
124
Engineering Reliability and Risk Assessment
[47] K. Tokuno, S. Yamada, Markovian software reliability measurement with a geometrically decreasing perfect debugging rate, Mathematical and Computer Modelling 35 (11e13) (2003) 1443e1451. [48] M. Tanrioven, Q.H. Wu, D.R. Turner, C. Kocatepe, J. Wang, A new approach to real-time reliability analysis of transmission system using fuzzy Markov model, International Journal of Electrical Power & Energy Systems 26 (10) (2004) 821e832. [49] A. El-Gohary, Bayesian estimations of parameters in a three-state reliability semi- Markov models, Applied Mathematics and Computation 154 (1) (2004) 53e67. [50] J.L. Wang, Markov-chain based reliability analysis for distributed systems, Computers & Electrical Engineering 30 (3) (2004) 183e205. [51] R. Dobias, P. Grillinger, S. Racek, A17: using Markov models for evaluation of single event upsets in TTP/C systems, IFAC Proceedings Volumes 37 (20) (2004) 100e104. [52] A. Sadek, N. Limnios, Nonparametric estimation of reliability and survival function for continuoustime finite Markov processes, Journal of Statistical Planning and Inference 133 (1) (2005) 1e21. [53] A. Azaron, H. Katagiri, M. Sakawa, M. Modarres, Reliability function of a class of time-dependent systems with standby redundancy, European Journal of Operational Research 164 (2) (2005) 378e386. [54] C.G. Bai, Bayesian network based software reliability prediction with an operational profile, Journal of Systems and Software 77 (2) (2005) 103e112. [55] W.L. Wang, D. Pan, M.H. Chen, Architecture-based software reliability modeling, Journal of Systems and Software 79 (1) (2006) 132e146. [56] H. Li, Q. Zhao, Reliability evaluation of fault tolerant control systems with a semi- Markov FDI model, IFAC Proceedings Volumes 39 (13) (2006) 1306e1311. [57] A.N. Ajah, P.M. Herder, J. Grievink, M.P.C. Weijnen, Hierarchical markov reliability/availability models for energy & industrial infrastructure systems conceptual design, In Computer aided chemical engineering 21 (2006) 1753e1758 (Elsevier). [58] P. Kucera, O. Hyncica, J. Cidl, J. Vasatko, Realibility model of TMR system with fault detection, IFAC Proceedings Volumes 39 (21) (2006) 468e472. [59] M. Tanrioven, M.S. Alam, Reliability modeling and analysis of stand-alone PEM fuel cell power plants, Renewable Energy 31 (7) (2006) 915e933. [60] A. Platis, A generalized formulation for the performability indicator, Computers & Mathematics with applications 51 (2) (2006) 239e246. [61] A. Csenki, Joint interval reliability for Markov systems with an application in transmission line reliability, Reliability Engineering & System Safety 92 (6) (2007) 685e696. [62] G. Fei, Z. Hong-yue, Analysis of observer-based Fault tolerant control systems with markovian parameters, in: Fault Detection, Supervision and Safety of Technical Processes 2006, Elsevier Science Ltd, 2007, pp. 552e556. [63] A.D. Dominguez-Garcia, J.G. Kassakian, J.E. Schindall, J.J. Zinchuk, An integrated methodology for the dynamic performance and reliability evaluation of fault-tolerant systems, Reliability Engineering & System Safety 93 (11) (2008) 1628e1649. [64] H. Li, Q. Zhao, Z. Yang, Reliability monitoring of fault tolerant control systems, IFAC Proceedings Volumes 41 (2) (2008) 6920e6925. [65] A.M. El-Nashar, Optimal design of a cogeneration plant for power and desalination taking equipment reliability into consideration, Desalination 229 (1e3) (2008) 21e32. [66] J. Chiquet, N. Limnios, A method to compute the transition function of a piecewise deterministic Markov process with application to reliability, Statistics & Probability Letters 78 (12) (2008) 1397e1403. [67] N.S. Sisworahardjo, M.S. Alam, G. Aydinli, Reliability and availability analysis of low power portable direct methanol fuel cells, Journal of Power Sources 177 (2) (2008) 412e418. [68] H. Guo, X. Yang, Automatic creation of Markov models for reliability assessment of safety instrumented systems, Reliability Engineering & System Safety 93 (6) (2008) 829e837. [69] A. Ehsani, A.M. Ranjbar, A. Jafari, M. Fotuhi-Firuzabad, Reliability evaluation of deregulated electric power systems for planning applications, Reliability Engineering & System Safety 93 (10) (2008) 1473e1484.
Markov and semi-Markov models in system reliability
[70] C.K. Pil, M. Rausand, J. Vatn, Reliability assessment of reliquefaction systems on LNG carriers, Reliability Engineering & System Safety 93 (9) (2008) 1345e1353. [71] P. Do Van, A. Barros, C. Berenguer, From differential to difference importance measures for Markov reliability models, European Journal of Operational Research 204 (3) (2010) 513e521. [72] J. Soszynska, Reliability and risk evaluation of a port oil pipeline transportation system in variable operation conditions, International Journal of Pressure Vessels and Piping 87 (2e3) (2010) 81e87. [73] A. Veeramany, M.D. Pandey, Reliability analysis of nuclear component cooling water system using semi-Markov process model, Nuclear Engineering and Design 241 (5) (2011) 1799e1806. [74] A. Veeramany, M.D. Pandey, Reliability analysis of nuclear piping system using semi-Markov process model, Annals of Nuclear Energy 38 (5) (2011) 1133e1139. [75] Y. Wei, L. Wang, M. Wang, Software reliability analysis of Hierarchical architecture based on Markov model, Procedia Engineering 15 (2011) 2857e2861. [76] J.J. Jiang, L. Zhang, Y.Q. Wang, Y.Y. Peng, K. Zhang, W. He, Markov reliability model research of monitoring process in digital main control room of nuclear power plant, Safety Science 49 (6) (2011) 843e851. [77] A.G. Mathew, S.M. Rizwan, M.C. Majumder, K.P. Ramachandran, Reliability modelling and analysis of a two unit continuous casting plant, Journal of the Franklin Institute 348 (7) (2011) 1488e1505. [78] F. Grabski, Semi-Markov failure rates processes, Applied Mathematics and Computation 217 (24) (2011) 9956e9965. [79] V. Gupta, S. Dharmaraja, Semi-Markov modeling of dependability of VoIP network in the presence of resource degradation and security attacks, Reliability Engineering & System Safety 96 (12) (2011) 1627e1636. [80] Y. Liu, M. Rausand, Reliability assessment of safety instrumented systems subject to different demand modes, Journal of Loss Prevention in the Process Industries 24 (1) (2011) 49e56. [81] M.R. Haghifam, M. Manbachi, Reliability and availability modelling of combined heat and power (CHP) systems, International Journal of Electrical Power & Energy Systems 33 (3) (2011) 385e393. [82] L. Yuan, X.Y. Meng, Reliability analysis of a warm standby repairable system with priority in use, Applied Mathematical Modelling 35 (9) (2011) 4295e4303. [83] S.R. Hosseini, M. Amidpour, A. Behbahaninia, Thermoeconomic analysis with reliability consideration of a combined power and multi stage flash desalination plant, Desalination 278 (1e3) (2011) 424e433. [84] B. Cai, Y. Liu, Z. Liu, X. Tian, Y. Zhang, J. Liu, Performance evaluation of subsea blowout preventer systems with common-cause failures, Journal of Petroleum Science and Engineering 90 (2012) 18e25. [85] V. Rosset, P.F. Souto, P. Portugal, F. Vasques, Modeling the reliability of a group membership protocol for dual-scheduled time division multiple access networks, Computer Standards & Interfaces 34 (3) (2012) 281e291. [86] S.R. Hosseini, M. Amidpour, S.E. Shakib, Cost optimization of a combined power and water desalination plant with exergetic, environment and reliability consideration, Desalination 285 (2012) 123e130. [87] X. Wang, W. Liu, Research on air traffic control automatic system software reliability based on Markov chain, Physics Procedia 24 (2012) 1601e1606. [88] A. Lisnianski, D. Elmakias, D. Laredo, H.B. Haim, A multi-state Markov model for a short-term reliability analysis of a power generating unit, Reliability Engineering & System Safety 98 (1) (2012) 1e6. [89] T.S. Selwyn, R. Kesavan, Reliability measures of constant pitch constant speed wind turbine with markov analysis at high uncertain wind, Procedia Engineering 38 (2012) 932938. [90] P. Cao, Z. Dong, K. Liu, K.Y. Cai, Quantitative effects of software testing on reliability improvement in the presence of imperfect debugging, Information Sciences 218 (2013) 119132. [91] C.W. Zhang, T. Zhang, N. Chen, T. Jin, Reliability modeling and analysis for a novel design of modular converter system of wind turbines, Reliability Engineering & System Safety 111 (2013) 86e94.
125
126
Engineering Reliability and Risk Assessment
[92] J.J. Wang, C. Fu, K. Yang, X.T. Zhang, G.H. Shi, J. Zhai, Reliability and availability analysis of redundant BCHP (building cooling, heating and power) system, Energy 61 (2013) 531e540. [93] D. Montoro-Cazorla, R. Perez-Ocon, A redundant n-system under shocks and repairs following Markovian arrival processes, Reliability Engineering & System Safety 130 (2014) 69e75. [94] Y. Liu, S. Zhao, S. Yang, Y. Li, R. Qiang, Markov process based reliability model for laser diodes in space radiation environment, Microelectronics Reliability 54 (12) (2014) 27352739. [95] A. Karami-Horestani, M.E.H. Golshan, H. Hajian-Hoseinabadi, Reliability modeling of TCR-FC type SVC using Markov process, International Journal of Electrical Power & Energy Systems 55 (2014) 305e311. [96] P.P. Guilani, M. Sharifi, S.T.A. Niaki, A. Zaretalab, Reliability evaluation of non- reparable threestate systems using Markov model and its comparison with the UGF and the recursive methods, Reliability Engineering & System Safety 129 (2014) 29e35. [97] S. Malefaki, N. Limnios, P. Dersin, Reliability of maintained systems under a semi- Markov setting, Reliability Engineering & System Safety 131 (2014) 282e290. [98] C. Mattrand, J.M. Bourinet, The cross-entropy method for reliability assessment of cracked structures subjected to random Markovian loads, Reliability Engineering & System Safety 123 (2014) 171e182. [99] E. Galateanu, S. Avasilcai, Business ecosystem "reliability", Procedia-Social and Behavioral Sciences 124 (2014) 312e321. [100] S. Soleymani, M.E. Mosayebian, S. Mohammadi, A combination method for modeling wind power plants in power systems reliability evaluation, Computers & Electrical Engineering 41 (2015) 28e39. [101] S.M.M. Aval, A. Ahadi, H. Hayati, Adequacy assessment of power systems incorporating building cooling, heating and power plants, Energy and Buildings 105 (2015) 236e246. [102] X. Wu, J. Hillston, Mission reliability of semi-Markov systems under generalized operational time requirements, Reliability Engineering & System Safety 140 (2015) 122e129. [103] F. Petroni, F. Prattico, Reliability measures for indexed semi-Markov chains applied to wind energy production, Reliability Engineering & System Safety 144 (2015) 170e177. [104] H. Fazlollahtabar, M. Saidi-Mehrabad, J. Balakrishnan, Integrated Markov-neural reliability computation method: a case for multiple automated guided vehicle system, Reliability Engineering & System Safety 135 (2015) 34e44. [105] S.A. Timashev, A.V. Bushinskaya, Markov approach to early diagnostics, reliability assessment, residual life and optimal maintenance of pipeline systems, Structural Safety 56 (2015) 6879. [106] T.T. Pham, X. Defago, Q.T. Huynh, Reliability prediction for component-based software systems: dealing with concurrent and propagating errors, Science of Computer Programming 97 (2015) 426e457. [107] Y. Wan, H. Huang, D. Das, M. Pecht, Thermal reliability prediction and analysis for high-density electronic systems based on the Markov process, Microelectronics Reliability 56 (2016) 182e188. [108] C.I. Ossai, B. Boswell, I.J. Davies, Application of markov modelling and Monte Carlo simulation technique in failure probability estimationda consideration of corrosion defects of internally corroded pipelines, Engineering Failure Analysis 68 (2016) 159e171. [109] M.H. Shariatkhah, M.R. Haghifam, M. Parsa-Moghaddam, P. Siano, Evaluating the reliability of multi-energy source buildings: a new analytical method for considering the dynamic behavior of thermal loads, Energy and Buildings 126 (2016) 477e484. [110] S. Honamore, S.K. Rath, A web service reliability prediction using HMM and fuzzy logic models, Procedia Computer Science 93 (2016) 886e892. [111] S. Honamore, K. Dev, R. Honmore, Reliability prediction of web services using HMM and ANN models, Procedia Computer Science 93 (2016) 41e47. [112] T. Adefarati, R.C. Bansal, Reliability and economic assessment of a microgrid power system with the integration of renewable energy resources, Applied Energy 206 (2017) 911e933. [113] X. Zhu, M. Boushaba, D.W. Coit, A. Benyahia, Reliability and importance measures for mconsecutive-k, l-out-of-n system with non-homogeneous Markov-dependent components, Reliability Engineering & System Safety 167 (2017) 1e9. [114] Y. Li, L. Cui, C. Lin, Modeling and analysis for multi-state systems with discrete- time Markov regime-switching, Reliability Engineering & System Safety 166 (2017) 41e49. [115] S. Du, Z. Zeng, L. Cui, R. Kang, Reliability analysis of Markov history-dependent repairable systems with neglected failures, Reliability Engineering & System Safety 159 (2017) 134142.
Markov and semi-Markov models in system reliability
[116] I.E. Zevgolis, P.L. Bourdeau, Reliability and redundancy of the internal stability of reinforced soil walls, Computers and Geotechnics 84 (2017) 152e163. [117] I. Kabashkin, J. Kundler, Reliability of sensor nodes in wireless sensor networks of cyber physical systems, Procedia Computer Science 104 (2017) 380e384. [118] M. Snipas, V. Radziukynas, E. Valakevicius, Numerical solution of reliability models described by stochastic automata networks, Reliability Engineering & System Safety 169 (2018) 570e578. [119] X.Y. Li, H.Z. Huang, Y.F. Li, E. Zio, Reliability assessment of multi-state phased mission system with non-repairable multi-state components, Applied Mathematical Modelling 61 (2018) 181e199. [120] Y. Ye, I.E. Grossmann, J.M. Pinto, S. Ramaswamy, Markov chain MINLP model for reliability optimization of system design and maintenance, In Computer Aided Chemical Engineering 44 (2018) 1483e1488 (Elsevier). [121] S. Rebello, H. Yu, L. Ma, An integrated approach for system functional reliability assessment using Dynamic Bayesian Network and Hidden Markov Model, Reliability Engineering & System Safety 180 (2018) 124e135. [122] S.D. Sonal, S. Ammanagi, O. Kanjilal, C.S. Manohar, Experimental estimation of time variant system reliability of vibrating structures based on subset simulation with Markov chain splitting, Reliability Engineering & System Safety 178 (2018) 55e68. [123] D. Montoro-Cazorla, R. Perez-Ocon, Constructing a Markov process for modelling a reliability system under multiple failures and replacements, Reliability Engineering & System Safety 173 (2018) 34e47. [124] H. Yi, L. Cui, J. Shen, Y. Li, Stochastic properties and reliability measures of discrete-time semiMarkovian systems, Reliability Engineering & System Safety 176 (2018) 162e173. [125] S. Kabir, Y. Papadopoulos, A review of applications of fuzzy sets to safety and reliability engineering, International Journal of Approximate Reasoning 100 (2018) 29e55. [126] H. Gunduz, D. Jayaweera, Reliability assessment of a power system with cyber- physical interactive operation of photovoltaic systems, International Journal of Electrical Power & Energy Systems 101 (2018) 371e384. [127] M.A. Ardakan, M.T. Rezvan, Multi-objective optimization of reliability- redundancy allocation problem with cold-standby strategy using NSGA-II, Reliability Engineering & System Safety 172 (2018) 225e238. [128] S. Alizadeh, S. Sriramula, Impact of common cause failure on reliability performance of redundant safety related systems subject to process demand, Reliability Engineering & System Safety 172 (2018) 129e150. [129] H. Su, J. Zhang, E. Zio, N. Yang, X. Li, Z. Zhang, An integrated systemic method for supply reliability assessment of natural gas pipeline networks, Applied Energy 209 (2018) 489501. [130] M. Steurer, A. Morozov, K. Janschek, K.P. Neitzke, Model-based dependability analysis of faulttolerant inertial navigation system: a practical experience report, IFAC-PapersOnLine 52 (12) (2019) 394e399. [131] H. Gao, L. Cui, H. Yi, Availability analysis of k-out-of-n: F repairable balanced systems with m sectors, Reliability Engineering & System Safety 191 (2019) 106572. [132] S. Zhao, X. Yan, B. Wang, E. Wang, L. Ma, Research on reliability evaluation method of DC circuit breaker based on Markov model, Electric Power Systems Research 173 (2019) 1e5. [133] A.K. Agrawal, V.M.S.R. Murthy, S. Chattopadhyaya, Investigations into reliability, maintainability and availability of tunnel boring machine operating in mixed ground condition using Markov chains, Engineering Failure Analysis 105 (2019) 477e489. [134] H. Wang, H. Fei, Q. Yu, W. Zhao, J. Yan, T. Hong, A motifs-based maximum entropy markov model for realtime reliability prediction in system of systems, Journal of Systems and Software 151 (2019) 180e193. [135] J. Yang, B. Zou, M. Yang, Bidirectional implementation of Markov/CCMT for dynamic reliability analysis with application to digital I&C systems, Reliability Engineering & System Safety 185 (2019) 278e290. [136] M.E. Honarmand, M.S. Ghazizadeh, V. Hosseinnezhad, P. Siano, Reliability modeling of processoriented smart monitoring in the distribution systems, International Journal of Electrical Power & Energy Systems 109 (2019) 20e28.
127
128
Engineering Reliability and Risk Assessment
[137] H. Che, S. Zeng, J. Guo, Reliability assessment of man-machine systems subject to mutually dependent machine degradation and human errors, Reliability Engineering & System Safety 190 (2019) 106504. [138] J. Wang, Q. Zhang, S. Yoon, Y. Yu, Reliability and availability analysis of a hybrid cooling system with water-side economizer in data center, Building and Environment 148 (2019) 405416. [139] X. Zhao, L. Sun, M. Wang, X. Wang, A shock model for multi-component system considering the cumulative effect of severely damaged components, Computers & Industrial Engineering 137 (2019) 106027. [140] I. Kabashkin, Reliability of cluster-based nodes in wireless sensor networks of cyber physical systems, Procedia Computer Science 151 (2019) 313e320. [141] W. Wang, M. Lin, Y. Fu, X. Luo, H. Chen, Multi-objective optimization of reliability-redundancy allocation problem for multi-type production systems considering redundancy strategies, Reliability Engineering & System Safety 193 (2020) 106681. [142] Y. Wang, X. Hou, P. Zhang, G. Qin, Reliability assessment of multi-state reconfiguration pipeline system with failure interaction based on Cloud inference, Process Safety and Environmental Protection 137 (2020) 116e127. [143] X. Wang, X. Zhao, C. Wu, C. Lin, Reliability assessment for balanced systems with restricted rebalanced mechanisms, Computers & Industrial Engineering 149 (2020) 106801. [144] K.S. Son, S.H. Seong, H.G. Kang, G.S. Jang, Development of state-based integrated dependability model of RPS in NPPs considering CCF and periodic testing effects at the early design phase, Reliability Engineering & System Safety 193 (2020) 106645. [145] X. Zhao, S. Wang, X. Wang, Y. Fan, Multi-state balanced systems in a shock environment, Reliability Engineering & System Safety 193 (2020) 106592. [146] C. Cheng, J. Yang, L. Li, Reliability assessment of multi-state phased mission systems with common bus performance sharing considering transmission loss and performance storage, Reliability Engineering & System Safety 199 (2020) 106917. [147] T.T. Pham, T.C. Kuo, D.M. Bui, Reliability evaluation of an aggregate battery energy storage system in microgrids under dynamic operation, International Journal of Electrical Power & Energy Systems 118 (2020) 105786. [148] L.L. Liu, Y.M. Cheng, Q.J. Pan, D. Dias, Incorporating stratigraphic boundary uncertainty into reliability analysis of slopes in spatially variable soils using one-dimensional conditional Markov chain model, Computers and Geotechnics 118 (2020) 103321. [149] S. Chakraborty, N.K. Goyal, S. Mahapatra, S. Soh, A Monte-Carlo Markov chain approach for coverage-area reliability of mobile wireless sensor networks with multistate nodes, Reliability Engineering & System Safety 193 (2020) 106662. [150] B. Wu, L. Cui, Reliability of multi-state systems under Markov renewal shock models with multiple failure levels, Computers & Industrial Engineering 145 (2020) 106509. [151] Q. Jiang, D.C. Gao, L. Zhong, S. Guo, A. Xiao, Quantitative sensitivity and reliability analysis of sensor networks for well kick detection based on dynamic Bayesian networks and Markov chain, Journal of Loss Prevention in the Process Industries 66 (2020) 104180. [152] B. Wu, L. Cui, Reliability evaluation of Markov renewal shock models with multiple failure mechanisms, Reliability Engineering & System Safety 202 (2020) 107051. [153] J. Chiachio, M.L. Jalon, M. Chiachio, A. Kolios, A Markov chains prognostics framework for complex degradation processes, Reliability Engineering & System Safety 195 (2020) 106621. [154] S.S. Raghuwanshi, R. Arya, Reliability evaluation of stand-alone hybrid photovoltaic energy system for rural healthcare centre, Sustainable Energy Technologies and Assessments 37 (2020) 100624. [155] P.P. Guilani, M.N. Juybari, M.A. Ardakan, H. Kim, Sequence optimization in reliability problems with a mixed strategy and heterogeneous backup scheme, Reliability Engineering & System Safety 193 (2020) 106660. [156] M.P. Anand, B. Bagen, A. Rajapakse, Probabilistic reliability evaluation of distribution systems considering the spatial and temporal distribution of electric vehicles, International Journal of Electrical Power & Energy Systems 117 (2020) 105609.
Markov and semi-Markov models in system reliability
[157] I. Postnikov, V. Stennikov, Modifications of probabilistic models of states evolution for reliability analysis of district heating systems, Energy Reports 6 (2020) 293e298. [158] Y. Li, F.P. Coolen, C. Zhu, J. Tan, Reliability assessment of the hydraulic system of wind turbines based on load-sharing using survival signature, Renewable Energy 153 (2020) 766776. [159] X. Zhou, P.H.A.J.M. van Gelder, Y. Liang, H. Zhang, An Integrated Methodology for the Supply Reliability Analysis of Multi-Product Pipeline Systems under Pumps Failure, vol. 204, Reliability Engineering & System Safety, 2020, p. 107185. [160] A. Roy, N. Sarma, A synchronous duty-cycled reservation based MAC protocol for underwater wireless sensor networks, Digital Communications and Networks 7 (3) (2021) 385e398. [161] C. Wu, X. Zhao, X. Wang, S. Wang, Reliability analysis of performance-based balanced systems with common bus performance sharing, Reliability Engineering & System Safety 215 (2021) 107865. [162] C. Wang, Estimation of time-dependent reliability of aging structures under correlated load and autocorrelation in resistance deterioration, Applied Mathematical Modelling 94 (2021) 272284. [163] G. Wang, L. Hu, T. Zhang, Y. Wang, Reliability modeling for a repairable (k1, k2)- out-of-n: G system with phase-type vacation time, Applied Mathematical Modelling 91 (2021) 311321. [164] S. Wang, X. Zhao, Z. Tian, M.J. Zuo, Optimum component reassignment for balanced systems with multi-state components operating in a shock environment, Reliability Engineering & System Safety 210 (2021) 107514. [165] Y. Wang, B. Xie, E. Shiyuan, Adaptive Relevance Vector Machine Combined with Markov-ChainBased Importance Sampling for Reliability Analysis, Reliability Engineering & System Safety, 2021, p. 108287. [166] C. Fang, L. Cui, Reliability evaluation for balanced systems with auto-balancing mechanisms, Reliability Engineering & System Safety 213 (2021) 107780. [167] A. Peiravi, M. Nourelfath, M.K. Zanjani, Redundancy Strategies Assessment and Optimization of KOut-Of-N Systems Based on Markov Chains and Genetic Algorithms, Reliability Engineering & System Safety, 2021, p. 108277. [168] A. Tarinejad, H. Izadkhah, M.M. Ardakani, K. Mirzaie, Metrics for assessing reliability of self-healing software systems, Computers & Electrical Engineering 90 (2021) 106952. [169] H. Yi, L. Cui, N. Balakrishnan, New reliability indices for first-and second-order discrete-time aggregated semi-Markov systems with an application to TT&C system, Reliability Engineering & System Safety 215 (2021) 107882. [170] Y. Yi, H. Siyu, C. Haoran, W. Meilin, G. Linhan, C. Xiao, W. Liu, Belief Reliability Analysis of Traffic Network: An Uncertain Percolation Semi-markov Model, Journal of the Franklin Institute, 2021. [171] Y. Jun, J. Chenyu, X. Zhihui, L. Mengkun, Y. Ming, Markov/CCMT: towards an integrated platform for dynamic reliability and risk analysis, Process Safety and Environmental Protection 155 (2021) 498e517. [172] H. Farzin, M. Monadi, M. Fotuhi-Firuzabad, M. Savaghebi, A reliability model for overcurrent relays considering harmonic-related malfunctions, International Journal of Electrical Power & Energy Systems 131 (2021) 107093. [173] S. Jiang, Y.F. Li, Dynamic reliability assessment of multi-cracked structure under fatigue loading via multi-state physics model, Reliability Engineering & System Safety 213 (2021) 107664. [174] B. Wu, L. Cui, Reliability analysis of periodically inspected systems with competing risks under Markovian environments, Computers & Industrial Engineering 158 (2021) 107415. [175] M. Tahmasebzadehbaie, H. Sayyaadi, Regional management of flare gas recovery based on waterenergy-environment nexus considering the reliability of the downstream installations, Energy Conversion and Management 239 (2021) 114189. [176] H.P. Jagtap, A.K. Bewoor, R. Kumar, M.H. Ahmadi, M.E.H. Assad, M. Sharifpur, RAM analysis and availability optimization of thermal power plant water circulation system using PSO, Energy Reports 7 (2021) 1133e1153. [177] V.C. Mathebula, A.K. Saha, Multi-state IEC-61850 substation communication network based on markov partitions and symbolic dynamics, Sustainable Energy, Grids and Networks 26 (2021) 100466. [178] X.Y. Li, H.Z. Huang, Y.F. Li, X. Xiong, A Markov regenerative process model for phased mission systems under internal degradation and external shocks, Reliability Engineering & System Safety 215 (2021) 107796.
129
130
Engineering Reliability and Risk Assessment
[179] O. Haghgoo, Y. Damchi, Reliability modelling of capacitor voltage transformer using proposed Markov model, Electric Power Systems Research 202 (2022) 107573. [180] J. Yin, L. Cui, Y. Sun, N. Balakrishnan, Reliability modelling for linear and circular k-out-of-n: F systems with shared components, Reliability Engineering & System Safety 219 (2022) 108172. [181] B. Wu, D. Ding, A gamma process based model for systems subject to multiple dependent competing failure processes under Markovian environments, Reliability Engineering & System Safety 217 (2022) 108112. [182] P.J. Boland, F. Proschan, 10 the impact of reliability theory on some branches of mathematics and statistics, Handbook of Statistics 7 (1988) 157e174. [183] H. Choi, V.G. Kulkarni, K.S. Trivedi, Markov regenerative stochastic Petri nets, Performance Evaluation 20 (1e3) (1994) 337e357. [184] C.P. Andriotis, K.G. Papakonstantinou, Deep reinforcement learning driven inspection and maintenance planning under incomplete information and constraints, Reliability Engineering & System Safety 212 (2021) 107551. [185] P. Bucci, J. Kirschenbaum, L.A. Mangan, T. Aldemir, C. Smith, T. Wood, Construction of eventtree/fault-tree models from a Markov approach to dynamic system reliability. Reliability Engineering & System Safety 93 (11) (2008) 1616e1627. [186] J. Cheng, Y. Tang, M. Yu, The reliability of solar energy generating system with inverters in series under common cause failure, Applied Mathematical Modelling 68 (2019) 509e522. [187] Pergamon, M. Gilvanejad, H.A. Abyaneh, K. Mazlumi, Fuse cutout allocation in radial distribution system considering the effect of hidden failures, International Journal of Electrical Power & Energy Systems 42 (1) (2012) 575e582. [188] H. Kim, P. Kim, Reliability models for a nonrepairable system with heterogeneous components having a phase-type time-to-failure distribution, Reliability Engineering & System Safety 159 (2017) 37e46. [189] M. Modarres, M.P. Kaminskiy, V. Krivtsov, Reliability Engineering and Risk Analysis: A Practical Guide, CRC press, 2009. [190] M. Modarres, M.P. Kaminskiy, V. Krivtsov. Reliability Engineering and Risk Analysis: A Practical Guide, Marcel Dekker, 1999. [191] M. Gilvanejad, H.A. Abyaneh, K. Mazlumi, Fuse cutout allocation in radial distribution system considering the effect of hidden failures, International Journal of Electrical Power & Energy Systems 42 (1) (2012) 575e582. [192] J. Cheng, Y. Tang, M. Yu, The reliability of solar energy generating system with inverters in series under common cause failure, Applied Mathematical Modeling 68 (2019) 509e522.
CHAPTER 7
Emerging trends and future directions in software reliability growth modeling Vishal Pradhan, Ajay Kumar and Joydip Dhar ABV-Indian Institute of Information Technology and Management, Gwalior, Madhya Pradesh, India
1. Introduction The current era becomes the technological era due to huge dependency on a wide range of complex equipment and machinery. In this technological era, all the equipment/systems operations are performed and controlled through the software. The applicability of these software systems is in our everyday life. The essential elements in our lives are safety, security, transportation, communication, etc., that depend on software systems. The use of software systems is not limited to our daily life uses. It is extensively used in diverse areas, including industrial processes, real-time sensor networks, air traffic control, design, defense, nuclear plant, etc. [1,2]. This immense applicability in various fields is possible because of software functionalities. For complicated and critical applications, more functionalities are needed. As the functionalities increase, the software’s complexity and size also increase accordingly. Due to the massive dependence on software systems and software complexity, software quality becomes the essential metric [3]. There are many instances of critical losses because software quality was not up to the mark. The Mexicana Airlines Boeing 727, Therac-25 radiation therapy, Texas, overdosed cancer patients, Air France’s new A320, Space Shuttle Columbia, Ariane-5 rocket, Patriot missile, London ambulance service, etc., are examples of software failure losses [1]. In these failures, either life or money or both are associated. Everyone wants good quality when life and money are related to any product. Various quality metrics are used to assess the software quality (see Fig. 7.1). As our everyday lives are continually relying on software systems, therefore remarkable guarantees on the reliability, security, and performance have become significant demand of the software systems. Among them, reliability is a broadly accepted measure for the quality of software [4]. Both the stockholder’s customer and developer want to get and deliver high-quality software, respectively. Software is developed in multiple phases: analysis, design, coding, testing, and operation [5]. Reliability measure basically depends on the total number of faults and the remaining number of faults present in the system. Half of the total faults are detected during the coding and testing phase. In the testing phase, verification and validation are performed [1]. Engineering Reliability and Risk Assessment ISBN 978-0-323-91943-2, https://doi.org/10.1016/B978-0-323-91943-2.00011-3
© 2023 Elsevier Inc. All rights reserved.
131
132
Engineering Reliability and Risk Assessment
Now, the most important question is how to evaluate the software’s reliability during the testing phase. Numerous software reliability models have been developed for accessing software reliability in the past four decades [6e12]. These developed models are categories in deterministic and probabilistic models. Different kinds of probabilistic software reliability models are present in the literature (see Fig. 7.2) [5]. It depicts the failure experiences and fault removal as probabilistic nature. Multistage sampling is used in error seeding models to estimate the number of faults. The failure rate per fault at failure intervals is studied using failure rate models. The JelinskieMoranda (JM) model is one of the earliest failure rate group models [1]. The future behavior of the process in the Markov model is solely determined by its current state and is independent of its prior history [4]. In the Markov model, the process’s future behavior depends only on the current state and is independent of its past history. The nonhomogeneous Poisson process (NHPP) presents an analytical framework for analyzing the phenomenon of software failure. One of the current methods to assess software reliability and assure software quality is to apply NHPP-based software reliability growth models (SRGMs). Measuring software reliability can be used for planning and controlling all testing resources during software development. Moreover, software reliability relates directly to the operation, rather than design, of the software. Software reliability is different from hardware
Figure 7.1 Software quality metrics.
Figure 7.2 Types of software reliability models.
Emerging trends and future directions in software reliability growth modeling
reliability because no wear-out phenomena exist in software reliability [1]. This chapter focuses on the NHPP-based SRGMs development in the past four decades. The following section describes the NHPP and then some essential model development and their evolution. Model development consists of various phases, and, in each step, the issue that arises is discussed in the later section. The later section presented a few recent models with various associated factors during the development process and described them in detail.
2. Software reliability growth models Several kinds of reliability models are present in the literature based on different assumptions for estimating software reliability. In this chapter, NHPP-based SRGMs are considered for discussion. So, here we discuss the NHPP process. 2.1 Nonhomogeneous poisson process For estimating the software reliability, we need to know the total number of faults and the residual number of faults in the software. NHPP is the counting process that provides a mathematical framework for describing the software failure phenomena during testing. In the homogeneous Poisson process, the failure rate is constant, while in NHPP, the failure rate is a nonconstant function. For software reliability modeling, the failure rate is time-dependent function l(t). Failure experienced by time as an NHPP is represented by {N(t), t 0} [1,4]. The important part of NHPP is determining the timedependent mean value function (MVF). MVF represents the expected number of failures experienced by a particular point of time. The MVF is changed according to the assumption, i.e., we get a different form of MVFs for different assumptions. The NHPP model is formulated according to the following assumptions: i. Failure process has an independent increment. ii. P{N(tþdt) - N(t) ¼ 1} ¼ l(t) dt þ o(Dt). iii. P{N(t) 1} ¼ o(Dt). iv. N(0) ¼ 0. Based on above assumptions, during time interval (0, t), the occurrence probability of exactly n failures for NHPP is: PfNðtÞ ¼ ng ¼
½uðtÞn uðtÞ e n ¼ 0; 1; 2; . n!
Rt Where uðtÞ ¼ E½NðtÞ ¼ 0 lðtÞ ds and we see that the MVF is nondecreasing function. In 1979, Goel and Okumoto extended the JM model and first introduced the NHPP based SRGM of this class.
133
134
Engineering Reliability and Risk Assessment
2.1.1 GoeleOkumoto model (GO model) This is the very well-known model in basic model for software reliability estimation [13]. With the very simple assumptions, the author proposes this NHPP-based model. The assumptions for this model are as follows: i. From the detection point of view, all the faults are mutually independent. ii. Detected faults at any point of time are proportional to remaining number of faults present in a program. iii. Prior to further test occasions, the isolated faults are fixed. iv. Detected faults are fixed immediately, and no new faults introduced. Based on the above assumption, the mathematical model formulation is defined as: duðtÞ ¼ b½a uðtÞ dt where a is total number of faults and b is fault detection rate (FDR). This model is developed in perfect debugging environment. In the next section, we discuss the recent SRGMs based on various associated factors. 2.2 SRGMs development with various associated factors 2.2.1 Perfect and imperfect debugging environment Based on the assumptions, all the models are classified either in perfect or imperfect debugging environments (see Table 7.1). During the model formulation, author’s assumption regarding the fault debugging process decides the perfect debugging environment. In the perfect debugging environment, models are based on the assumption that all faults are removed perfectly and no new faults are introduced during the debugging process [13]. The imperfect debugging model’s assumption is three forms, i.e., fault removal efficiency, error generation, and fault removal efficiency and error generation [17,18]. All faults are not removed perfectly; a fraction of faults remain in the system. In error generation phenomena, new faults may be introduced during the debugging process, i.e., the total number of faults may be changed. In fault removal efficiency and error generation, consider that all faults may not be fixed perfectly, and few faults can be introduced during the correction [11,19]. 2.2.2 Fault detection rate In literature, various types of FDR are considered to make the model more realistic. FDR is constant in the GO model, but it is not more practical in nature. Therefore, many timedependent FDRs are proposed to estimate better reliability (see Table 7.2). 2.2.3 SRGMs with environmental factors The software development environment is the most critical aspect of determining software reliability. Several environmental factors are involved during the whole software
Table 7.1 Perfect and imperfect debugging fault content function. Fault content function
Mean value function
Comment
Perfect debugging
aðtÞ ¼ a
uðtÞ ¼ a 1 e ut ab e ebt uðtÞ ¼ bþu
Fixed and constant fault content function [13]. Consider the exponential fault content function [14]. Consider constant fault introduction rate [15].
Imperfect debugging
aðtÞ ¼ aeu t
aðtÞ ¼ að1 þu tÞ
uðtÞ ¼
aðtÞ ¼ a þ uuðtÞ
a gebt þ1
1 aðtÞ ¼ a þ a ð1 eu t Þ
bt
ug
1
ebt
þut
uðtÞ ¼ gebt1 þ1 ut bt a bt bu e e ða þaÞ 1 e uðtÞ ¼
a 1u
1 e
ð1uÞ
Rt 0
bðsÞds
Consider exponential fault introduction rate [16]. Consider constant fault introduction rate [14].
Emerging trends and future directions in software reliability growth modeling
Debugging environment
135
136
Detection model
Fault detection rate
Constant detection rate
bðtÞ ¼ b b bðtÞ ¼ 1þge bt
Inflection S-shaped Delayed S-shaped Yamada exponential Yamada Rayleigh
bðtÞ ¼
bt 1þbt 2
bðtÞ ¼ b j u eu t 2 bðtÞ ¼ b j u eu t =2
Mean value function
uðtÞ ¼ a 1 ebt uðtÞ ¼
að1ebt Þ 1þgebt
uðtÞ ¼ a 1 ð1 þbtÞebt ut uðtÞ ¼ a 1 ejbð1e Þ 1 0 2 t C jb 1 eu 2 A B uðtÞ ¼ a@1 e
Comment
Constant FDR [13]. Consider the inflection factor g in FDR [17]. Consider delayed S-shaped FDR [17]. Consider exponential FDR [20]. Consider Rayleigh FDR [20]
Engineering Reliability and Risk Assessment
Table 7.2 Several fault detection rate and mean value function.
Emerging trends and future directions in software reliability growth modeling
development. These factors have either positive or negative impacts and affect the reliability assessment. Therefore, many SRGMs with environmental factors have been developed in the last two decades to accurately estimate software reliability (see Table 7.3). Zhang et al. [27] studied a 32 environmental factor-based survey and analyzed the most significant factor that significantly impacts software development and reliability estimation. Recent SRGMs with various kinds of environmental factors perform better predictability. Testing coverage function, testing effort function, fault reduction factor, and uncertainty of the operating environment, etc., are factors used to model SRGMs.
3. Method of model formulation Various methods exist to develop the SRGMs. Here, we specified a simple process to formulate SRGM (see Fig. 7.3). In this method, we divided the whole formulation method into three steps. The first step is model development, the second step is data collection, parameter estimation, and the third step is optimal release policy and analysis. In the third step, optimal release policy is optional because if we want to know the optimal release time, we go for the third step; otherwise, after the second step, we directly go for the analysis part. I. Model development: In this step, we first study the extensive existing literature and then go to the new model formulation or the extension of existing SRGMs. Further, we decide the model debugging environment, i.e., perfect debugging or imperfect debugging environment. The next part incorporates the environmental factors to make the model more realistic. Other essential factors are change-point (CP) and uncertainty of the operating environment. We can consider these critical factors during the model development. Nowadays, most of the software is released in version format. Therefore, we can extend the developed model for multirelease. One critical factor lacking in the literature is that most models do not consider the time lag between detection and correction process. We can formulate a better model for reliability prediction by considering these factors. After the model development, the most crucial task is parameter estimation. II. Data collection and parameter estimation: We need failure data to estimate the model parameters in the second step. The model parameters are estimated through various techniques. The previous four-decade literature shows that least-square and maximum likelihood estimations are applied for parameter estimation in the first two decades [2]. In the last two decades, we see that this trend has shifted to meta-heuristic techniques. After getting the parameter estimation, put the estimated value in the model and find the predicted value of u(t). The next task is to know whether model fitness is good or not? So, we calculate the goodness-of-fit (GOF) value for the proposed and competing models. This GOF value is used to compare the descriptive and predictive power of the models. If the proposed model gives a
137
138
Environmental factors
Testing coverage function
Factor functions
cðtÞ ¼ 1
Mean value function
b eðatÞ
cðtÞ ¼ 1 e1at
b
Að1ebt Þ 1þgebt ¼ abebt
cðtÞ ¼ Testing effort function
wðtÞ
wðtÞ ¼ bt k wðtÞ ¼ Fault reduction factor
bN ð1þzÞebt ð1þzebt Þ2
BðtÞ ¼ 1 ð1 U 0 Þekt
uðtÞ ¼
1þgebt ð1AÞðgþAÞebt
a pu
ðupÞb #
1 h i bt uðtÞ ¼ a 1 ebað1e Þ h i bt kþ1 uðtÞ ¼ a 1 e 1þk h bN ð1ebt Þ i uðtÞ ¼ a 1 e 1þzebt 2 uðtÞ ¼ a41 e
BðtÞ ¼ U 0 ekt
Comment
r q uðtÞ ¼ a 1 qþðatÞb r q uðtÞ ¼ a 1 qþatb 1 "
2 uðtÞ ¼ a41 e
b
b
ðU 0 1Þð1ekt Þ k
U 0 ð1ekt Þ k
þt
3
kt
5 3 5
Nonnegative and concave function [21]. Loglog distribution [22]. Inflection S-shaped function [19]. Exponential TEF [23]. Power function [24]. Inflected S-shaped TEF [25]. Increasing function [26]. Decreasing function [26].
Engineering Reliability and Risk Assessment
Table 7.3 Several environmental factors and mean value function.
Emerging trends and future directions in software reliability growth modeling
Figure 7.3 Working method of SRGM formulation.
better GOF value, then the proposed model is accepted for reliability analysis. Various model selection criteria will rank the model. III. Optimal release policy and analysis: This step can construct the new optimal release model or use the existing cost and reliability model to determine the optimal release time [14]. These models are based on the requirement, either cost-based or reliability-based or based on their trade-off. Several techniques such as multicriteria decision-making, fuzzy approach, etc., are used for release time determination. Another segment is parameter sensitivity analysis for analyzing the most sensitive parameters in the model. Finally, all the process is completed with an analysis report and report-based conclusion.
4. Emerging trends Some recent emerging trends in software reliability growth models are discussed below. 1. In recent literature, many researchers have focused on the uncertainty of the operating environment (UOE) [28e30]. They consider several environmental factors simultaneously and also consider UOE. The model developed using the UOE factor considers the reasonable assumption for the testing environment. The testing environment is a controlled environment, but the software working environment is random after release. Therefore, incorporated the UOE during the model formulation. Earlier it was used as a calibrating factor, previous release estimation value of which helps the current release.
139
140
Engineering Reliability and Risk Assessment
2. Another trending factor is entropy-based models. Several authors proposed entropybased SRGMs in recent literature. They incorporated the entropy approach combined with the reliability evaluation method for existing uncertainty during the software reliability evaluation. Uncertainty exists during the source code changes, and this associated uncertainty is also quantified through entropy-based measures. Recent literature also developed a two-dimensional entropy-based model to measure the complexity or uncertainty present in system source code [31,32]. 3. Change-point (CP) factor is very useful for better prediction because, during the testing process, changes occur at specific points. So, model working behavior before and after the CP can be different, and model parameter values also change accordingly. It helps a lot for large datasets. Single and multiple CP SRGMs are developed and perform better in terms of prediction accuracy [33,34]. 4. Most of the authors consider the homogeneous combination before and after CPs. This is not a practical case. Therefore, several hybrid models are developed to evaluate the accurate software reliability [35]. Quite a few models are developed and consider the heterogeneous model for fault prediction with a single CP concept [36]. Before and after the CP, different failure distributions are incorporated in these heterogeneous models. Heterogeneous models with single CP performed better fault prediction than the homogeneous CP models. 5. Debugging time delay is an important and practical aspect that will be considered in very few articles. Recent work considers this delay and models the SRGMs through various time-dependent functions. During the testing, the fault identification team identified and reported the faults. The debuggers team fixes the reported faults. This process takes time and is known as debugging time lag [37]. Fault dependency and fault severity are also important factors in understanding the behavior of failure. 6. Multirelease model development has become trending because most of the software firms are developing the software in version format in the current scenario [31]. In this model, the next version considers the current number of faults and the faults that remain to correct from the previous version, and feedback errors reported by the users. Reliability growth models for open-source software are also gaining more attention in recent literature [38,39]. 7. The most important task after the model development is parameter estimation. Earlier model parameters are estimated through MLE or LSE. In recent literature, we observe that the parameter estimation technique is shifted from MLE/LSE to nature-inspired algorithm and expectation-maximization, neural network, etc. Several articles are present in recent literature that estimate the model parameter through the genetic algorithm, particle swarm optimization, Gray wolf optimization, gravitational search algorithm, etc. Expectation-maximization is also applied in several recent articles [40,41].
Emerging trends and future directions in software reliability growth modeling
8. After the model development and parameter estimation, the next step is the GOF criteria. Lots of GOF criteria are available to evaluate the model performance. Mean square error, bias, Akaike information criteria, predictive power, predictive risk ratio, r-square, etc., are used frequently to evaluate the model performance. The predictive sum of square error is gaining more attention in recent literature because it gives more information about the model predictive power [36]. 9. Model preformation are evaluated through several comparison criteria and some gives better preformation for some criteria and some other models gives better performance on some other criteria. Therefore, several multicriteria decision-making methods are developed to rank the models. Distance-based approach (DBA), entropy-DBA, normalized distance criteria (NCD), etc., are developed to rank the models [42]. NCD method is more frequently used in the existing literature [6,43]. 10. The optimal release policy helps the manager to decide the release time. Reliability, cost, and schedule planning are the most important attributes for shifting the software from the testing phase to the operation phase. Most of the researchers used multiattribute utility theory for optimal release time determination [7]. Fuzzy environment approaches are also considered in several recent pieces of literature for release time analysis.
5. Future direction This chapter paves the way for more interpretive and exploratory research in the future, both in terms of model development and evaluation. Imperfect debugging environment models are the best choice for better predictability. Models with all faults of the same severity level are not really practical; thus, a model with varying severity levels will be more effective for estimating reliability. One of the most critical environmental factor percentages of reused module has a significant impact on model development. Therefore, this factor can be considered in future research. Entropy-based model development is also a good option in future work. Fuzzy techniques are also helpful for handling uncertainty. In release time determination, randomized cost budget may be incorporated to make a robust release policy. Multiattribute utility theory is also beneficial for the costreliability-based optimal release model. The quality of reliability models could be improved by assuming a time lag between detection and correction. Selecting a suitable model for a specific application is critical in software reliability engineering. The best model selection methods are CODAS, CODAS-E, TOPSIS, and VIKOR [44].
6. Conclusions SRGMs play a critical and crucial role in developing resilient software and ensuring that it fulfills defined quality criteria. Many measures are used to assess the software system’s
141
142
Engineering Reliability and Risk Assessment
quality. The presence of faults decreases the quality of the software and raises the system’s development costs. As a result, ensuring lower software system defects facilitates improved software quality. This chapter contains a summary of numerous works that have been published. The chapter begins with the fundamentals of software reliability modeling and progresses to more recent developments, such as predicting the number of errors. As a result, it can be a helpful study resource for newcomers to this field, as it covers a wide range of topics.
Acknowledgments The authors acknowledge MOE and ABV-Indian Institute of Information Technology and Management Gwalior, India, for providing necessary facilities for this work.
References [1] H. Pham, System Software Reliability, Springer Science & Business Media, 2007. [2] M.R. Lyu, Handbook of Software Reliability Engineering, IEEE computer society press, CA, 1996 vol. 222. [3] M. Xie, Software reliability modelsda selected annotated bibliography, Software Testing, Verification and Reliability 3 (1) (1993) 3e28. [4] H. Pham, in: H. Pham (Ed.), Handbook of Reliability Engineering, vol. 1, Springer, London, 2003. [5] H. Pham (Ed.), Springer Handbook of Engineering Statistics, Springer Science & Business Media, 2006. [6] S. Chatterjee, A. Shukla, A unified approach of testing coverage-based software reliability growth modelling with fault detection probability, imperfect debugging, and change point, Journal of Software: Evolution and Process 31 (3) (2019) e2150. [7] B. Pachauri, A. Kumar, J. Dhar, Software reliability growth modeling with dynamic faults and release time optimization using GA and MAUT, Applied Mathematics and Computation 242 (2014) 500e509. [8] H. Pham, X. Zhang, NHPP software reliability and cost models with testing coverage, European Journal of Operational Research 145 (2) (2003) 443e454. [9] X. Zhang, X. Teng, H. Pham, Considering fault removal efficiency in software reliability assessment, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans 33 (1) (2003) 114e120. [10] V. Pradhan, A. Kumar, J. Dhar, Modelling software reliability growth through generalized inflection S-shaped fault reduction factor and optimal release time, Proceedings of the Institution of Mechanical Engineers - Part O: Journal of Risk and Reliability 236 (1) (2022) 18e36. [11] P.K. Kapur, H. Pham, S. Anand, K. Yadav, A unified approach for developing software reliability growth models in the presence of imperfect debugging and error generation, IEEE Transactions on Reliability 60 (1) (2011) 331e340. [12] V. Pradhan, A. Kumar, J. Dhar, Enhanced growth model of software reliability with generalized inflection S-shaped testing-effort function, Journal of Interdisciplinary Mathematics 25 (1) (2022) 137e153. [13] A.L. Goel, K. Okumoto, Time-dependent error-detection rate model for software reliability and other performance measures, IEEE Transactions on Reliability 28 (3) (1979) 206e211. [14] S. Yamada, K. Tokuno, S. Osaki, Imperfect debugging models with fault introduction rate for software reliability assessment, International Journal of Systems Science 23 (12) (1992) 2241e2252. [15] H. Pham, L. Nordmann, Z. Zhang, A general imperfect-software-debugging model with S-shaped fault-detection rate, IEEE Transactions on Reliability 48 (2) (1999) 169e175.
Emerging trends and future directions in software reliability growth modeling
[16] H. Pham, X. Zhang, An NHPP software reliability model and its comparison, International Journal of Reliability, Quality and Safety Engineering 4 (03) (1997) 269e282. [17] M. Ohba, Software reliability analysis models, IBM Journal of Research and Development 28 (4) (1984) 428e443. [18] M. Ohba, X.M. Chou, Does imperfect debugging affect software reliability growth?, in: Proceedings of the 11th International Conference on Software Engineering, 1989, pp. 237e244. [19] Q. Li, H. Pham, A testing-coverage software reliability model considering fault removal efficiency and error generation, PLoS One 12 (7) (2017) e0181524. [20] S. Yamada, M. Ohba, S. Osaki, S-shaped software reliability growth models and their applications, IEEE Transactions on Reliability 33 (4) (1984) 289e292. [21] I.H. Chang, H. Pham, S.W. Lee, K.Y. Song, A testing-coverage software reliability model with the uncertainty of operating environments, International Journal of Systems Science: Operations & Logistics 1 (4) (2014) 220e227. [22] H. Pham, Loglog fault-detection rate and testing coverage software reliability models subject to random environments, Vietnam Journal of Computer Science 1 (1) (2014) 39e45. [23] S. Yamada, H. Ohtera, H. Narihisa, Software reliability growth models with testing-effort, IEEE Transactions on Reliability 35 (1) (1986) 19e23. [24] F. Li, Z.L. Yi, A new software reliability growth model: multigeneration faults and a power-law testing-effort function, Mathematical Problems in Engineering (2016) 13, https://doi.org/10.1155/ 2016/9276093, 9276093 2016. [25] C. Jin, S.W. Jin, Parameter optimization of software reliability growth model with S-shaped testingeffort function using improved swarm intelligent optimization, Applied Soft Computing 40 (2016) 283e291. [26] C.J. Hsu, C.Y. Huang, J.R. Chang, Enhancing software reliability modeling and prediction through the introduction of time-variable fault reduction factor, Applied Mathematical Modelling 35 (1) (2011) 506e521. [27] M. Zhu, H. Pham, Environmental factors analysis and comparison affecting software reliability in development of multi-release software, Journal of Systems and Software 132 (2017) 72e84. [28] V. Pradhan, J. Dhar, A. Kumar, A. Bhargava, An S-shaped fault detection and correction SRGM subject to gamma-distributed random field environment and release time optimization, in: Decision Analytics Applications in Industry, Springer, Singapore, 2020, pp. 285e300. [29] K.Y. Song, I.H. Chang, H. Pham, A software reliability model with a Weibull fault detection rate function subject to operating environments, Applied Sciences 7 (10) (2017) 983. [30] Q. Li, H. Pham, A generalized software reliability growth model with consideration of the uncertainty of operating environments, IEEE Access 7 (2019) 84253e84267. [31] V.B. Singh, M. Sharma, H. Pham, Entropy based software reliability analysis of multi-version opensource software, IEEE Transactions on Software Engineering 44 (12) (2017) 1207e1223. [32] P.K. Kapur, S. Panwar, V. Kumar, O. Singh, Entropy-based two-dimensional software reliability growth modeling for open-source software incorporating change-point, International Journal of Reliability, Quality and Safety Engineering 27 (05) (2020) 2040009. [33] S. Khurshid, A.K. Shrivastava, J. Iqbal, Effort based software reliability model with fault reduction factor, change point and imperfect debugging, International Journal of Information Technology 13 (1) (2021) 331e340. [34] I. Saraf, J. Iqbal, Generalized multi-release modelling of software reliability growth models from the perspective of two types of imperfect debugging and change point, Quality and Reliability Engineering International 35 (7) (2019) 2358e2370. [35] A.K. Shrivastava, R. Sharma, Developing a hybrid software reliability growth model, International Journal of Quality & Reliability Management 39 (5) (2021) 1209e1225, https://doi.org/10.1108/ IJQRM-02-2021-0039. [36] V. Nagaraju, L. Fiondella, T. Wandji, A heterogeneous single changepoint software reliability growth model framework, Software Testing, Verification and Reliability 29 (8) (2019) e1717.
143
144
Engineering Reliability and Risk Assessment
[37] A.K. Shrivastava, P.K. Kapur, Development of software reliability growth models with time lag and change-point and a new perspective for release time problem, Mathametics Applied in Information Systems 2 (2018) 34e52. [38] M. Zhu, H. Pham, A multi-release software reliability modeling for open-source software incorporating dependent fault detection process, Annals of Operations Research 269 (1) (2018) 773e790. [39] V. Pradhan, A. Kumar, J. Dhar, Modeling multi-release open source software reliability growth process with generalized modified weibull distribution, in: Evolving Software Processes, 2022, pp. 123e133. [40] A. Choudhary, A.S. Baghel, O.P. Sangwan, An efficient parameter estimation of software reliability growth models using gravitational search algorithm, International Journal of System Assurance Engineering and Management 8 (1) (2017) 79e88. [41] K. Sharma, M. Bala, An ecological space-based hybrid swarm-evolutionary algorithm for software reliability model parameter estimation, International Journal of System Assurance Engineering and Management 11 (1) (2020) 77e92. [42] A. Gupta, N. Gupta, R.K. Garg, Implementing weighted entropy-distance based approach for the selection of software reliability growth models, International Journal of Computer Applications in Technology 57 (3) (2018) 255e266. [43] Q. Li, H. Pham, NHPP software reliability model considering the uncertainty of operating environments with imperfect debugging and testing coverage, Applied Mathematical Modelling 51 (2017) 68e85 (C). [44] R. Garg, S. Raheja, R.K. Garg, Decision support system for optimal selection of software reliability growth models using a hybrid approach, IEEE Transactions on Reliability 71 (1) (2021) 149e161, https://doi.org/10.1109/TR.2021.3104232.
CHAPTER 8
Reliability and profit analysis of a markov model having cost-free warranty with waiting repair facility Ram Niwas Department of Statistics, Goswami Ganesh Dutta Sanatan Dharma College, Chandigarh, India
1. Introduction Reliability of a system or item plays an important part in the consistent performance of a system/item over its expected life span. In fact, failure-free operation is a consequential requirement of complex industrial/engineering systems. Due to inadequate formulation, inappropriate manufacturing, inaccurate testing, improper maintenance and installation of the system or component, the systems or components may fail. Users/customers need safety against the early or unseen failures of the components and the protection that the item they are purchasing will operate decently for a certain span. In this regard, cost-free warranty provides such safety or protection to the users, since it specifies that the item or product must be repaired during warranty period free of cost to the users. In that direction, a repairable system was discussed by Ref. [1] in which the system after its first failure was replaced by new one and minimal repairs were provided to the system for its successive failures within the warranty period. Also, [2] developed a system in which the fail item was minimal repaired or replaced with a new one free of cost to the user/ customer during the warranty period. The economic benefit of the repairable system directly or indirectly affected by the repairman also. In repairable system models, there are two possibilities: (i) repairman always remains with the system; and (ii) repairman does not remain with the system but arrives whenever required. If the repairman remains with the system, then the failed systems/components don’t have to wait for the repairman for repairs. Also, the repairman might take a sequence of vacations in his idle time and may take other assigned job, which can have great influence on performance of the system [3,4,5,6]. But, if the repairman is not with the system, then the failed components will have to wait for the repairman for repairs that can affect the availability and performance of the system [7]. Moreover, the administrative delay of failed component leads to more delays in expected production and more complaints from customers, which makes it more difficult for organizations to serve their customers [8].
Engineering Reliability and Risk Assessment ISBN 978-0-323-91943-2, https://doi.org/10.1016/B978-0-323-91943-2.00007-1
© 2023 Elsevier Inc. All rights reserved.
145
146
Engineering Reliability and Risk Assessment
A variety of methods for repairable and nonrepairable system models having warranty policies by using various approaches such as Markov process, supplementary variable technique, etc., which are summarized in the following sections.
2. Background and literature review A brief literature review regarding a variety of methods and strategies exist for warranty analysis, which include cost-free warranty, two-dimensional warranty, Markov modeling, extended warranty, replacement policy, maintenance, and inspection is given in this section. The gaps found from literature review are also addressed in it. 2.1 Concept of warranty A warranty can be defined as a basic assurance provided by the manufacturer to consumers or users that the product they buy will perform satisfactorily up to a specific span, which is the warranty period. In other words, warranty is a written agreement offered by a manufacturer to the users/buyers to replace or repair a failed/defective product free of cost up to a specific span. Warranty builds the goodwill of the manufacturers and establishes trust in the mind of the consumers regarding the quality of the product/ item. 2.1.1 Role of warranty 2.1.1.1 Role to consumer/customer From the consumer’s point of view, warranty provides assurance and protection to the consumer/customer against the unseen and early failures of a product/system. It provides the customer mental peace regarding the maintenance of the product and adds to customer satisfaction toward the purchase of the product. A major added advantage is that the customers save up on the cost of repair in case of a breakdown. 2.1.1.2 Role to manufacture From the manufacturer’s point of view, warranty provides protection to the manufacturer by limiting its liability in case of breakdown due to inappropriate and incautious use or misuse of product by the customer. It functions as an advertising/promotional tool for a manufacturer to compete effectively in the market and increases the sale of the product. 2.2 Warranty cost analysis Blischke [9] was the first to review papers on warranties and cost analysis by using mathematical models. An explicit discussion and review of different matters related to cost-free warranty have been discussed in detail by [10,11,12]. Chun [13] considered free and modified warranty policies to determine the optimal number of periodic preventive
Reliability and profit analysis of a markov model having cost-free warranty with waiting repair facility
maintenance (PM) operations. A two-dimensional warranty is a natural extension of a one-dimensional warranty and to carry out the cost analysis of different twodimensional failure-free warranty policies, a system model was discussed by Ref. [14]. [15] presented a serieseparallel reliability system design considering maintenance and warranty. However, most of the research studies relating the cost-free warranty policy have mainly emphasized on cost analysis for repairable and nonrepairable system models. In that direction, [16] analyzed two-attribute warranty schemes and computed the expected warranty cost per component/item by using the 2D policy. Moreover, various types of warranty approaches have been presented for estimating warranty cost through mathematical models in Ref. [17]. Bai and Pham [18] discussed free repair and prorata warranty policies by considering two types of discounting methods. A twodimensional warranty approach that utilizes both time and usage into account presented by Ref. [19]. [20] analyzed a system model to study an RFRW (“renewing free replacement warranty”) for multistate deteriorating repairable items. Further, [21] analyzed mathematically the reliability and profit of a repairable system with warranty by using SVT (“supplementary variable technique”). [22] & [23] examined a repairable system with PM, inspection, and degradation having cost-free warranty. PM implemented for both manufacturers and users to increase product’s reliability and extend product life during and beyond warranty (BW) period [24]. Further, [25] presented a new warranty scheme wherein the user invests in the PM cost within the product’s life cycle to lessen the losses from production downtime. Based on the Markov Process [26,27,28], a mathematical model in which the repairman always remains with the system was discussed by Ref. [29] to analyze the reliability and expected profit of the system. [30] developed an extended warranty plan with limited number of repairs during warranty period. A nonperiodic PM strategy was implemented by Ref. [31]; to minimize the cost of the manufacturer as well as the buyer respectively under a nonhomogeneous Poisson process framework. [6] studied a performance based warranty for products subject to competing hard and soft failure. [32] discussed the maintenance problem for warranty product with multiple failure modes. [33] suggested three types of warranty policies with random failure threshold based on the degradation model. [34] developed n-components optimal maintenance models for warranted coherent systems. 2.3 Shortcoming and overcoming of the literature From the above discussion, we can find out that no work has been conducted to consider the reliability and profit analysis of the repairable system based on the cost-free warranty policy with waiting repair facility. Keeping in mind the importance of cost-free warranty, here we proposed a repairable system with cost-free warranty policy having a repairman, which is not always remain with the system and takes a little time to arrive at the system whenever required by using Markov process. The objectives of the proposed study are:
147
148
Engineering Reliability and Risk Assessment
(i) To propose a repairable system model having cost-free warranty policy, which simultaneously protects the users and the manufacturers by providing assurance and protection to the user/customer against the unseen and early failures of a product/system within warranty (WW) period and saves the cost of repairs in case of a breakdown. Cost-free warranty policy functions as an advertising/promotional tool for a manufacturer to compete effectively in the market and increases the sale of the product. (ii) To compute different system performance measures by using Markov process technique. (iii) To examine the system’s reliability at various levels of failure rate with respect to (w.r.t.) time (t). (iv) Finally, to observe the impact of repair cost on profit function w.r.t. time (t). The proposed paper is described as follows. The background and literature review about warranty cost analysis discussed in Section 2. Section 3 gives the description of the system containing the assumptions of the model, state specifications, and notations related to the proposed system model, Section 4 presents the model analysis in which different system performance measures are computed such as availability of the system, reliability, mean time to failure (MTTF), busy period of the repairman BW, and profit function. The numerical results based on system reliability and profit function in the form of tables and graphs and their interpretations are shown in Section 5. Section 6 presents concluding remarks. Future research directions are discussed in Section 7.
3. Description of the system 3.1 Assumptions (1) The system consists of a single machine, the machine is new and starts to work at the initial time t ¼ 0. (2) At the failure of the machine, we call for the repairman to arrive at the system, and he needs some time to arrive. (3) The system will wait for inspection and repair of failed machine until the repairman is available. (4) WW period, the repairman inspects the failed machine to examine whether it is out of warranty period or not. (5) All the repairs are free of cost to the consumers’ WW period, provided failures are not due to unauthorized modifications. (6) The repairman returns after repairing the failed machine at the system. (7) The machine works as good as new after its repair. (8) All the rates follow negative exponential distribution.
Reliability and profit analysis of a markov model having cost-free warranty with waiting repair facility
3.2 State specifications Let fNðtÞ; t 0g be a stochastic process with state space E ¼ fS0 ; S1 ; :::; S6 g. The states are defined as follows: S0 =S5 : The system is operating WW/BW period. S1 : The system is failed and waiting for inspection WW period because the repairman is not available with the system. S2 : The failed machine is under inspection WW period to examine whether it is out of warranty period or not. S3 =S4 : The failed machine is under repair WW/BW period. S6 : The failed machine is waiting for repair BW period. 3.3 Notations l: Constant failure rate of the machine. a: Constant arrival rate of the repairman. b: Constant inspection rate of the machine. m: Constant repair rate of the failed machine. p=q: P [the warranty of the system will not terminate/terminate]. p0 ðtÞ p5 ðtÞ: Probability density that at time t, the machine is in working state WW/ BW period. p1 ðtÞ: Probability density that at time t, the machine is failed and waiting for inspection WW period. p2 ðtÞ: Probability density that at time t, the failed machine is under inspection WW period to examine whether it is out of warranty period or not. p3 ðtÞ p4 ðtÞ: Probability density that at time t, the system is under repair WW/BW period. p6 ðtÞ: Probability density that at time t, the system is in failed state and waiting for repair BW period. pðsÞ: Laplace transform of function pðtÞ. LT: Laplace Transform.
4. Analysis of the system The Markov repairable system model contains a single machine with failure-free warranty policy. There is a single repair facility that not always remains with the system and takes a little time to arrive at the system whenever required. If the system fails, it will wait for inspection of failed machine until the repairman is available. As and when the repairman arrives, he starts inspection immediately to examine whether the machine failed due to unauthorized modifications or not. If it is failed due to unauthorized modifications, then system warranty is void automatically and the consumers will have to pay
149
150
Engineering Reliability and Risk Assessment
λ S6
S1
S0
S2
S3 pβ
S5
α
α
μ
λ
S4 qβ
: Good State
μ
: Failed State
Figure 8.1 Transition diagram of the model.
for the repairs. Otherwise, all the repairs are free of cost to the consumers. But, after the repair of the failed machine, the repairman returns back. It has also been assumed the failure, inspection, repair time of a machine, and the arrival time of a repairman follow a negative exponential distribution. The transition diagram of this system as shown in Fig. 8.1. 4.1 Mathematical formulation of the model On the basis of above description and diagram, the difference-differential equations can be formulated as: d (8.1) þ l p0 ðtÞ ¼ mp3 ðtÞ dt d þ a p1 ðtÞ ¼ lp0 ðtÞ (8.2) dt d (8.3) þ b p2 ðtÞ ¼ ap1 ðtÞ dt d (8.4) þ m p3 ðt Þ ¼ pbp2 ðt Þ dt d þ m p4 ðtÞ ¼ qb p2 ðtÞ þ a p6 ðtÞ (8.5) dt d (8.6) þ l p5 ðtÞ ¼ m p4 ðtÞ dt d (8.7) þ a p6 ðtÞ ¼ lp5 ðtÞ dt
Reliability and profit analysis of a markov model having cost-free warranty with waiting repair facility
Initial Conditions:
( pi ð0Þ ¼
1
for
i ¼ 0
0
for
is0
(8.8)
4.2 Solution of the equations In order to solve the above Eqs. (8.1)e(8.7), we use the LT corresponding to initial condition given in Eq. (8.8) and get ðs þ lÞp0 ðsÞ ¼ 1 þ mp3 ðsÞ
(8.9)
ðs þ aÞp1 ðsÞ ¼ lp0 ðsÞ
(8.10)
ðs þ bÞp2 ðsÞ ¼ a p1 ðsÞ
(8.11)
ðs þ mÞp3 ðsÞ ¼ p b p2 ðsÞ
(8.12)
ðs þ mÞp4 ðsÞ ¼ q b p2 ðsÞ þ a p6 ðsÞ
(8.13)
ðs þ lÞp5 ðsÞ ¼ m p4 ðsÞ
(8.14)
ðs þ aÞp6 ðsÞ ¼ lp5 ðsÞ
(8.15)
l p0 ðsÞ ðs þ aÞ
(8.16)
From Eq. (8.10), we get p1 ðsÞ ¼
Using Eq. (8.16) in Eq. (8.11) and after solving we get p2 ðsÞ ¼ AðsÞp0 ðsÞ Where AðsÞ ¼
al ðs þ aÞðs þ bÞ
(8.17) (8.18)
Using Eq. (8.18) in Eq. (8.12), we get p3 ðsÞ ¼
pb AðsÞ p0 ðsÞ ðs þ mÞ
(8.19)
m p4 ðsÞ ðs þ lÞ
(8.20)
Similarly, from Eq. (8.14), we get p5 ðsÞ ¼
Eq. (8.15) becomes after using Eq. (8.20) p6 ðsÞ ¼ BðsÞp4 ðsÞ
(8.21)
151
152
Engineering Reliability and Risk Assessment
Where BðsÞ ¼
ml ðs þ lÞðs þ aÞ
(8.22)
Using Eqs. (8.23) and (8.17) in Eq. (8.13), we get p4 ðsÞ ¼ CðsÞp0 ðsÞ Where CðsÞ ¼
(8.23)
q b AðsÞ fðs þ mÞ a BðsÞg
(8.24)
Using Eq. (8.23) in Eq. (8.20), we get p5 ðsÞ ¼
m CðsÞ p0 ðsÞ ðs þ lÞ
(8.25)
Similarly, using Eq. (8.23) in Eq. (8.21), we get p6 ðsÞ ¼ BðsÞCðsÞp0 ðsÞ
(8.26)
Now, using Eq. (8.19) in Eq. (8.9), we get p0 ðsÞ ¼ Where T ðsÞ ¼
1 T ðsÞ
ðs þ lÞðs þ mÞ m p b AðsÞ ðs þ mÞ
(8.27) (8.28)
Hence, we find the relation that 6 X
pi ðsÞ ¼
i¼0
1 s
(8.29)
4.3 Reliability of the system RðtÞ [29] Reliability RðtÞ is the probability that the system/machine working well in a specified period of time. The difference-differential equation for reliability is: d (8.30) þ l p0 ðtÞ ¼ 0 dt Using the initial conditions and taking LT of the above equation, we get ðs þ lÞp0 ðsÞ ¼ 1
(8.31)
By using Eq. (8.31), the LT of RðtÞ at time t is as follows RðsÞ ¼ p0 ðsÞ ¼
1 ðs þ lÞ
(8.32)
Taking inverse LT of above equation, we obtain RðtÞ ¼ expðltÞ
(8.33)
153
Reliability and profit analysis of a markov model having cost-free warranty with waiting repair facility
Now, based on Eq. (8.33), the MTTF is defined as: ZN MTTF ¼
RðtÞdt 0
MTTF ¼
1 l
(8.34)
4.4 Availability of the system Av (t) The availability Av ðtÞ is the probability that the system/machine is working satisfactorily at time t. By using Eqs. (8.24), (8.25), (8.27) and (8.28), the LT of AvðtÞ at time t is as follows Av ðsÞ ¼ p0 ðsÞ þ p5 ðsÞ 6 s þ a5 s5 þ a4 s4 þ a3 s3 þ a2 s2 þ a1 s þ a0 ¼ 6 sðs þ b5 s5 þ b4 s4 þ b3 s3 þ b2 s2 þ b1 s þ b0 Þ
(8.35)
Where a5 ¼ ðl þ 2m þ 2a þ bÞ; a4 ¼ ½flm þ ða þ bÞðm þ lÞ þ abg þ ða þ mÞða þ b þ l þmÞ þ am; a3 ¼ ½flm þ abðm þ lÞ almg þ flm þ ða þ bÞðm þ lÞ þ abgðm þ lÞ þ ðm þ l þ a þ bÞam; a2 ¼ ½qalmb þ ðm þ aÞflm þ abðm þ lÞ almg þ amflm þ ða þ bÞðm þ lÞ þ abg; a1 ¼ ½qalmb þ amflm þ abðm þ lÞ alg & a0 ¼ a2 m2 bql b5 ¼ ð2l þ 2a þ 2m þ bÞ; b4 ¼ ½ðal þ am þ lmÞ þ ðl þ a þ mÞfðm þ l þ a þ bÞ þ flm þ ða þ bÞðm þ lÞ
" b3 ¼
þabgg; ðm þ l þ a þ bÞðal þ am þ lmÞ þ ðl þ a þ mÞflm þ ða þ bÞðm þ lÞ þ abgþ
#
flm þ abðm þ lÞg
b2 ¼ ½flm þ ða þ bÞðm þ lÞ þ abgðal þ am þ lmÞ þ ðl þ a þ mÞflm þ abðm þ lÞg þqlmab; b1 ¼ ½ðal þ am þ lmÞflm þ abðm þ lÞg þ ðl þ a þ mÞqlmab & b0 ¼ ½qlmab ðal þ am þ lmÞ After taking inverse LT of Eq. (8.35), we obtain
;
154
Engineering Reliability and Risk Assessment
Av ðtÞ ¼ A0 þ
6 X
Ai expðsi tÞ
(8.36)
i¼1
Where 9 a0 & > > s1 s2 s3 s4 s5 s6 > > > 6
= 5 4 3 2 si þ a5 si þ a4 si þ a3 si þ a2 si þ a1 si þ a0 Ai ¼ ; i ¼ 1; 2; :::; 6 > 6 > Y > > > si sj ; A0 ¼
(8.37)
jsi ¼ 1
s1 ; s2 ; :::; s6 are the roots of the Eq.; s6 þ b5 s5 þ b4 s4 þ b3 s3 þ b2 s2 þ b1 s þ b0 ¼ 0 4.5 Busy period of the repairman BW period Since p4 ðtÞ is the probability that the repairman remains busy for repairing the failed machine BW period. Consider the system’s warranty period is (0, w). Therefore, BW period, the repairman remains busy during the interval ðw; tÞ[35] and then the busy period BðtÞ of the repairman BW period in the interval ðw; tÞ is given by Zt BðtÞ ¼
p4 ðtÞdt
(8.38)
w
Using Eqs. (8.23), (8.24), (8.27) and (8.28), we get qlab s3 þ c2 s2 þ c1 s þ c0 p4 ðsÞ ¼ 6 sðs þ b5 s5 þ b4 s4 þ b3 s3 þ b2 s2 þ b1 s þ b0 Þ
(8.39)
Taking inverse Laplace transform of Eq. (8.39), we get p4 ðtÞ ¼ B0 þ
6 X
Bi expðsi tÞ
(8.40)
i¼1
Where 9 qlab c0 > & > > > s1 s2 s3 s4 s5 s6 > > = 3
2 qlab si þ c2 si þ c1 si þ c0 Bi ¼ ; i ¼ 1; 2; :::; 6 > > 6 Y > > > > si sj ; B0 ¼
jsi ¼ 1
(8.41)
Reliability and profit analysis of a markov model having cost-free warranty with waiting repair facility
s1 ; s2 ; :::; s6 are the roots of the Eq.; s6 þ b5 s5 þ b4 s4 þ b3 s3 þ b2 s2 þ b1 s þ b0 ¼ 0 Now, using Eq. (8.40) in Eq. (8.38) and after solving, we get BðtÞ ¼ B0 ðt wÞ þ
6 X Bi i¼1
si
fexpðsi tÞ expðsi wÞg
(8.42)
4.6 Profit analysis of the system Let R1 be the revenue per unit time and R2 be the repair cost per unit time. Since, all the repairs are free of cost to the users during warranty period so, there will be no effect of the busy period of the repairman within warranty period. Therefore, the expected profit of the system Ep ðtÞ is given by: Zt Ep ðtÞ ¼ R1
Av ðtÞdt R2 BðtÞ
(8.43)
0
Ep ðtÞ ¼ Total Revenue Total Cost=Expenditure BW Period By using Eqs. (8.36) and (8.42) and after solving, we get " # 6 X Ai expfðsi tÞ 1g Ep ðtÞ ¼ R1 A0 t þ si i¼1 " # 6 X Bi R2 B0 ðt wÞ þ fexpðsi tÞ expðsi wÞg s i¼1 i
(8.44)
5. Numerical results The values of system’s reliability RðtÞ and expected profit Ep ðtÞ are computed for different values of the failure rateðlÞ and repair cost ðR2 Þ w.r.t. time ðtÞ and shown in Tables 8.1 and 8.2 respectively. 5.1 Interpretations of the numerical results To examine the performance of the system model, we performed an analysis where we change the values of the parameters such as failure rate ðlÞ, and repair cost ðR2 Þ w.r.t. time ðtÞ. Based on it, the values of the reliability of the system and expected profit are computed and shown in Tables 8.1 and 8.2 respectively. Table 8.1 depicts that as we increase l from 0.1 to 0.2 and then further to 0.3 at a particular time, say 5 years, then the
155
156
Engineering Reliability and Risk Assessment
Table 8.1 Impact of failure rate (l) on system’s reliability RðtÞ [29]. RðtÞ RðtÞ Year ðtÞ l ¼ 0:1 l ¼ 0:2
1 2 3 4 5 6 7
0.904837 0.818731 0.740818 0.67032 0.606531 0.548812 0.496585
RðtÞ l ¼ 0:3
0.818731 0.67032 0.548812 0.449329 0.367879 0.301194 0.246597
0.740818 0.548812 0.40657 0.301194 0.22313 0.165299 0.122456
Table 8.2 Impact of repair cost ðR2 Þ on expected profit Ep ðtÞ. l ¼ 0:1; m ¼ 0:7;
l ¼ 0:1; m ¼ 0:7;
l ¼ 0:1; m ¼ 0:7;
a ¼ 0:8; b ¼ 0:9; p ¼ q ¼ 0:5; R1 ¼ 1000
a ¼ 0:8; b ¼ 0:9; p ¼ q ¼ 0:5; R1 ¼ 1000
a ¼ 0:8; b ¼ 0:9; p ¼ q ¼ 0:5; R1 ¼ 1000
Year ðtÞ
Ep ðtÞ ðfor R2 ¼ 300Þ
Ep ðtÞ ðfor R2 ¼ 200Þ
Ep ðtÞ ðfor R2 ¼ 100Þ
1 2 3 4 5 6 7
1190 2579 4000 5408 6801 8538 9610
1200 2611 4054 5484 6899 8658 9752
1210 2643 4108 5561 6997 8778 9893
system reliability decreases 32.96% every time. The complete variations of the impact of l on RðtÞ w.r.t. time are summarized in Fig. 8.2. On the other hand, Table 8.2 highlights the behavior of the expected profit and it indicates that the expected profit Ep ðtÞ is increases with the increase in repair cost (R2 ) for fixing other parameters. For instance, if we decrease R2 from 300 to 200 and then further to 100, then Ep ðtÞ increases from 6801 to 6899 and then further to 6997 for a particular time, say 5 years. The complete variations of the impact of R2 on Ep ðtÞ w.r.t. time are summarized in Fig. 8.3.
6. Conclusion This paper proposed a Markov repairable system model having cost-free warranty. In which there is a single repair facility, which is not accompanying the system and takes
Reliability and profit analysis of a markov model having cost-free warranty with waiting repair facility
Figure 8.2 Effect of failure rate l on reliability of the system RðtÞ.
Figure 8.3 Effect of repair cost R2 on expected profit. EP ðtÞ.
some time to arrive at the system whenever required. Implemented failure-free warranty provides assurance to the consumers against early failures because the repair charges are free of cost to the consumers WW period. Therefore, the proposed system model aimed to protect the users by reducing the unnecessary expenditure on repair of the failed machine WW period and also to analyze the system’s reliability at different levels of failure
157
158
Engineering Reliability and Risk Assessment
rate and compute various system performance measures by using Markov process technique. As a result, the reliability of the system increases by decreasing the failure rate w.r.t. time (t). Moreover, the effect of repair cost on expected profit of the system has been presented in the form of numerical results, which indicates that the expected profit of the system increases w.r.t. time (t) for fixing other parameters whenever we decrease the repair cost. Hence, the study reveals that a Markov process repairable system with a waiting repair facility will be economic to use by providing free of cost repair service to the users for a fixed span and by decreasing the repair cost w.r.t time (t). So, proposed model is more appropriate and advance than the existing models.
7. Future research directions Although we have focused on the one unit Markov model with waiting repair facility having cost-free warranty policy, the general idea presented here can also be applicable to many systems such as series system, parallel system, standby system, k-out-of-n system, multistate system, and so on. Also, preventive maintenance strategy could be considered for better functioning of the system. Moreover, the proposed approach will be further extended in future by using other types of warranty policies such as two-dimensional [19]; (Wang et al.[36]).
Acknowledgment The author is thankful to the reviewers for their valuable suggestions/comments that led to an improved presentation of this paper.
References 1. K. Rinsaka, H. Sandoh, A stochastic model on an additional warranty service contract, Computers and Mathematics with Applications 51 (2006) 179e188. 2. A.E. Jahromi, H. Vahdani, Replacement-Repair policy based on a simulation model for multi-state deteriorating products under warranty, Transaction E: Industrial Engineering 16 (1) (2009) 26e35. 3. J. Anand, Geeta, A Review: Reliability Modeling of a Computer System with Hardware Repair and Software Up-Gradation Subject to Server Vacation (Book Chapter), Communication and Computing Systems, CRC Press, 2019. 4. S. Liu, L. Hu, Z. Liu, Y. Wang, Reliability of a retrial system with mixed standby components and Bernoulli vacation, Quality Technology & Quantitative Management 18 (2) (2020) 248e265. 5. R. Niwas, M.S. Kadyan, Stochastic analysis of a single-unit system with repairman having multiple vacations, International journal of Computer Applications 8 (1) (2018) 137e147. 6. G. Wang, L. Hu, T. Zhang, Y. Wang, Reliability modeling for a repairable (k1, k2)-out-of-n: G system with phase-type vacation time, Applied Mathematical Modelling 91 (2021) 311e321. 7. M. Ram, S.B. Singh, V.V. Singh, Stochastic analysis of a standby system with waiting repair strategy, IEEE Transactions on Systems, Man, and Cybernetics: Systems 43 (3) (2013) 698e707. 8. M.S. El-Sherbeny, Z.M. Hussien, Reliability and sensitivity analysis of a repairable system with warranty and administrative delay in repair, Journal of Mathematics 2021 (2) (2021) 1e9.
Reliability and profit analysis of a markov model having cost-free warranty with waiting repair facility
9. W.R. Blischke, Mathematical models for analysis of warranty policies, Mathematical and Computer Modelling 13 (7) (1990) 1e16. 10. W.R. Blischke, D.N.P. Murthy, Warranty Cost Analysis, Marcel Dekker, New York, 1994. 11. W.R. Blischke, D.N.P. Murthy, Product warranty managementeI: a taxonomy for warranty policies, European Journal of Operational Research 62 (1992) 127e148. 12. W.R. Blischke, D.N.P. Murthy, Product warranty managementeIII: a review of mathematical models, European Journal of Operational Research 63 (1992) 1e34. 13. Y.H. Chun, Optimal number of periodic preventive maintenance operations under warranty, Reliability Engineering & System Safety 37 (5) (1992) 223e225. 14. D.N.P. Murthy, B.P. Iskandar, R.J. Wilson, Two dimensional failure-free warranty policies: two dimensional point process models, Operations Research 43 (2) (1995) 356e366. 15. A. Monga, M.J. Zuo, Optimal system design considering maintenance and warranty, Computers & Operations Research 25 (9) (1998) 691e705. 16. H.G. Kim, B.M. Rao, Expected warranty cost of two-attribute free-replacement warranties based on a bivariate exponential distribution, Computers and Industrial Engineering 38 (2000) 425e434. 17. D.N.P. Murthy, I. Djamaludin, New product warranty: a literature review, International Journal of Production Economics 79 (2002) 231e260. 18. J. Bai, H. Pham, Repair-Limit Risk-Free Warranty Policies With Imperfect Repair, IEEE Transactions on Systems Man and Cybernetics - Part A Systems and Humans 35(6) (2005) 765 - 772. 19. Y. Huang, C. Yen, A study of two-dimensional warranty policies with preventive maintenance, IIE Transactions 41 (2009) 299e308. 20. H. Vahdani, S. Chukova, H. Mahlooji, On optimal replacement-repair policy for multi-state deteriorating products under renewing free replacement warranty, Computers and Mathematics with Applications 61 (2011) 840e850. 21. M.S. Kadyan, Ramniwas, Cost benefit analysis of a single-unit system with warranty for repair, Applied Mathematics and Computation 223 (2013) 346e353. 22. R. Niwas, M.S. Kadyan, J. Kumar, Probabilistic analysis of two reliability models of a single-unit system with preventive maintenance beyond warranty and degradation, Eksploatacja i Niezawodnosc-Maintenance and Reliability 17 (4) (2015) 535e543. 23. R. Niwas, M.S. Kadyan, J. Kumar, MTSF (mean time to system failure) and profit analysis of a singleunit system with inspection for feasibility of repair beyond warranty, International Journal of System Assurance Engineering and Management 7 (2016) 198e204. 24. Y. Huang, C. Huang, J. Ho, A customized two-dimensional extended warranty with preventive maintenance, European Journal of Operational Research 257 (3) (2017) 971e978. 25. S. Mo, J. Zeng, W. Xu, A new warranty policy based on a buyer’s preventive maintenance investment, Computers and Industrial Engineering 111 (2017) 433e444. 26. G.S. Bura, Transient solution of an M/M/N queue with catastrophes, Communication in StatisticsTheory & Methods 48 (2018) 3439e3450. 27. G.S. Bura, S. Gupta, Time dependent analysis of an M/M/2 queue with catastrophes, Journal of Reliability theory & applications 14 (2019) 79e86. 28. N.K. Jain, G.S. Bura, M/M/2/N queue subject to modified binomially distributed catastrophic intensity with restoration time, Journal of the Indian Statistical Association 49 (2) (2011) 135e147. 29. R. Niwas, H. Garg, An approach for analyzing the reliability and profit of an industrial system based on the cost free warranty policy, Journal of the Brazilian Society of Mechanical Sciences and Engineering 40 (2018) 1e9. 30. F. Hooti, J. Ahmadi, M. Longobardi, Optimal extended warranty length with limited number of repairs in the warranty period, Reliability Engineering & System Safety 203 (2020), https://doi.org/ 10.1016/j.ress.2020.107111. 31. A. Salmasnia, A. Shahidian, M. Seivandian, B. Abdzadeh, Bi-objective optimization of non-periodic preventive maintenance strategy by considering time value of money, Scientia Iranica 27 (6) (2020) 3305e3321.
159
160
Engineering Reliability and Risk Assessment
32. P. Liu, G. Wang, P. Su, Optimal replacement strategies for warranty products with multiple failure modes after warranty expiry, Computers & Industrial Engineering 153 (2021), https://doi.org/ 10.1016/j.cie.2020.107040. 33. T. Li, S. He, X. Zhao, Optimal warranty policy design for deteriorating products with random failure threshold, Reliability Engineering & System Safety 218 (2022), https://doi.org/10.1016/ j.ress.2021.108142. 34. M. Hashemi, M. Asadi, M. Tavangar, Optimal maintenance strategies for coherent systems: a warranty dependent approach, Reliability Engineering & System Safety 217 (2022), https://doi.org/10.1016/ j.ress.2021.108027. 35. R. Niwas, Reliability analysis of a maintenance scheduling model under failure free warranty policy, Journal of Reliability Theory & Applications 13 (3) (2018) 49e65. [36] Y. Wang, Y. Liu, A. Zhang, Preventive maintenance optimization for repairable products considering two-dimensional warranty and customer satisfaction, Proceeding of the Institution of Mechanical Engineers, Part O: Journal of Risk and Reliability 233 (2018) 553e566.
CHAPTER 9
Semi-Markov modeling applications in system availability analysis Girish Kumar1, M.K. Loganathan2 and Om Yadav3 1
Delhi Technological University, Delhi, India; 2Kaziranga University, Jorhat, Assam, India; 3North Dakota University, Fargo, United States
1. System availability Greater availability of production systems is the primary requirement for any manufacturing industry in order to enhance the plant productivity [1]. System availability, which is an effective performance measure, is an indicator that quantifies the probability of a system being able to operate at a given point in time. Availability is, in essence, the amount of time that a system is functional when desired. It may also be expressed as the percentage of time a system is functioning over a stipulated time interval [2]. Availability assessment is the best way to measure the performance of a system when it can be reinstated to operation after failure, meaning that the system performance is expressed in the form of availability when it tends to encounter both failures (reliability) and repairs (maintainability). System reliability performance means that the system will not fail for a certain period of time. Whereas system maintainability shows how successfully the system is restored after failure has occurred. When these two metrics are considered, then the metrics availability is required to ensure that the system is operational at a given point in time. Availability is the performance measure for any repairable system that accounts for reliability and maintainability. The definition of availability can be elaborated as the probability that a system has not failed or undergone the restoration action when it needs to be in operational state. For instance, if a car has a 98% availability, there will be 20 times out of a 1000 that the driver needs to drive the car and figure out that the car is not in running condition either due to the car is broken down or it is under repair. From this, it is understood that the availability is the measure that combines both reliability and maintainability. Fig. 9.1 indicates that the system availability (uptime and downtime) is the function of system reliability (Failures) and system maintainability (Repairs). The reliability measure can be conceived as the probability that a system performs its intended functions during the given period of time, whereas the availability is the probability of the system, which is currently in functioning state although it might have failed earlier and restored to its normal working condition. On the other hand, maintainability represents the probability of a system being repaired and resumed to its working state from a failed state. Hence, it is required that reliability and maintainability
Engineering Reliability and Risk Assessment ISBN 978-0-323-91943-2, https://doi.org/10.1016/B978-0-323-91943-2.00012-5
© 2023 Elsevier Inc. All rights reserved.
161
162
Engineering Reliability and Risk Assessment
Figure 9.1 System availability.
be included when the system is modeled for availability. This means that the probability of system availability is evaluated by obtaining the failure and repair time distributions. Highly available systems mean that system operations will continue to remain normal on most occasions. Any failure, which is indispensable, will affect this, and therefore, it is crucial to devise a plan to eliminate or reduce the potentialities of any problems before they originate. Normally high availability systems might be specified as 99.98% or 99.999%. Availability of a system may be increased by minimizing downtime through appropriate and adequate testing methods, better diagnostics, and improved repair methodologies. For achieving high levels of availability in complex systems such as aircraft propulsion, the use of redundancy may be the best option to reduce the system failure rate. As redundancy increases reliability, so does availability.
2. Motivation Mechanical systems form the essential constituent of any industrial systems from airconditioning and manufacturing to aviation. The increasing technological interdependence and evolution of hybrid technology such as electromechanical, electronics, and software have led to the high degree of integration that results in increased complexity in these systems. The failure is inevitable in such complex systems. The performance of the complex mechanical system is one of the key factors that helps enhance plant productivity [3]. In most of the industrial sectors, the system availability is a major issue that still remains unsolved. Although maintenance function does help in enhancing the system availability [4,5], yet poor maintenance practices are regarded as the dominant factor that retards the availability performance, which affects the production capacity, product quality, and profitability. Most of the availability issues that adversely impact the plant performance are not accountable to poor maintenance. There are issues such as design deficiencies, poor standardization, and interchangeability, inaccessibility due to increased system complexity are also attributable to such a poor system maintainability. Hence, the focus on availability has received widespread attention recently and is a part of process/ service specification. It is also becoming an integral part of product design, life cycle costing, inventory management, and maintenance system decision-making. The availability measures help engineers in optimally choosing suitable maintenance methods
Semi-Markov modeling applications in system availability analysis
and enhancing the system performance. This will lead to reduction in operation and maintenance cost and overall increase in profitability and productivity of the plant. The elements of complex mechanical systems exhibit decreasing or increasing failure and repair patterns. This makes it difficult to perform availability analysis. Moreover, the estimation based on conventional methods [6], which evaluates availability using Markov method or intuitive knowledge, rendered ineffective for these systems. The Markov model is based on the assumption of constant failure and repair time. However, they are invalid in reality. In order to relieve these assumptions and to quantify the availability to nearly realistic value, the semi-Markov model based approach is adapted for practical systems [7]. This work presents system availability assessment methods using the semiMarkov model. This is done by first obtaining the system structural information by tracking its constituting elements, i.e., subsystem assembly or component and subsequently deriving their states, i.e., nonfunctional or operating, etc. This helps to develop the semi-Markov model for the system. The developed model is then solved analytically or by using simulation methods to obtain system availability. Some examples are demonstrated for availability assessment of critical mechanical systems that are employed in various industrial units.
3. Availability assessment Availability has different meanings and methods of its computation depend on its use. As mentioned earlier, both the failure and repair probability distributions must be considered to evaluate system availability. This availability includes functional factors that are contributed to the system functions and aspects of nonfunctions and delays pertaining to the system. There are different forms of availability one can choose to consider for system performance analysis. However, the focus of this chapter is aimed at discussing availability assessment in steady-state conditions. The system steady-state availability is defined as the limit of the point availability as the time tends to infinity. The point availability reaches the steady state after a certain period of time. This is also called as asymptotic or long-run availability [2], which is given as: AðNÞ ¼ lim AðTÞ t/N
(9.1)
where point (i.e., instantaneous) availability is the probability of a system being functional at any random time, t. The instantaneous availability function reaches the steady state after a time period of approximately three to four times the mean time to failure.
163
164
Engineering Reliability and Risk Assessment
4. Availability assessment methods Repairable systems are state-dependent systems. The system goes through various states throughout its cycle of working and nonoperating conditions. The repairable systems, if failed to operate, are generally restored to their normal operating condition by any means of activity, including component replacement, system settings, or adjustments, etc. State-based modeling of repairable systems remains the research interest [8] & [9,10,11]. These models are extensively used to assess the availability of repairable systems. In general, the state-based models employed to assess the system availability in two ways; simulation approaches and analytical availability assessment methods [12]. These methods model the system in terms of random variables (say, failures or repair times) considering the states of the underlying items; be it system or otherwise. Simulation methods iteratively determine the system availability for a period of time interval by drawing each random variable. Whereas analytical methods apply the theories of probability to quantify various system performance measures in steady-state or transient conditions. It is important to mention that both the approaches mentioned above deal with the state-dependent system. The commonly used approaches are: Markov and semiMarkov approaches for state-dependent systems. 4.1 Markov method The Markov method that is used to model state-dependent repairable systems, which is based on the Markov process, whose probabilities are evaluated by its current values, not the previous ones because it does not remember the past history. This property is known as memorylessness. The process is represented by the Markov chain showing various states of the repairable system and their transitions as shown in Fig. 9.2. The transition rate is constant, and the transition times are distributed exponentially. A Markov model needs identification of possible system states, their transition paths, and the rate parameters of the transitions. Each state represents the different conditions that the system undergoes. The state-to-state transition takes place with exponentially distributed failure and repair rate. In Fig. 9.2, the states (A and B) are represented by circles and transitions are represented by arrows. Initially, the system is at state A, say for example, operating state, which is shown by the green circle. Eventually, the system moves to state B, say
Figure 9.2 Markov chain representing Markov process for two-state system.
Semi-Markov modeling applications in system availability analysis
for instance, a failed state that is represented by the blue circle. The transition with respect to time occurs from state A to B, which is represented by “f” (i.e., failure rate). The system from the failed state B is restored to the operating state A through “r” (i.e., repair rate). In Fig. 9.2, the numerical value that is associated with the arrows indicates the probability of the system being shifted from one state to another. For example, if the system is at state A, then the probability of transition to state B is 0.8, while the probability that the system remains at state A is 0.2. In order to evaluate the value of the state probabilities, the Markov chain is solved analytically by converting the chain into a transition probability matrix. The Markov model is a widely used technique for many applications, including evaluation of reliability and availability performance. 4.2 Semi-Markov method Semi-Markov models have been widely applied for system modeling in recent times [13,14e16]. However, the application to repairable systems under various maintenance scenarios has not been exploited much. This chapter presents the application of semiMarkov model to steady-state availability analysis of repairable mechanical systems. A semi-Markov process (SMP) model is also a state-space model, where transactions are governed by a transition probability matrix. However, the time between transitions follows nonexponential distribution for SMP unlike Markov where it is exponential [17]. The sojourn (stay) time in most of the states of mechanical components/systems due to degradation/failure behavior and their maintenance time follow nonexponential distributions [18,19]. A Weibull distribution is more appropriate for such systems. However, the transition epochs possess the Markovian property, which fits into the semi-Markov process.
5. System availability modeling and analysis The SMP model is a state-based model that shows all possible states of the system as nodes connected by the edges representing the corresponding failure and repair time distributions. The system states are derived in terms of subsystem/component states. There are simulation and analytical methods available for solving the SMP model. Analytical methods provide closed-form solutions, which are accurate and obtained quickly in a single iteration under the assumptions of the model [20,21]. System availability is obtained by summing the probability of operating states [2]. In an analytical approach, the system state probability expressions are derived in terms of transition distribution parameters. The two-stage method is presented by Ref. [22] to obtain solutions for the developed SMP model for steady-state system availability analysis. The first stage evaluates the onestep transition probability matrix of the Embedded Markov Chain (EMC) of the SMP
165
166
Engineering Reliability and Risk Assessment
model. This matrix is subsequently applied to determine steady-state probabilities of the EMC. The second stage calculates mean sojourn time (MST) for each state. The MST reflects the period the system remains in a state. Subsequently, the steady-state probability matrix of the SMP model is obtained by using the steady-state probabilities of the EMC and sojourn time values. This is used for the system availability evaluation. The detailed method is discussed in the following section. 5.1 Steady-state solution The steady-state solution for the availability analysis using semi-Markov models has two stages such as; evaluation of EMC state probabilities and SMP state probabilities. These are described below. 5.1.1 Stage 1: EMC state probabilities The Kernel matrix, K(t) which describes the semi-Markov process [23], is written as: 2 3 6 k11 ðtÞ 6 6 k ðtÞ 6 21 6 6 k31 ðtÞ KðtÞ ¼ 6 6 :: 6 6 6 :: 6 4 k ðtÞ N1
k12 ðtÞ
k13 ðtÞ
::
::
k22 ðtÞ
k23 ðtÞ
::
::
k32 ðtÞ
k33 ðtÞ
::
::
::
::
::
::
::
::
::
::
::
::
kN2 ðtÞ kN3 ðtÞ
k1N ðtÞ 7 7 k2N ðtÞ 7 7 7 k3N ðtÞ 7 7 :: 7 7 7 :: 7 7 k ðtÞ 5
(9.2)
NN
The matrix K(t) is of the order N x N, where N represents the total number of states in the system. The matrix element, kij(t), depicts the conditional probability of the system being in jth state during regeneration, which occurs either on or prior to the time “t”; and the system was in ith state just after the proceeding regeneration process, i.e., kij(t) ¼ P{X1 ¼ j, S1 t|X0 ¼ i}, where i, j are the states of the SMP model, with S1, X0, X1 are the time of the first transition, initial state of the system, and the state after the first transition, respectively. Following this definition, the elements, kij(t) are expressed [23] as: 8 > 0 > > > < Fij ðtÞ kij ðtÞ ¼ Z t (9.3) > > > F ik F im dFij ðxÞ > : 0
: No state is reachable from State i during the transition time t. :A single state j is reachable from the state i during the transition time t.
Semi-Markov modeling applications in system availability analysis
Two or more states (say, j, k, m) are reachable from state i during the transition time t, then the element of k(t) in the ith row will be three. Where Fij (t) is the CDF in state i. The one-step transition probability matrix, Z of the EMC evaluated using the Kernel matrix, K(t) when t /N. While evaluating the matrix elements, the sum of elements of matrix Z in each row must be equal to 1. The matrix, Z is expressed as [22]: 2 3 6 p11 6 6p 6 21 6 6 p31 Z ¼ KðNÞ ¼ 6 6 :: 6 6 6 :: 6 4p
N1
p12
p13
::
::
p22
p23
::
::
p32
p33
::
::
::
::
::
::
::
::
::
::
pN2
pN3
::
::
p1N 7 7 p2N 7 7 7 p3N 7 7 :: 7 7 7 :: 7 7 p 5
(9.4)
NN
The following equation is used for evaluating the EMC state probabilities [22]. H ¼ HKðNÞ;
N X
hi ¼ 1; i˛U
(9.5)
i¼1
where; H ¼ ½h1 ; h2 ; ..; hN is the row vector, which gives the steady-state probability of each state of the EMC. The above values are subsequently utilized for evaluating steady-state probabilities of the SMP model in stage 2. 5.1.2 Stage 2: SMP state probabilities In this step, the mean sojourn time, si for each state i and steady-state probability, Pi, are evaluated. The following definition is employed for estimating the mean sojourn time, si evaluation [22]: 8 > > > Z N 0 > > > < F ij ðtÞ dt si ¼ (9.6) 0 Z > N > > > > > : 0 F ij F ik F im ðtÞ dt : No state is reachable from the state i within the transition time t. : A single state j is reachable from the state i within the transition time t. :Two or more states (say, j, k, m) are reachable from state i during the transition time t.
167
168
Engineering Reliability and Risk Assessment
The steady-state probability Pi for ith state of the system is obtained with the help of the EMC state probabilities and the sojourn times as per Eq. (9.7) [17]: hi si Pi ¼ P ; i˛U; hj sj
(9.7)
j˛U
The steady-state availability is given the following equation [2]. X Pj As ¼
(9.8)
j˛W
where W is the set of functional states and Pj is obtained using Eq. (9.7).
6. Application of SMP for engineering systems In this section, following practical examples of mechanical systems are considered for system availability assessment. • Pumping system under Preventive Maintenance (PM) • Vertical milling center under Run-To-Failure Maintenance (RTFM) • Pumping system under Condition Based Maintenance (CBM) • Pumping system under Opportunistic Maintenance (OM) Since, maintenance plays an important role in enhancing system availability, some maintenance scenarios are illustrated for increasing the system availability. 6.1 Illustration 1: pumping system under preventive maintenance PM is the planned-cum-time-based maintenance that is regularly performed on a system to avoid catastrophic failure [24]. PM actions include adjustments, inspection, lubrication, and replacement of components. The interval between two preventive maintenance actions is crucial to the availability of the engineering system. A large PM interval may lead to random failures, which decrease system availability. On the other hand, a low PM interval leads to frequent shutdowns of the equipment and may cause maintenance-induced failures that increase the down time and hence decrease system availability. Therefore, there should be an optimal interval to carry out the PM to maximize the system availability. This section demonstrates selection of an optimal PM interval for a maximum value of the system availability. 6.1.1 System availability modeling and analysis The selected system consists of two pumps in series configuration. The system schematic is given in Fig. 9.3. At the time of PM a pump is stopped. PM on a particular pump results in shutdown of the whole system due to series arrangement. In such a situation it is better to perform the maintenance on both pumps simultaneously. Industrial experts are
Semi-Markov modeling applications in system availability analysis
Figure 9.3 Schematic diagram: pumping system.
consulted for the failure and repair data of the pumps. For a particular value of the PM interval, the system steady-state availability assessment is done using the method described in Section 3. Golden Section search algorithm is employed to find the PM interval for optimal availability treating the system availability as the objective function. For availability assessment, a state model is developed. In case of state-based models, first states are identified. There are four possible states for the system as shown in Fig. 9.4. In state 1, both pumps are functional and the system is working. In states 2 and 3, the first and the second pumps fail, respectively. Being a series system, states “2” and “3” are “Unavailable.” In State 4, preventive maintenance is carried out on both the pumps at fixed time intervals [25]. For transitions among states and corresponding distribution and their parameters are given in Table 9.1. Failure and repair times are assumed to follow Weibull and Lognormal distributions, respectively. In this model, the PM interval (T1) is fixed.
Figure 9.4 System model. Table 9.1 Transition distributions. Transition
Distribution (parameters)
1e2 1e3 1e4 2e1 3e1 4e1
Weibull (b12, q12) Weibull (b13, q13) Deterministic or fixed Interval(T1) Log-normal (m21, s21) Log-normal (m31, s31) Log-normal (m41, s41)
169
170
Engineering Reliability and Risk Assessment
The repair and failure time distribution parameters and initial value of the PM interval (T1) are decided after discussion with plant experts. Cumulative Distribution Function (CDF) expressions for the respective transition distribution are given in terms of distribution parameters (Refer Table 9.2). CDFs are further used to find the elements of the kernel using Eq. (9.3). For the system model (given in Fig. 9.4), the elements of the kernel matrix are derived as listed in Table 9.3. Further, the one-step transition probability elements are also derived for the steady state using Eq. (9.4) and their values are evaluated (Table 9.4). Next, steady-state probabilities of the states of EMC and mean sojourn times are calculated using Eq. (9.5) and Eq. (9.6), respectively (Refer Tables 9.5 and 9.6). The steady-state probabilities of the states of SMP are calculated using Eq. (9.7) and the obtained values are given in Table 9.5. The system availability is assessed as the sum of the probabilities of the “Available” states using Eq. (9.8). For this system model, state 1 is the only “Available” State. Therefore, the system availability is As ¼ 0.992758. The same procedure is repeated for calculating the availability value at different PM interval values. After the discussion with domain experts, the PM interval range for maximizing availability is fixed in between 4500 and 5500 h. Using the Golden Search Algorithm, the optimal PM interval is obtained as 4979 h with maximum value of system availability as 0.9985. 6.2 Vertical milling center under run-to-failure-maintenance In the face of Industry 4.0, the failures in the system are inevitable, be it in a cyberphysical system or otherwise [26]. The complex production systems are no exceptions. Reliable and maintainable production systems are conceived at the design stage [27] and [28].
Table 9.2 Distribution parameters. Transition
Distribution
1e2
Weibull (b12,q12)
1e3
Weibull (b13,q13)
1e4 2e1 3e1 4e1
D(T1) Log-Normal (m21,s21) Log-Normal (m31,s31) Log-Normal (m41,s41)
CDF expression
Parameters values
b12 1 e
b12 ¼ 1.1 q12 ¼ 15,000
t q12
b23
t q23
(
1 e 0 t < T1 Uðt T1 Þ ¼ 1 t T1 ðxÞm 1 þ 1 erf ln p ffiffi 21 2 2 2s21 1 þ 1 erf lnðxÞm pffiffi 31 2 2 2s31 1trun1 þ 1 erf lnðxÞm pffiffi 31 2 2 2s 31
b13 ¼ 1.5 q13 ¼ 11,000 240 (hours) m21 ¼ 2.08 s21 ¼ 0.59 m31 ¼ 2.3 s31 ¼ 0.64 m41 ¼ 0.4 s41 ¼ 0.41
Table 9.3 Elements of Kernel matrix K(t). kij
k12
kij(t)
Rt
b12 b q1212
0 F 12 F 14 dF13
0 F 12 F 13 dF14
b13 b q1313
Rt
Rt
ðb12 1Þ e 0 Uðx T1 Þx
Rt
ðb13 1Þ e 0 Uðx T1 Þx b12 b13
0e
k21
F21(t)
1 2
þ 12 erf
k31
F31(t)
1 2
þ 12 erf
k41
F41(t)
1 2
þ 12 erf
x q12
þ
lnðxÞm pffiffi 21 2s21 lnðxÞm pffiffi 31 2s31 lnðxÞm pffiffi 41 2s41
x q13
x q12
þ
x q13
b13 b12
x q13
þ
dðUðx T1 ÞÞ
x q12
dx
dx
Semi-Markov modeling applications in system availability analysis
Rt k14
b12 b13
0 F 14 F 13 dF12
Rt k13
Derived expressions of kij(t)
171
172
pij
Derived expressions for matrix Z elements, pij ¼ kij ðNÞ
b12 b13
p12 b12 b q1212
p13 p14 p21 p31 p41
b13 b q1313
R T1 0
R T1 0
xðb12 1Þ e
xðb13 1Þ e
x q12
þ
x q13
b13 b12 x q13
1 ½k12 ðNÞ þk13 ðNÞ 1 1 1
þ
x q12
pij values
0.010511 dx 0.003198 dx 0.986291 1 1 1
Engineering Reliability and Risk Assessment
Table 9.4 Elements of one step transition probability matrix, Z.
Semi-Markov modeling applications in system availability analysis
Table 9.5 Steady-state probability values: EMC/SMP states. ni
Probabilities values- EMC
pi
Probability values- SMP
n1 n2 n3 n4
0.500000 0.005256 0.001599 0.493145
p1 p2 p3 p4
0.992758 0.000417 0.000163 0.006662
Table 9.6 Mean sojourn time. si
s1 s2 s3 s4
Definition
RN 0
RN 0
RN 0
RN 0
F 12 F 13 F 14 dx
Derived expressions
Values
b12 b13 R T1
x q12
þ
x q13
238.488104 dx
F 21 dx
0 e 1 2 m e 21 þ2s21
F 31 dx
1 2 em31 þ2s31
12.241110
F 41 dx
1 2 em41 þ2s41
1.622633
9.526239
However, the availability, which is the function of reliability and maintainability, becomes a key measure when it comes to the operational effectiveness and efficiency while in operation. The failed production system needs to be repaired soon after it fails so as to ensure the uninterrupted production flow. Although planned maintenance helps to improve the production machine availability, run-to-failure maintenance (RTFM) or breakdown maintenance is unavoidable in the shop floor. Availability modeling of complex production systems under RTFM is the subject of interest. Numerous published articles state that the complex systems such as Vertical Milling Center (VMC) follow decreasing or increasing failure and repair rates [29,30,32]. These cannot be modeled using the conventional Markov method. To address this issue, the availability analysis of VMC is carried out by employing the semi-Markov method. The VMC selected for the case study is being used for machining the engine cylinder block. To assess availability, the complex VMC is decomposed into various elements, namely subsystem, assembly, and component. Then the states, i.e., failed, operating, etc., are identified for each element. Subsequently, the identified states will be used to develop semi-Markov models for the VMC. These developed semi-Markov models can be solved either by using analytical solution methods or by simulation tools to calculate steady-state probabilities and subsequently, the availability of the VMC is determined. The VMC consists of numerous elements that reside in one of the various states such as; operational, nonfunctional, or standby. The occurrence of state change, say from operational to nonfunctional or vice versa, takes place in varying transition rates.
173
174
Engineering Reliability and Risk Assessment
The semi-Markov process can efficiently represent these changes in the system. It is therefore suitable for the availability assessment of VMC. The analysis is restricted to the mechanical elements of the selected VMC. 6.2.1 System description The selected VMC is a Twin Spindle machine that comprises various subunits. The components of the mechanical section include: ball screws (2 nos), axis drives (i.e., X, Y, and Z axis), spindle assemblies (2 nos), tool magazines, gripper arms (Refer Fig. 9.5). The
Figure 9.5 Vertical machining center.
Semi-Markov modeling applications in system availability analysis
mechanical subsystem is actuated and controlled by electrical and CNC control systems. Two separate spindle motors are employed to drive the spindle assemblies 1 and 2. The drives in the X and Y axis are used for moving the work table in two coordinates, whereas the Z-axis drive provides up and downward movement of the spindles. A tool magazine is provided to hold various tools that are to be preselected for the subsequent machining operations. A gripper arm aids to transfer the tools by picking them from the tool magazine and putting them back to the magazine. The components of VMC are continuously put in operation at high speed due to production constraints. Hence, they are subjected to heavy loads and stresses, which eventually lead to increased failure rate even before they reach the end of designed life. This will also call for enhanced repair actions. Operational availability assessment is the key issue for such production machines. It is to be mentioned here that the machine operates in load-sharing mode, meaning that the machining load is being distributed between two spindles, which are working simultaneously to achieve high production requirements. However, this may vary based on the production demand. The production engineer may put a spindle in standby mode and another in functional mode if the production is halved. Hence, it is obvious that the spindle operation will be in two modes of operation: load sharing and standby. In this chapter, the availability of the subsystem unit is evaluated considering these two modes of operation. 6.2.2 Illustration Fig. 9.6 represents the mechanical subsystem of the VMC. Based on the procedure as described under Section 3, the semi-Markov model is created for the mechanical subsystem of the VMC. Fig. 9.7 shows the developed semi-Markov model for the mechanical subsystem of the VMC. It has three operational states such as; 1, 2, and 3, and failed states such as 4, 5, 6, and 7. In Fig. 9.7, the nodes represent states and the arrows represent the
Figure 9.6 Mechanical subsystem of VMC.
175
176
Engineering Reliability and Risk Assessment
Figure 9.7 Semi-Markov model of mechanical subsystem of VMC.
transition. In this example, the failure time is assumed to conform to the Weibull distribution and the repair time follows the Lognormal distribution. The failure and repair times of the VMC were collected from the ERP (Enterprise Resource Planning) of a car production plant. However, the data were found to be incomplete. Therefore, the same are suitably assumed, but keeping in mind their probability of failure and repair. The assumed failure and repair times are used to calculate the parameters, i.e., shape and scale parameters (b, q) of Weibull distribution and mean and standard deviation (m, s) of lognormal distribution. This is carried out by using a parameter estimation method. The well-established rank regression method has been used for calculating the parameters. After evaluating the parameters for the selected considered distributions (i.e., Weibull and lognormal), the steady-state availability of the mechanical subsystem of the machine in load sharing is determined. It is learnt from the states and model that there are three operating states, 1, 2, and 3 for the mechanical subsystem, and hence, the equation for estimating the steady-state availability of mechanical subsystem is given as,
Semi-Markov modeling applications in system availability analysis
AMSVMC ¼ P1 þ P2 þ P3
(9.9)
By substituting the steady-state probabilities of operating states evaluated as per the steps discussed under Section 3, in Eq. (9.9), the steady-state availability of the mechanical subsystem of the VMC in load-sharing mode is evaluated as 0.979. In the similar lines, steady-state availability of the mechanical subsystem is evaluated in standby mode. 6.3 Pumping system under condition-based maintenance A condition-based maintenance (CBM) program recommends maintenance decisions based on health of the system that is collected through condition monitoring. It consists of three steps; data acquisition, data processing, and maintenance decision-making. Diagnostics and prognostics are two important features of a CBM program. The CBM is performed after one or more condition monitoring parameters indicate that the equipment is going to fail or that equipment performance is deteriorating. An optimal condition monitoring interval is decided by maximizing the system availability. This case is illustrated for a centrifugal pump. 6.3.1 System modeling with condition based maintenance A centrifugal pump is continuously being monitored for its health using the various health parameters. Condition monitoring (CM) is applied for degradation/failure detection of the pump so that a maintenance action is taken well in time to avoid the failure of the pump and increase its availability. To develop a pump availability model, degradation, random failures, and condition monitoring and repair of the pump are considered. There are a total of 13 states in the model [31]. For degradation; it is assumed that there is one original operating state, i.e., “as good as new” state (D1) and two degraded states (D2 and D3). It is expected that the CM will not allow the pump to fail. However, there can be its random failure, due to either inherent design problems or common causes such as voltage fluctuation, high voltage, etc. Therefore, it may enter into a random failure state (Fr). It is assumed that CM is performed at scheduled periodic intervals of T1 for state D1, T2 for state D2 and T3 for state D3.These three CM states are depicted as Dx1 ; Dx2 ; and Dx3 : The maintenance activities are performed depending upon the condition of the pump. The minor repair is considered at each stage of degradation, with three minor repair states being mi1 ; mi2 ; and mi3 : Major repair states, ma2 and ma3 , are considered at the second and third stage of degradation. An imperfect repair state im13 is envisaged at stage 3 of degradation. For the total 13 states in the models, various possible transitions and their distribution parameters are selected after discussion with the domain experts. For degradation and random failures, Weibull and exponential distributions are assumed. CM is performed at fixed intervals, and the interval is decided depending upon the degradation stage of the pump. The CM checks take less time for the initial degradation states, while
177
178
Engineering Reliability and Risk Assessment
it is more of the later states, as the later or higher degradation states may require monitoring of additional CM parameters. The time to perform CM at a degraded state is assumed to follow an exponential distribution. As CM time increases with an increase in degradation, the exponential distribution parameter value is expected to decrease for the subsequent degraded states. It is suggested that the parameter value should be decided based on the experience in the plant and in consultation with the vendor or manufacturer. The lognormal distribution, being more appropriate for repair of mechanical systems, is employed in the model. However, the distribution parameter values are decided based on plant experience and in consultation with the vendor. The pump availability model developed is shown in Fig. 9.8. 6.3.2 Analytical solution Following the two-stage analytical procedure as detailed in Section 3, the pump SMP model with CBM is solved by taking three CM intervals T1 ¼ 240 h, T2 ¼ 168 h, and T3 ¼ 96 h corresponding to system states D1, D2, and D3, respectively. The system availability value is obtained as 0.99714. Further, the optimum CM intervals to maximize the system availability are obtained by Genetic Algorithm approach. The results obtained are: T1 ¼ 9650 h; T2 ¼ 9600 h; T3 ¼ 9520 h; and maximum availability achieved is A ¼ 0.999446816 [31].
F
Repair D
D
T
T
D
D
D
D
Decision for ‘no repair’
T
T
D
Degradation/rando
D
Decision
D
CM Check m m
m m
i
m m
i
i
Figure 9.8 System SMP model with CBM.
m
Semi-Markov modeling applications in system availability analysis
6.4 Pumping system under opportunistic maintenance This section illustrates the role of opportunistic maintenance (OM) in increasing system availability. In OM, a repair is carried out on subsystem(s), which is either due or deferred previously when a repair is performed on a failed subsystem or there is a system shutdown due to some other reasons [33]. This strategy leads to an increase in the system availability as it either evades the subsystem failure or minimizes it. This maintenance strategy is widely applied in industries. However, the literature related to availability modeling with opportunistic maintenance is limited. 6.4.1 System modeling with opportunistic maintenance For this case study, a system with a pump and transmission subassembly in series is considered. This system is used round the clock in a centralized air-conditioning unit that supports emergency services in a health center. OM strategy is followed to enhance the availability of the system. For system modeling multistate degradation, corrective maintenance combined with opportunistic is considered. When any one of its subsystems degrades to an unacceptable level, a planned maintenance is undertaken. Under planned maintenance, the repairs can be perfect, imperfect, and minimal depending upon the level of restoration of the subsystem. As long as the subsystem remains under planned maintenance, the other operable but partially degraded subsystems have the scope for the maintenance. But, this maintenance task should be finished before accomplishment of the planned maintenance. In the system model development for each subsystem, state D1 is considered as the original operating state, i.e., “as good as new” state, and other Dj as degraded state. In the proposed OM model, there are three levels of degraded states, i.e., D2, D3, and D4. A maintenance action, i.e., perfect, imperfect, or minimal repair is taken, which restores the subsystem from its degraded state D4 to the previous states i.e., D1, D2, or D3. The states D1, D2, and D3 of the subsystem are considered as working states; while D4 is the “repair state” due to performance below the unacceptable level. Let a system be considered with its two subsystems in series, with each subsystem having one original operating and three states of degradation. It is assumed that both the subsystems cannot attain an unacceptable level at the same time, i.e., state D4. In total there are 15 states feasible states out of 16 possible states for the system (Refer Table 9.7). For the degradation and failures, Weibull distribution is employed while for repairs, lognormal distribution is assumed. The distribution parameters are selected after discussion with plant experts. The rationale to decide for planned or/and opportunistic maintenance is explained here. Table 9.7 shows system states for two subsystems “1” and “2” in series, each
179
180
Engineering Reliability and Risk Assessment
Table 9.7 Feasible states for two subsystems in series. System state S. No. #1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Status #2
Working Working Working Under repair Working Working Working Under repair Working Working Working Under repair Under repair Under repair Under repair
Maintenance possibility
State of subsystem ‘1’ #3
State of subsystem ‘2’ #4
Planned #5
Opportunistic #6
D1 D2 D3 D4
D1 D1 D1 D1
‘ e e Yes (subsystem 1)
e e e No
D1 D2 D3 D4
D2 D2 D2 D2
e e e Yes (subsystem 1)
e e e Yes (subsystem 2)
D1 D2 D3 D4
D3 D3 D3 D3
e e e Yes (subsystem 1)
e e e Yes (subsystem 2)
D1
D4
Yes (subsystem 2)
No
D2
D4
Yes (subsystem 2)
Yes (subsystem 1)
D3
D4
Yes (subsystem 2)
Yes (subsystem 1
with three levels of degradation (D2, D3, and D4) and D1 as the original state. The last two columns of the table give maintenance option or possibility of “Planned Maintenance” and/or “Opportunistic Maintenance” for the subsystems “1” and “2,” which is decided for the system state status (Column 2); “Under Repair,” i.e., 4, 8, 12 to 15. The logic applied for decision-making “yes” or “no” is illustrated by considering the first system state status “Under Repair” (Column 2) in Table 9.7 and i.e., at S. No. 4 (Column 1). In this state, states of subsystems “1” and “2” (Columns 3 and 4) are D4 and D1, respectively. For the subsystem “1” in state, D4, i.e., the highest degraded state, a planned maintenance is the best choice as one needs considerable time to perform its maintenance task, including use of resources needed, which are also expected to be on a higher side. In view of this, it is “yes” logic for planned maintenance of subsystem “1.” With subsystem “1” under maintenance, there is a need to check if “Opportunity Maintenance” for the other subsystem, i.e., subsystem “2” can be undertaken. For this, one needs to check its
Semi-Markov modeling applications in system availability analysis
state, which is D1, i.e., “Original Operating State,” and it does not need it. Hence, the decision logic for this is “no” (Column 6) for Opportunity Maintenance. In the similar way, decisions for “yes” and “no” for other system states 8 and 12e15, which are under “Under Repair,” are carried out. Kumar et al. [30] in their work developed six models for the system, and out of which three models are developed employing planned maintenance, which can be perfect, minimal, or imperfect repair along with the opportunistic maintenance, while the three others are developed without the opportunistic maintenance. 6.4.2 System model: planned perfect repair with opportunistic maintenance Refer the model in Fig. 9.9, the OM is considered along with planned perfect repair. For a system with two subsystems in series, six states are the “under repair” states, remaining nine states are the “working” states out of the 15 possible system states. The OM is not possible for the states 4 and 13 as the second subsystem is still in good condition (state D1). Refer Fig. 9.8, the planned perfect repair for these states is represented by edges 4/1 and 13/1 for the subsystems “1” and “2” respectively that restore the subsystem from its state, D4 to D1. For the states 8, 12, 14, and 15, the OM is possible because the second subsystem is in partially degraded state (D2 or D3). The opportunistic maintenance with planned perfect repair is shown with edges 8e1, 12e1, 14e1, and 15e1 (Refer Fig. 9.9). These four transitions represent planned perfect repair, restoring the subsystem, which is under repair from its state, D4 to D1. The edges
Figure 9.9 System SMP model: planned perfect repair with opportunistic maintenance.
181
182
Engineering Reliability and Risk Assessment
Table 9.8 Steady-state availability e OM model (analytical results). Repair level
Planned maintenance
Planned maintenance with opportunistic maintenance
Minimal Imperfect Perfect
0.99875 0.99930 0.99938
0.99917 0.99946 0.99953
12e1 and 15-1 depict the restoration of the subsystem undergoing OM, from state D3 to D1; and the edges 8e1 and 14e1 that show the restoration of the subsystem from state D2 to D1. The developed system model considering subsystem degradation, planned perfect repair with OM is shown in Fig. 9.9. Following the two-stage analytical procedure as detailed in the section the pump SMP model with planned perfect repair along with OM is solved. The system availability value is obtained as 0.99953. On the similar lines other five models are developed and solved using the two-stage analytical methods. The results obtained for the six models are shown in Table 9.8. From the results as obtained above, it is observed that the steady-state availability of the pump and transmission subassembly of the air-conditioning system increases when the planned and opportunistic maintenance activities are performed together for three levels of repairs, i.e., minimum, imperfect, and perfect. The SMP model has accurately captured the degree of increase in the availability when the system that is under planned maintenance is shifted to combined planned and opportunistic maintenance.
7. Conclusion This chapter demonstrates the extensive applications of semi-Markov models for steadystate availability analysis of repairable mechanical systems. Various maintenance scenarios are considered in the analysis to illustrate that there is a significant improvement in the system availability. Steady-state availability of serially connected two-pump systems under fixed and optimal PM intervals of 4979 h has been evaluated as 0.9985. The mechanical subsystem of VMC, which is subject to run-to-failure, shows the steady-state availability as 0.979 in load sharing mode. A centrifugal pump that undergoes degradation, random failures, conditioning monitoring, and repair actions yields the steadystate availability as 0.99714, and with optimal CM interval, the achieved availability is obtained as 0.999446816. The serially connected pump and transmission subassembly of a central air-conditioning unit at different repair levels shows an improved availability with the combined planned and opportunistic maintenance than with merely planned maintenance. In all examples, the failure and repair rate follows Weibull and lognormal distribution, which represents the behavior of real systems that often create uncertainties
Semi-Markov modeling applications in system availability analysis
about their states and their state transitions, which can effectively be modeled for their steady-state availability using semi-Markov method. The application of semi-Markov models can be extended to every complex engineering system that is critically used in production industries where the steady-state availability remains the key performance measure that helps to monitor and enhance the productivity of the plant.
References [1] V. Narula, G. Kumar, M.K. Loganathan, Improving the productivity of the machining process of a manufacturing company: a six sigma case study, Interdisciplinary Research in Technology and Management (2021) 310e314. [2] C.E. Ebeling, An Introduction to Reliability and Maintainability Engineering, Tata McGraw-Hill Publishing Company Limited, New Delhi, India, 2010. [3] M.K. Loganathan, M.S. Gandhi, O.P. Gandhi, Functional cause analysis of complex manufacturing systems using structure, Proceedings of the Institution of Mechanical Engineers, Part B: Journal of Engineering Manufacture 229 (3) (2015) 533e545. [4] T. Markeset, U. Kumar, Design and development of product support and maintenance concepts for industrial systems, Journal of Quality in Maintenance Engineering 9 (4) (2003) 376e392. [5] A. Zutshi, A. Sohal, Adoption and maintenance of environmental management systems, Management of Environmental Quality: An International Journal 15 (4) (2004) 399e419. [6] J.K. Muppala, M. Malhotra, K.S. Trivedi, Markov dependability models of complex systems: analysis € techniques, in: S. Ozekici (Ed.), Reliability and Maintenance of Complex Systems. NATO ASI Series (Series F: Computer and Systems Sciences), vol. 154, Springer, Berlin, Heidelberg, 1996, https:// doi.org/10.1007/978-3-662-03274-9_24. [7] K. Trivedi, A. Bobbio, Reliability and Availability Engineering: Modeling, Analysis, and Applications, Cambridge University Press, Cambridge, 2017, https://doi.org/10.1017/978131616304. [8] H. Garg, An approach for analyzing the reliability of industrial system using fuzzy Kolmogorov’s differential equations, Arabian Journal for Science and Engineering 40 (2015) 975e987, https://doi.org/ 10.1007/s13369-015-1584-2. [9] D. Feng, H. Shuai, Multi-state reliability analysis of rotor system using Semi-Markov model and UGF, Journal of Vibroengineering 20 (5) (2018) 2060e2072, https://doi.org/10.21595/jve.2018.19292. [10] H. Garg, Performance analysis of an industrial system using soft computing based hybridized technique, Journal of the Brazilian Society of Mechanical Sciences and Engineering 39 (2017) 1441e1451, https://doi.org/10.1007/s40430-016-0552-4. [11] M. Ram, S. Tyagi, A. Kumar, N. Goyal, Analysis of signature reliability of ring-shaped network system, International Journal of Quality & Reliability Management (2021), https://doi.org/ 10.1108/IJQRM-04-2021-0113. Vol. ahead-of-print No. ahead-of-print. [12] R. Dekker, W. Groenendijk, Availability assessment methods and their application in practice, Microelectronics Reliability 35 (Nos 9e10) (1995) 1257e1274. [13] O. Ivanchenko, V. Kharchenko, B. Moroz, L. Kabak, I. Blindyuk, K. Smoktii, Semi-markov availability model for infrastructure as a service cloud considering energy performance, in: V. Kharchenko, Y. Kondratenko, J. Kacprzyk (Eds.), Green IT Engineering: Social, Business and Industrial Applications. Studies in Systems, Decision and Control, vol. 171, Springer, Cham, 2019, https://doi.org/10.1007/978-3-030-00253-4_7. [14] A. Veeramany, M.D. Pandey, Reliability analysis of nuclear piping system using semi-Markov process model, Annals of Nuclear Energy 38 (5) (2011) 1133e1139, https://doi.org/10.1016/ j.anucene.2010.12.012. [15] X. Wu, P. Shi, Y. Tang, S. Mao, F. Qian, Stability analysis of semi-markov jump stochastic nonlinear systems, in: IEEE Transactions on Automatic Control, 2021, https://doi.org/10.1109/ TAC.2021.3071650.
183
184
Engineering Reliability and Risk Assessment
[16] J. Yu, Q. Li, W. Xing, X. Yuan, Y. Shi, Event-based consensus tracking for nonlinear multi-agent systems under semi-markov jump topology, in: IEEE Access, vol. 9, 2021, pp. 135868e135878, https://doi.org/10.1109/ACCESS.2021.3116253. [17] V.G. Kulkarni, Modeling and Analysis of Stochastic Systems, Chapman and Hall, London, UK, 2009. [18] A. Dennis, Application of MCS system reliability analysis, in: Proceedings of TAMU, 20th International Pump Users Symposium, Department of Mechanical Engineering, Turbomachinery Laboratory, Texas A & M University, College Station, Houston, Texas, USA, March 17e20, 2003, pp. 91e94. [19] P. Jager, B. Bertsche, A new approach to gathering failure behaviour information about mechanical components based on expert knowledge, in: Proceedings of Annual Reliability and Maintainability Symposium, Los Angeles, USA, January 26-29, IEEE, 2004, pp. 90e95. [20] R. Denning, A. Wood, “Applied R&M Manual for Defence Systems”, Part-D Supporting Theory, Chapter 4- Monte Carlo Simulation, 2012, pp. 1e8. GR-77 Issue 2012. [21] M.S. Rao, V.N.A. Naikan, A Markov system dynamics based availability and reliability analysis of a process industry, Communications in Dependability and Quality Management 12 (3) (2009) 29e48. [22] V.P. Koutras, A.N. Platis, Semi-Markov availability modeling of a redundant system with partial and full rejuvenation actions, in: Proceedings of 3rd International Conference on Dependability of Computer Systems, Szklarska Poreba, Poland, June 26e28, IEEE, 2008, pp. 127e134. [23] C.L. Tomasevicz, S. Asgarpoor, Preventive maintenance using continuous-time semi-markov processes, in: Proceedings of the 38th North American Power Symposium, Carbondale, Illinois, USA, September 17-19, IEEE, 2006, pp. 3e8. [24] S.H. Sim, J. Endrenyi, Optimal preventive maintenance with repair, Journal of IEEE Transactions on Reliability 37 (1) (1988) 92e96. [25] G. Kumar, P.V. Joel, Optimum preventive maintenance policy for a mechanical system using SemiMarkov approach and Golden section search technique, in: 2018 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM), Bangkok, Thailand, Dec. 16-19, 2018, 2019, pp. 232e236. [26] M.K. Loganathan, P. Goswami, B. Bhagawati, Failure evaluation and analysis of mechatronics- based production systems during design stage using structural modeling, Applied Mechanics and Materials 852 (2016) 799e805. 10.4028/www.scientific.net/AMM.852.799. [27] M.K. Loganathan, M.S. Gandhi, O.P. Gandhi, Functional cause analysis of complex manufacturing systems using structure, in: Proceedings of the Institution of Mechanical Engineers, Part B: Journal of Engineering Manufacture, vol. 229, 2015, pp. 533e545, 3. [28] M.K. Loganathan, O.P. Gandhi, Reliability evaluation and analysis of CNC camshaft grinding machine, Journal of Engineering, Design and Technology 13 (1) (2015) 37e73, https://doi.org/ 10.1108/JEDT-10-2012-0042. [29] G. Kumar, V. Jain, O.P. Gandhi, Availability analysis of repairable mechanical systems using analytical Semi-Markov approach, Quality Engineering 25 (2) (2013) 97e107. [30] G. Kumar, V. Jain, O.P. Gandhi, Steady-state availability analysis of repairable mechanical systems with opportunistic maintenance by using Semi-Markov Process, International Journal of System Assurance and Management 5 (4) (2014) 664e678. [31] G. Kumar, V. Jain, O.P. Gandhi, Availability analysis of mechanical systems with condition-based maintenance using semi-Markov and evaluation of optimal condition monitoring interval, Journal of Industrial Engineering International 14 (1) (2018) 119e131. [32] M.K. Loganathan, G. Kumar, O.P. Gandhi, Availability evaluation of manufacturing systems SemiMarkov model, International Journal of Computer Integrated Manufacturing 29 (7) (2016) 720e735, https://doi.org/10.1080/0951192X.2015.1068454, 107. [33] T. Bedford, B.M. Alkali, Competing risks and opportunistic informative maintenance, in: Proceedings of the Institution of Mechanical Engineers, Part O: Journal of Risk and Reliability, vol. 223, 2009, pp. 363e372, 4.
CHAPTER 10
An a-cut interval-based similarity aggregation method to evaluate fault tree events for system safety under fuzzy environment Mohit Kumar Department of Basic Sciences, Institute of Infrastructure Technology Research and Management, Ahmedabad, India
1. Introduction Fault tree analysis (FTA) is a logical risk assessment method, which is widely utilized to determine the possible causes and occurrence probability of an unexpected event, called top event (system failure) [1,2]. The top event is positioned at the top and tailed by primary binary events. The occurrence causes of events are distributed into divisions. Based on assessment, basic root causes (basic events) are identified. The basic events (BEs) in FTA are statistically independent and connected with each other by Boolean logic gates. System safety assessment involves qualitative as well as quantitative evaluation of system fault tree. In qualitative FTA, BEs and their logical relationships are determined to find a logical expression for top event. The quantitative FTA is executed to estimate the probability of top event based on BEs probabilities. Therefore, precise probabilities of BEs result in precise estimation of probability of top event in quantitative FTA [3]. Due to insufficient knowledge and changing operating environment, collecting precise quantitative failure data of system components becomes very challenging. Even, quantitative failure data for some new components may be unavailable. To overcome such difficulties and limitations in FTA, fuzzy set theory [4] has been often employed in FTA. Mahmood et al. [5] presented a literature study on fuzzy fault tree analysis (FFTA), which reveals the strengths and weaknesses of FFTA approaches with their applications. FFTA model considers the past failure experiences of its BEs for risk and safety analysis of a system. In absence of precise quantitative failure data of BEs, qualitative data such as expert’s views could be considered to evaluate BEs probabilities. Several investigators have been successfully presented the qualitative FFTA in different fields [6e9]. Furthermore, it has been confirmed that experts are more comfortable in providing the failure possibilities of components in terms of linguistic expressions [10]. The qualitative FFTA model for fire and explosion of crude oil tanks was presented by Wang et al. [11]. Purba et al. [12] utilized qualitative failure data to predict the component failure probabilities of nuclear power plant. Rajakarunakaran et al. [13] generated the failure probabilities of Engineering Reliability and Risk Assessment ISBN 978-0-323-91943-2, https://doi.org/10.1016/B978-0-323-91943-2.00002-2
© 2023 Elsevier Inc. All rights reserved.
185
186
Engineering Reliability and Risk Assessment
components of LPG refueling facility by using FFTA methodology and expert judgments. Mohsendokht [14] also employed expert elicitation approach with FFTA to assess the risk of uranium hexafluoride release from a uranium conversion facility. Cheliyan and Bhattacharyya [15] developed an FFTA of oil and gas leakage in a subsea production system using expert elicitation method. Hu et al. [16] proposed FFTA of the failure of above-ground walled storage system by using qualitative knowledge collected through expert elicitation and estimated the occurrence possibilities of fault tree BEs. There are several other research studies [17e22], which have shown the importance of qualitative data-based FFTA for the assessment of system safety and reliability in the absence of quantitative failure data of components. It can be seen that fuzzy set theoryebased methodologies has been employed very effectively to assess the failure probabilities of system components by processing the subjective judgments of experts. However, the subjective judgments may cause error that can be reduced by assigning weights to experts based on their profession, working experience, and knowledge level. The subjective judgments of different experts are aggregated to get single opinion about the BE possibility. Many aggregation techniques are available in literature to aggregate distinct linguistic operations of experts. Among the available aggregation methods, only similarity aggregation method (SAM) considers expert’s importance and consensus index to effectively aggregate the opinions of experts. Many researchers have been utilized SAM to determine the risk factor and to estimate the failure probabilities of systems through FFTA. Unver et al. [23] discussed a case study of crankcase explosion in two-stroke marine diesel engine by using the concept of SAM in FFTA and determined all root causes of crankcase explosion. Yin et al. [24] presented SAM-based FFTA for safety analysis of natural gas storage tank. Kumar et al. [25] utilized SAM in intuitionistic fuzzy FTA to evaluate the system failure probability and performed a case study of the oil tank fire and explosion accident to illustrate the applicability of developed approach. Guo et al. [26] proposed an improved SAM-based FBN model for risk analysis of storage tank accident and obtained more reliable results of the storage tank accident. Similarly, Kuzu et al. [27] investigated the risk of manifold leakage of ammonia by employing the SAM-based system FFTA. The objective of this chapter is to evaluate the fault tree BEs by employing a novel similarity aggregation method. After evaluating the probabilities of BEs, top event probability can be evaluated.
2. Preliminaries of fuzzy set theory Some basic definitions and concepts related to the present study are introduced. 2.1 Fuzzy set [28] The concept of fuzzy set was introduced by Zadeh [4] as an extension of classical set. A
An a-cut interval-based similarity aggregation method
e on a universe of discourse X is defined as fuzzy set A o n e ¼ A x; me ðxÞ ; x ˛ X A
(10.1)
where the function me : X/½0; 1 is called membership function. The value me ðxÞ A A e and is denotes the membership grade to which element x belongs to fuzzy set A associated with each element x of universal set X. 2.2 Fuzzy number [28] e on real line R, satisfying following properties: A fuzzy number is a fuzzy set A e is normal fuzzy set i.e., Maxx˛X m ðxÞ¼1, (i) A e A e is a closed interval, where (ii) For each a˛ð0; 1, the a-cut Aa of fuzzy set A n o Aa ¼ x ˛ R : meðxÞ a ; A o n e of fuzzy set A e is bounded, where S A e ¼ x ˛R : m ðxÞ > 0 . (iii) Support S A e A e ¼ ða1 ; a2 ; a3 ; a4 Þ; a1 a2 a3 a4 is a fuzzy set A trapezoidal fuzzy number A with membership function, 8 xa 1 > ; a1 x < a2 > > > a2 a1 > > > < 1; a2 x a3 me ðxÞ ¼ (10.2) a x A 4 > > ; a < x a > 3 4 > > a a3 > > 4 : 0; otherwise e is Triangular fuzzy number is a particular trapezoidal fuzzy number. If b ¼ c, then A called triangular fuzzy number and denoted by (a, c, d). 2.3 a-Cut interval-based arithmetic operations on fuzzy numbers [28] e and B e with their a-cut intervals Aa ¼ ala ; aua and Consider two fuzzy numbers A Ba ¼ bla ; bua respectively (Fig. 10.1). e and B e are The a-cut intervals-based arithmetic operations on fuzzy numbers A defined by Eqs. (10.3)e(10.7). Addition: Aa þ Ba ¼ ala þ bla ; aua þ bua (10.3)
187
188
Engineering Reliability and Risk Assessment
e and e Figure 10.1 Fuzzy numbers A B with their a-cut intervals.
Subtraction:
Multiplication:
Division:
Aa Ba ¼ ala bua ; aua bla
(10.4)
Aa Ba ¼ ala bla ; aua bua
(10.5)
Aa ÷ Ba ¼ ala = bua ; aua = bla
(10.6)
Scalar multiplication:
( kAa ¼
kala ; kaua ; u kaa ; kala ;
k>0 k < 0; if 0 < < 1 or 0
< 0; if 0 < < 1 or 0