242 37 21MB
English Pages 292 [294] Year 2023
RELIABILITY AND PROBABILISTIC SAFETY ASSESSMENT IN MULTI-UNIT NUCLEAR POWER PLANTS
RELIABILITY AND PROBABILISTIC SAFETY ASSESSMENT IN MULTI-UNIT NUCLEAR POWER PLANTS
C. SENTHIL KUMAR
Academic Press is an imprint of Elsevier 125 London Wall, London EC2Y 5AS, United Kingdom 525 B Street, Suite 1650, San Diego, CA 92101, United States 50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom Copyright © 2023 Elsevier Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. ISBN: 978-0-12-819392-1 For Information on all Academic Press publications visit our website at https://www.elsevier.com/books-and-journals Publisher: Charlotte Cockle Acquisitions Editor: Megan R. Ball Editorial Project Manager: Michelle Fisher Production Project Manager: Surya Narayanan Jayachandran Cover Designer: Mark Rogers Typeset by Aptara, New Delhi, India
To my parents To my wife and two lovely daughters
Contents About the Author Preface
1
2
3
Reliability modeling
xi xiii
1
1.1 Reliability mathematics 1.2 Probability theory 1.3 Probability distributions 1.4 System reliability Further readings
1 5 13 22 35
Introduction to probabilistic safety assessment
37
2.1 Safety approach in NPPs—defense-in-depth 2.2 Need for PSA 2.3 Regulatory decision making with PSA insights 2.4 Approaches for regulatory decisions 2.5 Quality assurance 2.6 Standardization of PSA 2.7 PSA methodology 2.8 Initiating event frequency 2.9 Component data 2.10 Human reliability 2.11 Dependence analysis 2.12 Passive systems 2.13 Software reliability 2.14 Uncertainty analysis 2.15 Sensitivity analysis 2.16 Importance measures 2.17 Applications of PSA Further readings
40 41 42 44 46 48 48 59 60 61 62 64 67 74 75 75 78 79
Risk assessment
83
3.1 Background 3.2 Objective and scope 3.3 Qualitative and quantitative methods of risk assessment Further readings
86 87 88 101
vii
viii
4
Contents
Site safety goals 4.1 Multi-unit considerations 4.2 Site safety goals 4.3 Site safety goals—international scenario 4.4 Multi-criteria analysis for risk metrics 4.5 Communication of risk information to public and their perception Further readings
5
Challenges in risk assessment of multiunit site 5.1 Key issues 5.2 Methods for integrated risk assessment 5.3 Seismic PSA for multiunit site 5.4 MUPSA for Level 2 5.5 MUPSA for Level 3 Further readings
6
Risk aggregation 6.1 Unit level 6.2 Site level 6.3 Aspects to be considered in risk aggregation 6.4 Risk aggregation and its effect on risk metric 6.5 Mathematical aspects of risk aggregation 6.6 Interpretation of results 6.7 Risk aggregation for risk-informed decisions Further readings
7
Human reliability 7.1 Introduction 7.2 Types of human errors 7.3 Human error in nuclear power plants 7.4 Human reliability models 7.5 HRA generations 7.6 Human cognitive architecture 7.7 HRA in the context of multiunit PSA Further readings
8
Common cause failures and dependency modeling in single and multiunit NPP sites 8.1 Dependent failures 8.2 Common cause failures
103 104 104 107 109 111 112
115 116 120 138 146 147 148
151 153 154 154 168 168 169 169 170
173 173 175 177 179 190 195 196 197
201 202 202
Contents
8.3 CCF models 8.4 Impact vector method to estimate the alpha factors 8.5 Approach for interunit CCF in multiunit sites Further readings
9
International studies related to multiunit PSA: A review
ix
204 206 217 218
221
9.1 Seabrook PSA 9.2 Byron and Braidwood PSA 9.3 Research work at Maryland University, United States 9.4 Korea Atomic Energy Research Institute 9.5 CANDU Owners Group 9.6 Multiunit PSA studies at EDF France 9.7 Fukushima Daiichi experience 9.8 MUPSA research in India 9.9 MUPSA approach in the United Kingdom 9.10 Site risk model development in Hungary 9.11 Other countries 9.12 Summary of international experience on MUPSA Further readings
221 223 224 225 226 230 231 232 235 235 237 237 238
10 Multiunit risk assessment for small modular reactors
241
10.1 Introduction 10.2 Small modular reactors 10.3 Multiunit risk in chemical industries Further readings
11 Summary 11.1 Case study and conclusions 11.2 Different approaches for MUPSA 11.3 Application of MUPSA methodology 11.4 Insights and lessons learnt in MUPSA 11.5 Closure Further readings Index
241 242 249 255
257 257 257 261 269 271 271 273
About the Author Dr. C. Senthil Kumar has over 30 years of research experience in the areas of reliability engineering, risk assessment studies, probabilistic safety assessment, software reliability, seismic safety, statistical analysis, etc. After his Ph.D. in reliability at Anna University in 2005, Dr. Kumar did his post-doctoral research in Sweden in 2009–10. His post-doctoral study is in the area of software reliability for computer-based systems in safety-critical operations, real-time scheduling for adaptive fault tolerance in multiprocessor systems, software testing, fault injection, and mutation studies. He has guided five Ph.D.students,two MS students,and several research fellows.He is a reviewer of journal articles in Elsevier publications and has authored/co-authored more than 32 international peer-reviewed journal publications and several national publications. C. Senthil Kumar
xi
Preface Accidents at more than one unit in multi-unit nuclear site are a serious concern for regulators, especially after the Fukushima accident. Safety experts are making continuous efforts to develop guidance documents and establish risk assessment methodologies for multi-unit sites as there is a growing concern about the impact of dependencies among the nuclear power plants co-located at a site. Various methods are being developed to model these dependencies and identify the critical contributors to the site risk. Methods are also underway to aggregate the risk associated with each unit and develop a holistic approach to address the factors that are important to risk assessment in multiple units. Further, International Atomic Energy Agency is strengthening and increasing its peer review and advisory missions to member states with an aim to harmonize the different approaches adopted for multi-unit risk assessment. The aim of this book is to give an overview of various aspects involved in multi-unit risk assessment with emphasis on new developments, general trends, and points of interest regarding multiple accidents at a nuclear site. It is recognized that probabilistic safety assessment (PSA) is a systematic methodology used to evaluate risks and to obtain insights into the weak links in the design and operation of a nuclear power plant. PSA is also a very useful tool to identify and prioritize safety improvements. A detailed discussion on PSA method and the risk assessment strategies is presented before the introduction of multi-unit aspects in the book before the discussion on multi-unit PSA methodologies in the book. Internationally, enough expertise on the application of PSA for single units is available yet it is challenging to identify the unique factors for modelling in multi-unit PSA. Moreover, at a multi-unit site, operation of different types of reactors,site-specific internal and external hazards,different combinations of the operating state of the reactors, common and shared safety systems, and the interaction effects are some of the major issues to be considered and hence it is complex to arrive at a consensus on the uniform Multi-unit PSA approach. The international developments in the above areas are compiled to give an overview of the importance in the evaluation
xiii
xiv
Preface
of site-level risk integrating the various risk contributions from different radiological sources to the multi-unit risk assessment. Though the focus of this book is to discuss the multi-unit risks in sites with multiple large-scale nuclear reactors, a glimpse of its importance in small modular reactor sites, and chemical and petrochemical industries is also highlighted. It is expected that the content of the book will be useful for both academics and experts in safety-critical industries with multi-unit operations. C. Senthil Kumar
CHAPTER 1
Reliability modeling Contents 1.1 Reliability mathematics 1.1.1 Set theory 1.1.2 Fundamentals of Boolean algebra 1.2 Probability theory 1.2.1 Conditional probability 1.2.2 Bayes theorem 1.3 Probability distributions 1.3.1 Discrete probability distribution 1.3.2 Continuous probability distribution 1.4 System reliability Further readings
1 1 1 5 9 10 13 15 16 22 35
The fundamental concepts of reliability and PSA are derived and built upon reliability mathematics. The elements of system and component reliability modelling involve the theory of statistical distributions which are explained in this chapter.
1.1 Reliability mathematics 1.1.1 Set theory A set is a collection of objects called elements. For example, the numbers 1, 2, 3 can be a set, a set A = {1, 2, 3}. A set can be finite or infinite. A set of positive integers, set of natural numbers, etc. are examples of infinite set. Empty set is a set with no elements, φ = {}.
1.1.2 Fundamentals of Boolean algebra Boolean algebra is widely used in the field of reliability. In Boolean algebra, the possible values for the variables are either 1 or 0. Similarly, in reliability, systems and components can be present in either success or failure state. Consider a variable X that denotes the state of a component and assume 1 represents success and 0 represents failure. Then, the probability that X is equal to 1, P(X = 1) is called the reliability of that particular component. Depending upon the configuration of the system, it will also have a success or failure state. Based on this binary state assumption, Boolean algebra can be conveniently used. Reliability and Probabilistic Safety Assessment in Multi-Unit Nuclear Power Plants. DOI: https://doi.org/10.1016/B978-0-12-819392-1.00003-5
Copyright © 2023 Elsevier Inc. All rights reserved.
1
2
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
It is the set of rules, used to simplify or minimize the given logic expression without changing its functionality. Law
Properties
A+0=A 1·A=A If A = 1, then A¯ = 0 A + A¯ = 1 A · A¯ = 0 A + A =A A·A=A A·0=0 A · (B + C) = A · B + A · C A + (B · C) = (A+B) · (A+C) A+B=B+A A·B=B·A A + (B + C) = (A + B) + C A · (B · C) = (A · B) · C A + AB = A A (A + B) = A ¯ A · (A + B) = AB A + B = A¯ · B¯ A.B = A¯ + B¯
Identity Complement
Idempotent Zero property Distributive Commutative Associative Absorption Multiplicative De Morgan
Logic gates: Logic gates are the building blocks of any digital system. A logic gate has one or more inputs but only one output. The types of logic gates are as follows: a. Basic Gates: AND gate, OR gate, NOT gate b. Universal Gates: NAND gate, NOR gate c. Special Gates: XOR gate, XNOR gate AND gate: An AND gate has two or more inputs but only one output which is the product of all the inputs.
To relate the inputs to outputs, truth table is used. Truth table for AND gate: Inputs
A 0 0 1 1
Output
B 0 1 0 1
Y = A. B 0 0 0 1
Reliability modeling
3
OR gate: Like an AND gate, OR gate also has two or more inputs but only one output which is the sum of all the inputs.
Truth table for OR gate: Inputs
A 0 0 1 1
Output
B 0 1 0 1
Y=A+B 0 1 1 1
NOT gate: A NOT gate is also called an inverter gate. It has a single input and single output and the output is the complement of the input.
Truth table for NOT gate: Input
Output
A 0 1
¯ Y=A 1 0
Universal gate is a gate used to implement any other gates without using any other type of gates. NAND gate: It has two or more inputs but only one output. The output is the complement of the product of the inputs. In simple terms, a NAND gate is a combination of NOT and AND gate.
Truth table for NAND gate: Inputs
A 0 0 1 1
Output
B 0 1 0 1
Y=A.B 1 1 1 0
4
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
NOR gate: It has two or more inputs but only one output. The output is the complement of the sum of the inputs. In simple terms, a NOR gate is a combination of NOT and OR gate.
Truth table for NOR gate: Inputs
Output
A 0 0 1 1
B 0 1 0 1
Y=A+B 1 0 0 0
Special gates are exclusive gates used in particular digital circuits. XOR gate: It is two or more inputs and only one output. XOR is also called as exclusive OR gate. ¯ Y = A B = AB¯ + AB
Truth table for XOR gate: Inputs
A 0 0 1 1
Output
B 0 1 0 1
¯ Y = AB¯ + AB 0 1 1 0
XNOR gate: It is two or more inputs and only one output. XNOR is also called as exclusive NOR gate and is the complement of the XOR gate. Y = A B = AB + A¯ B¯
Reliability modeling
5
Truth table for XOR gate: Inputs
A 0 0 1 1
Output
B 0 1 0 1
Y = AB + A¯ B¯ 1 0 0 1
1.2 Probability theory In probability theory, an experiment is an operation which results in a welldefined outcome. Experiment is of two types: deterministic and probabilistic (random). If an experiment, when repeated under identical conditions, result in the same outcome every time, it is called a deterministic experiment. If an experiment, when repeated under identical conditions, do not result in the same outcome every time but the result is one of the several possible outcomes, it is called a probabilistic or random experiment. Sample space: The set of all possible outcomes of a random experiment is called the sample space of the experiment. The sample space is denoted as S. If a coin is tossed, the S = {H, T}. Consider an experiment in which you measure the thickness of a material. The possible values for thickness depend on the resolution of the measuring instrument, and they also depend on upper and lower bounds for thickness. It is easier to define the sample space as the positive real line S ={x|x > 0} because a negative value cannot occur. If it is known that the thickness will be between 10 and 15 mm, the sample space could be S = {x|10 < x < 15} If the objective of the analysis is to consider only whether a particular part is low, medium, or high for thickness, the sample space might be taken
6
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
to be the set of three outcomes: S = low, medium, high If the objective of the analysis is to consider only whether or not a particular part conforms to the required specifications, the sample space might be simplified to the set of two outcomes that indicate whether or not the part meets the requirement. S = yes, no A sample space is discrete if it consists of a finite or countable infinite set of outcomes. A sample space is continuous if it contains an interval (either finite or infinite) of real numbers. Event: An event is a subset of the sample space associated with a random experiment. It is denoted by E or A, B, C, etc. If S = {1, 2, 3, 4, 5, 6}, then E1 can be {2, 4, 5}, E2 = {1, 3} Axioms of probability: Probability is a number that is assigned to each member of a collection of events from a random experiment that satisfies the following properties. If S is the sample space and E is any event in a random experiment: 1. P(S) = 1 2. 0 ≤ P(E) ≤ 1 3. For two events E1 and E2 with E1 ∩ E2 = ∅, P(E1 ∪ E2) = P(E1) + P(E2) The property that 0 ≤ P(E) ≤ 1 is equivalent to the requirement that a relative frequency must be between 0 and 1. The property that P(S) = 1 is a consequence of the fact that an outcome from the sample space occurs in every trial of an experiment. Consequently, the relative frequency of S is 1. Property 3 implies that if the events E1 and E2 have no outcomes in common, the relative frequency of outcomes is the sum of the relative frequencies of the outcomes in E1 and E2. Elementary event: If a random experiment is performed, then each of its outcomes is knows as an elementary event. Also, an event which contain only on element is called an elementary event. Impossible event: A null set (φ) is a subset of every set and. An event E = φ or {} is called an impossible event. If a coin is tossed, an event which does not result in either head or tail, E = {} is an impossible event. Compound event: If an event contain more than one element is called a compound event.
Reliability modeling
7
Probability is used to quantify the likelihood, or chance, that an outcome of a random experiment will occur. “The chance of rain today is 30%” is a statement that quantifies our feeling about the possibility of rain. The likelihood of an outcome is quantified by assigning a number from the interval [0, 1] to the outcome (or a percentage from 0% to 100%). Higher numbers indicate that the outcome is more likely than lower numbers. A 0 indicates an outcome will not occur. A probability of 1 indicates an outcome will occur with certainty. Different individuals will no doubt assign different probabilities to the same outcomes. Another interpretation of probability is based on the conceptual model of repeated replications of the random experiment. The probability of an outcome is interpreted as the limiting value of the proportion of times the outcome occurs in n repetitions of the random experiment as n increases beyond all bounds. For example, if we assign probability 0.2 to the outcome that there is a corrupted pulse in a digital signal, we might interpret this assignment as implying that, if we analyze many pulses, approximately 20% of them will be corrupted. This example provides a relative frequency interpretation of probability. The proportion, or relative frequency, of replications of the experiment that result in the outcome is 0.2. When the model of equally likely outcomes is assumed, the probabilities are chosen to be equal. Whenever a sample space consists of N possible outcomes that are equally likely, the probability of each outcome is 1/N. A random experiment can result in one of the outcomes {k,l,m,n} with probabilities 0.2, 0.1, 0.4, and 0.3, respectively. Let A denote the event {a,b}, B the event {b,c,d}, and C the event {d}. Then, P(A) = 0.2 + 0.1 = 0.3 P(B) = 0.1 + 0.4 + 0.3 = 0.8 P(C) = 0.3 ¯ = 0.7, P(B) ¯ = 0.2 and P(C) ¯ = 0.7 P(A) P(A ∩ B) = 0.1 P(A U B) = 1 and P(A ∩ C) = 0 Mutually exclusive events: Two events are said to be “mutually exclusive” if the occurrence of one event prevents the occurrence of the other event. When two events cannot happen simultaneously, they are said to be mutually exclusive. If events A and B are mutually exclusive, then the probability of A and B occurring together is zero. That is, P(A and B) = 0.
8
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Independent events: Two events are said to be “independent” if the occurrence of one does not affect the probability of occurrence of other event. Event A is said to be independent of event B if the probability of occurrence of event A is not affected by the occurrence of B. If events A and B are independent, then P(A and B) = P(A) x P(B). Two events A and B are said to be independent if and only if P(A U B) = P(A) + P(B) i.e., P(A ∩ B) = 0 Problem: The following circuit operates only if there is a path of functional devices from left to right. The probability that each device functions is shown on the graph. Assume that devices fail independently. What is the probability that the circuit operates?
Let T and B denote the events that the top and bottom devices operate, respectively. There is a path if at least one device operates. The probability that the circuit operates is ¯ P(A or B) = 1 − P[(A or B)] = 1 − P(A¯ and B) 2 ¯ ¯ ¯ ¯ P(A and B) = P(A) P(B) = (1 − 0.95) = 0.052 And so, P(A or B) = 1 − 0.052 = 0.9975 Exercise:
Reliability modeling
9
1.2.1 Conditional probability The concept of conditional probability is the most important in all of probability theory. Given two events A and B associated with a random experiment, P(A|B) is defined as the conditional probability of A given that B has occurred. P(A|B) = P(A ∩ B)/P(B). If A and B are independent events then the conditional probability P(A|B) = P(A) only. Thus, when A and B are independent, the probability that A and B occur together is simply the product of the probabilities that A and B occur individually. In a production process, 10% of the items contain flaws and 25% of the item with flaws are functionally defective. However, only 5% of items without flaws are defective. If D denote the event that an item is defective and F the flaw, then the probability of D given that the item has a flaw is P(D|F) = 0.25. F’ denotes the event that an item does not have a flaw and because 5% of the items without flaws are defective, we have that P(D|F¯ ) = 0.05.
10
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
1.2.2 Bayes theorem In many situations, we might know one conditional probability but how to calculate the different one. This concept was the basis for developing Bayes theorem. Statistically speaking or in reliability terms, if we know the prior distribution of a component and a new evidence or information about it, Bayes theorem helps to obtain a posterior distribution by updating the prior with the new evidence. Consider a semiconductor for which we know the probability of failure for various levels of contamination in manufacturing: Prob. of failure
Level of contamination
0.10 0.01 0.001
High Medium Low
If the semiconductor chip in the product fails, what is the probability that the chip was exposed to high levels of contamination? From the definition of conditional probability, P(A ∩ B) = P(A|B)P(B) = P(B∩A) = P(B|A)P(A) Now considering the second and last terms in the expression above, we can write P(A|B)
P(BA) P(A) for P(B) > 0 P(B)
This is a useful result that enables us to solve for P(A|B) in terms of P(B|A). If P(B) in the above equation is expressed in its full form, we obtain the Bayes equation. Another example: In a tire manufacturing factory, say machines A, B, and C produce 20%, 30%, and 50% tires, respectively. Out of the total manufactured tires, 4%, 5%, and 2% are defective tires from machines A, B, and C respectively. If the tire drawn at random is found to defective, what is the probability it is manufactured by the machine B. Let E1 : tire manufactured from machine A, P(E1 ) = 20/100 = 0.20 E2 : tire manufactured from machine B, P(E2 ) = 30/100 = 0.30 E3 : tire manufactured from machine C, P(E3 ) = 50/100 = 0.50 Since the defective probability is also known, 4 P(defective tire from machine A) = 100 = P de fective E1
Reliability modeling
P(defective tire from machine B) =
5 100
= P
de fective E2
11
2 = P de fective P(defective tire from machine C) = 100 E3 Now, the probability that a defective tire is manufactured by machine B. P =
de fective E 2 P(E1 ).P de Efective +P(E2 ).P de Efective +P(E3 ).P de Efective 1 2 3 5 0.30× 100 4 5 2 0.20× 100 +0.30× 100 +0.50× 100
E2 de fective
=
P(E2 ).P
= 0.45
Random variables Because the particular outcome of the experiment is not known in advance, the resulting value of our variable is not known in advance. For this reason, the variable that associates a number with the outcome of a random experiment is referred to as a random variable. A random variable is a function that assigns a real number to each outcome in the sample space of a random experiment. A random variable is denoted by an uppercase letter such as X. After an experiment is conducted, the measured value of the random variable is denoted by a lowercase letter such as x = 8 inches. Random variables can be classified into two categories, namely, discrete and continuous random variables. A random variable is said to be discrete if its sample space is countable. The number of power outages in a plant in a specified time is a discrete random variable. A discrete random variable is a random variable with a finite (or countably infinite) range. For example, number of scratches on a surface, proportion of defective parts among 1000 tested, number of transmitted bits received in error. Whenever the measurement is limited to discrete points on the real line, the random variable is said to be a discrete random variable. If the elements of the sample space are infinite in number and the sample space is continuous, the random variable defined over such a sample space is known as a continuous random variable. If the data are countable then it is represented with a discrete random variable, and if the data are a measurable quantity then it is represented with a continuous random variable. Sometimes a measurement (such as current in a copper wire or length of a product) can assume any value in an interval of real numbers. The random variable that represents this measurement is said to be a continuous random variable. If a variable can take a value between any two values, then it is a continuous variable else it is a discrete variable. For example, electrical current, length, pressure, temperature, time, voltage, weight.
12
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
A random variable X is said to have a probability density function (pdf) if the derivative, dF(x)/dx = f(x) exists for x. f(x) is called the density function. f(x) > 0 for all values of x and ∞ f (x)dx = 1 −∞
The cumulative distribution function (cdf) F(x) for a continuous random variable is x F (x) = P(X ≤ x) = f (y)dy −∞
P(X > x) = 1 − F(x) P(a ≤ X ≤ b) = F(b) – F(a) So, b P(a ≤ X ≤ b) =
f (x)dx a
defines the probability that a value lies in some interval a to b. However, the probability at a single point (a = b) is zero. For example, the probability that a component will fail between 10 and 50 hours can be answered but the question about how fast does the component fail at 15 hours cannot be answered. That is why the instantaneous hazard rate function evolved. Failure/hazard rate function, h(t) – Rate of failure at time T (how fast are units/components failing) – Probability of failure in the range t to t + t is P{t ≤ T ≤ (t + t)} = R(t) – R(t + t) Hazard rate h(t) is the instantaneous failure rate and is obtained by taking limit t ->0 in above equation. R(t ) − R(t + t ) h(t ) = lim t→0 t R(t ) ≤t+t ) We know f(t), probability density function is = lim P(t≤Tt = t→0
R(t )−R(t+t ) t t→0
lim
Therefore
h(t ) =
f (t ) f (t ) = R(t ) 1 − F (t )
Reliability modeling
13
1.3 Probability distributions Probability distributions are grouped into two types: discrete probability distribution and continuous probability distribution. It purely depends on whether the probabilities are associated with a discrete variables or continuous variables.
Reliability The probability that an item will perform a required function for a given period of time under stated operating conditions. Mean time between failures (MTBF) – Average time between failures of a system • MTBF = MTTF + MTTR Repair rate (RR) – Average number of times the system is likely to be repaired in a given period of time Mean time to repair (MTTR) – Average time required to repair a system = 1/RR Availability – The probability that a system will be functioning correctly at any given point of time. A = MTTF/(MTTF + MTTR). Life time distribution Define a continuous random variable T to be the time to failure of the component/system; T > 0
14
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
The cumulative distribution function (CDF), F(t) defined as F(t) = Pr {T < t}. It gives the probability that a randomly selected unit will fail by time t. The corresponding probability density function (PDF), f(t) defines over the range of time t = 0 to infinity: t dF (t ) ; F (t ) = f (t ) = f (x)dx dt 0
Reliability function – Reliability at time t: R(t) = Pr {T ≥ t}, Probability that component is functional before or at time T R(0) = 1, At t = 0 the component is functional Lim (t → infinity) R(t) = 0, at some time t, the component will fail
Mean time to failure (MTTF) = Mean of probability distribution function ∞ MT T F = t f (t )dt 0
dF (t ) f (t ) = dt ∝ dR(t ) t dt MT T F = − dt 0
Reliability modeling
15
Using integration by parts, MT T F =
−tR(t )|∝ 0 ∝
MT T F =
∝ +
R(t ) dt 0
R(t ) dt 0
1.3.1 Discrete probability distribution A discrete probability distribution describes the probability of occurrence of each value of a discrete random variable. For example, the possible values for the random variable X that represents the number of heads that can occur when a coin is tossed is either 0 or 1 and the probability of getting a head or tail is 0.5. The equation or formula used to describe the discrete distribution is called a probability mass function (pmf). Let us see some of the discrete probability distributions. Bernoulli distribution: When there are events with only two possible outcomes, say, true or false, they follow a Bernoulli distribution, irrespective of whether one outcome is more likely to occur. The random variable X is said to be a Bernoulli random variable if X can take the value 1 for success and 0 for failure. The probability mass function of X is given by P(X = 1) = p and P(X = 0) = 1 − p, where p is the probability that the event is a success. Binomial distribution: Suppose we carry out this experiment several times in a row each of which results in a success with probability p and in a failure with probability 1 − p, then if X represents the number of successes that occur in the n trials, then X is said to be a binomial random variable with parameters n and p. The probability mass function of binomial random failure is given by n−x P(X = x) = nx c px 1 − p , x = 0, 1, 2, …., n Poisson distribution This distribution describes the events that occur in a fixed time interval. Consider the case of the number of telephone calls received per hour. The pmf is given by λ e− λx P(X = x) = x!
16
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
where λ is the average number of times the event has occurred in a certain period of time and x is the desired outcome.
1.3.2 Continuous probability distribution As defined earlier, the probability that a continuous random variable X lies dx and an equation or formula , x + within infinitesimal interval x − dx 2 2 used to describe the continuous distribution is called a probability density function (pdf). PDF of X is
dx dx P x − ,x + 2 2
= f (x)dx
where f(x) is a continuous function of x and the curve y = f(x) is probability curve. Since the continuous random variable is defined over a continuous range of value, then the graph of the density function will also be continuous over the range. The area bounded by the curve of the PDF and the x-axis is equal to 1 when computed over the domain of the variable. ∞ f (x) = 1 −∞
that The probability a random variable assumes a value between some dx x − dx and x + is equal to the area under the density function 2 2 dx bounded by x − 2 and x + dx . It should be noted that the probability 2 of continuous random variable for a single point is zero, P(X = x) = 0. Some of the continuous probability distribution used in reliability and risk assessment are discussed below. Uniform distribution Let X be a random variable with pdf, f which is constant over a finite interval [a, b], then ⎧ ⎨ 1 , for a ≤ x ≤ b f (x) = b − a ⎩0, otherwise X is said to be uniformly distributed over [a, b]. Notation X ∼ U [a, b]
Reliability modeling
17
Exponential distribution In reliability, the exponential distribution is most widely used and has only one parameter, lambda (λ). Exponential distribution is used to model the useful life of a product/component. In Poisson distribution, X is the random variable representing number of, say, defects in a length of pipe/flaws along a length of wire. The distance between defects/flaws is another random variable of interest. The random variable X that equals the distance between successive counts of a Poisson distribution with means λ > 0 is an exponential random variable with parameter λ. The pdf of the distribution is f (x) = λe−λx , f or 0 ≤ x ≤ ∞ = 0, f or x < 0 1.5 O=0.5 1.25
O=1.0 O=1.5
f(x)
1 0.75 0.5 0.25
0 0
1
2
x
3
4
5
Cumulative distribution function: F(x) = P(X ≤ x) = 1−e−λx , x0 If a random variable X has an exponential distribution with parameter l, μ = E(X) = 1/λ σ 2 = V(X) = 1/λ2
18
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
The probability of success or survival of a component till a particular time of interest is defined by the reliability function. The reliability function for the exponential distribution is R(x) = e−λx If a failure rate (λ) of a component is known, the probability of success over time, t can be estimated. In a Poisson process, events occur uniformly throughout the interval of observation and there is no cluster of events. Thus, our starting point for observation does not matter. This is due to the fact that the number of events in an interval of a Poisson process depends only on the length of the interval and not on the location. In Poisson distribution, we assumed an interval could be partitioned into small intervals that are independent. Therefore, knowledge of previous results does not affect the probabilities of events in future intervals. This important property is called the memoryless property.
Memoryless property P(Xt1) = P(X 40). From the definition of conditional probability, P(X < 45 | X > 40) = P(40 < X < 45)/P(X > 40) where P(40 < X < 45) = F(45) – F(40) = [1−e−45/30 ] − [1−e−40/30 ] = 0.04 and P(X > 40) = 1 − F(40) = e−40/30 = 0.26 Therefore, P(X < 5 | X > 40) = 0.04/0.26 = 0.15 Event after waiting for 40 minutes without a logon, the probability that a logon occurs in the next 5 minutes is the same as the probability of a log in the 5 minutes immediately after starting the computer. The fact that we have waited for 40 minutes without a logon does not change the probability of a logon in the next 5 minutes.
20
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Normal distribution The normal distribution is also known as the Gaussian distribution and has a bell-shaped curve. The shape of a normal distribution curve is determined by its mean and standard deviation values.
While the exponential distribution is suitable to model the useful life of a component, normal distribution can be used to model the reliability of items that experience wear out failures. The pdf of normal distribution is f (x) =
x−μ 2 1 √ e−( 2σ ) , −∞ < x < ∞, −∞ < μ ∞ and σ 0 σ 2π
E(X) = μ and V(X) = σ 2 and the notation is N(μ, σ 2 ). The area under the curve is shown in the figure below:
The reliability function for the normal distribution is ∞
R(x) = 1 − F (x) = ∫ f (x)dx x
Reliability modeling
21
If a mean life of a component (μ) is 700 hours with a standard deviation (σ ) as 200 hours, the reliability at time 500 hour, R(500) is 0.84.
Log-normal distribution Log-normal distribution is the most commonly used life distribution model in reliability applications. The distribution is based on the growth model and used to model the wear-out failures, which means that at any instant of time, the process undergoes a random increase of degradation that is proportional to its current state. The multiplicative effect of all these random independent growths accumulates to trigger failure. Therefore, the distribution is often used to model parts or components that fail primarily due to aging caused by stress or fatigue. This distribution is useful for modeling data that are symmetric or skewed to the right. Also, in reliability, the log-normal distribution is more appropriate than the normal as a failure distribution is defined for only positive values of t. The probability density function is given by f (x) =
σx
1 √
e
−
ln(x)−μ 2σ
2
, x>0
2π
where μ and σ are the location and shape parameter. The log-normal reliability function is ln (x) − μ R(x) = 1 − ∅ σ
22
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
If X is a random variable, the function defined for all real x by F(x) = P(X ≤ x) = P{w : x(w) ≤ x}, − α < x < α is a distribution function of the random variable X. A distribution function is also called the cumulative distribution function. It is denoted by F(x) and the domain of distribution function of X is (−∞, ∞). R(x) = 1 − F(x)
1.4 System reliability A system is a collection of elements and components, subsystems arranged in a specific design in order to achieve desired functions with acceptable performance and reliability. The types of components, their quantities, their qualities and the manner in which they are arranged within the system have a direct effect on the system’s reliability. To accomplish specific system reliability, in addition to the reliability of the components, the relationship between these components is also considered and decisions as to the choice of components can be made to improve or optimize the overall system reliability, maintainability, and/or availability. This reliability relationship is usually expressed using logic diagrams such as reliability block diagrams (RBDs) and fault trees (FTs). Reliability models System reliability analysis refers to the evaluation of the reliability of a system based on the reliabilities of its elements. Generally, for the reliability analysis of a system, two major categories of modeling techniques are used: combinatorial and state-based. Combinatorial or nonstate-space models include FTs (which are the most widely used), event trees, RBDs, reliability graphs (binary decision diagrams [BDDs]), etc. Combinatorial models are used to model how the failure of basic components propagates into serious system failures. For example, a FT is used to model component-based failures, such as a valve or pump failure or pipeline leakage, an event tree is used to model plant-wide events such as core damage. State-based models include Markov chains, stochastic Petri nets, and so on. Brief descriptions of the reliability models are given below: a. Reliability block diagrams Block diagrams are used to describe the interrelation between the components and to define the system. A RBD is a graphical representation of the components of the system and how they are reliability-wise related
Reliability modeling
23
(connected). (Note: One can also think of an RBD as a logic diagram for the system based on its characteristics). It should be noted that this may differ from how the components are physically connected. After defining the properties of each block in a system, the blocks can then be connected in a reliability-wise manner to create a RBD for the system. The RBD provides a visual representation of the way the blocks are reliability-wise arranged. This means that a diagram will be created that represents the functioning state (i.e., success or failure) of the system in terms of the functioning states of its components. In other words, this diagram demonstrates the effect of the success or failure of a component on the success or failure of the system. For example, if all components in a system must succeed in order for the system to succeed, the components will be arranged reliability-wise in series. If one of two components must succeed in order for the system to succeed, those two components will be arranged reliability-wise in parallel.
Series–parallel combinations
Type of connection
Block diagram
System reliability
Series
RS = RA x RB
Parallel
RS = (1-(1-RA ) (1-RB ))
(continued on next page)
24
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Type of connection
Block diagram
System reliability
Series–parallel
RS = (1-(1-RA ) (1-RB )) x (1-(1-RC ) (1-RD ))
Parallel–series
RS = (1-(1-RA RB ) (1-RC RD ))
Complex
Evaluated by decomposition
System reliability evaluation using decomposition or conditional probability Reliability of a complex system can be evaluation using the decomposition method. For example, in the figure below, component E is a critical component. The system can be split into two parts, one if component E is good and completely reliable or if component E is considered bad and completely unreliable. Then, the system reliability is sum of both possible combinations with conditional probabilities. Rs = P(system failure | component E is good) P(component E is good) + P(system failure | component E is bad) P(component E is bad)
Reliability modeling
25
b. Fault tree Originating from the aerospace industry in the early 1960s, FTs have become one of the most important logic and probabilistic techniques used in reliability assessment and probabilistic risk assessment today. Among the existing methods, FT analysis is most widely used due to its expression power, applicability to complex systems, and various tool supports. But developing a FT is a cumbersome task even for the experts and require great amount of attention and caution to represent a system correctly. A FT is a graphical method of describing the combinations of events leading to a defined system failure. In FT terminology, the system failure mode is known as the top event. A FT involves essentially three logical gates such that the inputs below gates represent failures and outputs (at the top) of gates represent a propagation of failure depending on the nature of the gate. The three types are: r The OR gate ( ) whereby any input causes the output to occur; r
The AND gate ( ) whereby all inputs need to occur for the output to occur; r The voted gate (k-out-of-n gate), similar to the AND gate, whereby two or more inputs are needed for the output to occur. The AND gate models the redundant case and is thus equivalent to the parallel block diagram. The OR gate models the series case whereby any failure causes the top event. In probability terms the AND gate involves multiplying probabilities of failure and the OR gate the addition rules. Note that although the k-out-of-n gate can be modeled as the combination of AND and OR gates, it is still defined as a gate type because it provides a concise model or a typical logic structure—the k-out-of-n system. In practice, a k-out-of-n system is a frequently used redundancy technique in attempt to achieve high system reliability. While block diagrams model paths of success, the FT models the paths of failure to the top event. Since a FT can be used to specify the failure model of a large and complex system in a relatively easy way, many reliability analysts adopt FTs as their analytical models. Due to its strong modeling ability, a FT model is especially helpful for both the qualitative and the quantitative reliability analysis. The qualitative evaluations basically transform the FT logic into logically equivalent forms that provide more focused information. The principal qualitative results obtained are the minimal cut sets (MCSs) of the top event. A cut set is a combination of basic events that can cause the top event. A MCS is the smallest combination of basic events that result in the top event.
26
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Figure 1.1 Typical fault tree.
The basic events are the bottom events of the FT. Hence, a MCS helps to determine which combination of basic components might lead to unsafe or failure states so that appropriate preventive measures can be taken. The set of MCSs for the top event represents all the ways that the basic events can cause the top event. Quantitative analysis is done by determining the probability of occurrence of the top event given the probability of occurrence of the basic components. Several methods exist for the evaluation of the probability of occurrence of the top event from MCSs. Among existing methods, sum of the disjoint products algorithm is simple and efficient. An advantage of FT models is that very large systems can easily be described and solved. A disadvantage of the model is its inability to handle complex behavior scenario because FT model is a Boolean and static model and cannot handle dynamic aspects of system behavior. The set of cut sets for the sample FT in Fig. 1.1 is {(a,c}, {b,c), and (c,d)}. Dynamic fault tree: The current FT model has evolved from a static FT model to a dynamic FT model. New types of gates were introduced in dynamic FT analysis in order to capture dynamic behavior. With the development of FT models, a variety of either new or more efficient
Reliability modeling
27
Figure 1.2 BDD for fault tree in Fig. 1.1.
techniques have emerged in the past decades, such as BDDs, Markov models, and the modular approach. c. Binary decision diagram A BDD is an alternative approach to the FT-based solution. BDDs can be interpreted as the representation of Boolean functions encoded with an If-then-else structure (Bryant, 1992). Several definitions of BDD are given in (Bryant, 1986). A BDD is defined as a directed acyclic graph having two leaf nodes labeled “0” and “1” representing the Boolean functions 0 and 1. Each nonleaf node represents a disjoint combination of component failures and operating states. If the leaf node for a path labeled with a “1” then the path leads to system failure; if the leaf is labeled with a “0” then the path represents an operational system configuration. Moreover, each nonleaf has two outgoing edges. The edge labeled with a “1” represents the occurrence of failure of that component; another edge labeled with a “0” represents the component in its operational state. Both qualitative and quantitative analysis of a FT can be achieved from the traversal of its BDD. Qualitatively, we can generate the cut sets by traversing the BDD from the root node to leaf node “1.” Each path from root node to leaf node “1” corresponds to one cut set. The elements of each cut set are the components that pass their outgoing edges labeled with “1.” For instance, the path (ABC1) in Fig. 1.2 generates cut set {B,C}.
28
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Quantitative analysis involves the calculation of the probability that system failure will occur, that is, the calculation of probability that the top event of the FT will occur. By employing the BDD solution, the probability of top event occurrence is evaluated as in FT by calculating the sum of the probabilities for all the cut sets. This calculation can be achieved by associating each edge labeled “1” with the probability of node failure Qnode . For the edge labeled “0,” the associated probability is Pnode = (1 − Qnode ). Thus, the symbolic representation of the probability of top event occurrence (system failure) modeled in Fig. 1.2 is given by (QA QC + PA QB QC + PA PB QC QD ). While the FT approach also provide qualitative and quantitative analysis and highlight the most significant failure combinations and show where design changes can eliminate or reduce undesirable combinations, the BDD approach provides an exact calculation of the top event probability. The exact probability is useful when many high-probability events appear in the model. The BDD approach is also the most efficient approach for calculating probabilities. Because the minimal paths generated in the BDD approach are disjoint, calculation of importances and sensitivities can be done in an efficient and exact manner. For very large FTs having many AND and OR gates, in which many MCSs can be generated, the FT approach must often truncate the lowest probability MCSs to calculate the probability of the top event in relatively short time. The result of BDD calculation is generally accurate to at least two significant figures, which is typically more accurate than the basic event probabilities that are used once their uncertainties are considered. The BDD approach is thus more efficient and precise in quantifying probabilities and importances. However, the most information is provided by using both approaches. One major disadvantage of BDD-based solution is that it is inadequate for capturing sequence dependencies in dynamic systems. d. Markov models In system reliability analysis, combination of models is often used to simplify the calculation and to properly address the dynamics of the system behavior. For complex systems such as nuclear systems, dependencies in any subsystem are better handled by Markov analysis. The repair process can be appropriately analyzed using Markov model. A Markov model depicts the lifetime behavior of the system in a state-time space. Markov modeling technique starts by representing the system in number of distinct system states which corresponds to certain combination of component states. Transitions between these system states are governed/attributed by the
Reliability modeling
29
events like: component failure or repair, common cause failures of components (for example, due to loss of offsite power), environmental factors, etc. These transitions bring the time factor into the model. At any instant of time, the system is allowed to change its state in accordance with the competing processes which are appropriate for that plant state. This way, the Markov model is able to model the system dynamically. The state probabilities of the system P(t) in Markov analysis are obtained by the solution of a coupled set of first order, constant coefficient differential equations: dP/dt = M.P(t), where M is the matrix of coefficients whose off-diagonal elements are the transition rate and whose diagonal elements are such that each of the matrix columns sum to zero. The sequence of random variables in which the future variables are determined by the present variables only and not on how the evolution up to the present state has taken place is characterized by the memoryless property of Markov chains. Consider a repairable system made up of “n” components where each component can be either in operating state or failed state. The system may require certain number of components in operating state for it to function as desired. To evaluate the system reliability, following steps are adopted. r Specification of the states the system can be in. That is, list all the system states as either operating or failed state. A component can be in failed state if there is a fault or in operating state if the fault is repaired. r Specification of the rates at which transitions between states take place. That is,list all possible transitions between the different states and identify the causes of all transition. The transition could be due a failure of a component or a repair made to a component. r Computation of the solutions to the model. Calculate the probability of being in a state during a certain period of time. This model is used to obtain numerical measures related to the probability of a given state, reliability and availability of a system or part of a system. Markov analysis is performed when dependencies between the failures of multiple components are expected. During such dependent failures, when the failure rates cannot be easily represented using a combination of FTs and standard time-to-failure and time-to-repair distributions, Markov analysis is carried out. Specific examples of application areas are standby redundancy configurations as well as common cause failures. Markov models provide great flexibility in modeling the timing of events. They can be applied when simple parametric time-based models, such as Exponential or Weibull Time-to-Failure models are not sufficient to
30
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
describe the dynamic aspects of a system’s reliability or availability behavior. A sample application is given below: In any industrial application, normal class IV power supply from the grid is backed up by a class III power system.Dedicated power from standby diesel generators should be available to the class III buses on loss of normal supply from class IV buses. In nuclear power plants, when there is a simultaneous failure of both class IV and class III supply, it is called a station black out (SBO) scenario. In order to determine the necessity to provide alternate power requirement to meet the contingency arising out of simultaneous failure of class IV and class III power supply, that is, under SBO conditions, it is essential that the frequencies of SBOs of different duration are evaluated. Such a study, in turn, needs the reliability analysis of the systems concerned with blackout. That is, the reliability analysis of class IV and class III power systems is needed. Class IV reliability can be estimated with the inputs of grid failure frequency and associated data on component failures. Class III unavailability is better modeled using Markov chains. In order to calculate the SBO frequency as a function of time, one should know the probability that DG will be unavailable for a given interval of time. For a single DG, this can happen, either because the DG failed to start and was not repaired in a given time interval or because it started, and then failed while it was running and was not repaired in a given time interval. For the specific DG configuration, say, 1/2, or 2/4 DG, there will be many such possibilities that will lead to the unavailability of DG. Consider a discrete state continuous time stochastic process, {X(t), t ∈ T}. For any 0 < t1 < t2 ……< tn < t, if P{X(t) = x | X(t1 ) = x1 , X(t2 ) = x2 ,….. X(tn ) = xn } = P{X(t) = x | X(tn )=xn }, then the process {X(t), t > 0} is called a Markov process. That is, a Markov process is a stochastic process in which the probability law at time t is depending only on the present state X(tn ) = xn and is independent of the past history X(t1 ) = x1 , X(t2 ) = x2 , ….. X(tn-1 ) = xn-1 . This is the memoryless property. A time homogenous discrete state continuous time Markov process is considered for DG unavailability. The state transition diagram developed to represent all possible states of 1/2 DG configuration is shown in Fig. 1.3. The Markov process is homogenous as the transition rates are independent of time. In Fig. 1.3, the number inside the circle denotes the number of DG that has not failed and the number inside the square denotes the system state. The transition rates are denoted near the arrows.
Reliability modeling
31
Figure 1.3 Markov state transition diagram of 1/2.
The notations used are as follows: λf = Failure rate of a DG λr = Repair rate of a DG λccfr = CCF repair rate λfij = Failure rate of i out of j DG due to CCF The probability that a system is in state, i at time 0, is represented in the form of P(i, 0) ∧2 P(1, 0) = P fts ; P(2, 0)= 2(1 − Pfts )Pfts ; P(3, 0) = 1 − P(i,0) ; P(4, 0) = Pfts22 i=3
where Pfts is DG fail to start probability and Pftsij is the probability of failure to start i out of j DG due to CCF. The time evolutions of the state probabilities are governed by the following differential equation. dP =A×P dt where A is the transition matrix, ⎡ ⎤ −2λr λf 0 0 ⎢ 2λr −λr − λ f 2λ f 0 ⎥ ⎥ A=⎢ ⎣ 0 λr −2λ f − λ f 22 λcc f r ⎦ 0 0 λ f 22 −λcc f r Once P(i, t) is obtained by solving the differential equation, many reliability parameters of interest can be computed based on the knowledge of P(i,t). For example, the DG unavailability in 1/2 mode at any time, t is computed as, UDG (t) = P(1, t) + P(4, t).
32
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Figure 1.4 Markov state transition diagram for 2/4 DG configuration.
The probability that both DGs are not available for a time duration Tsb or more is evaluated using critical operating states method. The probability that a DG in 1/2 mode is not available for a duration of Tsb or more, given a power failure interval of Tlop is given by, UDG12 (Tsb ) = e(−2λr Tsb ) P(1, 0) + P(4, 0)e(λcc f r Tsb ) Tlop−T sb + P(2, t )λ f e(−2λr Tsb ) + P(3, t )λ f 22 e(−λcc f r Tsb ) 0
Fig. 1.4 gives the state transition diagram for 2/4 DG configuration. The starting state for 2/4 DG configuration is not indicated in the diagram. The states involving common cause failures (e.g., 2CF, 1RF means 2 DGs have failed due to a common cause and 1 due to independent cause) are indicated. The initial conditions, P(i, 0) for 2/4 configuration are as follows: P(1, 0) = P4fts ;
P(2, 0) = 4 × P3fts × (1 − Pfts );
P(3, 0) = 6 × P2fts × (1 − Pfts )2 ; P(5, 0) = (1 − P(i, 0));
P(4, 0) = 4 × Pfts × (1 − Pfts )3 ;
P(7, 0) = 6 × Pfts24 × (1 − Pfts )2 ;
P(8, 0) = 12 × Pfts24 × Pfts × (1 − Pfts );
P(9, 0) = 6 × Pfts24 × P2fts ;
P(10, 0) = Pfts × Pfts34 ;
i=5
P(6, 0) = Pfts44 ;
P(11, 0) = 4 × Pfts34 × (1 − Pfts ); P(12, 0) = 6 × Pfts24 × Pfts24 ;
Reliability modeling
33
The transition matrix, A for 2/4 DG configuration is
where, a33 = −(2λ f + λ f 22 + 2λr ), a44 = −(3λ f + λ f 33 + 3λ f 33 = λr ) a55 = −(4λ f + 6λ f 24 + 3λ f 34 + λ f 44 ), a77 = −(λcc f r + 2λ f + λ f 22 ) a88 = −(λcc f r + λ f + λr ), a10 = −(λcc f r + λr ), a11 = −(λcc f r + λ f ) Once the transition matrix is known,one can proceed in a similar fashion as in 1/2 DG case and obtain P(i, t). The unavailability at time t is computed as, U2/4 (t) = P(i, t ) i=3,4,5,7
The probability that DG in 2/4 mode is not available for a duration more than Tsb , given a power failure interval of Tlop , is obtained in a similar way. UDG24 (Tsb ) = e(−3λr Tsb ) P(2, 0) + P(8, 0) e(−(λcc f r +λr )Tsb ) + (P(6, 0) + P(10, 0) + P(11, 0) + P(9, 0))e(−λcc f r Tsb ) Tlop −Tsb + P(12, 0)e(−2λCCF R Tsb ) + P(3, t )(2λ f e(−3λr Tsb ) + λ f 22 e(−λcc f r Tsb ) ) 0
+ P(4, t )(3λ f 23 e(−(λcc f r +λr )Tsb ) + λ f 33 e(−λcc f r Tsb ) ) + P(5, t )(λ f 44 e(−λcc f r Tsb ) + 4λ f 34 e(−λcc f r Tsb ) ) + P(7, t )(2λ f e(−(λcc f r +λr )Tsb ) + λ f 22 e(−2λcc f r Tsb ) ) dt
34
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
d. Failure modes and effects analysis Another methodology for reliability analysis is failure modes and effects analysis (FMEA). FMEA is a systematic, analytical bottom-up approach, to properly plan for defect prevention and mistake proofing. It is a technique for identifying and focusing on those areas in the design and manufacturing process for the prevention, reduction, and elimination of nonconformances in a system. The final result of the analysis is the generation of a risk priority number (RPN) for each component of the system and remedial action for those components particularly on the corresponding failure mode which has unacceptably high RPN. FMEA is generally performed before a FT analysis to identify the potential failure modes to be modeled. Reliability terms Failure rate: The number of failures per unit of stress. The stress can be expressed in various units such as time, etc. Failure rate is usually denoted by λ. Repair rate: Average number of times the system is likely to be repaired in a given period of time. Repair rate is generally denoted by λ. MTBF: Mean time between failures, 1/l. When applied to repairable products this is the average time that a system will operate until the next failure. MTTR: Mean time to repair. This is the average elapsed time between a unit failing and its being repaired and returned to service. MTTF: Mean time to fail. This is the average time taken for a unit to fail.
MTTF = average of time values a; MTTR = average of time values b MTBF = MTTF + MTTR The distinction between MTBF and MTTF does not arise unless the reliability of a system subject to repair is considered.
35
Reliability modeling
Availability: The proportion of time a system is operable. This is only relevant for systems that can be repaired and is given by: Availability = =
Average o f (a) Uptime = Total time Average o f (a) + Average o f (b)
MT T F MT T F +MT T R
Maintainability: Ability to restore a component/system to perform its intended function within a specified period of time. 1
2
Reliability: The probability that an item will perform a required function for a given period of time under stated operating conditions. 4
1. 2. 3. 4.
3
For a specific component, a clear description of the function clear description of all possible failures (not performing defined function) clear description of operating conditions and clear definition of units of time period are required.
Further readings [1] H. Pham, Handbook of Reliability Engineering, Springer-Verlag, London, 2003. [2] W.R. Blischke, D.N.P. Murthy, Case Studies in Reliability and Maintenance, John Wiley & Sons, Inc., New Jersey, 2003. [3] W.R. Blischke, D.N.P. Murthy, Reliability: Modeling, Prediction and Optimization, Wiley, New York, 2000. [4] R.R. Fullwood, Probabilistic Safety Assessment in the Chemical and Nuclear Industries, Elsevier, Boston, 2000. [5] T.T. Soong, Fundamentals of Probability and Statistics for Engineers, John Wiley & Sons, Ltd., New York, 2004. [6] S.M. Ross, Introduction to Probability and Statistics for Engineers and Scientists, Wiley, New York, 1987. [7] D.C. Montgomery, G.C. Runger, Applied Statistics and Probability for Engineers, third ed., John Wiley & Sons, Inc., USA, 2002. [8] A.K. Verma, A. Srividya, D.R. Karanki, Reliability and Safety Engineering, SpringerVerlag, London, 2010. [9] L.S. Srinath, Concepts in Reliability Engineering, East West Press (P) Ltd., 1985. [10] L.S. Srinath, Reliability Engineering, third ed., East West Press (P) Ltd., 1991. [11] K.B. Misra, New Trends in System Reliability Evaluation, Elsevier Science, 1993. [12] E. Zio, Reliability engineering: old problems and new challenges, Reliab. Eng. Syst. Saf. 94 (2009) 125–141. [13] IAEA,Procedure for Conducting Probabilistic Safety Assessment of Nuclear Power Plants (Level 1), International Atomic Energy Agency, Vienna, 1992 Safety Series no. 50-P-4.
36
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
[14] A. Pages, M. Gondran, System Reliability-Evaluation and Prediction in Engineering, North Oxford Academic, 1985. [15] J.D. Andrews, T.R. Moss, Reliability and Risk Assessment, Longman, Scientific and Technical, UK, 1993. [16] C.L. Atwood, et al., Evaluation of Loss of Offsite Power Events at Nuclear Power Plants: 1980–1996, NUREG/CR-5496, INEEL/EXT-97-00887, USNRC, USA, 1998. [17] R.E. Barlow, F. Proschan, Importance of system components and fault free events, Stochast. Process. Applic. 3 (1975) 153–173. [18] P.W. Baranowsky, Evaluation of Station Blackout Accidents at Nuclear Power Plants, NUREG-1032, USNRC, USA, 1988. [19] R.E. Battle, D.J. Campbell, Reliability of Emergency AC Power Systems at Nuclear Power Plants, NUREG/CR-2989, Oak Ridge National Laboratory, 1983. [20] R.E. Bryant, Symbolic Boolean manipulation with ordered binary-decision diagrams, ACM Comput. Surv. 24 (1992) 3. [21] R.E. Bryant, Graph-based algorithms for Boolean function manipulation, IEEE Trans. Comput. c-35 (8) (1986) 677–691. [22] T. Daniel Gillespie, Markov Process, Introduction for Physical Scientists, Academic Press, California, 1992. [23] J.B. Dugan, S.J. Bavuso, M.A. Boyd, Dynamic fault tree models for fault tolerant computer systems, IEEE Trans. Reliab. 41 (3) (1992) 363–377. [24] J.B. Dugan, S.J. Bacuso, M.A. Boyd, Fault trees and Markov models for reliability analysis of fault tolerant systems, Reliab. Eng. Syst. Saf. 39 (1993) 291–307.
CHAPTER 2
Introduction to probabilistic safety assessment Contents 2.1 Safety approach in NPPs—defense-in-depth 2.2 Need for PSA 2.3 Regulatory decision making with PSA insights 2.3.1 Probabilistic safety goals/criteria 2.4 Approaches for regulatory decisions 2.4.1 Risk-informed approach 2.4.2 Risk-based approach 2.4.3 Risk-informed, performance-based approach 2.5 Quality assurance 2.5.1 Management of QA activities 2.5.2 Structure of QA program 2.6 Standardization of PSA 2.7 PSA methodology 2.7.1 Data 2.7.2 Failure mode selection 2.7.3 External events 2.7.4 Computer code 2.7.5 A graded approach to risk evaluation 2.7.6 Level-1 PSA 2.7.7 Level-2 PSA 2.7.8 Level-3 PSA 2.8 Initiating event frequency 2.9 Component data 2.9.1 Component reliability models 2.10 Human reliability 2.11 Dependence analysis 2.11.1 Complete dependency 2.11.2 High dependency 2.11.3 Moderate dependency 2.11.4 Low dependency 2.11.5 Zero dependency 2.12 Passive systems 2.12.1 Category A 2.12.2 Category B 2.12.3 Category C 2.12.4 Category D
40 41 42 43 44 44 46 46 46 47 47 48 48 49 49 49 50 50 50 55 58 59 60 60 61 62 62 62 62 62 63 64 65 65 65 65
Reliability and Probabilistic Safety Assessment in Multi-Unit Nuclear Power Plants. DOI: https://doi.org/10.1016/B978-0-12-819392-1.00010-2
37
Copyright © 2023 Elsevier Inc. All rights reserved.
38
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
2.13 Software reliability 2.13.1 Black-box reliability models 2.13.2 White-box reliability models 2.14 Uncertainty analysis 2.15 Sensitivity analysis 2.16 Importance measures 2.16.1 Risk achievement worth 2.16.2 Risk reduction worth 2.16.3 Birnbaum importance 2.17 Applications of PSA 2.17.1 Design of NPPs 2.17.2 Operation of NPPs Further readings
67 70 72 74 75 75 76 77 77 78 78 79 79
Probabilistic safety assessment (PSA) is a systematic and comprehensive tool to evaluate risk in a complex engineered technology such as nuclear power plant (NPP), chemical plants, or any safety critical applications, in general. The main focus of this book is for applications in nuclear industry and hence, PSA is discussed in the context of NPP. Though there were many small-scale risk assessments carried out internationally at various countries, the reactor safety study of WASH-1400 report was significant because it was the first large-scale comprehensive probabilistic safety assessment (PSA) performed in 1970s in USA with an aim to look at the individual risks from a large complex nuclear facility. The results were compared with risks from other natural and man-made applications. WASH-1400 study set a standard for risk assessments of nuclear power plants (NPPs) and is still being followed. Two important aspects were highlighted from the study: PSA require multidisciplinary team so that inadvertent omission of important safety information is not made and second, the report requires a thorough review by multidisciplinary team. These aspects are the key factors to reduce uncertainties to a large extent. Other benefits of PSA include identification of plant strength and weaknesses in design and operation. In addition, it provides inputs to decisions on design and backfitting, plant operation, safety analysis, and on regulatory issues. The PSA has been accepted all over the world as an important tool to assess the safety of a facility and to aid in ranking safety problems by order of importance. A major advantage of PSA is that it allows for the quantification of uncertainties in safety assessments together with the quantification of expert opinion and/or judgment. PSA is considered to complement the deterministic analysis for design basis events and for beyond design basis accidents that consider multiple failures
Introduction to probabilistic safety assessment
39
including operator errors and low probability events of high consequences, it supplements the information.Although PSA was expected to form a basis for arriving at probabilistic acceptance criteria, due the uncertainties in scope, subjective judgment on common cause failures (CCFs), human reliability, assessment of software reliability, and passive system reliability, quantitative goals as regulatory requirement were not universally accepted. Nevertheless, it was universally accepted that PSA is a tool to provide important insights into strengths and weaknesses of design and operation of the NPP under investigation and identify possible ways to improve the safety of the plant. The confidence on the PSA results depends on the uncertainties in data and models, CCFs, human reliability, software reliability. For example, if the plant is designed with adequate separation between systems, many causes of dependence, interaction and CCF are eliminated and systems are treated to be independent. This simplifies the model and reduces the vulnerability of the plant, systems and components to dependent failures. However, if such provisions are not made in the design, the vulnerability to common cause and complex system interactions would be large and it would be difficult to assess. Similarly, the influence of human factor and software reliability are the most difficult to quantify. In proven designs, the influence of human error is substantially reduced by automation and improved man–machine interface but the assessment of software reliability is not universally accepted. Internationally, regulatory bodies require some technical basis to establish safety goals and prescribe acceptance criteria so that an acceptable level of risk to public and environment can be made. Safety goals are formulated in terms of qualitative and quantitative manner. The qualitative objective in design and operation of NPP is that the risk to an average individual near a plant must be much less than those resulting from all other sources to which the individual is exposed. The quantitative objective is to limit the risk from NPP to be less than one tenth as compared to other causes. Although there are views about the uncertainty in input data and models, there appears a strong belief that PSA can serve as a useful tool to identify dominant accident sequences, compare different plant states and provide a systematic way for verifying the operating & maintenance procedures. PSA methodology integrates information on plant design, component reliability, operating practices, and history, human behavior, postulate initiating events (PIEs), develops them to accident sequences and finally identify potential environmental and health effects. This helps in focusing issues like deficiencies and plant vulnerabilities, risk contributors, sensitivity of governing parameters and uncertainties of numerical results. PSA also provides information on plant-specific risks and rank them based on severity. The benefits that accrue
40
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
from PSA outweigh its limitations and have proven to be an important tool in the safety assessment of nuclear reactors throughout the world. Many regulatory bodies use it as a risk-informed approach in decision making with regard to many safety issues. Over the years, methodology improvements have been made and advanced applications of PSA have emerged. Some of them are: Living PSA/Risk Monitor, Technical Specifications Optimization, Reliability-Centered Maintenance, and Risk-Based In-Service Inspection.
2.1 Safety approach in NPPs—defense-in-depth The fundamental safety approach in NPPs is the principle of defense-indepth concept with multiple layers of safety. Defense-in-depth is a comprehensive approach to ensure with high confidence that the public and the environment are protected from any hazards posed by the use of nuclear power for the generation of electricity. As stated in the safety principle 61 and 62 of INSAG3, safety involves the prevention or reduction of potential exposure and other risks.The basic concept of defense-in-depth approach is to prevent accidents and if prevention fails,limit the potential consequences of accidents and prevent their evolution to more serious conditions. It not only ensures that a plant is designed, constructed and operated to operate safely during normal operation but also to take care of the possible accident scenarios. The different levels of defense-in-depth during normal operation of a nuclear reactor are: – Level 1: Prevention of abnormal operation and failures – Level 2: Control of abnormal condition and detection of failures – Level 3: Control of accidents within the design basis – Level 4: Control of severe plant conditions including prevention of accident progression and mitigation of the consequences of severe accidents – Level 5: Mitigation of radiological consequences of significant releases of radioactive materials The first level is achieved by conservative design and high quality of components in construction and operation, redundancy, separation, testing, and operation. The second level of defense-in-depth is by inherent plant features which include automatic functions and control systems that can bring the plant to a safe operating mode with less time. The third level is achieved by engineered safety features (ESFs) and accident procedures to prevent severe accidents and also to confine radioactive materials within the containment. The aim here is to prevent core damage. Fourth level is in addition to engineered provisions to reduce the risk and severity of accidents, guidelines for severe accident management is
Introduction to probabilistic safety assessment
41
made available. The aim is to have a comprehensive management of severe accidents in addition to the first three levels of defense. The likelihood of an accident involving severe core damage and the magnitude of radioactive releases during the unlikely event is kept as low as reasonably achievable. The consequences of an accident are mitigated by functions that protect the containment, such as containment cooling, penetration control, and hydrogen management. Level 5 of defense-in-depth is the offsite emergency plant to limit the consequences of radiation in the event of radioactive releases. In spite of all the measures, radioactive releases may occur, though with a low probability of occurrence. Emergency plans are prepared to meet such an eventuality to limit the public exposure to radiation. Safety analysis should assess whether all levels of defense-in-depth has been provided and are preserved. The deterministic safety analysis should formally assess the performance of the plant under various plant states, against applicable acceptance criteria. PSA essentially aims at identifying the events and their combination(s) that can lead to severe accidents, assessing the probability of occurrence of each combination, and evaluating the consequences.
2.2 Need for PSA The impact of nuclear accidents is always a point of debate since the nuclear reactors were constructed.Whatever technical measures to minimize the risk of an accident is adopted, failures are inevitable. Several nuclear accidents have taken place with varied impact on public and environment. The causes of failure are equipment failure, maintenance issues, improper use, aging, human errors, organizational issues, etc. In order to reduce the likelihood of failures and minimize the risk of an accident, a systematic methodology is required to understand why these failures occurred and prevent such occurrences in future. As nuclear reactors are required to be operated for a long period of time with complex systems, risk associated with the plant needs to be assessed. Every component has a lifecycle with associated hazards at each stage described in the form of a bathtub (Fig. 2.1). The hazards in the component lifecycle can be categorized into three regions. The first region has a decreasing failure rate known as early failures (defective parts, manufacturing defect, fabrication error, poor quality control, etc.); the second region has a constant failure rate. This region is also called the useful life of the component as the failure is only due to random (environment, random loads, etc.). The third region has an increasing failure rate due to component wear-out process (fatigue, corrosion, friction, cyclic loading, etc.).
42
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Figure 2.1 Bath tub curve.
Early failures can be reduced by 100% testing, screening, burn-in test, etc. The random failures can be reduced by providing redundancy, excess strength to the component, etc. and the wear-out failures can be reduced by derating, replacement of parts, preventive maintenance, etc.
2.3 Regulatory decision making with PSA insights PSA is being increasingly used all over the world as a part of decision-making process in NPPs and there is consensus among international regulatory bodies that deterministic analysis should be complemented with PSA as a defense-in-depth concept in safety assessment and to make decisions about the safety of the plant. The current state of the art in PSA has sufficiently matured, at least with regard to level-1 PSA and is consistently used to balance the actions during NPP operation to minimize the risk contribution. Effective changes in the plant model such as optimal system configuration, procedural changes, design modifications are made using PSA as a tool. To indicate the plant risk dynamically at any given time, a concept call Living PSA and Risk Monitoring came into use. While Living PSA indicate the safety status of the plant dynamically taking into account the system changes, risk monitor indicates the impact on safety in real-time if a system/component is unavailable. The main goal of Living PSA is to provide the live safety status and can be executed with a given periodicity. On the other hand, risk monitor is tool for operator with an aim to assist maintenance strategies, aging management, system configuration.
Introduction to probabilistic safety assessment
43
The need for using PSA/PRA technology in regulatory matters is appreciated by all, USNRC with PRA policy statement (60FR, 42622, Aug 16, 1995) states in part “The use of PRA technology should be increased in all regulatory matters to the extent supported by the state of the art in PRA methods and data and in a manner that compliments the NRC’s deterministic approach and supports NRC’s traditional defence-in-depth philosophy.” The practices by different regulatory bodies on the requirement of PSA in different stages/activities of new as well as operating nuclear reactor plant vary considerably. For example, UK regulations require all three levels of PSA along with preliminary safety analysis report for “construction clearance” for new NPPs. Proposals for design modification, TS changes in plant operations, particularly with regard to AOT and STI require PSA backing as a mandatory requirement in countries like South Africa, Holland, Finland, Switzerland. USNRC requires that all NPPs carry out individual plant examination (IPE), which is PSA level 2 plus. Although Japan does not mandate PSA requirement for licensing of a new plant or reauthorization of operating plant, it requires accident management plan based on PSA insights. Decision with PSA insights require a strong technical basis and depend on the level of details adopted in the PSA model in terms of quality, consistency, completeness, use of operating experience, uncertainties in the results, use of determinist insights, and the review process. The uncertainties in modeling, assumptions made, and input data variability and their propagations to final PSA results require detailed analysis and quantification as these influence decision-making process. Also establishing attributes, targets/reference values, called probabilistic safety goals or criteria based on PSA insights is an important step.
2.3.1 Probabilistic safety goals/criteria In PSA, different probabilistic safety goals/criteria (PSG/PSC) are adopted for different decision-making process. For Level-1 PSA, these are specified at system reliability level, contributors of accident sequences to core damage and for core damage frequency (CDF), for Level-2 PSA, safety criteria for radioactivity release frequency from containment, and reliability of containment and associated ESFs and for Level-3 PSA, radiological consequences to individual and public form the safety criteria. INSAG-6 recommends use of PSG/PSC-based acceptance criteria for safety decisions, as these criteria set at safety function or system level are useful to check the adequacy of safety features (redundancy, diversity, fail–safe criteria). INSAG-3 proposes CDF, which, is the most common measure of risk in NPPs at least at
44
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
level-1 PSA, that is, 10−4 per reactor year for existing plants and 10−5 per reactor year for new plants. INSAG-3 also proposes target for large release radioactive material having severe implication to society and would require offsite emergency responses as: 10−5 per reactor year for existing plants and 10−6 per reactor year for new plants. With regard to new plants, target for elimination of accident sequence that could lead to large early radioactive release is proposed in INSAG-12. Severe accidents that could imply late containment failure would be considered in the design process with realistic assumptions and best estimate analysis so that their consequences on offsite radiological impact would necessitate limited protective measures in area and time. In some countries, the goal for the individual risk of mortality is taken to be 10−6 per reactor year for members of the public.
2.4 Approaches for regulatory decisions There are two approaches in the regulatory decision making with PSA insights. These are risk-informed and risk-based approaches.
2.4.1 Risk-informed approach When the regulatory decisions are based primarily on deterministic inputs (design, operations, defense-in-depth, etc.) and supported by the quantified estimates obtained from risk assessment, it is called risk-informed approach. A risk-informed approach considers other applicable factors such as cost– benefit, remaining life of the plant, dose to workers, operating experiences, etc., to establish requirements that better focus licensee and regulatory attention on design and operational issues. In other words, a risk-informed approach enhances the deterministic approach and reduces unnecessary conservatism. In applications where there is no clear direction to identify the dominant contributor to risk or when there is need to compare various design options, plant configuration, testing strategies, etc., PSA is used for relative ranking of attributes without the need for a reference value. However, when an application requires comparison with some reference value, this approach can be used to (a) judge whether a calculated risk value is acceptable, (b) assess the acceptability of a proposed change to the plant that would produce a calculated increase in risk, or (c) assess the need for a change in design or operational practices to reduce the level of risk. The structured process of risk-informed approach recognizes the mandatory requirements, insights from deterministic and probabilistic analysis and any other relevant factors (Fig. 2.2).
Introduction to probabilistic safety assessment
45
Figure 2.2 Elements of NRC integrated decision-making process (Source: IAEA 2005).
Mandatory requirements include legal and regulatory requirements prescribed to keep the risk as low as reasonably achievable. Deterministic requirement include the conservative rules and traditional defense-in-depth principles with sufficient safety margins and single failure criteria. Probabilistic requirements typically include the risk metrics and safety criteria on CDF, large early release frequency. Other requirements pertain mainly to economical considerations to balance the cost–benefit ratio considering the remaining life of the plant, radiation does to workers, etc. All these four elements are properly assessed and weighed depending on each issue to arrive at a decision. Final step is to monitor the effectiveness of the decision. The weights assigned for deterministic and probabilistic portrays the confidence that a regulatory body has in the probabilistic risk assessment. With high degree of expertise in probabilistic assessment and considering the quality of the comprehensive methodology followed to address all the uncertainties associated with arrive at realistic estimates, probabilistic risk insights are used effectively in the regulatory decision-making process. Therefore, in the context of multiunit risk assessment probabilistic evaluation helps to identify the challenges of inter unit dependencies at a site and provides both qualitative and quantitative assessment of the risks to enhance safety. The traditional process that focuses on deterministic requirements
46
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
nor a risk-based approach is sufficient and there is a need for risk-informed decision-making process which uses the best attributes of both deterministic and risk-based processes.
2.4.2 Risk-based approach When the regulatory decisions are based purely on the quantified estimates obtained from risk assessment,it is called risk-based approach.It relies heavily on risk assessment results than what is currently practicable for reactors. The use of probabilistic estimations helps regulators in providing useful insights for decisions on dose limits. This approach helps to identify the major risks and prioritize efforts to minimize or eliminate them. Decisions are made by comparing reference values, overriding deterministic consideration and the reference values are taken as PSC.
2.4.3 Risk-informed, performance-based approach Apart from the two approaches mentioned above, some countries also follow risk-informed, performance-based approach to assess the performance of NPP. This approach relies on the measurable results or on the performance, rather than a prescriptive process. Performance criteria are established by identifying parameters and their allowable margins based on deterministic analysis, risk insights, and operating experience. It is believed that riskinformed performance based approach is more suitable for regulation as the focus is based on performance objectives and combines risk insights, engineering analysis, judgment including the principle of defense-in-depth with the consideration of safety margins and performance history. Further, it allows for more innovation without compromising safety. The expected outcomes are verified through simulations.
2.5 Quality assurance Earlier, carrying out a PSA project itself was considered a difficult task and quality assurance (QA) was an unplanned activity and even if included it was not systematically organized. Much later, it was realized that QA is critical to ensure the correctness and adequacy of a PSA model. Moreover, for a PSA of a NPP, multidisciplinary teamwork with participants having expertise in highly specialized area are required to provide the thorough details of plant design, operations, and appropriate PSA techniques. Selection of staff, communication, computer software configuration control, and document
Introduction to probabilistic safety assessment
47
control are crucial to the effectiveness and quality of PSA. Therefore, it is recommended that a detailed QA program be established and made effective in every PSA program as it enhances the confidence in the PSA models and results. An effective QA ensures that the assumptions, modeling techniques are thoroughly checked and documented and that the approximations and assumptions in modeling techniques will not lead to erroneous results. Establishing a QA program is an essential aspect of good management and is fundamental to achieve a quality PSA.
2.5.1 Management of QA activities When a PSA project is planned in an organization, it is essential to have a detailed QA program that includes the complete details on all the activities starting from the organizational hierarchy, information flow, functional responsibilities, levels of authority for those managing, performing required action. These details should be accompanied with a clear description of objectives and scope of the project, approach to be adopted, documents on plant-specific configuration, interfaces with activities and appropriate training to staff involved in the PSA activity, and the review processes.
2.5.2 Structure of QA program A QA program needs to have a clear description of how the PSA will benefit the organizational process. It should also outline the evaluation procedure in terms of completeness, consistency, accuracy, document control, configuration control along with the requirements of the risk assessment software. As explained in IAEA TECDOC-1101, the QA program should be developed and documented to cover the QA program description, management documents, and working documents. In any PSA activity, nonconformances can be present in various factors such as input data, modeling, target criteria (as set by regulatory bodies), minimal cut-set combinations, design aspects, operational aspects, etc. Procedures and working instructions applicable should be used to handle nonconforming work. Results not confirming to design, operating framework, or deviation in data modeling (e.g., failure data, CCFs, HRA) and probabilistic safety criteria/goals set by analysts/regulatory bodies, etc. should be appropriately resolved and documented. PSA document and information control should include the scope of the work proposed, general description of the plant, extent of the study, specifying the output product, that is, methodology, initiating event, event tree development, fault tree
48
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
analysis, integration various parts or stages of the analysis needs to defined and included in the document. In addition, regular update of refinement in assumption, revision of input data, modifications in PSA models, data, information and results, including changes to requirement, scope, objectives, etc. needs to be subjected to QA. The reason for such a modification also needs to be documented. If there are significant change carried out, a new version or an update of the previous PSA version may be created. With detailed QA documents, a comprehensive multitier review process is required. As a minimum, the review process should include an in-house review by experts and an independent regulatory review.
2.6 Standardization of PSA Another important aspect in the QA of PSA is standardization. Considering the various sources of uncertainties such as lack of component failure information, definition of system boundary conditions and their unavailability information, system modeling, event sequence for all possible situations, uncertainties in modeling CCFs, HRA, source term and release consequences, standardization of various aspects like uses of proper failure data and models, including CCFs and human error probability,methodology of quantification of end states/consequences, etc., an overall uncertainty analysis in the results and computer codes etc. is imperative. This can only make the PSA insights more meaningful in risk-informed decision-making process.
2.7 PSA methodology Any modeling techniques used for risk assessment should be versatile and adaptable for incorporating modifications. The system modeling and event sequence modeling should allow easy integration of modules and produce realistic results with less uncertainties.It should be possible to model multiple core damage categories as per the defined acceptable criteria. In PSA, common cause and human reliability analyses are very important and therefore, modeling techniques should take care of these aspects appropriately. b-factor modeling should be limited with redundancies up to 3 components and independent failure probability should be taken the highest one of the values of these components. Where redundancies are more (³4), Multiple Greek Letter (MGL) model or a-factor model is preferable to use. It is recommended to consider human error probabilities (HEP) in the range of
Introduction to probabilistic safety assessment
49
1E-2 to 1E-4. A value in-between can be assigned in FT/ET quantification using simple human error matrix. Human involved jobs can be classified into simple, moderately complex and complex and human error attributes as less experienced, experienced and highly experienced with the time availability for completion of jobs as more than half an hour, 10–30 minutes and less than 10 minutes. If the minimal cut indicates human error event as a dominant contributor, detailed human error modeling needs to be done to arrive at more realistic values. Appropriate modeling of shared systems and passive features should include requirements of technical specifications for the operation of the unit. These requirements for shared systems include maintenance and surveillance aspects, conditions for sharing between units based on the system/component capability, human actions required, etc. The functional requirements for passive system operation include the physical phenomena, driving force, environmental conditions, application of margins, etc. Support systems/components and human actions should be modeled in fault trees so that the simpler approach of large fault tree and small event tree analyses is carried out. If a support system has potential to impact more than one mitigating system, it can be placed as header in the event tree before the mitigating systems.
2.7.1 Data It is preferable to use the plant-specific data for carrying out a PSA.However, it is not always available and failure data of similar plants and generic data are used with discretion and expert judgment. Statistical techniques such as Bayesian approach is recommended to update the existing data.
2.7.2 Failure mode selection Generally, failure modes effect analysis (FMEA) is adopted to identify the potential failure modes of a component that may impact the safety of the plant. The unavailability mode of components like repairable, nonrepairable, tested, standby, etc. should be chosen as per the technical specifications and plant layout.
2.7.3 External events While considering external events in PSA, simultaneous occurrence of two external events, or one external event inducing other should be considered in PSA. Some external events (e.g., flood) may disrupt the evacuation routes
50
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
and reduce the impact of radioactivity release/contamination and collective dose in public domain, etc. Such aspects of opposing nature should be addressed suitably in PSA modeling.
2.7.4 Computer code The computer code used for PSA should be verified and validated and well documented. It is preferable to use state-of-art code with user-friendly features and internationally recommended.
2.7.5 A graded approach to risk evaluation The main goal of a graded approach in reliability and risk evaluation is to have a structured method commensurate with the risk associated with the application. In this context, PSA also adopts a graded approach and is performed at three levels, Level-1, Level-2, and Level-3 PSA and require large amount of information depending on the scope of the analysis. It is an efficient tool used to assess the safety and risk at NPPs and identify the weak links and vulnerabilities. Level 1 PSA requires information from safety analysis report, piping, electrical, and instrumentation drawings, descriptive information for various systems of the plant, procedures related to test, maintenance and operation of the plant components, generic and plant-specific data on PIEs, component failures, and human errors. For Level 2 PSA, additional information on detailed design information on the containment and associated ESFs are needed.Level 3 PSA requires site-specific meteorological data for the radioactivity transport calculations, local population densities, evacuation plans, and health-effect models for risk evaluation. If external events are to be analyzed, considerably more information will be needed, depending on the external events to be included. Information about the compartmentalization of the plant is necessary to analyze susceptibility to fires and floods. These levels of PSA are explained in various literatures.For completeness, the levels of PSA are explained in brief (Fig. 2.3).
2.7.6 Level-1 PSA Level 1 PSA is also called as systems analysis and provides an assessment of plant design and operation with the main focus on the accident sequences that can lead to core damage. It can provide major insights into design strengths and weaknesses, and ways to prevent core damage in an NPP. PSA level-1 calculations are performed with the objective to investigate
Figure 2.3 Schematic of levels of PSA.
Introduction to probabilistic safety assessment
51
52
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
and determine the event sequence that could lead to severe core damage, and which could release large quantity of radioactivity to the environment. It involves identification of PIEs, construction of event and fault trees and quantification of accident sequence using the methods of reliability analysis. Either external or internal initiating events, with reactor operating at various power levels could be considered in Level-1 analysis. The PIEs and dominant accident sequences are reactors specific, and it is essential to identify these for the particular reactor system. Before identifying the PIEs, it is necessary to make a list of all sources of radioactivity from which accidental releases could be postulated during different operational states of plant (i.e., full power operation, low power operation, shutdown stage, etc.) depending on the scope of the PSA being performed. The end result of the Level 1 PSA is CDF. Subsequently, an exhaustive list of accident initiators, namely, PIEs is identified. Several approaches are available for this task. The aim is to produce a list of PIEs and grouped these appropriately so as to ensure the list is as complete as possible, includes bounding cases. Generally, three sources are used for preparation of PIEs – Engineering evaluation: All the systems and components are reviewed and if any of the failure modes (e.g., failure to open, disruption, spurious operation) could lead directly or in combination with other failures, to core damage. – Operating experience: It is useful to refer to list of PIEs from operating experience of similar plants and from the accident analysis reports. – Deductive analysis: With core damage as the top event, break down the event into all possible categories of events that could lead to its occurrence. The different elements of Level 1 PSA are: – Plant familiarization and information gathering – Selection and grouping of initiating events – Accident sequence modeling – System modeling – Data acquisition and assessment – Accident sequence quantification a) Plant familiarization and information gathering This is the most difficult but basic input for the analysis. Various aspects of the plant are gathered from the plant design and operational experience. b) Selection and grouping of initiating events
Introduction to probabilistic safety assessment
53
To comprehensively cover all possible event occurrences,initiating events are derived from all possible sources such as engineering evaluation, operational experiences, references from operation of other similar plants, deductive analysis, etc. From the comprehensive list, events that demand the successful operation of same frontline systems and pose similar challenges to operator and result in similar consequences are grouped together. c) Accident sequence modeling Each of the initiating event group is expanded as an event tree and the plant response to the event is modeled and expressed in terms of success or failure of the safety functions and human actions. Safety functions form the headings in the event tree representing the relevant mitigating systems and support systems. The event trees also display some of the functional dependencies between the systems. The end state of the event tree is a safe state or a failure state depending on the behavior of the mitigating systems in the sequence. d) System modeling Once the mitigating systems in the event tree are modeled in the event sequence, the details of each of the systems can be analyzed in the form of fault tree, state space, reliability block diagrams, etc. A good understanding of the system functions and the operation of its components is necessary for system modeling. Techniques such as FMEA help in identifying the potential failure modes of the components to be included in the system model. For system modeling, fault tree is the most widely adopted method. It is a deductive analysis and analyzed to find all credible ways in which the undesired state can be identified. The model is extended up to the level of a basic component unavailability. While constructing a system fault tree, system boundaries, logic symbols, event coding, and extent of human errors and CCFs should be taken into account to maintain consistency. Unavailability due to outage for testing and maintenance, human errors associated with switching operations following testing and maintenance should also be taken into account. 2.7.6.1 Shared system modeling If a system is shared between units, it may not be available because of its demand for the other unit during the mission time for unit under consideration or system itself may fail. This can be modeled by top gate of “OR-gate” type. For such shared systems, relevant human actions, if any, CCFs, spurious actuations should be considered for unavailability.
54
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
2.7.6.2 Passive system modeling Passive systems are intended to add reliability in functioning without depending on an external source for its operation. These systems are based on inherent properties of the physical phenomena and driving forces of nature. Though there is a possibility of deviation in physical processes that may lead to failure of passive system, the probability of failure is lesser than that of the active systems. However, these deviations in physical processes and the environmental conditions that may affect the passive systems should be properly accounted. 2.7.6.3 Markov analysis There could be scenarios in which system states change with time such as those with periodic testing and maintenance. To estimate the unavailability of such time-dependent systems, Markov modeling technique can be adopted to represent the various states of the system and the paths along which the system can change from one state to another. It is possible to represent a state space diagram by a set of simultaneous differential equations representing the change with time of the probabilities of the states. Such analysis can be performed for specific systems and integrated with fault trees. For PSA in NPPs, the use of combined event tree and fault tree method is recommended. Within this combined method, depending on the event resolution or depth at which even sequence modeling stops and system modeling begins, two approaches are acceptable. They are small event tree/large fault tree approach and large event tree/small fault tree approach. Though both the approaches produce equivalent results, it is the ease of modeling that help to decide a suitable approach. In the small event tree/large fault tree approach, dependencies between the mitigating systems and support systems do not appear in the event trees. Event trees with safety functions as headings are developed and then expanded to event trees with the mitigating systems as heading. The mitigating system fault tree models are developed up to a predetermined boundary condition. Subsequently, the support system fault trees are developed separately and integrated into the mitigating system models. This approach generates event trees that are very compact and allow for a simpler and easier view of accident sequences. Moreover, it is easier to model if suitable computer codes are available. However, large fault trees are difficult to handle and result in high computational cost.
Introduction to probabilistic safety assessment
55
In the large event tree/small fault tree approach, dependencies between mitigating systems and support systems are included in the event trees. The top events on the fault trees have associated boundary conditions that include the assumption that the support system is in a particular state appropriate to the event sequence being evaluated. Separate fault trees must be used for a given system for each set of boundary conditions. These separate fault trees can be produced from a single fault tree that includes the support system and that, before being associated with a particular sequence, is “conditioned” on the support system state associated with this sequence. This approach generates large event trees that explicitly represents the existing dependencies. Since they are associated with small fault trees (i.e., mitigating systems without support systems), they are less demanding in terms of computer resources. However, a major difficulty is the size of the event tree as it increases with the number of the support systems and their states depicted in the event tree. a) Data acquisition and assessment Plant-specific initiating event frequency is acquired along with the component failure, repair, maintenance, CCF, and human error data. If plantspecific data are insufficient, statistical techniques such as Bayesian update is adopted. b) Accident sequence quantification All the accident sequences of event trees are quantified using the data and CDF of the various contributors is obtained.
2.7.7 Level-2 PSA Level 2 PSA is called as containment analysis wherein the accident sequences identified in Level 1 are analyzed further to assess the integrity of the containment and quantifies the magnitude and frequency of radioactive release to the environment. Subsequent to identification of dominating accident sequences, a combination of probabilistic and/or deterministic approaches is used to determine the release of radioactivity from containment building to the outside environment. The deterministic part of the analysis focuses on calculating the release magnitude from the core, physical processes during an accident progression (timing and magnitude of radioactivity release), the response of the containment system and the estimation of source term. Probabilistic evaluation is carried out to study the progression of accidents and its impact on the containment behavior using containment event trees. Containment event trees are used to characterize the progression of severe
56
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
accidents and containment failure modes that lead to release of fission products outside the containment. CET includes evaluation of containment, containment ESF performance, and failure modes of containment. The large number of end states of the containment event trees is grouped to a more manageable set of release categories for which distinct source terms are estimated. The source term assessment provides information about the characteristics of the release categories in terms of composition of the release and the time of release. This phase (also called the PSA level-2) broadly involves calculations in the following areas: r Core fission product inventory r Core degradation r Core thermal hydraulics r Fission product release from degraded core r Fission product transport in Primary/Main heat transport system r Containment thermal hydraulics and fission product transport in containment r Containment integrity and leakage r Source term evaluation and fission product release to environment 2.7.7.1 Core fission product inventory It is essential to get an estimate of the actual fission product inventory in the core at the time of the accident. This would involve reactor physics calculations for the core configuration of the reactor, using appropriate fuel configuration and burn up data. 2.7.7.2 Core degradation Core degradation includes the early-phase degradation involving various phenomena such as fuel rod ballooning and subsequent failure, clad oxidation, fuel rod heat-up, molten mixture candling, etc., and then the latephase degradation such as corium accumulation within the core channels and formation of blockages, corium slump into the lower head and corium behavior in the lower head until the pressure vessel fails. Details of the pressure vessel geometry and core components would be required to carry out this analysis. 2.7.7.3 Core thermal hydraulics The primary heat transport system thermal hydraulic calculations under severe accident conditions play a major role in the overall consequence analysis. The type of analysis would depend on the type of reactor and may
Introduction to probabilistic safety assessment
57
include LOCA involving various break sizes, together with nonavailability of various safety systems. The mass and energy discharge data from any break location will form the input data for containment thermal hydraulic transient calculations. 2.7.7.4 Fission product release from degraded core The release of volatile, semivolatile, and nonvolatile fission products to coolant from the degraded core needs to be estimated. Models for species intragranular diffusion, evaporation of semivolatiles into porosities, fuel volatilization, release during candling, release from liquid pool, etc. are required. The input data for carrying out these calculations can be obtained from core fission product inventory calculations. 2.7.7.5 Fission product transport in PHT/MHT system Not all the fission products released from the core find their way to the containment. Hence, calculations are required to simulate transport of FP vapors and aerosols in the PHT/MHT and their subsequent release into the containment. These calculations will use the information generated in earlier steps, that is, core fission product inventory and core fission product release calculation. 2.7.7.6 Containment thermal hydraulics and fission product transport in containment These calculations would involve estimating the pressure and temperature transients, radioactivity transport within the containment building, their removal through various modes, and determination of leakages through various paths contributing to ground level and stack level release. These calculations could be carried out with or without the availability of various ESFs. Iodine is one of the main volatile fission products that determine the dose to public. 2.7.7.7 Containment integrity and leakage An appropriate model to estimate the structural integrity and leakage flow through cracks, penetrations, and permeability of concrete needs to be evolved. 2.7.7.8 Source term evaluation and fission product release to environment Fission products accumulated within the containment building are released over a period of time, from various paths via the stack. Besides, ground level
58
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
leakages due to flow through cracks and penetrations also occur. These data form as input requirement for the plume dispersion calculation required as part of consequence analysis. The quantity, timing, duration, and magnitude of radioactive releases to the environment during a severe reactor accident are defined by the source term. Thus,Level 2 provides insights into the weaknesses and strengths of onsite accident mitigation and management measures.
2.7.8 Level-3 PSA Level 3 PSA is called consequence analysis as it analyses the dispersion of radionuclides to the surrounding environment by analyzing both potential environmental and health effects. The analysis consists of studies related to transport of radionuclides in the environment (subsequent to the release from the containment through various paths), assessment of public health risks and if required, the economic consequences of a postulated severe accident. The issues to be addressed are outlined below: a) Transport of radionuclides in the environment The dispersion of radionuclides to the environment occurs by various pathways. For evaluation of radiation dose, the following pathways are normally considered. External irradiation from radioactive materials i. In the passing plume or cloud of radioactive materials (referred to as cloud shine). ii. Deposited on the ground (referred to as ground shine). iii. Deposited on clothing and skin generally called contamination. Internal irradiation from radioactive materials i. Inhaled directly from the passing plume. ii. Inhaled following re-suspension of ground deposits. iii. Ingested following the direct or indirect contamination of foodstuff or drinking water by radioactive material deposited from the plume. Radionuclide release data for the particular accident scenario obtained from the earlier calculations are used as input. Moreover, meteorological data for the particular site such as wind speed and direction, atmospheric stability category, precipitation rate, etc. for a significant period will also be required. b) Assessment of public health risks The following early health effects are generally considered in the consequence analysis:
Introduction to probabilistic safety assessment
59
r
Irradiation of bone marrow, lung, gastrointestinal tract, and skin. The occurrence of wide variety of radiation induced effects such as vomiting, diarrhea, hypo thyroids, temporary sterility, mental retardation is also investigated. r Dose conversion factors, risk conversion factors and population and agriculture data would be required at this stage. c) Assessment of economic consequences For the assessment of economical consequence, relevant information pertaining to economic data would be required. Thus, the analysis considers meteorological conditions, demographic data, land-use land-cover, vegetation, topography, and identifies various exposure pathways to estimate the health effects on the public and associated societal risks. r
2.8 Initiating event frequency A generic list of initiating events considered in a various type of reactors is given in several IAEA TECDOCs. The broad categories of IE groups are – Loss of regulation accident – Loss of coolant accident r Small LOCA r Medium LOCA r Large LOCA – Loss of flow accident – Loss of heat sink – Loss of pressure control – Reduction in primary system coolant flow – Loss of reduction in feedwater flow – Failures in electric power supply and I&C systems – Loss of instrument air – Loss of service water – Reactivity transients Engineering evaluation, previous PSAs, review of operating experience are some of the techniques recommended to evaluate the completeness of initiating events. The initiating event frequency is estimated based on the number of occurrences of the event over a period of time. It is appropriate to use plantspecific data for this purpose. Plant-specific data can be from the log books,
60
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
significant occurrences record, near miss events, event records, etc. In case of lack of plant-specific data, data from similar plants or generic databases are used. The initiating events with occurrences greater than zero and less than ten are classified as rare events.
2.9 Component data As in initiating frequency estimation, the first option is to collect the component data from the operating experience. Bayesian techniques can be adopted to combine the plant-specific data with generic data taking into account the differences between the plants. The boundary conditions and the mode of failure for the component have to be clearly defined. The mode of failure is given as an undesirable state of component performance (e.g., a closed motorized valve does not open when required owing to a mechanical failure of the valve prior to the demand). Component unavailability is estimated from testing and maintenance data.
2.9.1 Component reliability models For component reliability modeling, to estimate the probability that a component will not perform its intended function for a required period of time depend on the mode of operation. Each reliability model has one or more parameters defined. A brief description of the models is given: a) Repairable component Components which can be repaired and put back to operation without replacing the entire system are categorized as repairable. The metric for repairable component includes mean time to fail and downtime. b) Nonrepairable component Nonrepairable components are treated differently from repairable components. The metric for nonrepairable component is mean time to fail. c) Standby system/component (tested/untested) The unavailability of these components is a function of the standby time. If the component is tested periodically, then the average unavailability during the period of analysis is the average unavailability during the period between tests. The time-dependent feature allows the inclusion in the model of the influence of the frequency of periodic testing. Depending on how a component is tested, we can distinguish three types of components of standby systems.
Introduction to probabilistic safety assessment
61
d) Continuously monitored components Some components are continuously monitored for their health condition and any failure is detectable as soon as it occurs. The parameters required are the failure rate and repair rate.
2.10 Human reliability Human performance and human reliability play an important role in design, operation, testing, and maintenance activities of NPPs. HRA should be primarily applied in PSA, where it is important to identify human errors, which have a significant effect on overall safety and also quantify the probability of their occurrence. Human reliability analysis (HRA) is an integral part of PSA used to estimate the human error probability. For the past three decades many efforts have been made by researchers to develop several HRA methods. The application of HRA in PSA requires quality data. The systematic collection and classification of human error and human reliability data are essential for quantifying the human error probability. Recovery, dependency, uncertainty, and sensitivity analyses are the supporting tools to arrive at a more realistic HEP if it is properly applied along with the basic HEP (BHEP). The main objective of treating human reliability in a PSA is to ensure that the key human interactions and dependencies are systematically incorporated into the assessment because the risk metric such as CDF and large early release frequency in NPPs can be significantly underestimated if the potential dependencies are not addressed properly. The aim is also to make HRA as realistic as possible, by taking into account the emergency procedures,the man-machine interface,the training program and the knowledge as well as the experience of the crews. It should be noted that PSA by itself cannot fully address all important human reliability and human factors. Issues relevant to nuclear safety, for example, some aspects of management and organization, are generally excluded. While HRA models aim at systematically recognizing and investigating the roots, consequences and contributions of human failures in sociotechnical systems, the treatment of human reliability in a PSA is still evolving owing to complexity of human behavior and to a general lack of relevant data. There is a growing consensus, however, on the explicitly and implicitly need, usefulness and modeling of HRA in PSA. A detailed discussion on HRA is given in Chapter 6.
62
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
2.11 Dependence analysis While carrying out a PSA study, all the dependencies should be listed separately and modeled in fault trees and event trees in order to evaluate their impact on the level of risk. The degree of dependency among human actions or between person and events ranges along a continuum from complete dependency to zero dependency expressed in THERP shown below: According to THERP, there are five levels of dependency can be assigned, which are complete dependency (CD), high dependency (HD), moderate or medium dependency (MD), low dependency (LD), and zero dependency (ZD). The details of the dependency are as follows:
2.11.1 Complete dependency It is assigned when skilled operator, doing the actions from his memory and does not use any written procedure. Consecutive or sequence of actions is placed immediate one after another and the actions are performed by same person. Omission errors among operators are generally modeled in CD. Complete dependency gives possibility of 100% error probability for the subsequent actions, so the error probability would be 1.0 assigned when CD is considered.
2.11.2 High dependency We can assign at the time of high stress level of the operator during an abnormal event. In an abnormal event time for doing subsequent task is very minimum (e.g., 15 minutes) and substantially affect another task. It also occurs when the interaction between operators in control room is high. Final error probability will be estimated considering high dependency is (1 + BHEP)/2.
2.11.3 Moderate dependency Moderate dependency will be used only when stress level is moderate and obvious relationship between person on one task and other task. The equation for adjusting the error probability considering moderate dependency is (1 + 6xBHEP)/7.
2.11.4 Low dependency The LD generally results in substantial change in the CHEP. The change is much smaller in case precedence task has been performed successfully.
Introduction to probabilistic safety assessment
63
The equation for calculating error probability when low dependency is (1 + 19xBHEP)/20.
2.11.5 Zero dependency Performance of one task has no effect on the performance of the subsequent task. All human actions are identified as being as completely independent. Since there is no dependence between consecutive steps the actual BHEP will be taken as overall HEP for the system. Further, dependencies are categorized as functional, physical, and interaction: a) Functional dependencies These dependencies are among system, train, subsystem, or component due to the sharing of hardware or to a process coupling. Shared hardware refers to the dependence of multiple systems, trains, subsystems, or components on the same equipment. In process coupling, the function of the one system, train, subsystem, or component depends directly or indirectly on the function of another. A direct dependence exists when the output of one system, train, subsystem or component constitutes an input to another. An indirect dependence exists when the functional requirements of the one system, train, subsystem, or component depend on the state of another. Possible direct process coupling between system, train, subsystem, or component includes electrical, hydraulic, pneumatic, and mechanical connections. b) Physical dependencies There are two types of physical dependencies: – Those dependencies that cause an initiating event and also possibly failure of plant mitigating systems due to the same influence, for example, external hazards, internal events. Such events are certain transients, earthquakes, fires, and floods, etc. They require special treatment and will be discussed in subsequent chapters. – Those dependencies that increase the probability of multiple system failures. Often, they are associated with extreme environmental stresses created by the failure of one or more systems after an initiating event or by the initiating event directly. Examples are fluid jets and environmental effects caused by LOCAs. It should be emphasized that proximity is not the only “environmental” coupling inducing physical dependence. A ventilation duct, for example, might create an environmental coupling among systems;trains,subsystems or
64
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
components located in seemingly decoupled locations. Radiation coupling and electromagnetic coupling are two other forms not directly associated with a common spatial domain. c) Human interaction dependence Two types of dependence introduced by human actions can be distinguished: those based on cognitive behavioral processes and those based on procedural behavioral processes. Cognitive human errors can result in multiple faults once an event has been initiated. Dependencies due to procedural human errors include multiple maintenance errors that result in dependent faults with effects that may not be immediately apparent (e.g., miscalibration of redundant components).
2.12 Passive systems To gain the public confidence and to eliminate the concern on nuclear reactor safety, several efforts are made to improve the designs of new generation nuclear reactors for enhanced safety with stringent regulatory policies. The international nuclear safety group has recommended CDF less than 10−5 per reactor year for new reactors as against the present target of 10−4 per reactor year for existing reactors. Similarly, the goal for the large early release frequency is enhanced to 10−6 per reactor year as against the present goal of 10−5 per reactor year. To meet these stringent targets to achieve high reliability requirements, new reactor designs have inherent safety features to enhance the capability of the reactor to return to safe condition in the event of any accidents due to internal or external hazards. One such safety feature is the inclusion of passive safety systems that rely on intelligent use of natural phenomena, such as gravity, conduction, and radiation, to perform essential safety functions. Since passive systems are not depending on external sources, it is believed that passive systems are more reliable than active systems. The strong dependence of passive safety systems on inherent physical principles makes it much more sensitive to changes in the surroundings than active ones. This concern is significant for passive systems that depend on natural circulation of fluids.In addition,the operators cannot control passive systems the way they can control the performance of active systems. The major source of unreliability of active systems is the energy that is required to perform. IAEA TECDOC 626 defines the passive safety system as “A system that is composed entirely of passive components and structures or a system, which uses active components in a very limited way to initiate subsequent
Introduction to probabilistic safety assessment
65
passive operation.” As per IAEA-TECDOC-626, passive safety systems can be categorized into four categories, as described below.
2.12.1 Category A In this category, passive systems do not have any external signal inputs of intelligence and there is no external power sources or forces. There are no moving mechanical components or parts or any moving working fluids. Some of the examples of this category of passive safety features are physical barriers against the release of fission products, such as fuel cladding and pressure boundary components and systems; core cooling systems relying on heat transfer by radiation, convection, and conduction from nuclear fuel to outer structural parts, static components of safety related passive systems such as tubes, accumulators, surge tanks, etc.
2.12.2 Category B Unlike category A, these systems have moving working fluids. They do not need external power sources for their actuation and do not have moving mechanical components or parts. The fluid movement is only due to thermal-hydraulic conditions when the safety function is activated. Examples of this category of passive safety features are systems operating on natural circulation, emergency cooling systems based on air or water natural circulation in heat exchangers immersed in water pools for decay heat removal.
2.12.3 Category C In this category, passive systems can have moving mechanical parts. The systems may or may not have moving working fluids. These systems do not depend on any external power sources and any external signal for activation. Some of the examples of this category of passive systems are venting by relief valves or rupture discs to prevent overpressure; emergency injection systems consisting of accumulators and check valves, filtered venting systems of containments activated by rupture disks.
2.12.4 Category D This category of passive systems is characterized by passive execution and active initiation for their operation. That means external source of intelligence is required to initiate the process. Operation followed by the initiation of the process is executed by passive means. This category draws a border
66
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
between active and passive systems. Some of the examples of this category of passive safety features are emergency core cooling systems (ECCS), which get activated by electro-pneumatic valves and are based on gravity driven flow of water, and emergency shutdown systems based on gravity or static pressure driven control rods, which get activated by fail-safe trip logic. The concept of passive function failure,borrowed from reliability physics, has been introduced by Burgazzi in 2003. It describes “failure” in terms of a “stress” exerted during the performance of the system on its components and the “resistance” of these components to withstand that stress. The methodology for evaluating functional reliability is described by Michel et al. PSA of innovative nuclear reactor projects is only taking account of the failures of the passive system components, but not the failure of the physical phenomena on which the system is based, such as the natural circulation. The treatment of this aspect and the integration of passive system functional reliability in the PSA models is a difficult and challenging task. Reliability evaluation of passive systems reliability (REPAS) and reliability methods for passive safety functions (RMPS) are two popular methodologies for reliability evaluation of passive systems. REPAS method involves the following steps: – Identification of relevant parameters (design and critical) connected with the thermal hydraulic phenomenon – Definition of nominal values, range of variation and probability distributions for the parameters identified in step 1 – Selection of system status based on engineering judgment and Monte Carlo procedure – Definition of failure criteria for the system performance and identification of the accident scenario – Use of best estimate thermal hydraulic code for system model – Simulation for propagation of uncertain parameters considered and estimation of probability of failure of the passive function – Sensitivity analysis – Quantification RMPS method is developed inheriting the features of REPAS and improving the shortcomings. RMPS method involves the following steps: – Characterization of operation modes such as start-up, shutdown, etc. – Definition of physical failure criteria in terms of process variables such as temperature, pressure, flow rate, power when exceeding specified threshold limits
Introduction to probabilistic safety assessment
67
– Identification of related and root cause for failure through procedures such as analytical hierarchy process, fault tree – Decomposition of the processes in terms of relevant system parameters – Ranking of the most important parameters – Screening in/out of the parameters through expert judgment and proper decision analysis methods – Identification of parameters and their dependencies – Modeling the effects of relevant parameters and nodalization – Deterministic evaluation for best estimate calculation – Modeling of the most important processes and parameters and validation of results – Assign probability distributions for the parameters identified – Probability propagation, uncertainty and sensitivity analysis, and investigation using response surface, neural network models – Quantitative evaluation for reliability Several applications of REPAS and RMPS are reported in literature for reference.
2.13 Software reliability Any PSA study at a nuclear plant is incomplete if the reliability of software and computer based systems are not included for the estimation of risk. Software reliability is one of the important parameters of software quality and system dependability. It is defined as the probability of failure-free software operation in a specified environment for a specified period of time. A software failure occurs when the behavior of the software departs from its specifications, and it is the result of a software fault, a design defect, being activated by certain input to the code during its execution. In the nuclear context, the combination of software, hardware, and the instrumentation are referred as instrument and control (I&C) system. The increasing use of computer-based systems for safety critical operations demand a systematic way of estimating reliability of I&C system in NPPs. The high reliability requirements of safety critical I&C systems make this task imperative. Although, a wide variety of techniques are used to ensure system dependability such as fault-tolerance, fault prevention, etc., it is necessary to be able to reason in more formal way for dependability characteristics in order to assess the sensitivity of the system, given different execution conditions, usage profile or architecture changes. This provides a strong cause for provision of methods for quantitative evaluation of systems’
68
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
dependability attributes. Moreover, in the perspective of the software engineering guidelines for structured and predictive development of software systems it is of great importance to have adequate methods for estimation and quantification of as much as possible quality characteristics. In this respect, development and research on techniques for formal specification and quantification of quality factors are necessary. Many studies have proposed different software life cycle models to ensure the quality of the software.A software life cycle development process involves planning, implementation, testing, documenting, deployment, and maintenance. Among many software life cycle models proposed, the classic waterfall model of software development is widely adopted in NPP. The different phases of waterfall model include requirement specification (requirement analysis),software design,integration,testing (or validation),deployment,and maintenance. The refined model of water fall tailored to fulfill the safety requirements for NPP is called the V-model (Fig. 2.4) which introduces verification and validation at the end of every stage. For requirements analysis, formal verification is a method of proving certain properties in the designed algorithm written in mathematical language/notation. Approaches to formal verification include formal proof and model checking. Formal proof is a finite sequence of steps which proves or disproves a certain property of the software, whereas model checking achieves the same through exhaustive search of the state space of the model. Unfortunately, it is not always feasible to ensure complete formal verification of software due to the difficulties involved such as state space explosion and difficulties involved in application of formal methods. Also, a major assumption in formal verification is that the requirements specification captures all the desired properties correctly. If this assumption is violated, the formal verification becomes invalid. Reliability estimates based on software testing have been recommended, and has been adopted for decades. Repeated failure-free execution of the software provides a certain level of confidence in the reliability estimate. However, software testing can only indicate the presence of faults and not its absence. Regulatory bodies issue guidelines on best practices in software requirement analysis, defense-in-depth in design, safe programming practices, verification and validation processes, etc. Deterministic analyses such as hazard analysis and formal methods are generalization of the design basis accident methodology used in the nuclear industry. Probabilistic analysis is considered more appropriate as software faults are by definition design faults.
69
Figure 2.4 Software life cycle “V” model.
Introduction to probabilistic safety assessment
70
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Table 2.1 Classification of software reliability growth models. Exponential
Jelinski-Moranda Shooman Musa Basic execution time Goel-Okumoto Schneidewind
Finite Weibull & Gamma
S-shaped reliability growth Weibull
Infinite Exponential
Geometric
Bayesian model
LittlewoodVerrall
Musa-Okumoto Logarithmic Duane
Among the software testing methods to assess software system reliability using black-box and white-box models. However, there is no universally acceptable method to evaluate reliability using these models. The group of white-box models consists of several kinds of models that are used to estimate the reliability of software systems, based on knowledge of their internal structure and processes going within them. This knowledge may be expressed by different means, such as architecture models, test case models, etc. On the other hand, the group of black-box models encompasses much larger number of methods that treat the software as a monolithic whole, that is, as a black-box.
2.13.1 Black-box reliability models During the testing process of a software, when a bug or a fault is detected, the fault is rectified and assuming that these faults are fixed without introducing a new fault, software reliability increases. If the failure data is recorded in terms of number of failures observed over a given period of time, statistical models are used to identify the trend in failure data, reflecting the growth in reliability. Such models are called software reliability growth models (SRGMs). They are used to both predict failures and estimate software reliability. All SRGMs are of the black-box type since they consider only the failure data and do not consider the internal structure or software architecture. They use the past failure information and predict future failures that reflect the growth of reliability. A broad classification of SRGMs is given in Table 2.1. SRGMs are classified under three major groups: finite, infinite based on the total number of failures expressed in infinite time and Bayesian models. Tools such as CASRE, SMERFS are available for analyzing SRGMs. These models depend only on the number of failures observed or time between failures. SRGMs are in use since early 1970s. Three models
Introduction to probabilistic safety assessment
71
that represent different groups of SRGM and found more suitable for safety critical applications are discussed here. Jelinski-Moranda is one of the basic models which assumes exponential failure rate. Musa-Okumoto model assumes that the software is never fault free and is recommended for safety critical applications. Littlewood-Verrall is applicable when there are no failures during testing or when the failure data is not available. Moreover, this model accounts for fault introduction during error correction process. These three models are discussed as they represent each family of black-box model group and are suitable for safety critical applications. The main idea in the Jelinski-Moranda model is that failure occurrence rate is proportional to the number of faults remaining in the system and failure rate remains constant and is reduced by the same amount after each fault is removed. The assumptions of the J-M model are: r The rate of failure detections is proportional to the current fault content of a program. r All faults are equal and are corrected immediately. r Testing environment is same as operational environment. r Faults are corrected and no new faults are introduced during error correction. The model requires data either on time between failure occurrences or the total duration of the failure occurrences. Musa Okumoto model falls in the category of infinite failure models. This model assumes that the failure intensity decreases exponentially with expected number of failures experienced. The model is also called logarithmic because the expected number of failures over time is a logarithmic function. As in other black-box model, this model also requires either time between failures or the actual times that the software failed. Littlewood-Verrall model is an example for a Bayesian approach for reliability estimation and assumes that the times between successive failures are conditionally independent exponential random variables with a parameter li which itself has gamma distributions with parameters j(i) and a. This model is used when the failure data is not available and a judgment is made on the unknown data (prior). The data required in the model are either the time between failures or the time of failure occurrences. Since the black-box models rely on failure data, the reliability estimate obtained depend on various factors that can bring in uncertainty. These factors can be grouped into one of the following: r Test coverage r Number of failures r Time between failures
72
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
2.13.2 White-box reliability models According to the way of presentation of internal structure of software systems, white-box models could be divided into three main subgroups: state-based models, path-based models and additive models. Usually there are four basic steps in assessment of systems’ reliability: (1) identification of modules (components) constituting the software system; (2) construction of architectural model; (3) definition of components failure behavior; and (4) mixing the architectural model (obtained at step 2) with the failure behavior (obtained at step 3) of the components. According to the particular approach for combination of failure behavior with architecture, there exist three classes of component-based reliability models: r State-based models use finite state machine (FSM)-based model to represent the architecture of the system. They assume that at any given moment there is only one active component in the system and try to take into account, all of the possible traces of components execution within the system. r Path-based models are similar to state-based models, but consider only finite number of component executions traces. The latter usually correspond to system test cases. r Additive models do not concern actual architectural configuration of the software system. Instead, they assume a specific distribution process for components failure behavior and on that basis infer formulae for calculation of reliability. Based on their importance as per number of publications and references to the models we give a short overview of the most significant whitebox software reliability models here. There has been made wide research on architecture-based software reliability in the last few decades. A short summary of these models follows. Cheung’s model is one of the oldest component-oriented software reliability models, which is highly significant as most of the subsequent models are based on it. It presents the architecture of the system as a Markov chain (i.e., a directed graph), where every state represents a component in the system and the arrows represent probabilities of control transfer from one component to another. The model calculates system reliability, using an extended transition probability matrix (TPM). It takes into account that infinite number of component executions may occur until termination of the application execution, for example, in case of loops. The model of Reussner use Markov chains to model system architecture. This method uses the idea to express component reliability not as an
Introduction to probabilistic safety assessment
73
absolute value but as function of the input profile of the component. Markov chains are constructed in hierarchical manner and states include calls to component services in addition to usual component executions. Services may invoke different methods, which may be either internal or external for the component. The reliability of the component is calculated by the reliabilities of the methods,which it uses,and they depend on the operational profile. Path-based models are significantly less developed according to the number of known publications and research works on them. The most typical of these models is the model of Krishnamurthy. It considers system architecture as a component-call graph, in which states represent constituent components and the arrows—possible transitions between them. It estimates the reliability of a system by considering different sequences of component executions in that graph, called path traces. First reliabilities of single path traces are calculated and then system reliability is calculated as the average of the reliabilities of all considered path traces. In this paper Hamlet model is also regarded as path-based, as it considers the actual execution traces of component execution given the mapping from input to output profile. This model tries to address the issue of unavailability of component’s usage profile in early system development phases. To do so they do not assume fixed numeric values for reliability but provide model mappings from particular input profile to reliability parameters. Different input profiles are represented by dividing the input domain of the component to subdomains and assigning a probability for each subdomain to occur.This model does not consider explicitly the architecture of the system. Instead, it calculates the output profile of a component, which actually is the input for the next component and is used to calculate latter reliability. A recent extension of the Hamlet model is called Zhang model, where an attempt has been made to improve the applicability of the model in realcase software systems. Instead of using a difficult-to-define mathematical expression of the function, performed by system components an easy to apply combination with state-based approaches is proposed, based on expectation of different component input domains to occur. By considering the software testing models described,and the factors that bring uncertainty in reliability values given by these models, the following may be adopted. r In order to get a better estimate for reliability of safety-critical software systems, more than one black-box software reliability model is recommended.
74
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
r
Give a formal definition and measurement for each of the factors, influencing the selected (at previous step) models to be applied. r Combine the results of all models to get to a single interval (either for each component or for the system) for a realistic reliability estimate. r In case when architectural information is available, select a white-box model, which should be modified to take into account not only the reliability value but also their interval bounds. r Apply the selected white-box model to get a realistic reliability estimate for the system. In summary, when I&C systems perform functions important to safety, these systems must be demonstrated to be safe and reliable with appropriate degree of confidence and a combined analysis of deterministic and probabilistic approach is required. In the classical approach of PSA, for integrating software reliability in the estimation of risk, the following four questions need to be answered: r What can go wrong? Identification of the software-related failures in the system. r What are the potential consequences if the software failure occurs? The modeling approaches to be adopted. r How likely is the failure and its consequence? Quantify the software reliability. r What is the confidence level? Perform uncertainty analysis.
2.14 Uncertainty analysis Any PSA study needs to include uncertainty and sensitivity analysis as an integral part. Uncertainties are inevitably introduced during the process of assessment. Uncertainties associated with parameters, models, phenomena, assumptions, completeness, etc. limit the use of PSA in NPPs. Knowing the sources of uncertainties is important, so that by appropriately treating them depending on their contribution will result in an effective decision making. Major sources of uncertainties are classified into aleatory and epistemic. Aleatory uncertainties are due to randomness which is inherent in any assessment and are irreducible. It is addressed by adopting probability models for initiating events and component failures. However, epistemic uncertainties due to model, parameter, completeness have an impact on the results and can be minimized. They are briefly explained below. Modeling uncertainties are introduced due to the inadequacies in the model characterization and comprehensiveness. These are uncertainties introduced by the relative inadequacy of the conceptual models, the
Introduction to probabilistic safety assessment
75
mathematical models, the numerical approximations, the coding errors, and the computational limits. A consensus model combined with expert reviews can minimize the modeling uncertainties. Sensitivity studies to evaluate the change in risk under different assumptions will also help in addressing modeling uncertainties. Parameter uncertainties result from imprecise parameter values introduced due to the scarcity of data, assumptions, and boundary conditions. Statistical distributions are the appropriate measure to address parameter uncertainties. Completeness uncertainties arise due to the omission of significant phenomena or process or relationships are considered in the model. Though some of the factors are omitted deliberately by qualitative/quantitative screening, it is difficult to justify whether all pertinent risks are included. The principles of safety margins and defense-in-depth are used to reduce the completeness uncertainties. Sampling methods such as simple random sampling, Monte Carlo simulation with Latin hypercube sampling, moment propagation methods are generally adopted techniques for propagation of uncertainties.
2.15 Sensitivity analysis Sensitivity analysis is performed to understand the impact of contributors to the risk metrics such as CDF, release frequency and consequence frequency. The purpose of sensitivity analysis is twofold: (i) to address modeling assumptions suspected of having a significant impact on the results; (ii) to determine the sensitivity of the CDF/system reliability or other risk metric, to possible dependencies among component failures and among human errors. It is carried out for one assumption or one parameter at a time. While carrying out sensitivity studies,acceptable increase in risk level can be referenced as maximum of 10% increase in system unavailability level, 1% increase in CDF level and/or 0.1% increase in release frequency level in general. For risk-based evaluations one order lower may be considered, unless otherwise specified by regulatory body as targets for risk-informed decision. The level of confidence in the insights gained from the PSA is obtained from sensitivity analysis.
2.16 Importance measures Importance measures are used to assess the contribution of a component, basic event or cutset to the overall risk of the plant and help to interpret the results of PSA. It can also be used in optimization of the plant
76
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
design by analyzing the risk increase if the component is removed and risk reduction if a component is added to the system. By changing the testing and maintenance strategy, the effect on risk can be analyzed and thus optimize the maintenance schedules. The quantitative estimates obtained from importance measures provide valuable insights on the relative ranking of safety significance of PSA elements such as basic components, human actions, functions, etc. Importance measures use in Level-1 PSA include Fussel-Vesely importance, risk achievement worth (RAW), risk reduction worth (RRW), and Birnbaum importance (BI). Fussell-Vesely importance: It represents the probability that a certain basic event, component, cutset is the cause if the top event occurs. This importance measure can also be applied to groups of cutsets. It is calculated as the ratio of the probability of top event due only to cutsets of interest to the probability of top event. If the cutset of a fault tree is: A + B.C + B.D + C.E.F, then the probability of top event is: P = PA + PB .PC + PB .PD + PC .PE .PF where pi denotes the probability of failure of component A. The Fussell-Vesely measure for component A is FVA =
PA PA + PB PC + PB PD + PC PE PF
The Fussell-Vesely measure for component B and C is FVBC =
PB PC + PB PD + PC PE PF PA + PB PC + PB PD + PC PE PF
FVComponent i =
Sum o f MCS containingComponent i System U navailability
2.16.1 Risk achievement worth The increase in risk when a component is assumed to be unavailable. RAW is defined as the ratio of probability of top event with component failure probability taken as 1 to probability of top event. If the cutset of a fault tree is: A + B.C + B.D + C.E.F, then the probability of top event is: P = PA + PB .PC + PB .PD + PC .PE .PF
Introduction to probabilistic safety assessment
77
The RAW measure for component A is RAWA =
1 + PB PC + PB PD + PC PE PF PA + PB PC + PB PD + PC PE PF
The RAW measure for component B is PA + 1.PC + 1.PD + PC PE PF PA + PB PC + PB PD + PC PE PF System unavailibility with PComponent i = 1 = System U navailability
RAWB = FVComponent i
2.16.2 Risk reduction worth The decrease in risk when a component is assumed to be available. RRW is defined as the ratio of probability of top event to probability of top event with component failure probability taken as 0. If the cutset of a fault tree is: A+B.C+B.D+C.E.F, then the probability of top event is: P = PA + PB .PC + PB .PD + PC .PE .PF The RRW measure for component A is RRWA =
PA + PB PC + PB PD + PC PE PF 0 + PB PC + PB PD + PC PE PF
The RRW measure for component B is RRWB = RRWComponent i =
PA + PB PC + PB PD + PC PE PF PA + 0 + 0 + PC PE PF
System unavailibility System U navailability with PComponent i = 0
2.16.3 Birnbaum importance It is a measure to estimate the change in total risk as a result of changes to the probability of an individual basic component. BI ranks the components by comparing the increase in risk when a component is failed to the risk when the component is operable. It is the difference between the probability of the top event with component failure probability is 1 and the probability of the top event with the component failure probability is 0.
78
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
If the cutset of a fault tree is: A + B.C + B.D + C.E.F, then the probability of top event is: P = PA + PB .PC + PB .PD + PC .PE .PF The BI measure for component A is BIA = (1 + PB .PC + PB .PD + PC .PE .PF ) − (0 + PB .PC + PB .PD + PC .PE .PF ) =1 The BI measure for component B is BIB = (PA + 1.PC + 1.PD + PC .PE .PF ) − (PA + 0.PC + 0.PD + PC .PE .PF ) = PC + PD BIComponent i = System unavailibility PComponent i = 1 −System unavailibility PComponent i = 0
2.17 Applications of PSA PSA is vital from regulator’s viewpoint as risk-informed decision making relies on the output and insights of PSA. The methodology of PSA is structured and systematic as it explores the risk significance of various aspects of an application during design and operation stages. It also evaluates the impact of changes in design or modification in operating procedures apart from assessing the abnormal events that occur, in establishing allowed outage times and optimum surveillance test intervals for safety related equipment, improvement to emergency operating procedures, in the use of risk-based indicators for controlling plant configuration during maintenance and in development of maintenance strategies for systems. In view of its comprehensive evaluation and wide area of applications, PSA is used in various fields. Some of the applications of PSA are discussed.
2.17.1 Design of NPPs PSA can be used for different applications during the design stage of a NPP. Some of them are during design evaluation such as optimization of design for a new plant, identification of weak links, to achieve balance in design, effectiveness in redundancy, diversity, safety logics in protective systems, and in assessment of accident management measures, etc. By assessing the level of safety, PSA can also be used to upgrade, backfit, optimize maintenance schedules, and improve overall management to meet the safety goals. The PSA provides insight into the strength and weakness of the design of the
Introduction to probabilistic safety assessment
79
plant and helps to achieve a balanced design. It is used to examine the risk from various external hazards and internal events. It also allows designer to analyze the risk from various single and multiple failures in the plant. It also facilitates study of various intersystem and interunit dependencies to enhance the safety of plant and site. Finally, it is used to verify the target values as set by the regulatory organization. Important research/gap areas to support the design can be identified and appropriate solutions can be arrived. Proper planning of protection against internal and external hazards, CCFs, correlated hazards, etc. can be worked out.
2.17.2 Operation of NPPs During the operation phase of a nuclear plant, PSA can provide support on a day-to-day basis with “living PSA” for performance monitoring and to maintain the required level of safety. It is also used to evaluate optimized limits of allowed outage times, surveillance test intervals and testing strategies for various components and systems of the plant. PSA also can help in improving the training programs for operator and maintenance personnel. Several other applications during operation make PSA a mandatory tool for operators and regulators.Some of the important applications include,analysis of safety margin against correlated site hazards, periodic safety review, riskinformed support for online maintenance and technical specification optimization. Furthermore, it is also used in periodic safety reviews. This is done in order to ensure that the plants built by the old standards are sufficiently safe and also, to support upgrades and back fitting activities to further enhance their safety. Finally, it is also used for evaluation of operating experience, training program for operators and strategies for accident management and emergency planning IAEA TECDOC 1804 discusses in detail about various PSA applications.
Further readings [1] IAEA Procedure for Conducting Probabilistic Safety Assessment of Nuclear Power Plants (Level 1), Safety Series No. 50-P-4, International Atomic Energy Agency, Vienna, 1992. [2] International Atomic Energy Agency Procedures for Conducting Probabilistic Safety Assessments of Nuclear Power Plants (Level 2), Accident Progression, Containment Analysis and Estimation of Accident Source Terms, Safety Series No. 50-P-8, IAEA, Vienna, 1992. [3] IAEA Specific Safety Guide No. SSG-2. 2009. Deterministic safety analysis for nuclear power plants. [4] IAEA Training Course on Safety Assessment of NPPs to Assist Decision Making, Safety Analysis: Event Classification.
80
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
[5] International Atomic Energy AgencyProcedures for Conducting Probabilistic Safety Assessments of Nuclear Power Plants (Level 3): Off-Site Consequences and Estimation of Risks to the Public: A Safety Practice, Safety Series No.50-P-12, IAEA, Vienna, 1996. [6] AERB Probabilistic Safety Assessment Guidelines, AERB Safety Guide, AERB, Mumbai, 2002. [7] NUREG/CR-1278. A.D. Swain, H.E. GuttmannHandbook of Human Reliability Analysis With Emphasis on Nuclear Power Plant Applications, USNRC, Washington DC, 1983. [8] NUREG/CR-4772. A. SwainAccident Sequence evaluation Program: Human Reliability Analysis Procedure, USNRC, Washington DC, 1987. [9] IAEA A Framework for a Quality Assurance Programme for PSA, IAEA-TECDOC1101, IAEA, Vienna, 1999. [10] IAEAProbabilisitic Safety Assessment, Safety Series No.75-INSAG-6, IAEA, Vienna, 1992. [11] N.C. Rasmussen, United States Nuclear Regulatory Commission, Washington, DC, 1975. [12] IAEADevelopment and Application of Level 1 Probabilistic Safety Assessment for Nuclear Power Plants, Specific Safety Guide No. SSG-3, IAEA, Vienna, 2010. [13] International Atomic Energy Agency Defining Initiating Events for Purpose of Probabilistic Safety Assessment, IAEA-TECDOC-719, IAEA, Vienna, 1993. [14] International Atomic Energy Agency Procedures for Conducting Common Cause Failure Analysis in Probabilistic Safety Assessment, IAEA-TECDOC-648, IAEA, Vienna, 1992. [15] International Atomic Energy Agency Treatment of External Hazards in Probabilistic Safety Assessment for Nuclear Power Plants, Safety Series No. 50-P-7, IAEA, Vienna, 1995. [16] International Atomic Energy Agency Treatment of Internal Fires in Probabilistic Safety Assessment for Nuclear Power Plants, Safety Reports Series No. 10, IAEA, Vienna, 1998. [17] International Atomic Energy Agency Seismic Hazards in Site Evaluation for Nuclear Installations, IAEA Safety Standards Series: Safety Guide No. SSG-9, IAEA, Vienna, 2010. [18] International Atomic Energy Agency Regulatory Review of Probabilistic Safety Assessment (PSA) – Level 1, IAEA-TECDOC-1135, IAEA, Vienna, 2000. [19] International Atomic Energy Agency Determining the quality of Probabilistic Safety Assessment for Applications in Nuclear Power Plants, IAEA-TECDOC-1511, IAEA, Vienna, 2006. [20] International Atomic Energy Agency Procedures for Conducting Common Cause Failure Analysis in Probabilistic Safety Assessment, IAEA-TECDOC-648, IAEA, Vienna, 1992. [21] L. Burgazzi, Reliability evaluation of passive systems through functional reliability assessment, Nucl. Technol. 144 (2) (2003) 145–151. [22] M. Marques, J.F. Pignatel, P. Saignes, F. D’Auria, L. Burgazzi, Methodology for the reliability evaluation of a passive system and its integration into a probabilistic safety assessment, Nucl. Eng. Des. 235 (2005) 2612–2631. [23] F. D’Auria, F. Bianchi, L. Burgazzi, M.E. Ricotti, The REPAS study: reliability evaluation of passive safety systems, in: Proceedings of the 10th International Conference on Nuclear Engineering ICONE 10-22414, Arlington, Virginia, USA, 2002 April 14–18. [24] A.John Arul,C.K.Senthil,S.Atmalingam,O.P.Singh,S.Rao,Reliability analysis of safety grade decay heat removal system on Indian fast breeder reactor, Ann. Nucl. Energy 33 (2006) 180–188.
Introduction to probabilistic safety assessment
81
[25] T. Sajith Mathews, M. Ramakrishnan, U. Parthasarathy, A.Arul John, C.K. Senthil, Functional reliability analysis of safety grade deacy heat removal system of Indian 500 MWe PFBR, Nucl. Eng. Des. 238 (9) (2008) 2369–2376. [26] F. Bianchi, L. Burgazzi, F. D’Auria, M. Ricotti, The REPAS approach to the evaluation of passive system reliability, in: Proceedings of the OECD International Workshop Passive System Reliability—A Challenge to Reliability Engineering and Licensing of Advanced Nuclear Power Plants, Cadarache, France, March 2002. [27] M.E. Ricotti, E. Zio, F. D’Auria, G. Caruso, Reliability methods for passive systems (RMPS) study strategy and results, in: Proceedings of the NEA CSNI/WGRISK Workshop on Passive Systems Reliability—A Challenge to Reliability, Engineering and Licensing of Advanced Nuclear Power Plants, Cadarache, France, March 2002. [28] N. Kasinathan, et al., 2001 Design specification of Operational grade decay heat removal system, PFBR/47000/DN/001, Internal report IGCAR. [29] S. Athmalingam and P.M. Vijayakumaran, 2000. Operation Note for Safety Grade Decay Heat Removal Circuit, PFBR/3400/ON/1001, Internal Report, IGCAR. [30] IAEA-TECDOC-626 Safety Related Terms for Advanced Nuclear Power Plants, IAEA, Vienna, 1991. [31] S. Govindarajan, 1995. Temperature Limits for Fuel Elements during Transients, PFBR/31110/DN/1012/Rev-A, Internal Report, IGCAR. [32] I.K. Andre, A.C. John, Response Surfaces - Designs and Analyses, Marcel Dekker Inc, New York, 1996, p. 31. [33] NUREG/CR-5497, Common-Cause failure parameter estimations. [34] T. Wierman, S. Eide, and C. Gentillon. Common-cause failure analysis for reactor protection system reliability studies. INEEL/CON-99-0045 [35] S.A. Eide, M.B. Calley, in: Generic Component Failure Data Base, p 1175, Probabilistic Safety Assessment International Topical Meeting, 2, Clear water beach, Fl, USA, 1993. [36] A. Avižienis, J-C. Laprie, B. Randell, Basic concepts and Taxonomy of dependable and secure computing, IEEE Trans. Dependable Secure Comput. 1 (1) (2004). [37] Design Safety Guide on Computer Based Systems, Atomic Energy Regulatory Board AERB/SG/D25 - Computer Based systems of Pressurized Heavy water Reactor, 2001. [38] T.T. Soong, Fundamentals of Probability and Statistics for Engineers, John Wiley & Sons Ltd., New York, 2004 ISBN 0-470-86913-7. [39] H. Pham, Handbook of Reliability Engineering, Springer-Verlag, London, 2003 ISBNI-85233-453-3. [40] C. Cornelis, M. De Cock, E. Kerre, Representing reliability and hesitation in possibility theory: a general framework, in: L. Ahmad, J. Garibaldi (Eds.), Applications and Science in Soft Computing, Advances in Soft Computing, Springer-Verlag, Berlin, Heidelberg, 2004, pp. 127–132. [41] A. Dimov, S. Punnekkat, On the estimation of software reliability of componentbased dependable distributed systems, in: Proc. of the First International Conference on the Quality of Software Architectures (QoSA 2005), LNCS 3712, Erfurt, Germany, Springer-Verlag, 2005, pp. 171–187. [42] W. Farr, Software reliability modeling survey, in: M.R. Lyu (Ed.), Handbook of Software Reliability Engineering, McGraw-Hill, New York, 1996, pp. 71–117. [43] S. Gokhale, Architecture-based software reliability analysis: overview and limitations, IEEE Trans. Dependable Secur. Comput. 4 (1) (2007) 32–40. [44] S. Gokhale, W. Wong, J. Horgan, K.S. Trivedi, An analytical approach to architecturebased software performance and reliability prediction, Perform. Eval. 58 (4) (2004) 391– 412. [45] IEC: Nuclear Power Plants - Instrumentation and Control Systems Important to Safety Software Aspects for Computer Based Systems Performing Category A Function, IEC 60880 (2006).
82
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
[46] K. Miller, L. Morell, R. Noonan, S. Park, D. Nicol, B. Murrill, J. Voas, Estimating the probability of failure when testing reveals no failures, IEEE Trans. Softw. Eng. 18 (1) (1992) 33–43. [47] A. Nikora, Computer Aided Software Reliability Estimation User’s Guide (CASRE), Version 3.0, Jet Propulsion Laboratory, CA, 2002. [48] X. Zhang, H. Pham, An analysis of factors affecting software reliability, J. Syst. Softw. 50 (1) (2000) 43–56.
CHAPTER 3
Risk assessment Contents 3.1 Background 3.2 Objective and scope 3.2.1 Definition 3.3 Qualitative and quantitative methods of risk assessment 3.3.1 Risk assessment methods for individual units 3.3.2 Importance of multiunit PSA 3.3.3 Screening in multiunit PSA 3.3.4 Internal and external hazards 3.3.5 Correlated hazards 3.3.6 Shared connections 3.3.7 Human dependencies 3.3.8 Common cause failures 3.3.9 Combination of initiating events Further readings
86 87 87 88 89 95 96 97 99 100 100 100 101 101
Probabilistic safety assessment (PSA) is widely accepted in many industries to quantify risk. The insights gained from the results of PSA are used to support safety-related decision. The various systems and processes in an application is represented in PSA models and analyzed to provide estimates of risk metrics. These risk assessment models are built based on a set of assumptions and boundary conditions to represent the phenomena in the system or process. Risk assessment is a total process of improving safety and comprises of risk estimation, risk evaluation, and risk management. It is an important step for implementation of safety policy in an organization, helps to explore news ways of increasing safety. Generally, safety is assured by the use of deterministic criteria. Safety margin is built into any system and conservative designs are made to compensate for any undetected deficiencies and for unpredictable events. Deterministic criteria are defined in such a way that it is very improbable that the safety limits are exceeded. Still failures are not impossible. Therefore, in nuclear plants, risks must be assessed for two main reasons: systems have become so large and the potential consequences of an accident are so significant; and systems are so complex that operating experiences alone are not sufficient to enable designers to foresee all possible Reliability and Probabilistic Safety Assessment in Multi-Unit Nuclear Power Plants. DOI: https://doi.org/10.1016/B978-0-12-819392-1.00006-0
Copyright © 2023 Elsevier Inc. All rights reserved.
83
84
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
high impact events. Moreover, the quality of the risk assessment depends on the accuracy and level of detail available for the assessment. The focus of this book is to highlight the importance of risks in multiunit NPP sites. This chapter introduces the international status of multiunit sites and the general practice of single unit risk assessment ignoring the hidden risk especially when there are shared resources among the plants.The importance of integrated risk assessment in multiunit sites and the factors to be modeled are addressed. Multiunit risk assessment is site specific and identification of potential internal and external hazard and their impact on multiple units at a site is required. In nuclear power plants (NPPs), it is a common practice to consider the assessment of risk and the evaluation of its acceptability on an individual reactor basis. The safety goals established are expressed on a unit-reactor basis (per reactor year). This practice provides detailed insight and information about the risk and the plant vulnerability for any accidents such as loss of coolant, transients, seismic event, fire event, etc. However, in reality, most of the countries have more than one nuclear power reactors at a site. The distribution of number of operating units in a site around the world as given in Fig. 3.1 indicates out of 189 nuclear sites around the world, 135 (∼71%) of the sites have more than one reactor. It is therefore imperative that risk decisions must be based on a good understanding of the total risk of an activity at the site rather than single unit. Fukushima nuclear accident in 2011 clearly revealed the importance of assessment of potential natural hazards and their combined effects on multiple units. An integrated risk assessment at a site requires explicit modeling of the interactions and shared resources among all the units present. This integrated risk assessment also called as multiunit risk or site risk have been reported and several approaches are followed. Although site risk involves assessment of all radiation installations present at the site, progress on risk assessment method is being evolved in a phased manner and presently, multiple NPPs at a site are being addressed world over. Until recently almost all the efforts are put into deterministic and probabilistic safety assessments (PSAs) and looking at a single reactor at a time. It is optimistically assumed that all the other units are safe as we analyze only the unit under consideration. It is a nonconservative assumption that all the other units are safe because some of the initiating events (IEs) may also challenge more than one reactor and there is a significant risk of multiunit accidents. International operating experience data indicate that there were a lot of precursors to multiunit accidents. Flooding event at Oconee power
Risk assessment
85
Figure 3.1 Nuclear power plants—international scenario.
plant in 1970, external flooding event at Blayais site in France that caused great distress on multiple units in 1999, seismic event at Kashiwazaki-Kariwa in 2007 led to the re-evaluation of seismic safety and inclusion of potential interaction between large seismic events and accelerated ageing in future inspection programs, etc. are some of the events. One important aspect and a technical challenge in multiunit risk assessment is to consider the common cause failures on redundant components that are often replicated on multiple units and studies indicate that some fraction of the common cause failures actually involve multiple failures on different units at the same time. Some other technical challenges are the lack of experience with deterministic and probabilistic analysis of multiunit events, nonlinear dose–response models for early health effects (especially when there are simultaneous radioactive releases from multiple units in a short period of time), limited resources, and procedures for accident management inhibited by site contamination, need for new end states and risk metrics, proliferation of accident sequences, site-based versus reactor-based safety goals, need to augment definitions and strategies for defense-in-depth in multiunit context.
86
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
3.1 Background The basic nuclear safety objectives and requirements stipulate the need for considering risk when making regulatory decisions. Risk assessment in any safety critical application is essential to identify the weak links present in the system and suggest ways to improve safety of the plant. Risk assessment provides useful insights on the critical safety issues and serves as an effective means of refining and improving the safe design, operation, and regulation. In view of this, regulatory bodies depend on various risk assessment techniques to support the decision-making process. Risk assessment techniques vary depending on the industry. Traditionally, NPPs have been designed, constructed, and operated mainly based on deterministic safety analysis (DSA). In recent years, probabilistic approach has gained importance and is increasingly used to complement the deterministic approach. The insights obtained from both the assessments are used for regulatory decision making. A general framework is needed to integrate the risk contributions from single and multiunit NPPs and aggregate risks that may arise due to a range of applicable hazards and operating states across all nuclear plants at a given site. A review of practices adopted in different countries indicates that the site level PSA deals with the dependencies that may exist between the units on that site. The dependencies may be due to multiunit interactions, same environmental stresses during external hazards, sharing of systems at site level, identical systems in each unit, sharing of human resources, etc. In an actual design of a nuclear plant, there could be many systems shared among units at a site and their impact on safety may not be systematically modeled in safety analysis, as the events (both internal and external) are postulated to challenge a single unit. The site level core damage frequency (CDF) cannot be a simple aggregation of risk estimated for individual units at that site. It is also recognized that systems that impact an accident progression need not be a safety system or safety related but may be nonsafety system which may have significant impact on accident progression.Although sharing of systems can usually be identified with providing some additional redundancy for each unit, such sharing also introduces an increased potential for the spreading of accident condition to other units, which may result in a multiunit accident. The shared systems, structures and components (SSCs), depending on the design, typically include switchyard, power supply systems, control rooms, turbine building, containment structures, air conditioning and ventilation systems, spent fuel storage systems, cooling water intake and outtake structures, other on-site infrastructure, etc.
Risk assessment
87
For a multiunit PSA, identification of shared SSCs, evaluation of their responses to different IEs at different stages of the accident progression becomes very important. This assessment also needs the identification and treatment of all types of dependencies due to sharing of SSCs, sharing of human resources, and sharing some physical phenomena such as seismic events. In addition, the site PSA also takes consideration of site level response to beyond design basis accidents including severe accident conditions for one or more units. The case of Seabrook PSA clearly demonstrated that the site level risk is slightly lower than that of the sum of risk at the unit level. Nevertheless, this cannot be generalized as the integrated risk is site specific. By addressing all possible accident scenarios that can result from different hazards, risk for a multiunit NPP site is quantified and risk metric viz., site CDF is evaluated. The framework can be extended to optimize the shared resources effectively at the multiunit sites using sensitivity analysis. An extension of this analysis during the design stage with a comparison of risk at multiple units will provide an input to decide the optimum number of NPPs at a site, the optimal distance between two NPPs, layout diversity and configuration of shared systems, etc. Some of the gap areas in multiunit risk assessment are inclusion of risk from non-NPP facilities at the site such as spent fuel storage facilities, reprocessing facilities, etc. and a comprehensive method to integrate human and organizational dependencies.
3.2 Objective and scope The objective here is to assimilate the main challenges in multiunit risk assessment and discuss the technical issues to address challenges regarding the interactions and dependencies among the units in a site. The scope of the book is restricted to addressing site with multiple units of NPP.Risk associated with nonreactor radiological sources is beyond the scope of this book. The development of multiunit PSA addressed by experts in other countries and the methodologies proposed are also discussed. The scope of the proposed multiunit PSA methodology is limited to level-1 internal and external events for full power operation states of all NPPs in the site. The focus will be on development of regulatory safety goals and risk metrics.
3.2.1 Definition Site core damage frequency: Site CDF of a site with multiple NPP units can be defined as an overall risk associated with the site obtained by means of integrating the risk of core damage in more than one unit at the site. In
88
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Figure 3.2 Risk vs. frequency.
other words, it is the frequency of at least single core damage per site per year with consideration of various interdependencies.
3.3 Qualitative and quantitative methods of risk assessment Any safety assessment in safety critical applications concerns with risk and it addresses three major questions, called risk triplet: r What can go wrong? r How likely is it? r What are the consequences? The primary objective of risk assessment is to focus efforts on events which are less frequent but pose high risk. The reason for this is that these are rare events but when they occur the outcome is detrimental. Highfrequency/high-risk events are avoided by good design and engineering practices. The risk treatment plan as shown in Fig. 3.2 helps the organization to choose whether to mitigate, avoid, accept, or manage risk. Risk assessments in NPPs enhance safety by drawing attention to the most vulnerable equipment and most sensitive human actions and ensure a robust framework for regulatory decision making. The risk insights such as estimation of CDF, large early release frequency, identifying dominant accident sequences, importance measures of structures
Risk assessment
89
and components provide valuable inputs for numerous regulatory processes that improve both efficiency and effectiveness of regulatory requirements. Generally, risk assessments in NPPs are carried out using two approaches, viz., deterministic and probabilistic. Both the approaches are systematic and are aimed at reducing risk to workers and members of the public. A brief description of both the approaches along with the associated strengths and limitations are explained in the subsequent sections.
3.3.1 Risk assessment methods for individual units To maintain a balanced approach to nuclear safety, different risk assessment approaches are adopted. 3.3.1.1 Deterministic method In the conventional DSA for regulation, the high-level deterministic principles ensure defense-in-depth and large safety margins; the low-level principles ensure single failure criterion, equipment qualification, etc. The DSA studies the behavior of the plant under various operational states and accident conditions identified through comprehensive engineering evaluations. Many of the accident conditions assessed in deterministic assessments may never be experienced in reality but still for a complete understanding of the plant behavior under these conditions, deterministic assessment is carried out. In addition, the deterministic evaluation aids in arriving at the requirements for engineering margin and quality assurance in design, manufacture, and construction. The basic principle of this approach assumes that the plant encounters adverse situations and establishes a specific set of design basis events (DBEs; i.e., what can go wrong?). The DSA for a NPP predicts the response of the plant for various postulated IEs. While carrying out the analysis, specific set of rules and acceptance criteria are applied. The scope of events analyzed is predetermined worst case and the systems are assumed to encounter worst conditions. The deterministic evaluation focus on the ability of engineering principles such as safety margins, redundancy, and diversity to prevent and reduce the impact of catastrophic failures. The selection of specific accidents to be analyzed as DBEs is not quantified. The extent of failures of protective systems against design basis accidents has an upper bound. DSA establishes a design that include safety systems capable of preventing and/or mitigating the consequences (i.e., what are the consequences?) of those DBEs in order to protect public health and safety.
90
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
On the basis of this analysis, the design basis for items important to safety is established and confirmed. For example, we can consider partial blockage in a fuel subassembly of a nuclear reactor as a “cause,” and by carrying out suitable analytical modeling and computations one can determine the maximum clad temperature as a function of time or as a function of blockage. The clad temperature would be the “effect” and when related to prescribe limits, provides us with a “safety margin.” These safety margins are required in licensing applications. Such a DSA is usually carried out by a designer, as part of the design and construction process or by the utility firm to confirm the design and by the regulatory organization to regulate and ensure nuclear safety. As specified by IAEA Specific Safety Guide, No. SSG-2, 2009, there are three ways for carrying out DSA for various anticipated operational occurrences and design basis accidents. The first one is via conservative analysis. Here, conservative computer codes with conservative initial and boundary conditions are used. Another way is by carrying out combined analysis. Here, best estimate computer codes are used in combination with conservative initial and boundary conditions. And finally, the third approach is use of best estimate analysis. In this approach the best estimate computer codes with conservative and/or realistic input data are used, wherein the evaluation of the uncertainties in the calculation results is also carried out by accounting for both the uncertainties, that is, uncertainties in the input data and uncertainties associated with the models of the best estimate computer code. Among the three approaches, the best estimate analysis together with an evaluation of the uncertainties is most popularly used nowadays because of many reasons. The first reason is, the use of conservative assumptions can lead to an incorrect prediction of progression of events or an inaccurate estimation of the timescales or it can also lead to exclusion of some critical physical phenomenon. Also, the use of a conservative approach often facilitates reduced operational flexibility. On the contrary, the use of best estimate approach provides more profound information about the plant’s behavior, aids in identification of the most significant safety parameters and provides greater insight on the existing margins between the calculated results and the acceptance criteria thereby facilitating better operational flexibility. The following points describe the importance of DSA (IAEA Specific Safety Guide No. SSG-2, 2009; Dave, Nuclear Power Plant Safety-Nuclear Engineering-301; Gianni Petrangeli, 2006): r It is used for developing plant protection and control systems, set points, and control parameters. r It is also used for developing technical specifications of the plant.
Risk assessment
91
r
It is used to demonstrate that various anticipated operational occurrences and design basis accidents can be safely managed by automatic response of safety systems in combination with appropriate operator actions. r It aids in establishing a set of DBEs and it further facilitates analyses of their consequences through various subsequent computations. r It demonstrates the effectiveness and robustness of various equipment and the engineered safety systems deployed to prevent escalation of AOOs and DBAs to severe accidents. It is also used to design mitigation strategies for the resulting severe accidents. r It demonstrates that the safety systems can: ◦ Cause shutdown of the reactor and maintain it in safe shutdown state during and post-DBA. ◦ Efficiently remove the decay heat from the core of the reactor post shutdown for all operational states and DBA conditions. ◦ Ensure that the release of radioactivity following a DBA is below acceptable limit. r DSA for normal operation of the plant (IAEA Training Course on Safety Assessment of NPPs, Safety Analysis: Event Classification): r It ensures that normal operation is safe and plant parameters do not exceed operating limits with radiological doses and release of radioactivity within the acceptable limits. r It also helps in ensuring that the doses from the operation of the plant follow the principle of ALARA. “ALARA is an acronym for ‘As Low As (is) Reasonably Achievable,’ which means making every reasonable endeavor to minimize the exposure of ionizing radiation below the dose limits as low as possible.” r Establishes the conditions and limitations for safe operation of the reactor which includes safety limits for reactor protection and control and other engineered safety systems,reference settings and operational limits for the control system, procedural constraints for operation of various processes. Thus, a deterministic analysis explicitly addresses only two questions of the risk triplet. The deterministic requirements are defined in several regulatory codes and guides. Uncertainties in deterministic evaluation are addressed by conservative assumptions in models and data to demonstrate that sufficient margins exist. 3.3.1.2 Advantages of deterministic approach Deterministic approach is well developed and large experience is available. It is a conservative approach and provides a way to account for uncertainties in the performance of equipment.
92
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
3.3.1.3 Disadvantages of deterministic approach Generally, it looks at rare events (such as large LOCAs) rather than more frequent events (such as small LOCAs) which are found to be significant contributors to risk. This approach does not provide a balanced design as it considers IE frequencies and component failure probabilities in an approximate way. The approach assumes human actions are always effective and does not consider multiple equipment/procedure failures. The approach does not integrate results in a complete manner so that the overall safety impact of postulated IEs is assessed and also does not identify sensitive components. 3.3.1.4 Probabilistic approach In PSA, the full spectrum of accident scenarios is analyzed in conjunction with likelihood of occurrence. In addition to DBEs, risk contributors due to beyond DBEs are also addressed and there is no limit to number and type of failures considered, as realistic prediction is the main objective in this approach. PSA is a systematic and comprehensive methodology to evaluate risks associated with a complex engineered technological entity like NPP, oil and gas facilities, chemical & process industries, etc. In general, it can be said that it is a conceptual tool for deriving the numerical estimates of risk for nuclear plants and industrial installations, and also for evaluating the uncertainties in these estimates. It differs from DSA, as it facilitates systematic identification of the accident sequences that can arise from a wide range of events including design basis and beyond design basis. It includes logical determination of accident frequencies and consequences, component and human data for arriving at a realistic estimate of risk. The probabilistic evaluation integrates plant operation and response and includes risk contribution of human errors and common cause failures as appropriate. All the components in the plant are considered to have their own failure characteristics. As this approach quantifies the risk metrics, it is also capable of evaluating how likely something will go wrong in the plant and thus addresses all three questions of the risk triplet. The probabilistic evaluation addresses uncertainty in data, model, and in the evaluation. Based on the experience with NPP design and operation, numerical values are proposed that could be achieved. It defines a “threshold limit” above which the risk is not acceptable. In the USA, acceptance criteria for design or operation changes that would lead to change in risk are given. In recent years, the PSA has emerged as an increasingly popular analytical
Risk assessment
93
tool. It addresses three basic questions: “(i) What can go wrong with the entity under study? (ii) What and how severe are the potential detriments or consequences that the entity under study may be subjected to? and (iii) How likely these undesirable consequences may occur?” Thus, PSA in nuclear domain provides insight into the strength and weakness of the design of the NPP and helps to achieve a balanced design of the plant. The objective of PSA is to identify issues that are important to safety, and to demonstrate that the plant is capable of meeting authorized limits on the release of radioactive material and on the potential exposure to radiation for each plant state. Since, DSA does not alone demonstrate the overall safety of the plant, and it should be complemented by probabilistic safety analysis. While deterministic analysis is typically used to verify that acceptance criteria are met, PSA is generally used to estimate the probability of damage for each barrier (Dave, Nuclear Power Plant Safety-Nuclear Engineering-301). 3.3.1.5 Advantages of probabilistic approach A high level of maturity in adopting probabilistic approach in NPPs has been achieved. Probabilistic evaluation helps to identify design deficiencies that can challenge plant safety during the operation phase. It also helps in reducing unnecessary conservatism associated with regulatory requirements, license commitments, codes, and guides. The evaluation starts with a comprehensive list of initiators and progresses to identify all the fault sequences that lead to core damage or large early release and risk is quantified based on the component reliability data. Sensitivity and uncertainty studies are carried out to identify risk contributors and to achieve a balanced design. These additional features help in achieving improvements in design and operation. It also helps in comparing relative risks with different design configurations. Furthermore, this approach which is applicable to any type of reactor satisfies the regulatory requirement of establishing a technology neutral framework. 3.3.1.6 Disadvantages of probabilistic approach The main disadvantage with probabilistic approach is that it is not possible to demonstrate the completeness in PSA and therefore may not be justified to use PSA alone in decision-making process. Further, quantification in PSA and the final predictions depend on the component reliability data used. These data are very scarce and often require engineering judgment. In some areas of PSA such as modeling human errors, software reliability,
94
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Figure 3.3 Complimentary approach of DSA and PSA.
passive system reliability, etc., a universally acceptable method of evaluation are difficult to implement. The pure probability-based approaches to risk and uncertainty analysis are often challenged due to the limited knowledge or lack of data on the high-consequence risk problems. Earlier, deterministic approach was found suitable for making decisions on safety issues as it helps to build a design with appropriate safety barriers to prevent the consequences of postulated worst case accident scenarios. Safety margins are implemented with high conservative regulations of system design and operation. In recent times, probabilistic approach has evolved to be an effective way for analyzing safety. The approach does not limit only to consideration of worst-case conditions but also examines all feasible scenarios and their related consequences, with the element of probability associated at all events. While the regulatory decisions seek protection by adding conservatisms and performing traditional deterministic risk assessment to bind the uncertainties and “unknown unknowns,” the insights from probabilistic approach complement those provided by deterministic approach. Thus, it has become the choice of the regulators to adopt a combined approach taking into account the insights provided by the deterministic evaluation and those from the probabilistic method. The complementary aspect of both DSA and PSA is depicted in Fig. 3.3. The considerations of multiple failures and CCFs in probabilistic approach compensates for the limitation of considering only single failure in the deterministic approach and the conservative approach on data and hypothesis in deterministic approach is providing margins in the design that may compensate for potential optimistic data in the “as realistic as possible” approach of the PSA.
Risk assessment
95
3.3.2 Importance of multiunit PSA While noting the inherent strength of the design, practices and regulatory practices being followed in NPPs, it is essential to periodically reconfirm safety and identify areas for further enhancement. In this context, following Fukushima accident, most of the countries have taken steps to take stock of the safety and adequacy of severe accident management guidelines following an extreme external event. The possibility of destruction of assisting facilities both inside the plant site affecting multiple units and the surroundings are re-assessed. This re-assessment has enhanced the capability of the plant to perform safety functions under extended SBO or extended loss of ultimate heat sink and assess the need for increasing the capability of existing provision for continued heat removal. Some of the measures incorporated based on the assessments include provision of portable diesel generators, battery operated devices for plant status monitoring, additional hook-up points for make-up water to spent fuel storage pools, alternate provisions for core cooling and cooling of reactor components. Different approaches are adopted for safety enhancement measures to avoid multiunit accidents. One such approach is performing multiunit PSA. At a multiunit site, nuclear reactors, and the co-located radiological facilities generally share the common infrastructure: electrical grid, ultimate heat sink. Every site has specific hazards that has the potential to initiate a sequence of events that challenge the safety functions and cause accidents in more than one unit. A holistic way of looking at multiunit site from risk perspective is to formulate and develop a comprehensive framework that integrates the various risk contributions from different radiological sources, hazard groups, different plant operating states taking into account the interactions and dependencies. The challenge is to develop a logical structure to address the complexity in modeling-specific issues pertaining to multiunit sites such as shared connections, human dependencies, common cause failures, combination of different initiators triggered at different units due to a single hazard, etc. However, the quantitative target for single unit CDF must remain the same for multiunit CDF as the target is decided taking into consideration the possible environment impact from the site. Further, the quantitative target for multiunit CDF must be less than or equal to single unit CDF. In general, external hazards seem to dominate the multiunit risk but several studies indicate that it is not always true and site characteristics, design configurations, etc. decide the contributions. Some of the outcomes of multiunit PSA are creation of onsite emergency facility to respond to emergency events
96
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
at the site, strengthening of severe accident management guidelines for the site, establishing multiple hook-up provisions for charging water in spent fuel pools, mobile DGs, development of appropriate intervention level to minimize public impact, development of linear no-threshold principles, etc.
3.3.3 Screening in multiunit PSA Even for a twin unit site, a large number of multiunit scenarios are expected. To have a manageable and realistic analysis, as in any other PSA, some screening principles are needed. This concept is similar to the qualitative and quantitative screening in fire PSA or in the failure mode effects analysis (FMEA) study. Qualitative screening uses the principles of classifying the potential and insignificant combinations/dependencies. The combination of hazards, events, plant operating states that has less potential to contribute to site risk can be screened out. Quantitative screening uses probabilistic criteria to screen out insignificant combinations. The screening process can undergo a multitier review to confirm whether any potential combination of hazards or events is not missed out in the analysis. Further, the screening of hazards and IEs are site specific and if single unit PSA at a site is available, the screening process for single unit may exist and can serve as first level input for screening process for multiunit study. The first step in screening process is to identify and list all potential internal and external hazards (both natural and human induced hazards) that may impact a site. Some guidance on the list of hazards is available in several documents such as PRA procedures guide, SSG-3, etc. For each hazard, the screening criteria are applied to determine if the hazard has the potential to affect multiple units. If needed, a plant walkdown can be performed to support the decision. In case, even after the walkdown, if the hazard cannot be screened out, a bounding analysis can be performed to estimate the risk metric and comparing the result with acceptance criteria, a decision can be arrived. The model for bounding analysis of internal events can be modified for external hazards. Multiunit and multisource aspects are to be included in the bounding analysis, both for internal and external hazards. If the impact of internal or external hazards from one unit to another unit cannot be excluded with a high degree of confidence, it should be screened in for the multiunit assessment. Further, if the shared or interconnected systems configured for providing safety functions to more than one unit has a probability of hazard impact, these safety systems are to be considered as unavailable for the relevant sequences.It is therefore necessary to study all the interdependencies between the units at the site. In this context, evaluation of
Risk assessment
97
deterministic safety with respect to multiunit impact analysis and a dedicated FMEA study may be considered important. Further, specific requirements for each of the hazard groups are required to be followed. For example, requirements for internal events such as internal floods and fires may be entirely different from external hazards such seismic, high winds, external floods, etc. and even within the internal and external events, requirements may differ. Some hazards may be considered important depending on the location of the site. As there are no strict guidelines on the bounding assessment and screening criteria for multiunit PSA, expert judgments supported by inputs from the design,operation,and maintenance are recommended.Moreover,if there are any expert judgments made during the screening or bounding analysis, conservative failure assumptions are required on shared system failures and propagation of the consequential failures. The screening process is an essential step in multiunit PSA and hence it can undergo a multitier review to confirm whether any potential combination of hazards or events is not missed out in the analysis.
3.3.4 Internal and external hazards Risk assessment at a multiunit site is complete only if we consider all possible hazards that can affect a site. The hazards are generally categorized as internal and external hazards. Internal hazards are those that are triggered from within the plant. Some examples are loss of coolant, primary system failures, major leaks, etc. External hazards are those that are triggered from outside the plant and create extreme environments common to several plant systems. International experience indicates that external hazards can make a significant contribution to plant risk, and it is widely recognized that such hazards should be included in plant PSA studies. External hazards are significant since they are ideal candidates for CCFs. External events can be considered at any level of PSA, depending on the scope and objectives of the study.External Hazards include rainfall,storm surge,tsunami,earthquake, cyclone, high winds, etc. It is necessary, from a risk management perspective, to assess the total risk at a site that encompasses the impacts of both internal and external hazards on all the nuclear plants at the site. 3.3.4.1 Internal hazards For the evaluation of internal hazards, sequence of events which go on to challenge vital safety functions and cause accidents involving one or more units are considered. Internal IEs are caused by human error and
98
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
hardware faults. For the evaluation of external hazards, please see IAEA Safety Standards Series No. GSR Part 4 (Rev. 1). Safety Assessment for Facilities and Activities states: “4.36A. For sites with multiple facilities or multiple activities, account shall be taken in the safety assessment of the effects of external events on all facilities and activities, including the possibility of concurrent events affecting different facilities and activities, and of the potential hazards presented by each facility or activity to the others.” “4.36B. For facilities on a site that would share resources (whether human resources or material resources) in accident conditions, the safety assessment shall demonstrate that the required safety functions can be fulfilled at each facility in accident conditions.” 3.3.4.2 External hazards The probability of external hazards to trigger accidents simultaneously on multiple units is very high but unfortunately there is a lack of analysis on multiunit hazards and fragility under external events. The PSA methodology for external hazards envisages employment of qualitative and quantitative screening criteria to focus analysis on most risk-significant hazards. More attention should be given in this respect to consistent application of screening criteria in PSAs for external hazards;in particular,external hazards should not be screened out if a similar hazard of lesser intensity has been observed in the region of the site. Also, external hazards should not be screened out prior to consideration of potentially correlated hazards and their combined impact on the plant components and engineered safety features. The behavior of external hazards on a multiunit site challenge the emergency preparedness and require information on many aspects such as geology, structural engineering, drainage capabilities, weather conditions. In this context, combined effects of an external event on all units at the site need to be considered along with the combined effect of hazards at one unit affecting other units. One example of structural failure affecting multiple units could be that of a seismic event leading to collapse of large turbine building complex that is common for multiple units at a site. Also, during a seismic event, interunit correlation of structure, system, and component fragilities is a challenge in MUPSA. Such modeling of correlations for other external events/hazards also are essential. Fukushima accident clearly demonstrated the impact of external hazard on the entire site. It is also necessary to consider the complex scenarios for accident mitigation measures and recovery action. The design, procedural, operational, and human aspects are to be integrated while modeling the external hazards.External hazards PSAs at a site should be based on justifiable
Risk assessment
99
frequencies for the hazards of relatively high magnitude that may be never observed in the past in the plant vicinity. The frequency assessment should take into account all events occurred in the immediate vicinity of the plant, in wider regions around the plant, and around the world. An analysis of all available information has to be performed in order to determine the level of applicability of the observed events to the conditions of the specific plant site. Statistical correlation analysis for event occurrence data can be used as part of this process. 3.3.4.3 Earthquakes Correlation and dependencies within a unit is well established in single unit seismic PSA. Depending on the demand and capacity of the component/system, a simpler approach in single unit seismic PSA is to consider structure, systems, and components to be either fully correlated or fully independent. If the SSCs are in same building or at same level or oriented in the same direction, they are assumed to be fully correlated with correlation probability as 1 else they are assumed to be independent with correlation probability as 0. To determine the correlation factor in MUPSA, the capacity of the component with respect to its mounting location is required to be evaluated. A detailed seismic correlation analysis supported by a walkdown is needed. 3.3.4.4 Meteorological hazards (high wind, extreme snow, extreme frost, external floods) The possibility of each individual unit in a different plant operating state at the time of a multiunit IE also needs to be considered. For this purpose, a representative set of combination of plant operating states consistent with site-specific and plant-specific operations and maintenance practices are required.
3.3.5 Correlated hazards When the occurrence of one hazard has the potential to cause a second hazard simultaneously or closely in time, they are called correlated hazards. For example, earthquake followed by tsunami. Such correlated hazards and the multiunit IEs that could be triggered as a result of it, must also be modeled in MUPSA. Though the frequency of occurrence of such correlated hazards may be less but the impact at the site might have more severe impact as compared to that of single unit. A thorough analysis of potential occurrence of several hazards simultaneously should be carried out. During
100
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
the analysis, focus on identification of all possible correlation mechanisms, source of correlated hazards, phenomenology, duration, induced hazards, etc. needs to be considered.
3.3.6 Shared connections Shared systems are those that can support several units simultaneously or independent systems that can cater to one unit at a time but can be connected to any other unit as and when required. Third category of shared systems is those systems that are shared between units but are in standby state.
3.3.7 Human dependencies As compared to single units, human actions at a multiunit site are more challenging during critical situations. Multiunit accidents add more stress to the operators and depend on the degree of interdependence between individual units. The methodology for human reliability analysis should be enhanced in the part of consideration of the impact of the accident scenarios when information/indications are either not available or not reliable and when decisions have to be made and actions performed in unfavorable environmental conditions. In such cases it might be reasonable not to give credit to the success of human actions or higher failure probabilities should be assigned to the associated human errors. Even if the action required are not identical, there will be some level of human dependency in multiunit site. Many factors that are important to single unit dependency model such as training, common procedures, organizational management are applicable across units whereas unit-dependent factors such as crew, timing, environment are not applicable. Only in remote cases where there are dedicated team for each unit for all human actions, interunit dependency can be ignored. A more comprehensive and less optimistic analysis should be performed for all operator actions in order to account for the impact of the external hazards on operator’s performance and associated human errors.
3.3.8 Common cause failures Common cause failures are the main risk contributors in single unit PSA. Generally, within a unit, common cause groups are developed for identical components carry out the same functions. Appropriate parameters are used for CCF models depending on the number of components in a CCF group. The same modeling technique of CCF for multiunit PSA is necessary to take care of interunit failures. In addition to the single unit CCF considerations, components in different units may have the common factors of same
Risk assessment
101
design, same manufacturer, etc. and therefore the CCF principles applied for intraunit is applicable for interunit also. There are instances where failures have occurred on similar components in different reactors. If such CCF occurs in combination with a multiunit IE affecting more than one unit, it might lead to a catastrophic situation. The main problem is the CCF data and its applicability across units. Since there is a possibility of having multiple units with different vintage components, expert judgment is required to decide either to include or exclude the components of different age in CCF group or what fraction of CCF data is applicable across units.
3.3.9 Combination of initiating events The list of multiunit IEs to be considered for analysis is an important step as it plays a vital role in the level and depth of the analysis. The basis for inclusion of a particular IE both for internal and external hazards needs multitier screening review so that important combinations of IEs are not missed out and unrealistic combinations can be ignored. Starting from the list of IEs considered in single unit PSA, the screening process should take into account the factors such as, site-specific characteristics, likelihood of same or different IEs being triggered at multiple units, events at multiple units triggered due to an impact on the shared systems. For a realistic risk assessment in a multiunit site, one should not only look on the negative interactions between the units but also on the positive interactions because one unit can use the resources from another unit.
Further readings [1] IAEA, Determining the Quality of Probabilistic Safety Assessment (PSA) for Applications in Nuclear Power Plants, International Atomic Energy Agency, Vienna, 2006. [2] K.N. Fleming, On the issue of integrated risk-A PRA practitioner’s perspective, in: Proceedings of the ANS International Topical Meeting on Probabilistic Safety Analysis, San Francisco, CA, 2005. [3] EPRI Technical Update 3002003116. An Approach to Risk Aggregation for RiskInformed Decision-Making, 2015. [4] G. Apostolakis, The concept of probability in safety assessments of technological systems, Science 250 (4986) (1990) 1359–1364. [5] T. Bjerga, T. Aven, E. Zio, An illustration of the use of an approach for treating model uncertainties in risk assessment, Reliab. Eng. Syst. Saf. 125 (2014) 46–53. [6] J. Vecchiarelli, K. Dinnie, J. Luxat, COG-13-9034 RO. Development of a Whole-Site PSA Methodology, CANDU Owners Group Inc., Canada, 2014. [7] IAEA Specific Safety Guide, no. SSG-3. Development and application of Level 1 probabilistic safety assessment for nuclear power plants, 2010. [8] P. Lowe, I. Garrick, Seabrook Station Probabilistic Safety Assessment Section 13.3 Risk of Two Unit Station, 1983 Prepared for Public Service Company of New Hampshire, PLG-0300.
102
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
[9] T.D. Le Duy, D. Vasseur, E. Serdet, Probabilistic safety assessment of twin-unit nuclear sites: methodological elements, Reliab. Eng. Syst. Saf. 145 (2016) 250–261. [10] T. Bani-Mustafa, Z. Zeng, E. Zio, D. Vasseur, A framework for multi-hazards risk aggregation considering risk model maturity levels, in: 2017 Second International Conference on System Reliability and Safety (ICSRS), 2017, pp. 429–433. [11] IAEA Specific Safety Guide No. SSG-2. Deterministic safety analysis for nuclear power plants, 2009. [12] IAEA Safety Series No. 92. Consideration of External Hazards in Probabilistic Safety Assessment for Single Unit and Multi-unit Nuclear Power Plants, 2018. [13] International Atomic Energy Agency, Safety Assessment for Facilities and Activities, IAEA Safety Standards Series No. GSR Part 4 (Rev. 1), IAEA, Vienna. [14] IAEA Webinar on “Expanding NPP Risk Assessment to Multi-Unit Context: Achievements and Challenges”, 2020. [15] IAEA TECDOC-1804. Attributes of full scope Level 1 PSA for applications in NPPs, 2016. [16] J.W. Hickman, et al., PRA Procedures Guide: A Guide to the Performance of Probabilistic Risk Assessments for Nuclear Power Plants, NUREG/CR-2300, NRC, Washington, DC, 1983. [17] International Atomic Energy Agency Evaluation of Seismic Safety for Existing Nuclear Installations, IAEA Safety Standards Series No. NS-G-2.13, IAEA, Vienna, 2009. [18] International Atomic Energy Agency Probabilistic Safety Assessment for Seismic Events, IAEA-TECDOC-724, IAEA, Vienna, 1993. [19] IAEA (2005). Risk-Informed Regulation of Nuclear Facilities: Overview of the Current Status (IAEA-TECDOC-1436). Technical Report, IAEA, Vienna.
CHAPTER 4
Site safety goals Contents 4.1 Multi-unit considerations 4.2 Site safety goals 4.3 Site safety goals—international scenario 4.4 Multi-criteria analysis for risk metrics 4.5 Communication of risk information to public and their perception Further readings
104 104 107 109 111 112
Safety goals for nuclear power plants (NPPs) are developed with the fundamental safety objective to protect people and the environment from the harmful effects of ionizing radiation. Various defense in-depth principles with multiple layers of protection are applied throughout the life cycle of a NPP. To prevent and mitigate accidents, defenses in terms of independence, redundancy, diversity, physical barriers, interlocks, and access controls are relied upon. Further, to assess and maintain the required level of safety in NPP, risks associated with the operation of various systems are required to be assessed and controlled. Such risk assessment process helps to develop and establish safety goals to minimize the risk to an acceptable level. Though the concept of safety goals was in use for a long period, it was formally introduced for NPPs by IAEA in 2011. However, there is no guidelines on how to develop a set of safety goals in areas other than NPPs. Even for NPPs, different countries adopt their national standards to decide on qualitative and quantitative safety goals. A consistent and coherent safety goals that is universally accepted for NPPs and other nuclear installations is yet to be evolved. However, safety goals are needed to demonstrate and communicate to a larger community about the safety of NPPs. It is well understood that a single safety goal for a nuclear installation is not sufficient, and a set of safety goals is required. Safety goals are generally qualitative and quantitative. Some countries only have qualitative safety goals with criteria expressed in terms of whether the risk from the nuclear power plant (NPP) is high/low as compared to that of other applications. Quantitative safety goals are deterministic or probabilistic. Probabilistic safety assessment (PSA) is a systematic approach to assessing the risk in a probabilistic manner. The hierarchical levels of safety goals are explained Reliability and Probabilistic Safety Assessment in Multi-Unit Nuclear Power Plants. DOI: https://doi.org/10.1016/B978-0-12-819392-1.00007-2
Copyright © 2023 Elsevier Inc. All rights reserved.
103
104
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
in detail in International Atomic Energy Agency (IAEA) TECDOC-1874. The highest levels of safety goal are usually qualitative and consistent with the national safety policy and societal needs. Lower levels of goals are quantitative with design and operating performance objectives and in line with high-level qualitative safety goals. These safety goals are recommended by regulators in the form of directives and in safety documents.
4.1 Multi-unit considerations Fukushima accident highlighted the importance of multi-unit effect considerations in PSAs. Interaction between the units at a site due to the sharing of infrastructure facilities such shared cable trenches, ventilation ducts, spatial interactions,common buildings,etc.,and simultaneous source term development due to simultaneous core and containment damage in multiple units. Such multi-unit accidents have much more severe radiological consequences and has the potential to impact the mitigation strategies of other units. The timing of simultaneous accidents at multi-unit site can challenge shared structures, systems and components, and the resources available for accident management and emergency response.
4.2 Site safety goals The traditionally used safety goals: core damage frequency and large early release frequency are normally expressed in units of events per reactor year and are associated with each reactor on the site. Establishing safety goals is important as it concerns the protection of public and the environment. Safety goals are often perceived as the target to be achieved by the designers and operators of NPP to minimize radiation risk to the population. Different qualitative and quantitative criteria are adopted in establishing and using the safety goals to maintain effective defences against radiological hazards. Qualitative criteria are arrived through a formal process without assigning a numerical value to the risk whereas quantitative criteria are by evaluating the product of likelihood of occurrence of an accident and its consequences. These criteria are different in different countries and therefore there is no consensus on a common safety goal. In some countries, safety goals are defined in the national regulations and are made mandatory. In recent times, safety goals are linked to risk-informed regulatory decisions and PSA results are considered to be a valuable tool to provide input for determining the goals. Internationally, PSA is considered as a useful tool to
Site safety goals
105
evaluate risk in NPPs and support regulatory decision making. The benefits of PSA are multi-fold.It provides insights on the weak links in the system and identify plant vulnerabilities that may exist. PSA is used in the design stage to compare different design options to evaluate risk and arrive at a balanced design. During operation, it provides insights on maintenance-related issues. Effectively, PSA demonstrates the concept of “how safe is safe enough”. It is also important to note that PSA has certain limitations as the results are based on quality of input data and the uncertainties involved in the models. There are other limitations such as modeling of human errors, quantifying residual risks, modeling aging of components, etc. In many countries, the risk criteria are not the same for existing and new plants. The basis for defining the risk criteria considers society level, risks from other sources, and the technical expertise available. However, in some countries, safety goals are provided as guidelines to designers and plant operators in view of the large number of varied uncertainties in PSA. One apparent intention to use safety goal as a mere guideline is to have an open-minded assessment of plant safety and not to just fulfil the requirement of safety goal. The existing safety goals and numerical targets are developed for single reactor units. When more than one unit is present at a site, unit-based safety goals and risk metrics are no longer valid and site-based safety goals are required. Multi-unit site safety goals are defined based on the type of facilities present. Many experts are developing a common site safety goal to support risk-informed regulation. While it is believed that safety goals are the reflection and interpretation of PSA results, it is also understood that strict adherence only to probabilistic criteria is not realistic. One of the primary reasons is the large uncertainties involved in a PSA model. These models depend on the level of detail and boundary conditions used. The problems get multi-fold when a large number of units are present at a site. The purpose of safety goals is also to demonstrate to the public about the safety of nuclear plants. Hence, the generally applied target for the individual risk in a nuclear plant is 100 times lower than the risk due to other societal reasons such as traffic accidents, air crash, illness, etc. Several countries apply the ALARA principle. Though there are varying targets adopted in different countries, the common objective of risk criteria is to reduce societal risk and to minimize the environmental consequences due to accidental radioactive release. Both qualitative and quantitative safety goals are expressed in terms of protection of the public and environment. Qualitative goals contain abstract concepts in terms of reducing harmful effects of ionizing radiation such as
106
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
basic principles for nuclear and radiation safety, the management principles, etc. Quantitative goals are in terms of offsite consequences. They are defined in terms of probabilistic risk criteria and are broadly grouped into four categories such as core damage frequency (CDF), large release frequency (LRF), containment failure frequency, and frequency of radioactive dose release. The target values for each of these categories are different in different countries and depend on national regulatory policies. For multi-unit site safety goals, due to the differences in the scope of PSA and different methods used by PSA experts, a consistent approach is difficult to achieve. In the context of multi-unit PSA, several questions arise for both site CDF and site LRF. As there is no prescribed safety goal for a multi-unit site, the targets specified for a single NPP is considered applicable for a multiunit site also irrespective of the number of units at a site. The target for plant CDF is derived based on a comparison with risk from all other sources and with an objective to keep the doses as low as reasonably achievable for the public and environment. Therefore, it is justified to adopt the same target for a multiple-unit site as adopted for a single unit. The site CDF (SCDF) is the aggregation of frequencies of one or more core damages at a site. n 5 m SCDF = CDF (i, j, k) i=1
j=1
k=1
where i denotes the number of simultaneous core damages n denotes the number of units at the site j denotes the category of hazard or event k denotes the type of hazard in jth category m denotes the total number of types of hazard in jth category. Therefore, CDF (i, j, k) denotes the frequency of i number of simultaneous core damages due to jth category of hazard type k; j = 1 refers to external hazards for the site that have potential to impact all the plants at the site j = 2 refers to external hazards for the site that have potential to impact all the plants at the site only under certain conditions j = 3 refers to internal events for the site that have potential to impact all the plants at the site j = 4 refers to internal events for the site that have potential to impact all the plants at the site only under certain conditions j = 5 refers to internal independent events in each of the unit SCDF is the multi-unit level 1 risk metric.
Site safety goals
107
The safety goal for LRF in single unit is presently based on the release frequency and its consequences. The aggregated frequencies of all accident sequences leading to radioactive release in multiple units at a site needs to be included in the estimation of release to the environment for arriving at the LRF at the site. As a defence in-depth policy, SCDF safety goal is one order of magnitude greater than a site LRF safety goal. While the SCDF is limited to the site boundary, site LRF (SLRF) have the potential to impact the surrounding environment and therefore needs special considerations such as the population,impact of land or water contamination,long-term effects,etc. Depending on the severity and consequences resulting from simultaneous radioactive releases from multiple units at a site, appropriate revision is necessary in the guidelines and emergency preparedness plans devised for single units. Effectively, site LRF safety goal is required to ensure safety of the general public from both short-term and long-term health effects. Such a site safety goal takes input from atmospheric dispersion analysis insights for potential release and the quantitative risk criteria for site LRF to the general public any other societal risk. SLRF is the multi-unit level 2 risk metric. While the SLRF is calculated using the source terms and the release frequencies from combinations of accident scenarios at a multi-unit site, the level 3 risk metric can also be calculated by grouping the source terms from different units and estimating the risk measures such as effective dose, individual risks, and societal risk. In the context of multi-unit PSA, various levels of safety goals are required as suggested in CANDU owners group report COG-13-9034 R0. Further, analysts involved in multi-unit risk assessment has suggested to provide a metric to represent the level of dependencies among the units, in addition to site CDF and site LRF. With these metrics, the conditional probability of multi-unit accident to describe the dependency can be evaluated. A graded approach is needed starting from specific criteria for deterministic analysis, probabilistic analysis, severe accident management principles, emergency preparedness, and finally, the risk-informed decisions to achieve the ultimate goal of safety of public and environment.
4.3 Site safety goals—international scenario Not all countries have a clear definition for qualitative safety goals. General nuclear safety objective as indicated in INSAG-12 is the basis for safety goals in many countries like India, Canada, France. Like in many countries, India has introduced the concept of “practical elimination” in
108
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Figure 4.1 Level-1 probabilistic safety assessment (PSA) numerical criteria for core damage frequency (from: VTT Technical Research Centre of Finland Ltd).
the regulatory framework. The possibility of certain events or combination of events occurring is considered to have been practically eliminated if it is physically impossible for the conditions to occur or if the events can be considered to be extremely unlikely. The practical elimination is achieved by implementing a rugged design process followed by inspection and surveillance during manufacturing, construction, commissioning and operation, and further by preventing the conditions that could lead to an early or a large radioactive release. As for the quantitative safety goals, some countries have clear numerical targets related to core damage frequency and large early release frequency. CDF targets range between 1E-4 and 1E-5 per reactor year and LRF range between 1E-5 and 1E-6 per year. Based on data collected from 14 countries, which is shown in Figs. 4.1 and 4.2. However, the scope of numerical objectives varies from one country to another and some are defined in law or regulations and are mandatory, some are defined by the regulatory authority and some are defined by designers and licensee. Based on the literature survey, it is noticed that there is no consensus or uniform safety goals or risk metrics for multi-unit site but the development of site-based goals are in progress. At certain multi-unit sites, nonreactor radiological sources such
Site safety goals
109
Figure 4.2 Level-2 probabilistic safety assessment (PSA) numerical criteria for large release frequency (from: VTT Technical Research Centre of Finland Ltd).
as spent fuel pools, dry fuel storage facilities, etc. are co-located with NPPs and it is not appropriate to have CDF as a site-level risk metric. Instead, fuel damage frequency as an alternate risk metric being considered in some countries. The technical challenges associated with the site-level safety goals were identified during the international workshop on multi-unit PSA in Ottawa, Canada. These challenges include: – Aggregating risk contributions across different reactor units and facilities, different hazard groups and operating states, single unit and multi-unit, level of modeling detail, treatment of uncertainty, CCF and human, etc. – Methodology for aggregating risk at site consisting of old and new reactors. – Lack of multi-unit based acceptance criteria for evaluating integrated risk from a multi-unit site PSA. – Methodology for comparing estimated risk against existing and new sitebased safety goals. – Need for international consensus for site safety goals.
4.4 Multi-criteria analysis for risk metrics Many countries refer the fundamental safety objective to protect public and environment as a qualitative safety goal. These safety goals are established
110
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Table 4.1 Numerical targets for Level 1 probabilistic safety assessment (PSA) (single unit). Country
Frequency
Remarks
Canada
1.0 E-04/ry 1.0 E-05/ry 1.0 E-05/ry 1.0 E-04/ry 1.0 E-05/ry 1.0 E-05/ry 1.0 E-06/ry 1.0 E-04/ry 1.0 E-05/ry 1.0 E-04/ry 1.0 E-05/ry 1.0 E-04/ry 1.0 E-06/ry 1.0 E-04/ry 1.0 E-05/ry 1.0 E-04/ry 1.0 E-05/ry 1.0 E-05/ry 1.0 E-05/ry 1.0 E-04/ry 1.0 E-04/ry
Limit for existing plants Limit for new plants Limit Objective for existing plants Objective for new plants
Chinese Taipei Czech Republic Finland France Hungary India Netherlands Slovak Republic Slovenia Sweden Switzerland United Kingdom United States
Limit for existing plants Limit for new plants Limit for existing plants Limit for new plants Limit for existing plants Limit for new plants Limit for existing plants Limit for new plants Limit for existing plants Limit for new plants
Limit
From: IAEA Fundamental Safety Principles, 2006.
on a single-unit basis and there are no separate safety goals for sites. A comparison of level-1 numerical criteria in different countries are given in Table 4.1 and Fig. 4.1. Generally, multi-criteria analysis is adopted in the field of operations research when there are multiple criteria for evaluation. The main aim of multi-criteria analysis is to arrive at the best feasible solution considering all different objectives and targets. When there are multiple facilities at a site, the risk metric may be different in different facilities. Suppose there are n different facilities with n different risk metrics, say, RM1, RM2 …. RMn , then, for the overall evaluation of site safety goal, we need a mathematical model with appropriate weights assigned to each risk metric RMi , i = 1 to n and these weights may vary for different scenarios. The risk metrics of various internal and external hazards, different plant states, hazard potential, site characteristics are combined to generate a site level safety goal. For a realistic site safety goal, a robust procedure is required to identify the requirements, identify the alternatives, and develop criteria with appropriate
Site safety goals
111
weights for the various targets. For other radiological hazards such as fresh fuel, irradiated fuel storage, etc. a separate risk assessment analysis is required and integrated in the site safety goals. Considering the complexity in risk aggregation of different facilities and the uncertainties involved, a methodology needs to be evolved for aggregation of risk contributions at a multi-unit site. In the absence of a consensus on a unified approach to arrive at site safety goals, the current practice of unit safety goal and their targets are to be considered as site safety goals. For multi-criteria evaluations, analytical hierarchical process (AHP) is a commonly used technique as it has the capability of both qualitative and quantitative evaluation of attributes.The procedure by which the weights are produced follow the logic developed by Saaty in 1977 under the AHP which is utilized to determine the relative importance of the criteria in a specified decision-making problem. Each and every key parameter considered for the multi-unit risk is assigned a weightage value to indicate their importance in making a decision.
4.5 Communication of risk information to public and their perception One of the most challenging aspects in nuclear industry is the risk communication to public. In many countries, the public is sceptical of any risk information communicated by the utilities or the regulatory organization. Further, communicating the facts that the frequency of serious accidents is very low but the consequences are relatively high is difficult. Generally, public is more concerned with the risk at a nuclear site and their consequence of radioactive release. It is also the experience internationally that public is more convenient in interpreting the risk criteria in terms of radioactive release instead of the numerical value of risk frequency, for example, 1E-6 per reactor year. Technical and safety experts in NPP depend on the risk assessment tools and the communication of results leads to misunderstanding and misinterpretations. To prevent the misinterpretation of PSA results, it is important to clearly demonstrate the results and criteria in terms of safety enhancements in the plant and risk reductions achieved by PSA. It is even more challenging to communicate the results of multi-unit PSA. Developmental works on methods for addressing internal and external hazards up to Level 2, event combinations and correlations for all operating states in multi-unit sites
112
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
are being actively pursued. It is also recognized that risk aggregation for site-specific aspects such multi-unit and multi-source considerations are necessary. The events like Chernobyl, Three Mile Island and Fukushima have created a negative impact in the minds of public mainly due to the fear of long-term impacts. It is also common among the public to accept the risks associated with natural hazards as compared to those from man-made source. Long-term planning and coordination along with good communication is vital to gain public trust and confidence. The societal benefits gained from nuclear applications are required to be periodically communicated to the public in an organized manner. Further, openness and transparency in the communication of factual, timely and easily understandable information to public increases the confidence and risk perception in nuclear applications. It is essential to spread awareness among the public about the defense in depth principles built into a reactor and in maintaining the role of nuclear power in the energy mix as sustainable, safe and low carbon source.
Further readings [1] J.S.Kim,M.C.Kim,Consistency issues in quantitative safety goals of nuclear power plants in Korea, Nucl. Eng. Technol. 51 (7) (2019). [2] L. Bengtsson, J.-E. Holmberg, J. Rossi, and M. Knochenhauer, ‘Probabilistic Safety Goals for Nuclear Power Plants; Phases 2–4 : Final Report’, Nordic nuclear safety research (NKS), Roskilde, 2011. [3] International Atomic Energy Agency (IAEA), Fundamental safety principles, IAEA Safety Standards Series No. SF-1, IAEA, Vienna, 2006. [4] Canadian Nuclear Safety Commission: Summary report of the international workshop on multi-unit probabilistic safety assessment, Ottawa, Ontario, Canada, 2014. [5] International Atomic Energy Agency (IAEA), Hierarchical structure of safety goals for nuclear installations, IAEA-TECDOC-1874, IAEA, Vienna, 2019. [6] J. Vecchiarelli, D. Keith, J. Luxat, Development of a Whole-Site PSA Methodology, COG-13-9034 R0, CANDU Owners Group Inc., February 2014. [7] INSAG-12, Basic safety principles for nuclear power plants 75-INSAG-3 Rev. 1, IAEA, Vienna,1999. [8] International Atomic Energy Agency (IAEA), Safety assessment for facilitiesand activities, IAEA Safety Standards Series No. GSR Part 4 (Rev. 1), IAEA, Vienna, 2016. [9] International Atomic Energy Agency (IAEA), Defence in depth in nuclear safety, INSAG-10, A Report by the International Nuclear Safety Advisory Group, IAEA, Vienna, 1996. [10] International Atomic Energy Agency (IAEA), Basic safety principles for nuclear power plants 75-INSAG-3 Rev. 1, INSAG-12, A Report by the International Nuclear Safety Advisory Group, IAEA, Vienna, 1999. [11] CNSC, Summary report of the international workshop on multi-unit probabilistic safety assessment, Canadian Nuclear Safety Commission, Ottawa, Ontario, 2015. [12] L. Bengtsson, J.-E. Holmberg, J. Rossi, M. Knochenhauer, Probabilistic safety goals for nuclear power plants; phases 2–4/final report, Soc. Sci. Med. 35 (May 2011). Research Report 2010.
Site safety goals
113
[13] J.-E. Holmberg, M. Knochenhauer, Guidance for the definition and application of probabilistic safety criteria, SSM Research Report 2010, 36 (2011). [14] National Research Council, Lessons learned from the Fukushima nuclear accident for improving safety of US nuclear plants, The National Academies Press, Washington DC, 2014. [15] P. Hessel, J.-E. Holmberg, M. Knochenhauer, A. Amri, Status and experience with the technical basis and use of probabilistic risk criteria for nuclear power plants, Proceedings of PSAM 10. International Probabilistic Safety Assessment & Management Conference, 7–11 June 2010, Paper 47, Seattle, Washington DC, 2010. [16] K.S. Dinnie, Experience with the application of risk-based safety goals, 25th Annual Conference of the Canadian Nuclear Society, Canadian Nuclear Society, 2004. [17] J.K. Vaurio, Safety-related decision making at a nuclear power plant, Nucl. Eng. Des. 185 (1998) 335–345. [18] The role of probabilistic safety assessment and probabilistic safety criteria in nuclear power plant safety, Safety Series No. 106, International Atomic Energy Agency, Vienna, 1992.
CHAPTER 5
Challenges in risk assessment of multiunit site Contents 5.1 Key issues 5.1.1 Shared systems or connections 5.1.2 Identical components 5.1.3 Human dependencies 5.1.4 Proximity dependencies 5.1.5 Modeling site level response 5.2 Methods for integrated risk assessment 5.2.1 Identification of multiunit initiating events 5.2.2 Hazard categorization method 5.2.3 Event sequence MUPSA method 5.2.4 Master event tree method 5.3 Seismic PSA for multiunit site 5.3.1 Site-specific seismic hazard assessment 5.3.2 Safety analysis 5.3.3 Component fragility 5.3.4 Plant fragility 5.3.5 Seismic core damage frequency 5.4 MUPSA for Level 2 5.5 MUPSA for Level 3 Further readings
116 116 118 118 118 119 120 120 123 134 136 138 140 141 142 144 144 146 147 148
The first and foremost challenge in risk assessment of multi-unit sites is the identification of multi-unit initiating events. A systematic approach is required to identify types of multi-unit IEs that include reference to similar studies carried out internationally, a list of external events applicable to the site and based on singleunit PSAs, a combination of different operating modes of the reactors at the site. A guideline for arriving at a comprehensive list of multi-unit IEs include: r Failure of Shared systems r Effect of cross-ties between units r Loss of common instrument air r Cascading events r Common cause failures Reliability and Probabilistic Safety Assessment in Multi-Unit Nuclear Power Plants. DOI: https://doi.org/10.1016/B978-0-12-819392-1.00008-4
Copyright © 2023 Elsevier Inc. All rights reserved.
115
116
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
r
External events in addition to seismic, flood, and fire specific to the site and the units r Combination of events r Proximity events r Missiles r Combination of operating states r Common crews r Organisational influences r Independent events r It is recommended to include initiating events of very low frequency. If a single-unit PSA is carried out in any of the units at the site, events that are screened out are to be reviewed for possible inclusion for multi-units. Some of the key issues that require special attention in multiunit risk assessment include interunit dependencies such as shared systems/components, common cause failures (CCFs) of identical components, interaction effects due to proximity, etc. [1]. These interunit dependencies are modeled appropriately at various stages in IE development, accident sequence modeling, and in fault trees. Further, issues such as appropriate use of mission time for IEs and system functions, cliff-edge effects especially during external hazards are also important. Modeling of these key issues is explained in Section 5.1.
5.1 Key issues 5.1.1 Shared systems or connections a) Structures, systems, and components (SSCs) shared between the units: In a multiunit site, due to various economic and logistic reasons, sharing of SSCs is common. Sharing exists for diesel engines, compressors, substations, headers, pipelines, stack, etc. For the purpose of multiunit risk assessment, the shared systems/components are modeled with same identity in the fault trees/event trees of the corresponding units (Fig.5.1). b) Standby system sharing: In some of the sites, sharing of resource between units is based on preference. Such systems are modeled by assigning preference probability of the component/system for a particular unit [1]. For example, if a common DG is shared between two units and the first unit is assigned with a preference probability of 0.75, DG unavailability for unit 1 (DGu1 ) is estimated as DGU 1 = 1 − P fu1 + P fu1 ∗ PDG
Challenges in risk assessment of multiunit site
Figure 5.1 Modeling of common shared system between two units.
117
118
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
and DG unavailability for unit 2 (DGu2 ) is DGU 2 = P fu1 + 1 − P fu1 ∗ PDG where Pfu1 is preference probability for unit 1 and PDG is the probability of DG failure.
5.1.2 Identical components Identical components are those that are of same type, design, manufacturer, and have same operating conditions. Such identical components in different units are grouped for CCFs. For example, shutdown cooling pumps, emergency core cooling pumps,diesel generator,and emergency process sea water pumps, etc. The grouping of identical components and the value of factors for CCF contribution is decided based on the nature and severity of the hazard.
5.1.3 Human dependencies Whenever there is human action associated with an operation, like in single unit risk assessment human dependencies are to be considered for multiple units also as there is a possibility of manual action affecting the normal operation in multiple units [2]. Human actions in nuclear plant operation are classified under two categories, viz., preinitiating event and postinitiating event actions. Actions related to maintenance, testing, calibration, etc. fall under the preinitiator category while human actions that are expected after an event, are classified under postinitiating actions. In the multiunit context, generally, common crew exists for actions at multiple units and erroneous actions could lead to a grave situation. Some examples of preinitiating event are miscalibrations, misalignments, missing surveillances, errors during testing/maintenance, etc. Examples of postinitiating event include erroneous response after a failure of auto action, timely response to an emergency situation, etc. Post Fukushima, along with human factors, organizational factors are also considered as a source of dependency between units and needs to be appropriately modeled in multiunit risk assessment.
5.1.4 Proximity dependencies When an operating condition/environment has the potential to impact more than one unit at a site, the components in the environment are modeled for proximity dependencies. Some examples are systems/components located in an enclosure may be exposed to fire, explosion, external events, or
Challenges in risk assessment of multiunit site
119
failure of a rotating equipment impacting other components located nearby, etc.
5.1.5 Modeling site level response 5.1.5.1 Mission time For multiunit risk assessment, the mission time for accident sequences of various hazards is decided based on the nature and severity of the hazard. A mission time of 72 hours is taken for external hazards, that is, earthquake, tsunami, clogging, etc. whereas a mission time of 24 hours is selected for all the internal events. 5.1.5.2 Cliff-edge effect A cliff-edge effect in a nuclear power plant is an instance of severely abnormal plant behavior caused by an abrupt transition from one plant status to another following a small deviation in a plant parameter [3]. While it is true for an individual unit and for internal events, it is more important for some extreme events in which risk may grow significantly with slight variations in the external event and hence it is imperative to evaluate the cliff-edge margin for multiunit risk assessment. Therefore, identifying hazard related cliff-edge factors in a multiunit site is equivalent to avoiding a major accident. Sensitivity studies are required to be performed to identify cliffedge factors. For example, during external flooding due to tsunami, if the flood level exceeds the height of the component, the flood fragility of all the components located below the flood level is taken as unity and for those components above the level, fragility is zero. 5.1.5.3 Concept of site CDF The use of the term “core damage (CD)” is somewhat subjective and several drastically different definitions associated with the reactor technology are available [4]. The IAEA states that CD for a light water reactor is often defined as exceeding the design basis limit of any of the fuel parameters [5]. The NRC’s SPAR models, on the other hand, define CD as the uncovery and heat up of the reactor core to the point where “severe” fuel damage is anticipated [6]. Indian Atomic Energy Regulatory Board defines CD as the state of the reactor brought about by the accident conditions with loss of core geometry or resulting in crossing of design basis limits or acceptance criteria limits for one or more parameters: fuel clad strain, fuel clad temperature, primary and secondary systems pressures, clad oxidation, amount of fuel failure, radiation dose, etc. For PHWR type reactor, CD is defined as loss of
120
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
structural integrity of multiple fuel channels [7]. Very precise definition of CD such as local fuel temperature exceeding 1204 °C, the limit for ECCS for light-water reactors are defined in 10 CFR 50.46(1b) [8]. For multiunit safety assessment, the concept of site core damage frequency (SCDF) is considered. The frequency of multiple CDs occurring simultaneously in a site per year is calculated. The integrated approach will also aid in identifying SSCs important for severe accident progression and in Level 2 PSA studies [9].
5.2 Methods for integrated risk assessment Recent studies [1,4,10-14] have recommended ideas to deal with different aspects of a multiunit risk assessment through probabilistic approach. USNRC endorsed an integrated risk analysis using PSA approach in 2005 to quantify the risk of all units on a reactor site [15]. A general framework is needed to integrate the risk contributions from single and multiunit NPPs and aggregate risks that may arise due to a range of applicable hazards and operating states across all nuclear plants at a given site [1]. A review of practices adopted in different countries indicates that the site level PSA deals with the dependencies that may exist between the units on that site. The dependencies may be due to multiunit interactions, same environmental stresses during external hazards,sharing of systems at site level, identical systems in each unit, sharing of human resources, etc. In an actual design of a nuclear plant, there could be many systems shared among units at a site and their impact on multiunit safety may not be systematically modeled in safety analysis, as the events (both internal and external) are postulated to challenge a single unit. The Level 1 MUPSA framework is given in Fig. 5.2.
5.2.1 Identification of multiunit initiating events The first step is to identify the potential multiunit initiating events (MUIEs). MUIE is an event that results from an external hazard or due to a malfunction in an operating unit and has the potential for CD in one or more units. For identification of MUIEs (Fig. 5.3), following three inputs are used: – Level 1 PSA of single units – Identification of site-specific hazards (flooding, seismic, tsunami) – Expert judgment There may be situations where an MUIE may affect only a set of units and not all units at a site. Therefore, it is important to identify the MUIE and the combination of units that will be affected.
Challenges in risk assessment of multiunit site
121
Figure 5.2 MUPSA framework.
Independent internal events Internal events are abnormal conditions that occur within the plant due to failure or incorrect operation of plant components through random failures, human errors, faulty signals, etc. Independent internal events (IIE)
122
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Figure 5.3 Identification of initiating events.
are those events that occur and affect only a single unit. For example, loss of coolant accidents, transients, etc. This group of initiators is required for the comprehensively cover all possible initiators at a multi-unit site. Internal events affecting multiple units The internal initiating events that have the potential to affect multiple units are called definite internal initiating events (DIIE). Those internal events which only under certain circumstances will affect multiple units are called conditional internal initiating events (CIIE). Site specific external hazards External hazards are both natural and human-made which originate outside the plant and create extreme environmental conditions at the site. They are always site-specific and design dependent. As a first step for the multi-unit risk assessment, all possible site-specific external hazards that can affect the multiple units of nuclear plant site needs to be identified. External hazards such as seismic, tsunamis, external floods, etc. that always affect multiple units are called definite external hazard (DEH). Hazards such as aircraft crashes, missiles, explosions, etc., that may affect multiple units only under certain circumstances are called conditional external hazard (CEH). Insights from Level 1 PSA of single unit Identification of multi-unit initiating events will be incomplete if the initiators of Level 1 PSA are not taken into account. Screening out low-risk sequences based on the insights from the original single unit PSA model can significantly reduce the size of the MUPSA model. Insights from single unit PSA will also help in identification of the impact of initiators due to the support systems, shared systems, interconnecting systems, etc. that are site specific and configuration specific. This category of initiating events also includes the reviews and inputs from operating experience. Multiple screening criteria is required for the identified initiating events at a multiunit site based on its contribution to site CDF. As the number of multi-unit
Challenges in risk assessment of multiunit site
123
initiating events are expected to be exponentially increase with the number of units at the site, appropriate grouping and screening is required. In this context, if a hazard or an event that does not result in a trip or a degraded condition in multiple units can be screened out as multi-unit initiators. While shortlisting the multi-unit initiating events, It is also important to include the combination of events due to different plant operating states. Following three methods for multiunit risk assessment is described in the subsequent sections: – Hazard categorization method – Event sequence MUPSA method – Master event tree method
5.2.2 Hazard categorization method For an integrated risk assessment at a multiunit site,hazards are categorized as definite and conditional [1,6,17]. The hazards that will always affect multiple units are called definite hazards and those which only under certain circumstances affect multiple units are called conditional hazards. The schema proposed by Schroer and Modarres [1] is further developed and an integrated approach to address both external and internal events that can affect a single/multiple units at a site is described. A pictorial representation of the hazard categorization method is given in Fig. 5.4. Hazards in the multiunit context are categories as, viz., definite external hazard (DEH), conditional external hazard (CEH), definite internal initiating event (DIIE), conditional internal initiating event (CIIE), and internal initiating event (IIE). Examples of IEs in each of these categories are given in Table 5.1 [16,18-20]. After the initiating events for external hazards and internal events are identified and categorized, event tree/fault tree models are developed for each hazard category for further analysis. Key issues which need to be addressed while modeling event trees and fault trees for a multiunit site safety assessment are given in Section 5.1. The issues account for dependencies between the units arising from shared physical links, similarity in the design, installation and operational approach for a component/system, same or related environment of positioning the systems and associated dependencies for various human interactions. 5.2.2.1 Methodology for definite external hazards In case of a DEH, such as tsunami, seismic, etc., the hazard-induced initiating events are identified. Event trees for the postulated initiating events due the DEHs are developed for each unit taking into account the shared systems/components (Fig. 5.5). Initiating events for the units can also arise
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Figure 5.4 Schematic of method for multiunit risk assessment.
124
Challenges in risk assessment of multiunit site
125
Table 5.1 Examples of IEs in hazard categorization method. DEH r Seismic
CEH
DIIE
CIIE
IIE
• Aircraft • Loss of • Loss of emergency • All IEs events crash offsite service water considered r Tsunami • Offsite power • Loss of feed water in single r External explosion • Loss of • Station Blackout unit PSA ultimate • Loss of floods, fires r High winds heat instrument air sink • Turbine missile
Figure 5.5 Schematic of definite external hazard for multiunit site. Table 5.2 Boolean expressions obtained for definite external hazard. Unit 1
Unit 2
Unit 3
Unit 4
H1 H1 H1 H1
H1 H1 H1 H1
H1 (D113 .BExp113 ) H1 (D123 .BExp123 ) H1 (D133 .BExp133 )
H1 (D114 .BExp114 ) H1 (D124 .BExp124 ) H1 (D134 .BExp134 )
(D111 .BExp111 ) (D121 .BExp121 ) (D131 .BExp131 ) (D141 .BExp141 )
(D112 .BExp112 ) (D122 .BExp122 ) (D132 .BExp132 ) (D142 .BExp142 )
indirectly as a secondary effect, due to failure of shared SSCs. Boolean expression for CD due to DEH is represented as Hi (Dijk .BExpijk ), where Hi denotes frequency of (definite) external hazard i, Dijk denotes the probability of initiating event j due to DEH i for unit k, BExpijk denotes the Boolean expression for jth initiating event due to DEH i for unit k. For DEHs, if we assume there are four initiating events that affect unit 1 and 2 and three initiating events that affect unit 3 and 4, the Boolean expressions are as given in Table 5.2. 1. Four simultaneous CD for the site can be obtained as the sum of {Boolean expression (CD of unit 1 by any of its initiating events)∗ Boolean expression (CD of unit 2 by any of its initiating events)∗
126
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Boolean expression (CD of unit 3 by any of its initiating events)∗ Boolean expression (CD of unit 4 by any of its initiating events)} Total number of ways, four simultaneous CDs for the site can occur = C14 × C14 × C13 × C13 = 144 2. Three simultaneous CD for the site is the sum of the following four expressions: A. Sum of {Boolean expression (CD of unit 1)∗ Boolean expression (CD of unit 2)∗ Boolean expression (CD of unit 3)} B. Total number of such cases = C14 × C14 × C13 = 48 C. Sum of {Boolean expression (CD of unit 1)∗ Boolean expression (CD of unit 2)∗ Boolean expression (CD of unit 4)} D. Total number of such cases = C14 × C14 × C13 = 48 E. Sum of {Boolean expression (CD of unit 1)∗ Boolean expression (CD of unit 3)∗ Boolean expression (CD of unit 4)} F. Total number of such cases = C14 × C13 × C13 = 36 G. Sum of {Boolean expression (CD of unit 2)∗ Boolean expression (CD of unit 3)∗ Boolean expression (CD of unit 4)} H. Total number of such cases = C14 × C13 × C13 = 36 Therefore, number of ways three simultaneous CD for the site can occur 4 = + 2. C14 × C13 × C13 = 168 C1 × C14 × C13 3. Similarly, number site = 4 of two simultaneous CD forthe + 4. C14 × C13 + C13 × C13 = 73 C1 × C14 4. And number of single CD for the site = C14 + C14 + C13 + C13 = 14 After simplification of Boolean expression for the cases of single and multiple CDs and quantifying the hazard and failure probability of structures, systems, and components, the site risk due to specific DEH is obtained. Repeating this process and summing CDFs for all DEHs of varying intensity, the cumulative risk at the site due to DEHs is obtained. Probability of multiple DEHs occurring simultaneously is very low and hence need not be considered. 5.2.2.2 Methodology for conditional external hazards As in the case of DEHs, initiating events for each CEH such as aircraft crash, offsite explosions, etc., are identified and corresponding event trees and fault trees for each of the units are modeled together, that is, with same identity for shared systems/components (Fig. 5.6). If Cij denotes the probability of a CEH “i” that directly affects unit j then Cijk denotes the probability that it affects unit k (k = 1, 2, 3…n and k = j) also.
Challenges in risk assessment of multiunit site
127
Figure 5.6 Schematic of single conditional external hazard at multiunit site.
Each CEH that impact multiple units is assumed to cause one direct initiating event and several indirect initiating events. Repeating this process and summing CDFs for all CEHs of varying intensity, the cumulative risk at the site due to CEH is obtained. Case 1: Single conditional external hazard 1. Four simultaneous core damage for the site due to a conditional external hazard = sum of all possible combinations {Boolean expression (core damage of all 4 units by the conditional external hazards)} Total number of ways four simultaneous core damage due to a conditional external hazard at the site = 4 If two hazards are considered, the total number of ways four CDFs at the site can occur in 4 × 2 = 8 ways. 2. Three simultaneous core damage for the site due to a conditional external hazard is = sum of all possible combinations {Boolean expression (core damage of any three units by the conditional external hazards)} Total number of ways three simultaneous core damage due to a conditional external hazard at the site = 4∗ 3 = 12 per hazard 3. Two simultaneous core damages for the site due to a conditional external hazard is = sum of all possible combinations {Boolean expression (core damage of any two units by the conditional external hazards)} Total number of ways two simultaneous core damage due to a conditional external hazard at the site = 4∗ 3 = 12 per hazard 4. Total number of ways single core damage for the site due to a conditional external hazard at the site = 4 per hazard
128
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Case 2: Two simultaneous conditional external hazards If we consider the case of two conditional external hazards like aircraft crash and offsite explosions, then each hazard is considered as an initiating event and Boolean expressions for CDF of each of the units caused by the two conditional external hazards are evaluated. 1. Four simultaneous core damage for the site due to the two conditional external hazards = sum of all possible combinations {Boolean expression (core damage of all 4 units by the two conditional external hazards)} This can occur due to all possible combinations: one CDF from first hazard and 3 CDFs due to second hazard, two CDFs from first hazard and 2 CDFs due to second hazard, three CDFs from first hazard and 1 CDF due to second hazard. Total number of ways four simultaneous core damage due to the two conditional external hazards for the site = 2∗ (4∗ 3 + 4∗ 3∗ 2 + 4∗ 3) = 2∗ (12 + 24 + 12) = 96 2. Three simultaneous core damage for the site due to the two conditional external hazards is = sum of all possible combinations {Boolean expression(core damage of any three units by the two conditional external hazards)} This can occur due to all possible combinations: one CDF from first hazard and 2 CDFs due to second hazard, two CDFs from first hazard and 1 CDF due to second hazard. Total number of ways three simultaneous core damage due to the two conditional external hazards for the site = 2∗ (4∗ 6 + 4∗ 3∗ 2) = 2∗ (24 + 24) = 96 3. Two simultaneous core damage for the site = Sum of all possible combinations {Boolean expression (core damage of any two units by the two conditional external hazards)} This can occur due to one CDF due to first hazard and one CDF due to second hazard. Total number of ways two simultaneous core damage due to the two conditional external hazards for the site = 2∗ (4∗ 3) = 24 Single core damage due to two simultaneous conditional external hazards is not possible. After simplification of Boolean expression for all possible ways of double, triple and quadruple core damage and quantification of the external hazard and SSC failures, SCDF of a multiunit site due to conditional external hazards is obtained.
Cases 1 and 2 describe the analysis for single CEH and two simultaneous CEHs respectively occurring at a site. Table 5.3 represents the Boolean expressions obtained for one hazard and can be repeated for the other hazard in similar way.
Challenges in risk assessment of multiunit site
129
Table 5.3 Boolean expressions for CDF due to a conditional external hazard in each of the units. Unit 1
Unit 2
Unit 3
Unit 4
H1 H1 H1 H1
H1 H1 H1 H1
H1 H1 H1 H1
H1 H1 H1 H1
C11 (BExp11 ) C121 (BExp121 ) C131 (BExp131 ) C141 (BExp141 )
C112 (BExp112 ) C12 (BExp12 ) C132 (BExp132 ) C142 (BExp142 )
C113 (BExp113 ) C123 (BExp123 ) C13 (BExp13 ) C143 (BExp143 )
C114 (BExp114 ) C124 (BExp124 ) C134 (BExp134 ) C14 (BExp14 )
Note: Hi denotes frequency of (conditional) external hazard i.
Figure 5.7 Schematic of definite internal initiating events at multiunit site.
5.2.2.3 Methodology for definite internal initiating events for the site All DIIEs, such as loss of heat sink, loss of offsite power, etc., are to be modeled and analyzed together. The event trees and fault trees are developed for these initiating events in the same manner as done for initiating events in case of DEHs (Fig. 5.7).
130
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Table 5.4 Boolean expressions for CDF of each of the units. Unit 1
Unit 2
DIE 1 affecting units 1 and 2 IE1 (BExp11 ) IE1 (BExp12 )
Unit 3
Unit 4
DIE 2 affecting units 3 and 4 IE2 (BExp13 ) IE2 (BExp14 )
Note: IEi denotes ith initiating event.
Figure 5.8 Schematic of conditional internal initiating events at multiunit site.
If we consider one definite initiating event affecting unit 1 and 2 and one definite initiating event affecting unit 3 and 4, Boolean expression are obtained as shown in Table 5.4. Repeating this process and summing CDFs for all DIIEs, the site risk due to DIIEs is obtained. Further, if single initiating event is considered, two CDFs can occur in two ways and no other combination of CD is possible. Simultaneous occurrence of multiple DIIEs affecting multiple units is not considered as it is an extremely rare event. 5.2.2.4 Methodology for conditional internal initiating events for the site All conditional initiating events, such as loss of instrument air, loss of feed water, loss of DC bus, etc., are modeled together for all units (Fig. 5.8). Boolean expression for single and multiple CD is analyzed with CIIEs under both scenarios, that is, one CIIE occurring at the site and more than one CIIEs occurring simultaneously at the site. As done earlier, for illustration purpose, let us consider unit 1 and 2 to be identical and have some sharing of resources (e.g., instrument air and feed
Challenges in risk assessment of multiunit site
131
Table 5.5 Boolean expressions for CDF of each of the units. Unit 1
Unit 2
Unit 3
Unit 4
Cond.initiating event 1 for unit 1 and 2 Cond. initiating event 3 for unit 3 and 4 p11 IE1 (BExp11 ) p112 IE1 (BExp112 ) p33 IE3 (BExp33 ) p334 IE3 (BExp334 ) p121 IE1 (BExp121 ) p12 IE1 (BExp12 ) p343 IE3 (BExp343 ) p34 IE3 (BExp34 ) Cond.initiating event 2 for unit 1 and 2 p21 IE2 (BExp21 ) p212 IE2 (BExp212 ) p221 IE2 (BExp221 ) p22 IE2 (BExp22 )
water) and 3 and 4 are identical and have sharing of resources (e.g., DC bus). Case 1 describes the analysis for single CIIE and Case 2 describes the analysis for multiple CIIEs occurring simultaneously at a site. The corresponding Boolean expressions are shown in Table 5.5. Case 1: Single conditional internal initiating event occurring at the site 1. Four simultaneous core damages on the site due to single conditional internal initiating event are not possible as one initiating event affects a maximum of two units only. 2. Similarly, three simultaneous core damages on the site due to single conditional internal initiating event are also not possible. 3. Two simultaneous core damage for the site can occur in the following six ways = A. Sum of all possible combinations {Boolean expression (core damage of unit 1 by the single conditional internal initiating event) ∗ Boolean expression (core damage of unit 2 by the single conditional internal initiating event)} Total number of combinations = 4 B. Sum of all possible combinations {Boolean expression (core damage of unit 3 by the one single conditional internal initiating event) ∗ Boolean expression (core damage of unit 4 by the single conditional internal initiating event)} Total number of combinations = 2 4. Single core damage on the site due to single conditional internal initiating event can occur in 6 combinations. After simplification of Boolean expression for the cases of single, double, triple, and quadruple core damage and quantification of internal initiating event and SSC failures, cumulative risk at the site due to conditional internal initiating events for the site is obtained.
132
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Case 2: Multiple conditional internal initiating events occurring simultaneously on the site. If all three IEs occur simultaneously, then 1. Four simultaneous core damages is sum of all possible combinations {Boolean expression (core damage of all 4 units by respective conditional initiating events)} Total number of ways four simultaneous core damages for the site = 6∗ 2 = 12. 2. Three simultaneous core damage for the site due to the three conditional internal initiating events is the sum all possible combinations {Boolean expression (core damage of any three units by the three conditional initiating event)} This can occur due to all possible combinations: one CDF from first/second IE and two CDFs from third IE or two CDFs from first/second IE and one CDF from third IE. Total number of ways three simultaneous core damage due to the three conditional internal events for the site = 4 × 2 + 6 × 2 = 20. 3. Two simultaneous core damage for the site due to the three conditional internal initiating events is the sum all possible combinations {Boolean expression (core damage of any two units by the two conditional initiating event)} This can occur due to all possible combinations: Two CDFs from first/second IE or two CDFs from third IE or one CDF from first/second IE and one CDF from third IE. Total number of ways two simultaneous core damage due to the three conditional internal events for the site = 6 + 2 + 4 × 2 = 16. 4. Total number of ways single core damage for the site due to three conditional internal events for the site = 6. After simplification of Boolean expression for the cases of single, double, triple, and quadruple core damage and quantification of internal initiating events and SSC failures, cumulative risk at the site due to multiple conditional internal initiating events for the site is obtained.
5.2.2.5 Methodology for internal independent events Event trees and corresponding fault trees developed for internal Level 1 PSA are used and the Boolean expressions are obtained to evaluate single CD frequency only, since occurrence of multiple internal independent events is an extremely rare possibility. Total number of ways for single CD on the site due internal independent events in all units = sum of all the Boolean expressions in Table 5.6 = 14.
Challenges in risk assessment of multiunit site
133
Table 5.6 Boolean expressions for CDF of each of the units due to internal independent events. Unit 1
Unit 2
Unit 3
Unit 4
IE11 (BExp11 ) IE21 (BExp21 ) IE31 (BExp31 ) IE41 (BExp41 )
IE12 (BExp21 ) IE22 (BExp22 ) IE32 (BExp32 ) IE42 (BExp42 )
IE31 (BExp13 ) IE32 (BExp23 ) IE33 (BExp33 )
IE14 (BExp14 ) IE24 (BExp24 ) IE34 (BExp34 )
Note: IEij denote the ith initiating event for unit j.
Figure 5.9 Overall schematic for multiunit safety assessment.
Complete expression for site core damage frequency: The integrated approach explained in earlier sections for multiunit safety assessment considering all categories of hazards is depicted in Fig. 5.9. Extended mission time as appropriate may be used for external hazards and mission times used in internal PSA may be adopted for internal events. Thus, the integrated approach presented in this section leads to the formulation of site CD frequency as follows: Site CD frequency, SCDF =
5 m n
CDF (i, j, k)
i=1 j=1 k=1
where i denotes the number of simultaneous CDs; n denotes the number of units at the site;
(5.1)
134
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
j denotes the category of hazard or event; k denotes the type of hazard in jth category; m denotes the total number of types of hazard in jth category. Therefore, CDF(i, j, k) denotes the frequency of i number of simultaneous CDs due to jth category of hazard type k; j = 1 refers to DEHs for the site j = 2 refers to CEHs for the site j = 3 refers to definite internal events for the site j = 4 refers to conditional internal events for the site j = 5 refers to internal independent events considering for all units SCDF accounts for both single and multiple CDs occurring at the site.
5.2.3 Event sequence MUPSA method The practical difficulty of the hazard categorization method in evaluating all combinations of accident sequences may be tedious and, in many cases, may not be required as most of the combinations are a rare occurrence and do not contribute significantly in the multiunit accident probability. A simplistic approach is adopted in event sequence MUPSA method to carry out the integrated risk assessment in multiunit sites. In this method, after the identification of MUIEs using the method described in Fig. 5.3, event trees are developed for the whole site. If the event trees are already developed for single unit PSA, it will serve as a useful input both in the development and for verification of the multievent sequence progression. The event tree for unit 1 is first developed. The end states of the sequences are classified into two categories as safe or unsafe state. Generally, unsafe states are categorized based on the severity of CD. For brevity, in multiunit event tree development let us assume there is no subcategorization of unsafe state and all unsafe states are CD state. Event tree for unit 2 is subsequently developed only for the unsafe states/sequences of unit 1. The safe state for unit 1 will not lead to a multiunit accident sequence at all and hence are termed as NMU indicating that it is not a multiunit sequence and will not be evaluated in the event sequence MUPSA model. This way of modeling is continued for rest of the units at a site. To demonstrate the model for a twin unit site let us consider loss of offsite power as the initiating event. The single unit event tree will be as shown in Figs. 5.10 and 5.11 for unit #1 and #2, respectively. The multiunit event tree in the event sequence MUPSA model is developed as shown in Fig. 5.12. When both Unit 1 and Unit 2 does not
135
Challenges in risk assessment of multiunit site
Loss of offsite failure (U1)
Reactor Protection System (U1)
LOOP#1
RPS#1
Auxillary Boiler Feed Water System (U1) ABFWS#1
Crash Fire Water Cooldown System (U1) using SG inventory (U1) CRASHCOOL#1
FWS#1
No. 1
Freq.
Conseq. SAFE
2
SAFE
3
SAFE
4
CD
5
CD
Figure 5.10 Single unit LOOP (Unit #1). Loss of offsite failure (U2)
Reactor Protection System (U2)
LOOP#2
RPS#2
Auxillary Boiler Feed Water System (U2) ABFWS#2
Crash Fire Water Cooldown System (U2) using SG inventory (U2) CRASHCOOL#2
FWS#2
No. 1
Freq.
Conseq. SAFE
2
SAFE
3
SAFE
4
CD
5
CD
Figure 5.11 Single unit LOOP (Unit #2). Loss of offsite failure (U1) LOOP#1
Reactor Protection System (U1) RPS#1
Auxillary Boiler Feed Water System (U1) ABFWS#1
Crash Fire Water Cooldown System (U1) using SG inventory (U1) CRASHCOOL#1
FWS#1
Reactor Protection System (U2) RPS#2
Auxillary Boiler Feed Water System (U2) ABFWS#2
Crash Fire Water Cooldown System (U2) using SG inventory (U2) CRASHCOOL#2
FWS#2
No. 1
Freq.
Conseq. SAFE
2
SAFE
3
SAFE
4
NMU
5
NMU
6
NMU
7
2CD
8
2CD
9
NMU
10
NMU
11
NMU
12
2CD
13
2CD
Figure 5.12 Twin unit LOOP (Unit #1 and Unit #2).
lead to CD, the sequences are labeled with end state as SAFE, when both unit 1 and unit 2 lead to CD, the sequences are labeled 2CD indicating two CDs and when either one of the unit alone lead to CD, the sequences are labeled NMU indicating that the sequence is not a multiunit CD. As can be seen, the unit 2 sequences are developed in twin unit LOOP event tree only for the unsafe sequences of unit 1. The safe sequences 1, 2 & 3 in single unit cannot lead to multiunit CD. However, if single CD frequency
136
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
is also to be estimated then safe sequences of single unit are further developed as shown in Fig. 5.13. The multiunit event tree in Fig. 5.13 is comprehensive but becomes complicated for initiating events with many functional events. The twin CD sequences 7, 8, 12, and 13 of Fig. 5.12 are effectively the same as sequences 19, 20, 24, and 25 of Fig. 5.13, respectively. If there are common systems between two units, the multiunit accident sequence in event tree model will depend on how the common systems are configured at the site to serve multiple units. For example, common systems may be capable of serving both the units (2 × 100) simultaneously on demand, based on first come first serve basis, preferential allotment considering the age of the plant, etc. The multiunit event tree in the case of a common fire water system with 2 × 100 capacity in the example given in Fig. 5.12 can be constructed as given in Fig. 5.14.
5.2.4 Master event tree method Master event tree method is similar to hazard categorization method explained in Section 5.2.1 and is easier to apply when single unit PSA model is available. Master event tree method consists of three steps: in the first step, potential initiating events are identified for each unit and corresponding accident sequences leading to CD in each unit is developed. The second step is to combine all the unsafe states of the single unit IEs and develop a combined fault tree with top event as CD pertaining to the IE of each unit. This procedure is repeated for all IEs of individual units.Upon completion of the second step, one fault tree for each IE of a unit is available. The interunit dependencies and correlations are modeled within the fault trees. The third and final step is to develop the master event tree with multiunit IEs. Each multiunit IE will have the functional events as a fault tree developed in second step for the IEs of unit level. The procedure is explained in Fig. 5.15. To reduce complexity in fault tree modeling, accident sequences leading to CD having low risk contribution may be neglected while developing the corresponding fault tree. The end state of the master event tree will effectively be the combination of all possible combinations of CDs in the site. A MUIE may trigger different IE in individual units and all such potential single unit IEs are included as functional event in the master event tree. For example, a seismic event at a site might trigger loss of offsite power in one unit and general transients in another unit and so on. Expert judgment is
Loss of offsite failure (U1) LOOP#1
Reactor Protection System (U1) RPS#1
Auxillary Boiler Feed Water System (U1) ABFWS#1
Crash Fire Water Cooldown System (U1) using SG inventory (U1) CRASHCOOL#1
FWS#1
Reactor Protection System (U2) RPS#2
Auxillary Boiler Feed Water System (U2) ABFWS#2
CRASHCOOL#2
FWS#2
No. 1
Freq.
Conseq. SAFE
2
SAFE
3
SAFE
4
1CD
5
1CD
6
SAFE
7
SAFE
8
SAFE
9
1CD
10
1CD
11
SAFE
12
SAFE
13
SAFE
14
1CD
15
1CD
16
1CD
17
1CD
18
1CD
19
2CD
20
2CD
21
1CD
22
1CD
23
1CD
24
2CD
25
2CD
Challenges in risk assessment of multiunit site
137
Figure 5.13 Twin unit LOOP (U #1 and U #2) with single and double CD states.
Crash Fire Water Cooldown System (U2) using SG inventory (U2)
138
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Loss of offsite failure (U1)
Reactor Protection System (U1)
LOOP#1
RPS#1
Auxillary Boiler Feed Water System (U1) ABFWS#1
Crash Fire Water Cooldown System using SG inventory (U1) CRASHCOOL#1
FWS
Reactor Protection System (U2) RPS#2
Auxillary Boiler Feed Water System (U2)
Crash Cooldown using SG inventory (U2)
ABFWS#2
CRASHCOOL#2
No. 1
Freq.
Conseq. SAFE
2
SAFE
3
SAFE
4
NMU
5
NMU
6
2CD
7
2CD
8
NMU
9
NMU
10
2CD
11
2CD
Figure 5.14 Twin unit LOOP with a common fire water system.
preferred to identify relevant function events (single unit CDs) in the master event tree. By combining unit level CD sequences in the master event tree, a comprehensive modeling of the entire site is addressed and combinations of single and multiple unit CDs are estimated.
5.3 Seismic PSA for multiunit site As seismic hazard has the potential to directly impact all the facilities at a multiunit site, it needs a special attention. Common methodology adopted for seismic PSA for single unit involve five major stages (Fig. 5.16). First stage is the site-specific seismic hazard assessment. Second stage covers the safety analysis, which includes characterization of initiating events, accident sequence analysis by means of logic tree for each initiating event, development of fault trees for each primary and support system identified in the logic tree of accident sequences. Impact of human errors is modeled in this stage. One of the outcomes of this stage is identification of seismic structures, systems and components whose seismic capacity is to be determined. As built information along with operation history and plant walk down provide necessary input for the activities of this stage. Next stage involves with determination of components’ seismic fragility and capacity either by direct method or by indirect method. The direct method applies analytical approach and tastings; while, the indirect method is an experienced based method of in-plant evaluation with the help of plant walk down and from which the easy fixes are established. In the fourth stage, the system model analysis is carried out to calculate the plant seismic fragility and/or capacity from the component fragility or capacity [18]. Finally, the plant seismic risk assessment is done in the fifth stage to derive the seismic core damage frequency (SCDF) of the reactor by
139
Figure 5.15 Master event tree method.
Challenges in risk assessment of multiunit site
140
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Figure 5.16 Five stages of seismic PSA.
convoluting the plant seismic fragility over the seismic hazard curve of the site.
5.3.1 Site-specific seismic hazard assessment Seismic hazard analysis plays an important role in seismic PSA. For sitespecific assessment, hazard contributions are integrated over all magnitudes and distances for all source zones around the site. The seismic input to be utilized within the scope of a seismic PSA is derived from probabilistic
Challenges in risk assessment of multiunit site
141
considerations. Seismic hazard curves are developed to characterize the seismic exposure of a given site to the primary seismic effect (vibratory ground motion). To derive this, historical earthquake reports and instrument records, as well as geology of the region, including physical evidence of past seismicity are used. Traditionally, hazard parameters such as peak ground acceleration or the response spectra as a function of earthquake magnitude and distance are used to estimate the damage potential. The hazard curve gives the annual frequency of exceeding the given estimator of ground motion (PGA) at a given frequency.Since hazard analysis is subject to a relatively high level of uncertainty, it is common practice giving, for each estimator of ground motion, the median, the mean, the 85th percentile and the 15th percentile hazard curve. The expected number of exceedances of ground motion level z during a specified time period, usually 1 year is E (Z) =
N i=1
mu r=∞ αi
fi (m) fi (r)P(Z > z|m, r)drdm mo r=0
where α i is the mean annual rate of occurrence of earthquakes between lower and upper bound magnitudes (m0 and mu ) for the ith source, m is earthquake magnitude, r is the distance to the site, fi represents a probability density function of the distance to the source for various locations (r) and magnitude (m) for the source i, and P() denotes the probability that a given earthquake of magnitude m and epicentral distance r will exceed the ground motion level z. Based on the above approach, a typical hazard curve is derived as given in Fig. 5.17.
5.3.2 Safety analysis The seismic initiating events are characterized in two steps. First one is to identify the initiating events, and then to assign frequency of occurrence or corresponding fragility parameters to each of them. Two approaches are available for characterizing the seismically induced initiating events. In the first approach, earthquake is considered itself as the initiating event. In this approach, the range of seismic accelerations is generally partitioned into discrete bins with increasing frequency of PGA. The bin acceleration is determined as the geometric average of the two bin range limits and bin frequency is calculated as the difference of the frequencies of the two bin
142
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
1.0E+00 1.0E-01
Prob Y>y' /year
1.0E-02 1.0E-03 1.0E-04 1.0E-05 1.0E-06 1.0E-07 1.0E-08 0.01
0.1
PGA 'g'
1
10
Figure 5.17 Typical hazard curve.
range limits. These frequencies are obtained from seismic hazard curve. In the second approach, seismically induced failure of system/component that initiates plant transients resulting to propagation of accident scenario and leading to CD and/or breach of containment and confinement function are considered as initiating events. The first approach of postulating seismic initiating event is advantageous when SPSA is conducted after carrying out full-fledged internal event PSA of a plant. Following the identification of initiating events, event tree and fault tree modeling is carried out.
5.3.3 Component fragility Fragility analysis of SSSC is an integral step in SPSA of an NPP for external events. The fragility of a component is the probability of the component reaching a limit state condition on a particular value of a random demand parameter [21]. The limit state of the SSC could be limit state of strength leading to strength failure or limit state of serviceability leading to functional failure. Seismic fragility is the probability of failure pf at any nonexceedence probability level Q and is commonly expressed [22,23] as, ⎞ ⎛ ln Aam + βU −1 (Q) ⎠ (5.1) p f = ⎝ βR
Challenges in risk assessment of multiunit site
143
where pf is the failure probability for a given peak ground acceleration, “a”; and (.) is the standard Gaussian cumulative distribution function. Am is the median ground acceleration capacity, and β R and β U represent associated randomness and uncertainties. The capacity of SSC, in terms of ground acceleration capacity A, is expressed as follows, as, A = Am .εR .εU
(5.2)
Am = ARBGM .F
(5.3)
where,
ARBGM : εR
:
εU
:
F¯
:
PGA of review basis ground motion (RBGM) or review level earthquake Random variable representing the aleatory uncertainties, i.e., inherent randomness associated with ground acceleration capacity. Random variable representing epistemic uncertainty in the determination of median value, Am , i.e., the uncertainty associated with data, modeling, methodology etc. Median value of factor of safety, F
ε R and ε U are taken as log-normally distributed random variables, which have unit median and logarithmic standard deviations of β R and β U , respectively. The composite fragility curve, or mean fragility curve is given by,
1 a p f = P(A ≤ a) = (5.4) ln βc Am where, The composite variability, βC =
β 2 R + β 2U
(5.5)
The high confidence low probability failure (HCLPF) value of capacity, AHCLPF, is taken as the value of A corresponding to a 95% confidence (Q = 0.95) of not exceeding about a 5% probability of failure (pf = 0.05); from Eq. (5.2). For composite fragility curve, Eq. (5.4), the HCLPF value corresponds to 1% probability of failure (pf = 0.01). The relation between Am and AHCLPF can be derived for the above equations; AHCLPF = Am e−1.645(βR +βU )
(5.6)
AHCLPF = Am e−2.33βC
(5.7)
144
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Derivation of component fragility principally involves determination of two fragility parameters, (i) median ground acceleration capacity Am , and (ii) the logarithmic standard deviations β R and β U , or β c . Once ARBGM is known, determination of median safety factor, F, is key to deriving Am . Generic expression of F is, F = F1 F2 F3
(5.8)
F1 is the strength factor, that is, ratio of capacity and demand. F2 corresponds to the level of conservatism in assessing capacity; it depends on the energy absorption capacity of SSC beyond elastic limit. F3 represents the conservatism associated with calculating demand. Different approaches such as analysis, testing, or experience-based methods are adopted to determining the median values of F1 , F2 , and F3 .
5.3.4 Plant fragility Plant fragility is determined from the component fragility using the cut sets determined in the previous task of system analysis.
5.3.5 Seismic core damage frequency Seismic CD frequency is evaluated by convolution of seismic hazard curve of the site with plant fragility curve, ∞ F=
P f (a)
dH (a) da da
(5.10)
0
where H(a) is the hazard curve, Pf (a) is the plant fragility curve. Here the median fragility curve and median hazard curves are used. The convolution operation consists of multiplying the occurrence frequency of an earthquake PGA between “a” and “a + da” with the conditional probability of the plant damage state and integrating such products over the entire range of PGA from 0 to α. This operation results in median value of seismic CD frequency. 5.3.5.1 Seismic analysis for multiunit Since seismic hazard is common to all the units at a site,appropriate screening is required to identify and shortlist the concurrent IEs at each of the unit due to a seismic event. The difference in the approach for seismic PSA in single and multiunit site is the additional analysis for correlation of seismic induced
Challenges in risk assessment of multiunit site
145
failures of structures, systems and components among units, identification and screening of concurrent failures in multiple units, derivation of concurrent failure probability from correlation factors and identification of accident sequences. Each initiating event is quantified with occurrence probability on condition of the seismic event. The event tree and fault trees are developed and the minimal cut sets of the accident sequences derived are quantified as conditional CD probability. The site CD frequency is then obtained through the integration of seismic hazard and the conditional CD probability. These steps are common in all the three approaches described earlier in Section 5.2, the accident: hazard categorization method, event sequence MUPSA method, and master event tree method. In a multiunit site with N units, the ith accident sequence is given by ASi (all units) N ASi acc. seq. o f unit j|acc. at unit j − 1 = ASi acc.seq. o f unit 1 j=2
ASi (all units) = IEi (C1C2C3 + C4C5 . . . .) The accident sequence will be the product seismic induced concurrent failures of initiating event and conditional failure probabilities of structures, systems, and components in the cutsets of Boolean expression. A seismic event at a multi-unit site enacts spatial correlations on SSCs of individual or different units at a site. Identical SSCs in the same or different units, tend to fail together due to the dependencies of the ground motion, seismic demand, and capacity. Therefore, in seismic analysis for multi-unit, modeling of inter-unit correlation is very important. When the inter-unit correlation coefficient increase, the number of damaged units increases. Generally, it is observed that the number of core damage in a multi-unit site is proportional to the magnitude of PGA. As the magnitude of PGA increases, number of core damages also increases. But the challenge lies in the determination of the appropriate degree of dependency. However, the importance analysis and sensitivity analysis are required to be performed on the degree of correlation and its effect on the site CDF. The procedure for importance analysis is identical to single unit PSA but with the additional complexity of combining importance measures for different hazards. It is important to take note of the asymmetry in different units and an appropriate interpretation of the results is required for any risk-based decisions.
146
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Figure 5.18 Framework for Level-2 MUPSA.
5.4 MUPSA for Level 2 The discussions in the earlier sections pertain only for Level 1 MUPSA. A brief discussion on MUPSA framework for Level 2 is given in Fig. 5.18. Level 2 MUPSA is similar to that of single unit PSA with some refinement. In a single unit Level 2 PSA, the accident sequences of Level 1 PSA are
Challenges in risk assessment of multiunit site
147
grouped to represent the possible paths through which radioactive products can escape into the environment.The group of such accident sequences leads to the damage states termed as plant damage states (PDS). The PDS leading to the potential release of fission products denote various aspects such as the state of fuel in the reactor core, presence/absence of bypass of containment for the radioactive products to escape, state of containment isolation valves, spray system,accident progression in terms of time,core inventory,etc.Along with the frequency of accident sequences in each PDS and based on the accident progression and its impact on containment behavior, containment event trees are modeled. The various end states of the containment event trees are grouped to a set of release categories and source terms for each release category are evaluated. The same procedure is adopted for Level 2 of multi-units with additional steps. The objective here is to evaluate the site release category including single-unit and multi-unit releases. The first step is to define the risk metric for Level 2 MUPSA. The plant damage states of single-unit and multi-unit are listed and grouped in accordance with the accident characteristics and containment response characteristics of accident sequences categorized. A combined containment event tree analysis is developed taking into account the accident progress of corresponding single-unit and multi-unit PDS. Subsequently, the source term analysis of radionuclide releases for the events from single and multiple units are cumulatively evaluated. The release categories for the site are listed similar to the procedure followed for single-unit PSA for the estimation of site release frequency. Uncertainty analysis is performed for various parameter values assigned and the range of uncertainty surrounding the mean site release frequency. In addition, sensitivity analysis is also carried out to show the potential impact of important assumptions and uncertainties on the results.
5.5 MUPSA for Level 3 Internationally, only a few Level 3 MUPSA studies have been carried out to date, mainly due to the complexity in a large number of multi-unit accident combinations. The problem is complicated as the source terms that can be released to the environment from different units at a site may be different and the location of release may also be different. Developing consequence models of a large number of possible accident scenarios is extremely difficult. Like in other levels of PSA, in Level 3 MUPSA, the accident scenarios and their combinations at a multi-unit site can be grouped to build the consequence model. The number of scenarios may
148
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
also be managed based on an appropriate screening method. Researchers have proposed techniques using a lookup table based on consequences and their frequency for evaluating the risk. A simplified approach is needed to perform a Level 3 consequence analysis at a multi-unit site with a reasonable computational effort with proper management of accident scenarios and the associated source term evaluations.
Further readings [1] S. Schroer, M. Modarres, An event classification schema for evaluating site risk in a multi-unit nuclear power plant probabilistic risk assessment, Reliab. Eng. Syst. Saf. 117 (2013) 40–51. ˇ [2] M. Cepin, DEPEND-HRA—a method for consideration of dependency in human reliability analysis, Reliab. Eng. Syst. Saf. 93 (10) (2008) 1452–1460. [3] IAEA Safety Standards. Deterministic safety analysis for nuclear power plants, IAEA Specific safety Guide no. SSG-2, 2009. [4] IAEA Report. Nuclear Safety Review for the Year 2012, GC(56)/INF/2, 2012. [5] IAEA. Development and application of level 1 probabilistic safety assessment for nuclear power plants. IAEA Specific Safety Guide no. SSG-3, 2010. [6] IAEA. A methodology to assess the safety vulnerabilities of nuclear power plants against site specific extreme natural hazards. 2011. [7] OECD. Probabilistic risk criteria and safety goals. Technical Report NEA/CSNI/ R(2009)16, Organisation for Economic Co-operation and Development, 2009. [8] J.-E. Holmberg and M. Knochenhauer. Guidance for the definition and application of probabilistic safety criteria. (2010:36), VTT, Finland, 2010. [9] USNRC. An approach for using probabilistic risk assessment in risk-informed decisions on plant specific changes to the licensing basis. Technical report, 1998. [10] K. Ebisawa, M. Fujita, Y. Iwabuchi, H. Sugino, Current issues on PRA regarding seismic and tsunami events at multi units and sites based on lessons learned from Tohoku earthquake/tsunami, Nucl. Eng. Technol. 44 (5) (2012). [11] M.D. Muhlheim, R.T. Wood, Design Strategies and Evaluation for Sharing Systems at Multi-unit Plants Phase-I (ORNL/LTR/INERI-BRAZIL/06-01), Oak Ridge National Laboratory, USA, 2007. [12] K.N. Fleming, On the issue of integrated risk-A PRA practitioner’s perspective, in: Proceedings of the ANS International Topical Meeting on Probabilistic Safety Analysis, San Francisco, CA, 2005. [13] J.-E. Yang, et al., Development of a new framework for the integrated risk assessment of all modes/all hazards, in: Korean Nuclear Society 2009 Autumn Meeting, Gyeongju, Korea, 2009. [14] S. Samaddar, K. Hibino, O. Coman, 2014. Technical Approach for Safety Assessment of Multi-Unit NPP Sites Subject to External Events. PSAM12, Hawaii. [15] U.S. Nuclear Regulatory Commission. Policy issues related to new plant licensing and status of the technology-neutral framework for new plant licensing (SECY-05-0130), 2005. [16] Higashidori. Report of the SNETP Fukushima Task Group, 2013. [17] B. Zerger, M.M. Ramos, M.P. Veira, European Clearing house: Report on External Hazard related events at NPPs, Joint Research Centre of the European Commission, 2013.
Challenges in risk assessment of multiunit site
149
[18] P. Lowe and I. Garrick, “Seabrook Station Probabilistic Safety Assessment Section 13.3 Risk of Two Unit Station,” Prepared for Public Service Company of New Hampshire, PLG-0300, 1983. [19] H. Varun, C.S. Kumar, K. Velusamy, Probabilistic safety assessment of multi-unit nuclear power plant sites—an integrated approach, J. Loss Prev. Process Ind. 32 (2014) 52–62. [20] G. Toro, R. McGuire, Calculational procedures for seismic hazard analysis and its uncertainty in the eastern United States, in: Proceedings,. Third International Conference on Soil Dynamics and Earthquake Engineering, Princeton, NJ, 1987, pp. 195–206. [21] United States Nuclear Regulatory Commission (USNRC), Washington, PRA Procedures Guide, A Guide to the Performance of Probabilistic Risk Assessments for NPP, NUREG CR-2300, 2, USNRC, 1983. [22] R.P. Kennedy, M.K. Ravindra, Seismic fragilities for nuclear power plant risk studies, Nucl. Eng. Des. 79 (1984) 47–68. [23] R.P. Kennedy, C.A. Cornell, R.D. Campbell, S. Kaplan, H.F. Perla, Probabilistic seismic safety study of an existing nuclear power plant, Nucl. Eng. Des. 59 (1980) 315–338.
CHAPTER 6
Risk aggregation Contents 6.1 Unit level 6.2 Site level 6.3 Aspects to be considered in risk aggregation 6.3.1 Heterogeneity 6.3.2 Aging 6.3.3 Multi-facility site 6.3.4 Combination of hazards and uncertainties in external hazards 6.3.5 Human, organizational, and technological factors 6.3.6 Risk importance and sensitivity measures 6.4 Risk aggregation and its effect on risk metric 6.5 Mathematical aspects of risk aggregation 6.6 Interpretation of results 6.7 Risk aggregation for risk-informed decisions Further readings
153 154 154 154 156 166 167 167 167 168 168 169 169 170
The term aggregation means combination and risk aggregation means the combination of risks. In the context of probabilistic safety assessment (PSA), risk aggregation refers to combining risks from various contributors to provide an overall risk index. Traditional single unit PSA at a multi-unit site may not be adequate to assess the total risk to the public and environment. The need for an integrated multi-unit or site-level PSA approach that include potential for concurrent occurrences involving multiple co-located facilities is well recognized. Simple addition of the single-unit risk at a site could be either conservative or nonconservative, depending on the interunit resource sharing, site configurations, and risk metric. To make risk informed decisions, multi-hazard risk aggregation is required to aggregate the various risk contributors. At a multi-unit site, risk aggregation encompasses aggregation of all the risk contributions from various group of hazards, across different reactor types, different plant states, and from different radioactive sources present. Broadly, the term risk aggregation can be used in two different contexts, viz., r aggregation of risk for multiple radioactive sources at a site (e.g., nuclear power plants [NPPs], spent fuel facilities, reprocessing facilities, etc.) Reliability and Probabilistic Safety Assessment in Multi-Unit Nuclear Power Plants. DOI: https://doi.org/10.1016/B978-0-12-819392-1.00001-1
Copyright © 2023 Elsevier Inc. All rights reserved.
151
152 r
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
aggregation of risks due to different type of hazards (e.g., internal events, external hazards such as seismic, flood, etc.) The focus of this book is on the risk aggregation due to different type of hazards at a site. In the context of site-level probabilistic safety assessment (PSA) or multiunit PSA (MUPSA), whether and how the risks attributed to various radiological sources, hazards, and operating states are aggregated will depend upon the specific decisions to be informed by the PSA. As a first step, risk from different contributors must be evaluated and a framework needs to be evolved to combine them in a logical manner to arrive at the overall risk index. The framework should include the uncertainty and sensitivity studies. This risk index is then compared with the safety goals and risk metrics for risk informed decisions. The heterogeneities in the risk contributors need to be addressed with utmost importance so that a realistic inference is made. In the case of multi-unit PSA in nuclear reactor sites, concurrent or consequential initiators due to multiple sources may challenge shared systems and resources. Aggregation methods vary from simple addition of the risk contributions from individual hazard groups to development of an integrated PSA model that quantifies the total aggregated risk. The approach that is used can influence the results and insights that are obtained, including the assessed relative importance of risk contributors. For example, aggregation across an entire site can identify potentially important scenarios overlooked by aggregation across a subset of units on the site. Alternative, more detailed aggregation schemes (e.g., those that consider timing, as well as other source term characteristics) may be needed. There are technical concerns regarding different degrees of realism across the analyses of different risk contributors, particularly with respect to different hazard groups. Solutions to these concerns include: (1) presenting disaggregated results; and (2) estimating and adjusting for the degree of bias or systematic error associated with different analyses. In addition to combining quantitative risk contributions, risk aggregation should also consider integrating qualitative risk insights to develop an overall picture of the risk profile for a nuclear installation. The main challenge in risk aggregation is the lack of experience with both deterministic and probabilistic risk assessment of multi-unit sites. The other challenges include the relationship of dose response models and the early health effects. When there is core damage and radioactive release from multiple sites, the dose response model is nonlinear and the early health effects in such a scenario is difficult to postulate. Another major challenge is the safety goal. The risk metric for multi-unit site needs to be properly
Risk aggregation
153
Figure 6.1 Considerations for problem formulation and risk characterization.
defined. Moreover, use of risk aggregation in risk informed decision-making process depend on the safety goals established in the country (Fig. 6.1).
6.1 Unit level The main aim of risk aggregation at the unit level is aggregation from the full spectrum of internal and external hazards and this can be achieved using simple summation for primary risk metrics of PSA, namely core damage frequency (CDF) and large early release frequency (LERF). However, such a risk aggregation can lead to biased and misinterpreted result, especially when large uncertainties and conservative assumptions are used in various hazard groups. To explain further, it should be understood that the biases and uncertainties across all the individual contributors may not be uniform and by simple addition of the mean or median value of risk external hazards PSAs, the insights derived may be incorrect. The underlying reason for the variability in the biases could be due to the maturity levels of the methods in modeling and quantifying the hazards. PSA methods and techniques established for internal events are much more advanced as compared to those for external hazards such as fire, flood, and seismic.
154
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Suitable aggregation alternatives can be used depending on how the safety goals are formulated in the national framework.
6.2 Site level At site level, more than the core damage, environmental issues need more attention and health effect consequences from multiple core damages from two or more reactor facilities are to be looked at. In this context, the risk metric of core damage frequency is not sufficient, more meaningful metric is large early release frequency. Risk aggregation at the site level (from all units at the site) can be obtained through a simple multiplication of a singleunit PSA. However, risk aggregation through simple multiplication can lead to an overly conservative result, knowing that the contribution from common mode events will be double counted. Simple and more elaborate methods are proposed to account for multi-unit PSA. The objective of risk aggregation at a multi-unit site is to encompass the contributions to integrated risk arising from various radiological sources at the site, various hazard groups that cover internal as well as external hazards. Both the hazard groups should cover human-made and natural hazards along with individual and combined hazards. For both unit and site level, analogous risk measure can be used for multiple sources of radioactivity available at the site.
6.3 Aspects to be considered in risk aggregation If the scope of the multi-unit assessment includes spent fuel pools, dry fuel storage facilities, waste treatment facilities, the risk aggregation should model combined risks at these facilities. Some of the important aspects to be considered in risk aggregation are listed in Fig. 6.2.
6.3.1 Heterogeneity In many countries, multi-unit site may comprise reactors of different types, such as Tarapur Atomic Power Station in India, Qinshan Nuclear Power Plant in China, Wolsong in Korea, Ringhals in Sweden, etc. As there is difference in reactor technology, the heterogeneity in PSA modeling needs to be appropriately addressed and separate models are needed. Heterogeneity also exists in the level of realism and maturity among different analysis areas which may have implications in interpreting the integrated risk. The different in maturity is due to various reasons as the PSA methodologies and its implementation has evolved over the last few decades and the
Risk aggregation
Figure 6.2 Aspects in risk aggregation. 155
156
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
progress on additional modeling and quantification of various phenomena has expanded to different levels. In addition, modeling assumptions during the development of accident sequence models for different groups of hazards also vary. The heterogeneity in hazard assessment, modeling, etc. is an inherent aspect of integrating different phenomena and plant responses and is not to be viewed as a disadvantage. Rather, the extent and level of approximations provide the necessary confidence in the results. For example, in a site, the risk contribution from extreme events such as hurricane, wind may be sufficiently low and any additional refinements would involve plenty of resources and may even alter the insights that could be obtained from the model. Therefore, risk aggregation and quantification of heterogeneity at a multi-unit site must be well understood and interpreted in the right manner.
6.3.2 Aging Aging management is required to maintain the required safety level throughout the life of an NPP. It requires consideration of material degradation, technological obsolescence, human and organizational aspects. Detailed information on existing and potential aging and degradation of systems, structures and components is required to assist operating organizations and regulatory bodies by specifying a technical basis and practical guidance on managing aging of mechanical, electric and instrumentation and control components, and civil structures of a NPP important to safety. Generally, aging is attributed to management of structure, systems and components but in a broader sense it also includes keeping up with the technological developments and the management practices. The effects of component ageing and its consideration in PSA is included in most of the regulatory requirements worldwide and it provides transfer of knowledge to a new generation of experienced personnel in order to avoid loss of information besides technical aspects and ageing management requirements. In the context of an integrated safety management, it must be shown how to integrate the safety related issues of aging into PSA. Method for prioritization of the components in the NPP considering implication of their aging on safety and to highlight the necessity of introducing ageing in PSA is needed. Design reports of a nuclear plant contain all the necessary data and information right from the initial stage to operation. Modifications, deviations,details of manufacturing,transportation,commissioning along with the safety reviews, in-operation history of components, frequency of failure and maintenance of the components with replacement history are all the data required for aging analysis. All these data may be collected and prepared in a
Risk aggregation
157
format suitable for inclusion in PSA for evaluation and prediction of better PSA as Aging–PSA. When additional reactors are constructed at an existing NPP site, risk aggregation of old and new plants is an important aspect to be considered. Due to the advancement and operational experience,core damage frequency is relatively much lower as compared to the old nuclear plants co-located in a site.The uncertainty analysis at these heterogenous units may be significantly larger and the aggregated risk may be dominated for a high-risk contributor that may not be realistic for risk-informed decision making. .
6.3.2.1 Statistical approach to model aging in NPP components Though several methods are available for addressing aging in risk assessment, Bayes method provides a realistic assessment as it uses plant specific information along with data obtained from experience. Bayesian statistics is a theory in the field of statistics that applies probabilities to update the beliefs in the evidence of new data. In the context of reliability, when there is lack of plant specific component failure data, most of the reliability analysis depend on generic sources. While the use of limited plant specific information is not statistically valid, data from generic sources also may not be suitable to be adopted as it is. To overcome this two-fold issue, statistical methods such as Bayesian techniques are adopted. Bayesian technique considers the mean failure rates and the corresponding standard deviations available from operating experience of similar systems worldwide and combines with plant specific information to obtain posterior failure probability. 6.3.2.2 Estimate ageing failure rate Thresholds for various structural parameters can be calculated using design specifications. For instance, if we have a rod that is under some load, we can estimate the threshold value of the cross-sectional area of the rod at the yield point beyond which the rod is considered to have failed. After deciding the thresholds, we define a random variable for the total deterioration given time t. Failure time in terms of the deterioration random variable and the decided thresholds are modelled and with some basic information on probability and Bayes formula the ageing failure rate is estimated. 6.3.2.3 Combining base failure rate and ageing failure rate Once both the failure rates have been estimated, we simply superimpose the second on the first giving the net failure rate.
158
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
6.3.2.4 Methodology for base failure rate The Bayesian formula for continuous distributions f ( θ |x) =
f ( x|θ ) f (θ ) f ( x|θ ) f (θ )dθ
(6.1)
The component failure data from various international NPPs are given below: Source
1 2 ….. M
Failure rate reading at various instants
Average failure rate
Standard deviation
λ11 λ12 λ13 ………… λ1n λ21 λ22 λ23 ………… λ2n
λ1 λ2
S1 S2
λM1 λM2 λM3 ………… λMn
λM
SM
Note here only λi ’s are known and λij is unknown.
Assumptions: 1) Sample size of each plant is the same. 2) Failure distribution is exponential, f(x) = λe−λt , and therefore the mean is 1/λ. 3) Component failure rate follows normal distribution. Since the mode and mean are equal for normal distribution, the average failure rate given is the most likely failure rate for the component of the plant. Now the most likelihood estimator of the mean of the required priori distribution is m 1 λi L= (6.2) m Similarly, variance of the average failure rates as m 2 1 (λi − L) 2 S = m−1
(6.3)
We know, 1 1 λ−L 2 f (λ) = √ e− 2 ( S ) S 2π
(6.4)
is the prior distribution. If the sample size is “k”, mean time to failure, T for this sample would be MTTF = (X1 + X2 + …Xk )/k, where Xi represent the time to failure.
Risk aggregation
159
Let D = X1 + X2 + …… + Xk . Since Xi follow exponential, sum of k identically independently distributed (IID) exponential variables follows gamma distribution with parameters β = 1/λ and α = k. By the scaling property of gamma distribution, D/k that is the MTTF follows gamma with parameters 1/λk and k. The likelihood function is: f (T /λ) =
(λk)k (k−1) −T λk T e (k)
Thus, the posterior distribution is, (λk)k (k−1) −T λk 1 − 1 ( λ−L )2 2 S T ∗ √ e e f (λ/T ) ∝ (k) S 2π
(6.5)
(6.6)
This essentially gives us the probability function for the component failure rate given the available data. Building on this, we intend to inspect the effects of aging on the failure rate and estimate the failure rate if the components are aging. Let us look at a case study where the prior data is taken from International Atomic Energy Agency (IAEA) TEC-DOC-478, Vienna, 1988, which is the prior information. Given below is the failure information of an NPP component, viz., Motor driven pump. Mean
Additional information (error factor)
3.0E-3/hr 3.0E-3/hr 1.0E-4/hr 3.0E-5/hr 7.9E-6/hr 1.0E-5/hr 1.3E-4/hr 9.9E-5/hr
— 10 — 10 — — 10 —
This is our prior information. As derived above in Eqs. (6.2) to (6.6), we call L as average failure rate and S2 as variance. Therefore, L = 7.97E-4 and S = 1.36E-3. Thus, 1 1 λ−7.97e−4 2 f (λ) = √ e− 2 ( 1.36e−3 ) 2π ∗ 1.36e − 3
(6.7)
160
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
If plant specific information is as follows: Sl. no.
Pump running hours while testing
No. of failures
1 2 3 4 5 6 7 8 9 10 11 12 Total
95.56 200.84 141.6 1769.09 1527.63 505.66 1509.13 2302.07 998.59 721.9 296.15 1001.2 11069.42
0 0 0 23 23 17 7 17 8 1 5 2 103
Failure rate = 9.3∗ E-3 and MTTF = 107.5268 hr. The likelihood function is:
(7103λ)7103 f T/λ = 107.537102 e−1.07.53∗7103λ (7103) Thus, the posterior, (λk)k (k−1) −T λk 1 − 1 ( λ−L )2 f (λ/T ) ∝ T ∗ √ e 2 S e (k) S 2π
(6.8)
(6.9)
After estimating a prior distribution for failure rate parameter and combining it with the likelihood function of MTTF we obtain a formula of proportionality for the posterior distribution, which will be further used in ageing analysis, of failure rate given a MTTF which looks as follows: (7103λ)7103 f (λ/MT T F ) ∝ 107.53(7102) e−1.07.53∗7103λ ∗ (7103) (6.10) 2 1 − 12 ( λ−7.97E−4 ) 1.36E−3 √ e (1.36E − 3) 2π 6.3.2.5 Effect of ageing in failure rate estimation The traditional ways to estimate component failure rates depend on the inherent lifetime distribution of the component. However, for a component whose life expectancy is very high, as in safety critical applications, various modes of deterioration with respect to time are considered.
Risk aggregation
161
For example, corrosion is a mode of deterioration considered because it can be a single contributor for large amount of failure in components. Further, we intend to compare the difference between the failure rates estimated with and without considering ageing as a factor. This analysis is expected to give a fair idea on whether ageing analysis is required or not and if required, at what level. However, not much research work is reported in open literature. The most common way of addressing this problem is taking the linear model where the ageing failure rate is proportional to the deterioration incurred by the component. This model was proposed by William E. Vesely in April 1987 and was adopted for light water reactor safety system. The proposed linear model estimates component failure rate taking into account the ageing mechanisms from basic phenomenological considerations. In this study, occurrences of deterioration are modeled as a Poisson process but the severity of damage is allowed to have any distribution; however, the damage is assumed to accumulate independently. Finally, the failure rate is modeled as being proportional to the accumulated damage. Using this treatment, the linear aging failure rate model is obtained. The model assumes that shocks occur discretely and deterioration is linear to failure rate. For a real application, linearity cannot be assumed and a continuous deterioration processes needs to be modelled as it is the case in most modes of ageing. Inspired by this, a simple model that assumes the failure rate at any given time to be proportional to the deterioration can be developed. It gives a basic idea of how to handle continuous deterioration mechanisms. The model is purely with the intention of making the thought process of including ageing in the component failure rate estimation more clearer. It is assumed that the average deterioration at time t is known, say c(t). The random variable for deterioration at time t is assumed to follow normal distribution with mean c(t), call it C|t. Now we know the failure rate h(t) is proportional to the deterioration at time t. Thus, h(t) α C|t and mean failure rate at time t is actually same as the mean deterioration rate and hence h(t) is actually a scalar multiple of c(t). Although this model may have several flaws, the takeaway from this model is actually the idea of converting a known approximate graph to a more useful probability distribution of a relevant deterioration random variable much like linear regression. 6.3.2.6 Aging failure rate defining relevant random variables Let us assume we have data points of corrosion depth at successive time intervals of size t.
162
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Figure 6.3 Typical corrosion depth with time.
Fig. 6.3 represents a possible corrosion depth with time graph. Let us assume the corrosion depth is logarithmic in nature with respect to time. Thus, we expect that ec is linear with time t. Now we have data points (c1 ,t0 ),(c2 ,2t0 )…..,(cn ,nt0 ) where t0 can be decided on the basis of required accuracy. If a linear regression model is adopted, Yi = eci = β0 + β1ti + εi , here ti is i∗t0 for all i in {1, n} Let εi |t0 i ∼ N 0, σ 2 , Thus, where, μi = β0 + β1ti Yi ti ∼ N μi , σ 2 Using the linear regression model we know, N
it − t¯n (yi − y¯n ) ∗ β1 = N 2 i=1 it − t¯n
(6.11)
(6.12)
i=1
where y¯n denotes the average value over all n, and β1∗ is the estimator of β 1 . (6.13) β0∗ = yn − β1∗ t¯n , and μ∗i = β0∗ + β1∗ti = Yi ∗ , y∗ t = β0∗ + β1∗ti = μ∗ t say ,
Risk aggregation
163
That is, the expected value at time t is the same as expected average value of y at time t. Y t ∼ N μ∗ t, σ 2∗ Hence, ec |t ∼ N(μ∗ |t, σ 2∗ ), If FY|t (x) = p(y|t < x) = P(ec |t < x) = p(c|t < ln (x)), then fY|t (x) = fc|t (ln (x))/x and ex fy|t (ex ) = fc|t (x),
(6.14)
Thus, the probability distribution for the corrosion depth at time t is fc|t (x) =
− 12 (ex −μ∗ )2 ex √ e σ ∗2 σ 2
(6.15)
At any time t, we can estimate the extra corrosion in t time interval. Thus, we use the estimated value of corrosion at time t. C(t) be the estimated value of corrosion at time t. We know C(t ) = Let ∗ ∗ ln β0 + β1 t , We have c (t ) = β1∗ / β0∗ + β1∗t . C (t)t is the average extra corrosion that occurs at time t in t time interval. Let us assume C |t is a random variable that follows normal distribution about C (t)t as its mean and σ12 as its standard deviation. Here σ 1 can be decided using the concept of maximum likelihood estimator. Essentially, we have (c1 ,t0 ), (c2 ,2t0 ),…….,(cn ,nt0 ) and we take these values into consideration (c2 –c1 ,t0 ), (c3 –c2 ,2t0 ),……(cn –cn–1 ,(n–1)t0 ) as the values of extra corrosion taking place at time t in t0 time interval. Now, let us assume that the component, that is, the pipe in our case fails if its diameter goes below some threshold value. This threshold value can be calculated using various parameters of the pipe that can be affected due to its diameter. For instance, if we had a rod which has a certain load on it, then its threshold diameter would be the one that is obtained by calculating the yield point stress for that load assuming the reduction in diameter cannot cause failure in any other way. Thus, (x−c (t )t ) 1 −1 fC |t = √ e 2 (σ1 )2 2σ1
2
(6.16)
Here σ 1 can be decided using the concept of maximum likelihood estimator. Now, we try to maximize f(Z|σ 1 ) where Z is the probability that the data
164
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
takes the known values. Thus, we get the MLE for σ 1 by differentiating f(Z|σ 1 ) w.r.t. σ 1 and equating to 0. n−1 That is, we differentiate f c jt0 w.r.t. σ 1 and equate to 0. j=1
6.3.2.7 Failure rate/hazard rate estimation with ageing effect Hazard rate is defined as the probability that the component fails in a short interval of time given it has survived up till that moment per unit interval of time. h(t ) = limt−>0 p(T < t + t|T > t )/t
(6.17)
here T denotes the lifetime of the component. T < t + t ≡ C|t + t > Th ,
(6.18)
here, ≡ means corresponds to. T > t ≡ C|t < Th,
(6.19)
since t = t0 h(t ) ≈ p(T < t + t|T > t )/t =
p(C|t + t > Th ∩ C|t < Th) p(C|t < Th)t0 (6.20)
Let, (C|t + t ) − C|t = C |t,
(6.21)
p C |t + C|t > Th ∩ C|t < Th p(C|t + t > Th ∩ C|t < Th) = p(C|t < Th)t0 p(C|t < T h)t0 (6.22) If X = Th – C|t FX (x) = P(Th − C|t < x) = P(C|t > Th − x) = 1 − P(C|t < Th − x) = 1 − FC|t (Th − x) (6.23) Differentiate w.r.t. x, fx (x) = fc|t (Th − x), Eq. (6.22) becomes
P C |t − X > ∩ X > 0 h(t ) = P(X > 0)t0
(6.24)
Risk aggregation
165
By total probability theorem, P(F) =
N
p( F |Ei)P(Ei)
(6.25)
i=1
Ei are actually the events that may precede F. We use this theorem in its integral form. The numerator of (6.24), that is, P(C |t − X > ∩ X > 0) can be written as P(C |t > X ∩ X > 0. By summing up all the events where X is positive and C |t > X given a certain value of X, ⎛ ⎞ ∞ ∞ P C |t > X ∩ X > 0 = ⎝ f C |t (x)dx⎠ fx (y)dy (6.26) h(t ) = h(t ) =
0
∞
∞
y
0
f C |t (x)dx fx (y)dy y ∞ (y)dy fx t0
0
∞
0
∞ y
√
1
e 2kσ ∗ ∞ 0
− 12
(x−c (t )t )2 (kσ ∗ )2
dx
(6.27)
− 12 (ey −μ∗ )2 ey ∗2 dy √ e σ σ ∗ 2
− 12 (ey −μ∗ )2 ey dy √ e σ ∗2 σ ∗ 2
(6.28) Here, μ∗ |t = β0∗ + β1∗ t and σ ∗ =
n 1 ∈∗ n − 2 i=1 i
Even though the final result is in terms of an integral, these integrals can be evaluated using a statistical software. Depending on the modes of failure and varying behavior of deterioration versus time, the proposed method may be suitable for modelling ageing failure rates. 6.3.2.8 Multiple modes of component failures Case 1: Multiple modes of failure acting on the component and result in deterioration of a single parameter,say the cross-sectional area of the component. In such a case we first independently estimate their deterioration rates
166
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
(assuming independent data for each mode of failure is available) as shown above and add them to find the total deterioration rate. The total deterioration as a function of time is estimated by integrating with proper limits. Case 2: When there is a possibility of failure of a component due to several reasons such as increased brittleness, etc. We independently calculate the failure rates corresponding to each of the possible properties like brittleness, length, area, etc using the procedure described in case 1. Since the component can fail due to deterioration in any one property, we add all the failure rates corresponding to each of them that we calculate independently of each other. In this way, the proposed model can address all possible ways in which a component can fail due to ageing. 6.3.2.9 Net failure rate To obtain the total failure rate, add the base failure rate estimated as derived in Eq. (6.16) with the failure rate that accounts only for ageing, that is, h(t) in Eq. (6.28). 1 λ−L 2 (T (P−1) e(−T Pλ) ) P 1 We know f (λ|X ) ∝ λP ∗ (2πS) e− 2 ( S ) (P) If v(t) denotes the total failure rate, then v(t) = λ + h(t) F(V |t ) (x) = P(V |t < x) = P(λ + h(t ) < x) = P(λ < x − h(t )) = Fλ (x − h(t )) Taking derivative with respect to x, f (V |t ) (x) = fλ (x − h(t ))
(6.29)
This way we completely model the failure rate at time t.
6.3.3 Multi-facility site When a multi-unit site has different types of facilities such as spent fuel facility, reprocessing facility, etc. along with NPPs, the failure criteria for each of the facility will vary and risk aggregation for the site becomes difficult. Further the event progression of different hazards is different in these facilities and the integrated risk profile for combination of hazards is tedious. It is even more difficult to perform an uncertainty or sensitivity analysis for site level risk as it will only be feasible to estimate the importance of a component and its risk contribution for individual hazards for a particular type of facility. Moreover, the level of detail and scope of the individual models may be different in nature.
Risk aggregation
167
The multi-source interactions or dependencies between the different sources may vary by the group of hazards, initiating event and by the operational states. The scheme proposed by IAEA for classifying intraunit dependencies is a guideline for classifying inter-unit dependencies. To consider other sources, it is recommended to have a list of potential intersource dependencies for the site. The timing of the concurrent accident sequences is very important as they may challenge shared structures, systems and components and resources available for severe accident management.
6.3.4 Combination of hazards and uncertainties in external hazards Risk aggregation due to combination of individual hazards may lead to double counting of risks which needs to be addressed properly. When different hazard groups are integrated in a PSA model, and quantified for risk aggregation, the associated uncertainty in each hazard are required to be used to obtain appropriate aggregated results. Quantifying risk due to rare events such as seismic pose a challenge in the aggregation process due to the approximation and truncations involved.
6.3.5 Human, organizational, and technological factors It is well established that organizational and technological factors are important source of dependency between units and needs to be appropriately treated. Generally, similar procedures and associated human actions are adopted among the units at a site. Human reliability requires a special treatment from the multi-unit context. The stress levels, limitation of resources, extreme demanding situations are common challenges for a human reliability analysis (HRA) at site level. Standard procedures for HRA followed at unit level require major modification for site level analysis. Having considered all the important aspects for multi-unit or site level PSA,it is imperative to include uncertainty analysis for site level PSA because in addition to the common modeling and data uncertainties, additional uncertainties due to human dependencies, treatment of inter-unit dependencies, organizational dependencies may also significantly contribute.
6.3.6 Risk importance and sensitivity measures Like other issues, aggregation of risk importance measures and sensitivity measures of separate PSA models is also vital. It is also critical to understand the aggregation of these measures as it provides useful insights on the overall
168
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
importance of a particular component and sensitivity of the component for decision making. The risk contribution of a component to a particular hazard must be proportionately estimated based on the hazard contribution to the total CDF.
6.4 Risk aggregation and its effect on risk metric The challenges for risk aggregation are multi-fold. A single-integrated site level PSA can help to comprehensively address important accident scenarios affecting multiple sources but difficult to address the results from the available single-source PSA and put them in one model.However,if a site has multiple sources, due to large number of potential combinations of multiple sources, hazard groups and plant operating states, it may be practically difficult to integrate all the combinations in a single-integrated site level analysis. Some expert judgement is required to logically combine or group the important accident scenarios from single-source PSA models and prioritize potential effects from significant contributors from individual sources. The main focus on risk aggregation is to model the interactions between dependencies between units and sources. In risk aggregation, it is also challenging to address the uncertainties in modeling both the frequencies of external hazards and their impacts on each individual unit. The problems in risk aggregation and the factors to be addressed as described in previous sections are the sources of uncertainty in evaluating risk metric at a multi-unit site. IAEA reports TECDOC-1229 and TECDOC1200 discuss the importance of the uncertainties and the effect of risk aggregation. In addition to the existing uncertainties in component failure probabilities, parametric uncertainties, the uncertainty due to heterogeneity, combination of hazards, aging, multi-facility, etc. and make risk-informed decision making more difficult. However, by modeling all the factors with appropriate engineering judgement will result in a more realistic modeling of the effect of risk aggregation on risk metric such as CDF and LRF at a multi-unit site.
6.5 Mathematical aspects of risk aggregation When the number of units at a site is more than two, depending on the type of the hazard, risk metric estimated for the conditional internal hazards and conditional external hazards that affect a particular group of units to avoid double counting of risk. This can be a complex task for site CDF estimate.
Risk aggregation
169
With all the complexities described about risk aggregation in this chapter, the mathematical aspects in quantification of risk aggregation poses an additional challenge. When risk from different radiological hazards, sources and operating states are to be aggregated, the conceptual challenge to risk aggregation lies in the diversity of distributional properties across risk types. While the simple approach of combining the mean values is generally acceptable, from statistical point of view, when the behaviour of different risk contributors departs from normality, simple measures of dispersion such as the standard deviation fail to provide a complete description of risk. To overcome this, more sophisticated approaches such as the method of copulas and simulations allow to incorporate realistic features of the marginal distributions while preserving the dependence structure for a realistic risk aggregation. In view of the uncertainties involved in various contributions such as hazard frequencies, simulation studies and sensitivity analysis are recommended.
6.6 Interpretation of results As in single-unit PSA, interpretation of results for multi-unit PSA is an important step and several factors have to be elaborated while documenting the results. The premise in shortlisting MU initiating events, the approximations and models adopted for human reliability estimation, assumptions and simplifications, logic behind combination of multi-unit IEs and accident sequences, the procedure for intra and inter unit CCF and site-specific hazards considered have to be clearly described. The documentation also must include the sensitivity analysis and the contribution of internal and external hazards to the risk metric so that enhancements, if any, to improve safety can be recommended. Some of the expected insights from multi-unit PSA include the following: - areas of improvement in interactions and shared systems among various units - improvement in site emergency operating procedures, crew availability during accidents involving more than one unit at a site - measures to reduce intra-unit CCF, safety improvement in proximity issues
6.7 Risk aggregation for risk-informed decisions As stated in INSAG-25, quantitative and qualitative aspects are equally important and need to be considered holistically; which is particularly
170
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
important when considering risk aggregation issues in the integrated risk informed decision making. A similar process is provided in Electric Power Research Institute (EPRI) that addresses the various issues associated with risk aggregation in the context of risk informed decision making. The approach to aggregation proposed by EPRI involves five different tasks and summarised here. The first task is to define the application and the role of PSA in the aggregation process. The risk metrics are identified in this task. Task 2 evaluates the baseline best-estimate model to characterise the important risk contributors. Task 3 evaluates the required risk metrics characterized in the previous task and the cause-effect relationship of any proposed change in the plant. Task 4 identifies the sources of uncertainty followed by sensitivity analysis. Task 5 documents the conclusions for integrated decision making. Internationally, as there are more reactors being built at a site and nonreactor facilities being installed close to the reactors, there is a need to address multi-source interactions and their contributions for risk aggregation to assess the site risk. From regulation point of view, site risk presents challenges in licensing and operation management and therefore there is a need for continuous development and improvements in the integrated risk assessment methodologies. The systematic approach recommended in EPRI for risk aggregation for risk informed decision making is applicable for multi-unit sites also. Another important aspect to be focussed is that while using a multiunit PSA model to support risk informed decision, it is essential that the risk from all contributors is addressed. Due to the nature of inputs used for the evaluation of risk metrics, the resulting numerical estimate must be understood as an indicator instead of using it as a precise measurement. Furthermore, relying exclusively on the numerical results by adding the contributions to risk metrics might lead to inappropriate decisions.
Further readings [1] International Atomic Energy Agency, Regulatory review of PSA Level-2, IAEATECDOC-1229, IAEA, Vienna, 2001. [2] International Atomic Energy Agency, Applications of probabilistic safety assessment (PSA) for nuclear power plants, IAEA-TECDOC-1200, IAEA, Vienna, 2001. [3] An approach to risk aggregation for risk-informed decision making, EPRI Technical Report 3002003116, 2015. https://www.epri.com/research/products/ 000000003002003116. [4] A framework for using risk insights in integrated risk-informed decision-making, EPRI Technical Report No. 3002014783, 2019. https://www.epri.com/research/products/ 000000003002014783.
Risk aggregation
171
[5] NEA/CSNI/R(2019)16, 2019. https://www.oecd-nea.org/jcms/pl_58495/status-ofsite-level-including-multi-unit-probabilistic-safety-assessment-developments. [6] International Atomic Energy Agency, A framework for an integrated risk informed decision making process, INSAG-25, IAEA, Vienna, 2011. [7] Tasneem Bani-Mustafa. Addressing multi-hazards risk aggregation for nuclear power plants:PSA level of maturity model building [internship report] ENSTA-ParisTech,EDF R&D. 2015. [8] International Atomic Energy Agency, Ageing management for nuclear power plants: international generic ageing lessons learned (IGALL), Safety Report No. 82 (Rev.1), IAEA, Vienna (2020). ˇ [9] Kanˇcev, Duško & Cepin, Marko. (2023). Ageing within PSA: Development of an analytical unavailability model and its application. [10] D. Kancev, M. Cepin, Evaluation of risk and cost using an age-dependent unavailability modelling of test and maintenance for standby components, J. Loss Prev. Process Indust. 24 (2011) 146–155. [11] M. Beynon, D. Cosker, D. Marshall, An expert system for multi-criteria decision making using Dempster 2 Shafer theory, Expert Syst. Appl. 20 (4) (2001) 357–367. [12] S. Zhang, J. Tong, J. Zhao, An integrated modeling approach for event sequence development in multi-unit probabilistic risk assessment, Reliab. Eng. Syst. Saf. 155 (2016) 147–159. [13] William E. Vesely, Risk evaluations of aging phenomena: the linear aging reliability model and its extensions, NUREG/CR-4769, EGG-2476, April 1987. [14] Andrei Rodionov, Dana Kelly, Guidelines for Analysis of Data Related to Ageing of Nuclear Power Plant Components and Systems, JRC Scientific and Technical Report, European Commission, 2009. [15] Wasserman, Larry. All of Statistics: A Concise Course in Statistical Inference. Ukraine, Springer New York, 2013. [16] Introduction to probability and statistics by Sheldon N Ross. [17] R. Alzbutas, T. Iešmantas, Application of Bayesian Methods for Age-Dependent Reliability Analysis, Qual. Reliab. Eng. Int. 30 (2014) 121–132. https://doi.org/ 10.1002/qre.1482. [18] X. Wang, N. Balakrishnan, B. Guo, Residual life estimation based on a generalized Wiener degradation process, Reliab. Eng. Syst. Saf. 124 (2014). [19] N.Z. Gebraeel, LiR LawleyMA, J.K. Ryan, Residual-life distributions from component degradation signals: a Bayesian approach, IIETrans 37 (6) (2005) 543–557. [20] S. Poghosyan, A. Gilbertson, F. Ferrante, N. Sui, R. Alzbutas, T. Siklossy, IAEA project on aggregation of various risk contributors for nuclear facilities, in: Proceedings of the 29th European Safety and Reliability Conference, Hannover, Germany, 2019. [21] O. Coman, S. Poghosyan, IAEA Project: multiunit probabilistic safety assessment, in: Proceedings of the Probabilistic Safety Assessment and Management (PSAM14) Conference, Los Angeles, CA, 2018. [22] International Atomic Energy Agency, Safety of nuclear power plants: design, IAEA Safety Standards Series No. SSR-2/1 (Rev. 1), IAEA, Vienna, 2016. [23] International Atomic Energy Agency, Safety assessment for facilities and activities, IAEA Safety Standards Series No. GSR Part 4 (Rev. 1), IAEA, Vienna, 2016. [24] D. Kim, J.H. Park, H.G. Lim, A pragmatic approach to modelling common cause failures in multi-unit PSA for nuclear power plant sites with a large number of units, Reliab. Eng. Syst. Saf. 195 (2020) 106739. [25] D.S. Kim, J.H. Park, H.G. Lim, Approach to inter-unit common cause failure modelling for multi-unit PSA, International Workshop on Status of Site Level PSA (Including Multi-Unit PSA) Developments OECD/NEA/GRS, Munich, Germany, July 18–20, 2018.
172
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
[26] H. Jeon, K. Oh, J. Park, Preliminary modelling approach for multi-unit PSA in Korea (utility side), OECD/NEA International Workshop on Status of Site Level PSA (including Multi-Unit PSA) Developments, Munich, Germany, July 18–20, 2018. [27] H. Jeon, K. Oh, K. Bahng, Evaluation plan for multi-unit PSA considering operating modes in a reference site,in:Transactions of the Korean Nuclear Society Spring Meeting, Jeju, Korea, May 17-19, 2017. [28] Canadian Nuclear Safety Commission, Probabilistic safety assessment (PSA) for nuclear power plants, Regulatory Document REGDOC-2.4.2, Ottawa, Canada (2014). [29] Nuclear Regulatory Commission, Procedure for analysis of common-cause failures in probabilistic safety analysis, NUREG/CR-5801 (SAND91-7087), US NRC, Washington DC, 1993. [30] Practical guidance on the use of PRA in risk-informed applications with a focus on the treatment of uncertainty, EPRI 1026511, Palo Alto, CA: 2012. [31] International Atomic Energy Agency, Risk aggregation for nuclear installations, IAEATECDOC-1983, IAEA, Vienna, 2021.
CHAPTER 7
Human reliability Contents 7.1 7.2 7.3 7.4
Introduction Types of human errors Human error in nuclear power plants Human reliability models 7.4.1 Technique for human error rate prediction 7.4.2 Accident sequence evaluation program 7.4.3 Success likelihood index methodology 7.4.4 Human cognitive reliability model 7.4.5 Standardized plant analysis risk-HRA model 7.5 HRA generations 7.5.1 First-generation HRA models 7.5.2 Second-generation HRA models 7.5.3 Third-generation HRA models 7.5.4 Cognitive architecture models 7.6 Human cognitive architecture 7.7 HRA in the context of multiunit PSA Further readings
173 175 177 179 181 183 184 187 189 190 190 191 192 193 195 196 197
7.1 Introduction Safety critical applications such as nuclear, aviation, chemical are complex sociotechnological system which requires a high level of reliability to ensure a safe and hazard-free environment. The reliability of such systems depends on the failure-free performance of digital displays, control systems, and hardware. In nuclear industry, with the rapid advancement in scientific technology, there has been a significant improvement in the reliability of all these devices used in the nuclear power plants (NPPs). However, the human element plays an important role in the effective control and monitoring of critical operations in the NPPs. Several authors [12,13] reported that human error (HE) is found to be one of the prime contributors to major nuclear accidents. A HE is characterized as a divergence between the realized action and the action that should have been taken. NPP’s control room operators are a group of highly trained professionals for monitoring system functions and maintaining the safe operations. Specific tasks in the control Reliability and Probabilistic Safety Assessment in Multi-Unit Nuclear Power Plants. DOI: https://doi.org/10.1016/B978-0-12-819392-1.00011-4
Copyright © 2023 Elsevier Inc. All rights reserved.
173
174
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Figure 7.1 Statistical trend of causes for nuclear accidents [15].
room often require multitask performance and complex cognitive activities, as well as physical activities. These operators track numerous signs of safetycritical systems and may encounter critical decision-making situations under time pressure and extremely stressful conditions. Any error by the operators in control room will have a catastrophic consequence and therefore, it is necessary to carry out a comprehensive research on the impact of human factors on nuclear safety. There is an overwhelming desire to establish a coherent strategy to minimize the frequency of HEs in the NPPs control room.Accident evaluation methods and human reliability assessment (HRA) methodologies were applied in such complex systems, but can be both complicated and time-consuming [14]. HRA needs to be embedded with various disciplines including human factors, physiology, behavioral science, etc. Recent HRA approaches highlight basic cognitive processes and causes of HE in terms of psychology, cognition, and neuroscience, allow a more logical and convincing qualitative analysis and provide a more complex framework for quantitative analysis. According to Korea Institute of Nuclear Studies (KINS), the trend of causes for nuclear accidents is shown in Fig. 7.1. Increasing system complexity, automation, the emergence of advanced control techniques, reducing the number of operators without attempting to increase their mental ability to handle unusual circumstances are among the primary causes of HE in nuclear industries. While the contribution of HE is decreasing by introducing a number of trade improvements, such as sophisticated control rooms, automation, state-of-the-art equipment, etc. in nuclear power, there remains an apparent proportion of HE to be understood. A recent example is Japan’s 2011 Fukushima disaster. It motivates researchers to investigate the human components in these sectors. Operator errors viz.
Human reliability
175
slips, lapses, mistakes, violations in a nuclear plant are also often cited as reasons for reactor trips, plant shutdown, and various incidents/accidents. The crucial human factor in nuclear operations remains a vital domain because the human element, which is prone to error, is an important part of the system. In the light of the trend toward nuclear safety, development, and operational methods are improving, through which HE is minimized in nuclear operations. The first step to reducing HEs is the identification of the critical influencing factor in operations that triggers the HE. A fundamental problem when modeling HE is that it is the same human perceptual-cognitive motor system that produces each action, whether or not it is right. Therefore, to effectively model HE, a relatively complete model of the whole perceptive, cognitive and motor systems is required. Several efforts have been made in the last decades to develop integrated cognition concepts and to build many significant cognitive architectures such as adaptive control of thought-rational, queuing network-adaptive control of thought rational, etc. The evolution of these architectures bears witness to the development of HCA. In addition, these models have been used extensively to model many areas of human performance and are widely accepted as a computational cognitive architecture, and have in some cases been applied to modeling HEs. Nonetheless, these include several methods for error modeling, and it is unlikely that all components necessary to complete error modeling will be usable.
7.2 Types of human errors HE is defined as the difference between the action perceived and the action to be taken [16]. According to Reason [17] HE is a “discrepancy between the human action taken or omitted and that intended.” It is a “discrepancy between a computed, observed or measured value or condition and the true, specified or theoretically correct value or condition”—referred to as the failure; “loss of ability to perform as required”—the root cause of the failure; “set of circumstances that leads to failure”—referred to as the failure mechanism: the process that leads to failure. HE is classified in different ways. Swain and Guttmann [19] described five kinds of error depending on the behaviors of omission and commission and is shown in Table 7.1. Rasmussen [22] classified HE as three distinct psychological dimensions viz. knowledge-based (KB), rule-based (RB), and skill-based (SB). Slips and errors are commonly referred to as SBs. Skills are learned habits that they regularly do with little knowledge. Generally “error” comes under RB and KB. Occasionally, routine and exceptional violations occur in NPP actions.
176
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Table 7.1 Types of human error. Omission
Portion of a task is skipped
Commission Extraneous act
Task is performed incorrectly Task that should not have been performed, as it diverts attention from the system Task is performed out of sequence Task performed too early, too late, or outside of time allowed
Sequential act Time error
Figure 7.2 Rasmussen’s human error taxonomy. Table 7.2 Human error classification. Slips
Automatic behavior
Description errors
Performing correct action on wrong object triggering incorrect action Internal thought/association triggers incorrect action
Associative activation errors Loss of activation errors Mode errors Mistakes
Forgetting to do something Changed meaning in the same context leads to incorrect action Conscious deliberation
The HE taxonomy as presented by Rasmussen in relation to human behavior is shown in Fig. 7.2. According to Norman [23], slips are errors caused by negligence, whereas mistakes are said to be deliberately examined.It was stated that slips are errors caused by lack of attention; whereas, mistakes are errors due to conscious deliberation. Descriptions about these errors are summarized in Table 7.2. Reason [17] gave major distinction between error and violation. Unintended deviations from expected acts, objectives, and wrong actions stem from lack of knowledge. Violations represent a more deliberate noncompliance motive, such as deliberately failing to follow procedures. Rasmussen model of skill-rule-knowledge is shown in Table 7.3.
Human reliability
177
Table 7.3 Human errors based on Rasmussen’s skill-rule-knowledge model. Behavior
Error type
Cognitive stage
Knowledge Rule
Mistakes Lapses
Planning Storage
Skill
Slips
Execution
Failure modes
Failure of expertise, lack of expertise Failure of good rule application or knowledge Failure of skill by inattention or over attention
According to Endsley (2000) [61], violations occur where there is inadequate awareness of the situation. Violations fall into three main groups, “routine violations” that involve cutting corners whenever such possibilities occur. “Optimizing violations” or acts for more personal purposes, rather than work-related goals. And “Situation violation” seems to provide the only way to do the job and when the rules and procedures for the current situation are deemed inappropriate. Shorrock [24] describes an error as “psycho or physical activity of individuals who fail to fulfill their purpose” whereas violations are a deliberate disregard of rules and regulations.
7.3 Human error in nuclear power plants In NPPs, HEs are categorized into three: preinitiators, initiators, and postinitiators.Preinitiators are also called latent errors that result in latent conditions in the system that may cause an accident. They are present within the system in an unnoticed condition before the onset of an error. Some examples are during assembly, maintenance or configuration management such as component left in a wrong state or incorrectly assembled. On a world-scale, the significant NPP incidents/accidents reported to be caused by HE is about 70–90%. The US Nuclear Regulatory Commission (USNRC) and International Atomic Energy Agency (IAEA) studies stated that 50% incidents are caused by HE [25]. HE in an NPP is treated as a product of observed behavioral conduct, which has been assessed against certain performance standards and has been triggered by an incident where it is necessary to act in a way that is considered appropriate.It can be considered as the leading contributor to the risk of NPP accidents through event assessment [26]. Human beings make mistakes for various reasons. Some of the explanations were human, for example, that the operator was not patient enough because of previous night’s inadequate sleep and certain external factors, for example, controls of devices cannot be reached quickly or efficiently. The best way to observe, recognize, measure, and quantify human
178
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Figure 7.3 Human errors in NPPs.
contribution to security in NPPs is to use the methods of human reliability analysis (HRA). In order to reduce the risk of failure, it is necessary to evaluate any HE which happens in an operating facility, evaluate its root cause and introduce appropriate development and functional improvements. Different types of HEs in an NPP as given in Sharma [27] are shown in Fig. 7.3. HEs in NPPs fall under two broad categories such as external error mode and internal error. Internal errors include stress; memory, attention etc. indirectly influence operator output, while external errors are directly influenced. Internal error does not affect a work environment explicitly. Research was carried out based on reports and documents published by regulatory and directive international institutions of nuclear sectors, as the IAEA, the USNRC, the Institute of Nuclear Power Operations (INPO), etc. A study by the INPO showed that the underlying causes of accidents in the nuclear industry covered human factors such as cognitive (43%), psychological (18%), physiological (16%), mental (10%), other (2%). At least 77% of the underlying causes of accidents were caused by people, only a small proportion of the underlying causes were actually initiated by
Human reliability
179
front-line personnel (i.e., failure to follow procedures) and most originated in either operation-related activities or in bad decisions taken within the organizational and managerial domains. In a research on human and organizational factors associated with Korea nuclear safety, numerous human factors such as cognitive failures scheduling and planning, miscommunication over directions, and the abundance of training and psychological factors were found as the causes in the unintended reactor trip events analysis.
7.4 Human reliability models HRA refers to the study of HE and focuses on estimating human error probability (HEP). HRA plays an important role in the reliability analysis of man–machine. The start of HRA methodology was in 1960, but most of the human factor analysis strategies were established since the mid-80s toward the analysis of HEs contributing to the system failure and the associated risk. It is a statistical device with a qualitative and quantitative component. The analytical aspect of HRA is primarily the detection of HE capacity. The nuclear industry was the first to establish and implement HRA for itself. Qualitative HRA is focused on identification and modeling of the human failure events (HFEs) and employ some for task analysis to identify potential HEs. Quantitative HRA is focused on producing HEP. Qualitative and quantitative HRA are complimentary to each other and qualitative HRA supports detailed quantification. HRA focuses on the detection and quantification of HEs probabilities. The statistical aspect of the HRA involves the calculation of HEP of possible errors based on time. This tries to further explore the human factor in the workplace. The findings of HRA as a contribution to probabilistic safety assessment (PSA) evaluating the performance of whole systems by decomposition into their components, including hardware, software and human operators. HEP is a measure of the likelihood that an individual (plant personnel) will fail to initiate the correct, required action or respond in a given situation, or by commission perform the wrong action. The HEP is the probability of the HFE. HFE is a basic event that represents a failure or unavailability of a component, system, or function that is caused by inappropriate human action. Under PSA, HEs in the activity of such plants typically represent a significant share of potential public risks. It is therefore important that PSA research would provide HRAs which adequately represent the potential impacts of human activities in both normal and emergency situations. There have been many HRA models available to estimate HEP.
180
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Table 7.4 Human reliability analysis models used in NPPs. Model name
Acronym
Authors
Technique for human error rate prediction Maintenance personal performance simulations Paired comparisons Systematic human action reliability procedure Human cognitive reliability Human error assessment and reduction technique Success likelihood index methodology–multiattribute utility decomposition Accident sequence evaluation program Cognitive environment simulation Absolute probability judgment Cognitive reliability and error analysis method A technique for human error analysis
THERP
[19]
MAPPS
Siegel et al., 1984 [49]
PC SHARP
Comer et al., 1984 [50] [47]
HCR HEART
[29] Williams, 1985 [51]
SLIMMAUD
Embrey, 1986 [52]
ASEP CES APJ CREAM
Swain, 1987 [53] Woods et al., 1987 [54] Kirwan, 1994 [55] Hollnagel, 1998 [56]
Nuclear action reliability assessment Simplified plant analysis risk-human reliability assessment Systematic human error reduction and prediction approach
ATHEANA NUREG-1624, 2000 [57] NARA Kirwan et al., 2005 [58] SPAR-H Gertman et al., 2005 [59] SHERPA
Embrey, 1986 [60]
Numerous research works have been carried out to identify contextspecific core HRA models in the most security-critical application. For example, in nuclear industry, Oliveira et al. [28] analyzed more than 200 research articles, describing 99 HRA techniques. The authors omitted those less likely to have functional use in NPPs, in compliance with clear guidelines. The authors also excluded all methods used primarily for retrospective analysis, which were actually only theoretically described and not validated in practice. Of which, 15 key HRA methods with practical use in the NPP or well-established elsewhere are shortlisted and shown in Table 7.4. HRA was started as a quantitative technique following the PSA principles and treated HE similar to the technical component failure, but the realistic validity of HRA estimates began to be questioned only in the 1990s. With the accumulated experience gained in risk assessment and considering the range of HRA applications, more research studies were conducted. Some of the common HRA models used in PSA are briefly explained.
Human reliability
181
7.4.1 Technique for human error rate prediction Technique for human error rate prediction (THERP) evolved when there was a need to quantify HE in nuclear weapon systems in order to assess overall system reliability. Quantification of human reliability was considered as measure to control a number of design-related decisions that could impact human performance. So, the challenge was to develop a scheme for quantifying human reliability that was sufficient to characterize the complexity surrounding human task performance. Though THERP was influenced for NPP operation, the technique was applicable to a wide variety of industrial applications including process control, maintenance and manufacturing operations. The main objective of THERP is to construct the human performance in the form of an HRA event tree and use the output of event tree in the system reliability estimation. This was expected to serve as a design trade-off by evaluating the HE contribution to component reliability. The various steps involved in THERP are: r Identify HE-related events and the human task associated by reviewing the plant information and functional requirement of the system. r Perform a qualitative assessment by task analysis. This is done by decomposing the task into subtask and by determining the boundary conditions such as time, skill level required, alarms/signals and recovery factors, under which the task is to be performed.
182 r
r
r
r
r
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
The relevant human tasks are modeled into an event tree with one branch indicating correct action and the other branch indicating incorrect action. For each of the incorrect human action, assign nominal HEP. These probabilities represent the median of log-normal distributions, implying that if one could observe a human to perform a particular activity a large number of times, the distribution of HEPs across many such actions would be log-normal. With each nominal HEP, an upper and lower uncertainty bounds are attached. The square root of the ratio of upper to the lower bound defines the error factor. In NPP operations, large error factors are generally assumed indicating variance involved in assigning nominal HEPs in addition to the variance involved with difference in performance. The source for deciding the uncertainty bounds are the data from NPP simulator experiments, data from operating experiences in NPPs/other industrial applications, process industries and expert judgment. These nominal HEPs are then modified through a series of refinements that capture the complexity of the effects associated with human performance for each task. Factors such as training and experience, procedures and administrative controls, ergonomics and human machine interaction, time available, task complexity, workload, stress, environment, etc. These factors are termed as performance shaping factors (PSFs) as they influence the human performance. PSFs can be of divided into two categories: internal and external PSFs. Human attributes such as skills, abilities, attitudes, that operate within an individual are internal PSFs. It is not always easy to measure the internal PSFs as they may vary from individual to individual. External PSFs are aspects of situations, tasks, and depend on characteristic of the component that influence the performance. Based on PSFs, the nominal HEPS are modified into basic HEPs. The dependencies or the relationship between task elements are identified. Five levels of dependencies are considered: zero dependency (ZD), low dependency (LD), medium dependency (MD), high dependency (HD), and complete dependency (CD). The dependency model is nonlinear and considers only positive dependence whereby failure on a task element increases failure probability and success on a task element decreases the probability of success associated with the subsequent task elements. Assume the BHEP for some task element A is 10−3 ,
Human reliability
r
r
r
183
and assume HD between task elements A and B. The conditional HEP (CHEP) of A given incorrect performance on B would then be (1 + BHEP)/2 ∼0.50. Thus,success and failure probabilities for the entire task can be computed by propagating the CHEPs associated with each branch of the event tree. Various approaches can be taken depending on the purpose of the HRA. The point estimates from each branch are propagated through the event tree to arrive at the probability of failure of the entire task. From reliability point of view, best case and worst case analyses can also be performed by propagating the uncertainty bounds. Then, the effects of recovery factors are modeled. Recovery factors include factors such as presence of annunciations, human redundancy, etc. When there are two operators in a nuclear control room and the nominal HEP for human action is 0.001. This value is modified by the PSF of stress to 0.02 as it is a task by an experienced staff. Then the BHEP is modified by the recovery factor due to human redundancy with HD between the two personnel. The new BHEP is estimated as [0.02((1 + 0.02)/2)] = 0.0102. Finally, a sensitivity analysis will identify the most probable errors on the event tree and determine the degree to which design modifications associated with the task will help in reducing HEs.
7.4.2 Accident sequence evaluation program ASEP is a simplified version of THERP, which is developed for USNRC. This method also calculates HEP for pre- and postinitiators, as same as THERP. The use of THERP has some initial challenges, as it does not provide specific guidelines for how to handle a wider set of PSFs. ASEP provides a fixed set of PSFs and is made to assist HRA practitioners at a reasonable cost, with minimum support and guidance from HRA experts. The HEPs assigned through the use of ASEP are intended to be more conservative than those achieved through the use of THERP. 7.4.2.1 ASEP algorithm Step 1: List out all the critical actions required during an abnormal event and check whether each critical action is reported in the emergency operating procedure. If a critical action is not reported, then let HEP = 1, else go to step 2.
184
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Step 2: Estimate the maximum permissible time (Tm) to accurately diagnose and complete the human action of an abnormal event (Thermal Hydraulics/Expert Judgment/talk through with operators would facilitate estimating Tm). Step 3: Identify further actions to successfully cope with the abnormal event, once correct diagnosis is made. Step 4: For the post—diagnosis action to be performed in control room area, calculate the time required to complete the action. Step 5: For travel time and manipulation time performed outside the control room area, use walk through to estimate the time required to get to the appropriate locations to perform action. Double the time required if it is obtained from operating personnel. Step 6: Calculate the available time Ta by adding the time obtained from step 4 and step 5. Step 7: Estimate the time required for diagnosis, Td = Tm − Ta. Step 8: Compute “diagnosis HEP” using THERP Handbook [19] for the estimated Td value. Step 9: Calculate the “action HEP” for control room actions using the THERP Handbook [19]. Calculate the HEP for recovery factor based on dependency/task level/stress level with upper bound or lower bound adjustment for all postactions. Step 10: Compute the HEP by adding diagnosis and action HEPs. Step 11: Calculate the task failure probability without formal dependence Pw/od , by adding (HEP)d and (HEP)a . Step 12: Calculate the task failure probability with formal dependence Pw/d, by adding (HEP)d and (HEP)a based on the five-level dependence models given in THERP Data book. Step 13: If the study did not consider formal dependence the actual HEP is Pw/od , else Pw/d .
7.4.3 Success likelihood index methodology One of the major problems in HRA is the difficulty in obtaining HE data. In such situations, success likelihood index method (SLIM) is a HRA technique widely used in PSA. The underlying principle of SLIM is derived from the common sense approach that the likelihood of successful human action is a function of various characteristics of the individual, the gravity of the situation, and the PSFs available. The probability a human will
Human reliability
185
carry out a particular task successfully depends on the combined effect of a number of PSFs and that these PSFs can be identified and evaluated through expert judgment. As this approach heavily depend on expert judgment for determining the model parameters and in assigning weightage to PSFs, epistemic uncertainty is inherent in SLIM. The human reliability estimation using SLIM was implemented through the use of an interactive computer program called MAUD—multiattribute utility decomposition. MAUD is used in social decision-making applications to assist experts in making choices between available alternatives. When there are alternatives available for a selection of a NPP site, the relevant attributes such as seismic considerations, accessibility, availability of heat sink source, etc. These factors are given weightage based on their relative importance and each site is rated numerically. The product of the importance weights and the ratings for each factor are summed for each site to decide the best suitable site for construction of NPP. A similar approach is adopted in SLIM for HRA. It attempts to evaluate the alternatives and select a human action depending on success probabilities assigned by expert judgment for each alternative action. The likelihood of success for each human action under consideration is determined by summing the product of the weights and ratings for each PSF, resulting in success likelihood index (SLI) that represent a scale of likelihood of success analogous to the scaling of alternatives achieved in MAUD through the derivation of expected utilities. When a human action is vital during an emergency scenario, the SLIs for each action is used to determine which actions are least or most likely to succeed. However, for PSAs, SLIM converts SLIs to HEPs. The various steps involved in SLIM are as follows: - Identify the group of experts, all possible modes of HE for each action and the most relevant set of PSFs. The selection of appropriate experts is essential as it is those experts who play a vital role for determining SLIs. It is preferred to select experts with sufficient operational experience and with a range of expertise and are specialists in modeling human factors. The identification of all possible error modes must be through a detailed analysis and discussions that could include task analyses and review of documentation concerning emergency operating procedures. The aim is to arrive at a minimal set of PSFs that are considered to be most relevant in governing human behavior for the specific tasks and scenarios within the plant in which these actions take place. This is done by consensus, beginning with a predefined set of
186
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
PSFs that the judges are encouraged to modify in light of their own experience. - Relative importance weights for the PSFs are derived by asking each judge to assign a weight of 100 to the most important PSF, and then assign weights to the remaining PSFs as a ratio of the one assigned the value of 100. Discussion concerning these weightings is encouraged in order to arrive at consensus weights, which can potentially minimize the variability resulting from taking the mean or other aggregation procedures. Normalized weights are then obtained by dividing each individual weight by the sum of the weights for all the PSFs. - Ratings of PSFs is done by experts by assigning a value to each PSF on an equal interval scale with the lowest scale value indicating that the PSF is as good as it is likely to be in terms of promoting successful task performance. It is desirable to provide examples on the scales to ensure that all experts have an equal understanding of what constitutes the start and endpoints of the scale. The product of the PSF weights and ratings form the SLI values and the range of possible SLI values is dictated by the ranges of values associated with the rating scale. SLI for task j is the product of normalized weight for the ith PSF (Wi ) and rating of the jth task on the ith PSF (Rij ). SLI j = Wi ∗ Ri j - As with the procedure for deriving weights, the individual ratings should be discussed in order to arrive at consensus ratings. - For each task, SLIs are computed by summing the product of the normalized weights with the ratings for each PSF. An estimate of the HEP is derives as the linear logarithmic equation using the relationship Log p(success) = a × SLI + b provided that at least two tasks are available for which the probabilities of success are known so that the constants a and b can be evaluated. For rare event scenarios, sufficient data may not be available to elicit SLI values. To meet this complex situation, experts are required to make absolute probability judgments on the best and worst cases for the scenario That is, all PSFs are as good as they can be in a real plant or as bad as they can be. SLI values are assigned 100 and 0 for good and bad PSFs, respectively. The SLI computed for a given task is then used to interpolate between the lower and upper bound probabilities in the
Human reliability
187
HEP estimation as H EP = (LB)SLI/100 (U B)(1−SLI )/100 where LB represents the experts probability of failure under best conditions and UP represents the experts probability of failure under worst conditions.
7.4.4 Human cognitive reliability model Human cognitive reliability (HCR) is developed by the Electric Power Research Institute, and estimates nonresponse probabilities as diagnosis HEPs for post-initiators. It uses time response curves [47], which are based on the simulator data from the main control room of a full-scale NPP. The objective of HCR is to analyze the likelihood of error response and no response to quantify that the team cannot respond correctly to the anomalous signal in the time given. The likelihood of failure of each type t of behavior depends on the ratio T0.5 , where t is the time which allows the operator to respond and T0.5 is the time the operator executes. The HCR model provides time-dependent error rate estimates for the nonresponse probability for a given time window, t. Time reliability correlations are used, as the time for action is the most important variable. Cognitive level of human actions viz., skill, rule, and knowledge also need to be taken into account. A major assumption of this model is that the normalized response times for all tasks of a given type (skill, rule, or knowledge based) follow a single distribution (Weibull). It is also reported that beyond a certain critical time, nonresponse probabilities depend only very weakly on the amount of time available to act. The nonresponse probability is calculated using the Weibull function as, P(t ) = e
−
(t/T0.5 )−B
C
A
where t represents the available time to diagnose and execute the task. T0.5 = average time necessary to choose and execute appropriate action. A, B, and C are coefficients representing the type of operator action and his behavior such as skill, rule or knowledge. T0.5 = T’0.5 ∗ (1 + k1 ) ∗ (1 + k2 ) ∗ (1 + k3 ). Here T’0.5 is the average time under standard conditions and k1 , k2 , and k3 are the coefficients for operators’ abilities, operators’ nervousness and efficiency of human– machine interaction, respectively.
188
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
The values of the parameters A, B, and C and the coefficients of operator behavior k1 , k2 , and k3 as derived using simulator data [46] are given in Tables 7.5 and 7.6: Table 7.5 Coefficient for performance shaping factors. Performance shaping factors
Coefficient
Operator experience (K1 ) Expert, well trained Average, knowledge training Novice, minimum training Stress level (K2 ) Situation of grave emergency Situation of potential emergency Active, no emergency Low activity, low vigilance Quality of operator/plant interface (K3 ) Excellent Good Fair Poor Extremely poor
−0.22 0.00 0.44 0.44 0.28 0.00 0.28 −0.22 0.00 0.44 0.78 0.92
Table 7.6 Cognitive correlation coefficients. Cognitive processing level
Ai
Bi
Ci
Skill Rule Knowledge
0.407 0.601 0.791
0.7 0.6 0.5
1.2 0.9 0.8
7.4.4.1 HCR algorithm The stepwise procedure of proposed HCR model is presented below: Step 1: Consider the situation which needs a HRA. Identify whether the situation is skill based (SB), rule based (RB), and knowledge based (KB) for the human action. Step 2: Estimate the time available (t) to complete the action. Step 3: Obtain the coefficients of the PSFs viz., K1 , K2 , and K3 from Table 7.5 for the respective PSF (K1 —stress level, K2 —operator experience, and K3 —quality of operator) selected for human action. Step 4: Obtain correlation coefficients considered for a human action Ai , Bi , and Ci from Table 7.6 for the selected SB, RB, and KB actions. Step 5: Estimate median time response T0.5 = (1 + K1 ) (1 + K2 ) (1 + K3 ) ∗ T½.
Human reliability
189
Step 6: Compute HEP using the nonresponse probability equation P(t ) = e
−
(t/T0.5 )−B
C
A
where P (t) = HEP for a given system time window. t = time available for the crew to complete a given action. T0.5 = estimated median time taken by the crew to complete an action. Ai , Bi , Ci = cognitive processing level associated with SB, RB, and KB information. HCR is simple, easy, and quick technique and is suitable for diagnostic task that involve a time constraint. In the absence of a time constraint, HCR is not recommended due to high variability in results. While HCR calculates the nonresponse probability, it does not consider misdiagnosis or violations. HEP estimation using HCR depends on the estimates for cognitive parameters and PSF coefficients. It does not provide a systematic method to identify aspects of human performance for diagnosis/action.
7.4.5 Standardized plant analysis risk-HRA model Standardized plant analysis risk-HRA (SPAR-H) is a quantification technique for addressing pre- and postinitiators, and was developed for the USNRC. As an easy-to-use method, SPAR-H has been widely used by both industry and regulators in its intended area of use, as well as in other industries. In addition, SPAR-H assumes a beta distribution, which can mimic a log-normal distribution, while most of the other HRA methods consider error factors with log-normal distributions for addressing uncertainty. SPAR-H uses the same approach for calculating both diagnosis and execution HEPs. It assumes fixed nominal HEPs for diagnosis (i.e., 1.0E−02) and execution (i.e., 1.0E−03), then multiplies the PSF influences associated with the value of corresponding PSF levels. 7.4.5.1 SPAR-H algorithm Step 1: Identify the initiating event to be analyzed and segregate the human factor events (HFE) into diagnosis HFE and action HFE. Step 2: Assign nominal human error probability (NHEP) value 0.01 and 0.001 for diagnosis HFE and Action HFE, respectively. Step 3: Obtain PSF value for each of the eight PSFs from SPAR-H framework reported in the SPAR-H Handbook.
190
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Step 4: Compute PSFcomposite =
8
PSF
i=1
Step 5: Estimate total HEP for diagnosis HEP, (H EP)d = NH EP ∗ PSFcomposite Step 6: Check whether the number of negative multipliers (the assigned PSF value that is more than one, is referred to as negative multiplier) is greater than 3. If yes, go to step 7 else step 8. NH EP.PSF Step 7: Re-compute the (HEP)d = NH EP.(PSF composite composite −1 )+1 Step 8: Compute action HEP, namely (HEP)a by following the same procedure mentioned in 3–7 (Tables 7.2 and 7.3 in Gertman et al., 2005 [59]). Step 9: Calculate the task failure probability without formal dependence Pw/od , by adding (HEP)d and (HEP)a . Step 10: Calculate the task failure probability with formal dependence Pw/d , by adding (HEP)d and (HEP)a based on the five-level dependence models given in SPAR-H Data book. Step 11: If the study did not consider formal dependence the actual HEP is Pw/od , else Pw/d .
7.5 HRA generations HRA techniques can be classified into the first, second, and third generation. The first generation of HRA is to evaluate the HEP, which mainly includes both preprocessed tools and expert judgment approaches. SLIM attempts to decompose the PSFs based on expert judgment.
7.5.1 First-generation HRA models All the tasks are broken down and analyzed for possible errors considering factors such as time, pressure, and equipment design. Many of these approaches such as, THERP, ASEP, and HCR—the basic assumption is that because humans have natural deficiencies, humans logically fail to perform tasks, just as do mechanical or electrical component. Other models are human error assessment and reduction technique (HEART) and SLIM are available to estimate HEP. In first-generation models, all HEPs are combined to get the nominal value of probability associated with tasks. The focus is on the mental dimensions of operator efficiency, known as the skill-rule-knowledge (SKR) model [30] and does not consider other elements such as organization factors and errors of commission. Among
Human reliability
191
the first-generation models, the most popular and effective method used is THERP,defined as other first-generation methods by accurate mathematical analysis of probability and error rates, as well as well-structured computer programs to communicate with trees to measure HE of a fault event and trees. Notwithstanding criticisms and inefficacy of a few methods of first generation such as THERP, ASEP, and HCR, their functionality and very quantitative aspects are widely used in many industries.
7.5.2 Second-generation HRA models While the HRA methods of the first generation are primarily relational approaches, the HRA methods of the second generation are theoretical in nature. The current HRA models are not adequate to resolve the mental foundations of potential accident failure teams. The second generation of HRA approaches investigates more aspects of human behavior. In general, these approaches analyze the causes and likelihood of HE, including evaluation, assessment, selection, and intervention, at various stages of mental activity. The HRA models of the second generation emphasize the integrity of the human interaction and machines, taking into consideration the environmental and psychological impact. Furthermore, the implications of teamwork are often taken into account. Such models include many aspects of the human–machine–environment derived from systems engineering. The second-generation approaches may also include: ATHEANA, HEART, and CREAM. Cognitive models were developed that depict the operator’s logical-rational process and summarize the reliance on personal factors (such as pressure, incompetence, etc.) and on the present situation (normal conducting, unusual circumstances, or even emergencies) and the models of the interface between men and machines, representing the monitoring system of the productive process. Any attempted understanding of human performance should explicitly include the function of human knowledge, described by an operator as “the act or process of information including both awareness and judgment.” From the HRA practitioners viewpoint, a new category of error was introduced as the immediate option to take human cognition into account in HRA methods: “cognitive error,” described as both an activity failure that is primarily cognitive in nature and as the underlying cause of an operation that fails. HRA approaches of the second generation have attempted to qualitative analyze the operator’s actions and look for models explaining the relationship
192
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
with the operating environment. The operator model for CREAM is more critical and less simplistic than first-generation methods. The cognitive model of CES is based on the premise that human behavior is driven by two basic principles: the cyclic existence of human understanding and the reliance on the context and working environment in cognitive processes. This model takes into account the cognitive functions (perception, interpretation, planning, and action) and their mechanisms for connection and cognitive processes that regulate evolution. The SPAR-H is based on a specific human quality analysis framework developed from the literature on behavioral sciences. In order to overcome this issue and attain a more precise estimation, Park and Lee (2008) [62] suggested a simple and efficient method: analytical hierarchy method (AHP)-SLIM. This approach incorporates the decisionmaking methodology AHP—a multicriteria decision-making system for complicated problems in which both qualitative and quantitative dimensions are known to produce unbiased and practical outcomes—with the SLIM, a simple, robust expert assessment approach for estimating HEPs. It is also necessary to measure the subjective decision and validate the accuracy of the data obtained by means of a form of HEP calculation utilizing the AHP.
7.5.3 Third-generation HRA models The shortcomings and drawbacks of the first- and second-generation HRA approaches have led to more advances relevant to developing pre-existing methodologies. The only approach actually known as a third generation is the nuclear action reliability assessment (NARA) which is in reality an improved version of HEART for the nuclear sector. The shortcomings in the first and second generation, highlighted above, have been the starting point of HRA experts for new research and improvement of existing methods. Currently, there are some databases for HRA analysts that contain the HE data with cited sources to improve the validity and reproducibility of HRA results. Examples of databases are the human event repository and analysis (HERA) (NUREG/CR-6903, 2006) [63] and the human factors information system (HFIS) . But they gather information about the factors that influence all human performance throughout an entire risk-significant scenario rather than single HEs within the scenario. The information provided by such data sources is too high level to be used to develop a model of PIFs affecting a single HE. Recent developments in the modeling of decision making emphasize the dual influences of cognition and emotion on decision outcomes. The
Human reliability
193
integration of emotions and cognition models of decision making has improved the ability of such models to understand and predict behavior. Furthermore, such an integrated approach is highly relevant to the riskrelated decision making typically found within safety critical industries.
7.5.4 Cognitive architecture models Cognitive architectures constitute mental processes with laboratory-based, scientifically derived variables that set visual, auditory and motor-processing cognitive limitations, as well as visual, auditory, short (or working) and longterm memory storage capacity. The following are some of the cognitive architecture models. r SOAR model proposed by Newell [31] is used to develop generalized intelligent agents for a range of tasks that also functioned as the building blocks to emulate human cognitive capacity. It has been extensively applied by Artificial Intelligence (AI) researchers to construct intelligent agents and cognitive models of various aspects of human behavior. r DUAL is a hybrid cognitive architecture in which symbolism and connectionism are known as two facets of human perception (Kokinov,1994) [64]. The former refers to historical awareness and the latter refers to its current importance and the conceptual or connectionist components are independent units. This architecture serves as a paradigm for human cognition and natural language comprehension, where the function of a constantly changing environment is well understood. r The connectionist learning with adaptive rule induction on-line (CLARION) cognitive architecture focuses on psychological databased abstract stimulation processes and meta-cognitive processes [32]. It consists of two levels in which the top level uses abstract (encoding explicit information) and symbolic symbols, while the bottom level uses unconscious knowledge (encoding implicit knowledge). r Active control of thought-rational (ACT-R) attempts to study how the brain organizes itself into singular processing modules to reduce cognitive functions to the most basic operations that can enable cognition [5]. This theory presents a set of fixed mechanisms using task expertise to carry out a task, thus predicting and demonstrating the cognition processes that establish human behavior. r Learning intelligent distribution agent (LIDA) architecture proposed by Ramamurthy et al. (2006) [65] is a built-in artificial cognitive system which aims to form a broad spectrum of cognition within biological systems, from low perception/action to high reasoning.
194
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Cognitive architectures can serve as a good basis for developing braininspired, psychologically functional cognitive agents for a variety of applications that need/prefer human-like actions and performance. The simulations of different behavior of operators in complex power plants were first attempted with optimal control and cognitive environment simulation studies. The psychological basis of HRA for nuclear operators and the theoretical reasons for human reliability were studied. Many HRA models consider HE from a behavioral point of view. These models only utilize an abstract definition of human activity (human actions or human cognition) to better understand human cognitive error. The well-known Rasmussen’s skill-rule-knowledge paradigm was developed based on the psychological nature of human behavior. This concept was used for assigning cognitive tone to a few HRA models viz. HCR models. Techniques such as expert systems were then recommended to improve the existing models. Algorithms with the framework of computational analysis that deal with the development of models based on psychological of cognitive behavior were then developed. Cognitive environment simulation, cognitive simulation model are few examples. Though, these models were able to reproduce the behavior of an operator, they failed to show the mental representation or cognitive architecture of the operator for the execution of tasks. Therefore, a cognitive architecture has to be designed in order to account for the biases of cognition which represent the true mental architecture of operators. In complex man–machine systems, reliability is not addressed in a deep cognitive level and the human performance has not been given proper treatment. The theoretical basis must be derived from psychology; an improvised HRA methodology should formally incorporate cognitive and psychological theories as the basis of human performance model. This calls for providing more training to operators to minimize errors and improve the operator performance. Research has, in the past, been dominated by studies of psychological behavior of operators interacting with systems. It is necessary to address the issues associated with developing models for nuclear safety system based on the human cognitive architecture design. Although recent studies indicate a better understanding of the nature of human perception, cognitive architecture analysis remains reasonably perception-centered. Cognitive factors such as memory and perception are primarily used in architectures which are physically embodied. Several senses like vision and attention remain largely unexplored. Hybrid and evolving architectures with artificial and actual sensors incorporate a broader variety
Human reliability
195
of sensory modalities. However, it appears that studies on human cognitive architecture design in nuclear safety systems needs to be addressed.
7.6 Human cognitive architecture Human cognitive models in nuclear applications mainly focus on operator psychological and thinking process before a certain action execution, that is, the mechanisms and laws of executing certain operator behaviors. In this section, the challenges of designing the cognitive framework considering the key characteristics an architecture needs to have in order to function as a tool in HRA. A few challenges in developing a cognitive architecture model are presented below: r The cognitive architectures described earlier are macrocognition, detection and perception, understanding and sense-making, knowledge representation, long-term memory, short-term memory (working memory), reasoning, decision-making, action, and team-work aspects. Traditional models of human cognition are based on the qualitative analysis and/or empirical rules, but providing accurate digital representations of the process of cognition is difficult. r The cognitive architectures such as ACT-R, DUAL, and CLARION were designed to evaluate human performance without an intention of serving as a HRA tool. These architectures have understanding of microcognition and cognitive mechanisms of HE, which is the key in HRA. They are comprehensive enough in terms of the study on various cognitive function failures. However, these architecture models lack error mechanisms (not available in HRA) which are necessary in man– machine interface assessment. It is a challenge integrating these models with HRA in order to develop a new human cognitive architecture that includes study on human behavior for better human–machine interface, and the mechanism for HE generation. This integration is not easy, but a theoretical breakthrough in HRA is necessary. r The connectionist and symbolist architecture consist of neural nets (neuronal networks linked together) that represent the biological processes of the human brain. Artificial intelligence is used to assess the human behavior. But the implementation of artificial intelligence for NPP operators working under dynamic working conditions is very difficult. A robust theoretical model is required to characterize human cognitive behavior in such applications to assess the cognitive processes.It is necessary to incorporate the theoretical model into human cognition
196
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
in order to design a suitable cognitive architecture that represents human cognitive behavior. r Design of cognitive architecture model for control room operators in NPP is a challenge due to the varied responses of the operators under different situations. Simulations for predicting typical measures in psychological experiments like latency (time to perform a task), accuracy (correct vs. false responses) is necessary. For example, normal operating conditions would require low levels of cognition and high levels of cognition during emergencies. The need to design human cognitive architecture for task execution for operators is a potential area of research in human cognition. In spite of the developments in HRA models, following deficiencies still exists in HRA methods: r Lack of empirical data for model development and validation. r Lack of inclusion of human cognition (i.e., need for better human behavior modeling). r Heavy reliance on expert judgment in selecting PSFs and use of these PSFs to obtain the HEP in HRA. r Many second-generation models still lack sufficient theoretical or experimental bases for their key ingredients. r First- and second-generation models lack a fully implemented representation of the underlying causal cognitive mechanisms, measurable linking PSFs, and their interdependence, or other characteristics of the operator and their context-particularly, the measurement of the cognitive and physical abilities remain very subjective. r The majority of the proposed approaches still rely on implicit functions relating PSFs to probabilities without proper uncertainty treatments.
7.7 HRA in the context of multiunit PSA Operators and other personnel are posed with additional challenges during a multiunit initiating event. The challenges could be from the increased complexity in managing the resources available, prioritizing the deployment of shared equipment, multiple events due to a common hazard. It is important to assess and quantify the HEPs for multiunit events that include the dependence of human actions in multiple units. Multiunit considerations for HRA must include shared human resources between units, impact on accessibility due to a unit in degrade condition, increases stress levels due to multiunit accident conditions, shared control room, etc. The modeling of
Human reliability
197
HRA in multiunit PSA is identical to that of single unit PSA in identification of HFEs, qualitative and quantitative assessment, screening of human actions based on timing, signals, plant procedures, etc. As observed in Fukushima, one of the important factors in MUPSA is that a radioactive release after a severe accident in one unit can impact the operator actions at other units. Therefore, the analysis must cover the potential impact of radioactive release affecting the habitability of control room. Another important factor in HRA of multiunit site is the consideration of interunit dependencies, especially for the shared resources. Since the methodology for MUPSA is still evolving, a list of some additional aspects to be included in the HRA is given below. r Organizational factors in performing mitigative actions r Communication among the field personnel to operate the SSC r Identification of the correct equipment/system of the respective unit in shared connections r Accessibility during extreme environmental conditions r Available time for action r Availability of power source during extreme accidental conditions r Sequence of operation to perform. For example, cross-ties of shared systems r Diagnosis/misdiagnosis of the operating state of reactors r Planning of emergency control actions r Use of mobile equipment, fire tenders, communication, etc.
Further readings [1] A. Al-Dujaili, K. Subramanian, S. Suresh, HumanCog: a cognitive architecture for solving optimization problems, in: IEEE Congress on Evolutionary Computation, CEC 2015, 2015, pp. 3220–3227. [2] M.A.B. Alvarenga, P.F. Frutuoso, E. Melo, R.A. Fonseca, A critical review of methods and models for evaluating organizational factors in human reliability analysis, Prog. Nucl. Energy 75 (2014) 25–41. [3] M.A.B.Alvarenga,E.Frutuoso,P.F.Melo,Including severe accidents in the design basis of nuclear power plants: an organizational factors perspective after the Fukushima accident, Ann. Nucl. Energy 79 (2015) 68–77. [4] M.D.Ambroggi,P.Trucco,Modelling and assessment of dependent performance shaping factors through analytic network process, Reliab. Eng. Syst. Saf. 96 (7) (2011) 849–860. [5] J.R.Anderson,Human symbol manipulation within an integrated cognitive architecture, Cogn. Sci. 29 (3) (2005) 313–341. [6] J.R. Anderson, D. Bothell, M.D. Byrne, S. Douglass, C. Lebiere, Y. Qin, An integrated theory of the mind, Psychol. Rev. 111 (4) (2004) 1036–1060. [7] M. Aoki, G. Rothwell, A comparative institutional analysis of the Fukushima nuclear disaster: lessons and policy implications, Energy Policy 53 (2013) 240–247.
198
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
[8] G.E. Apostolakis, V.M. Bier, A. Mosleh, A critique of recent models for human error rate assessment, Reliab. Eng. Syst. Saf. 22 (1–4) (1988) 201–217. [9] M. Baba, Fukushima accident: what happened? Radiat. Meas. 55 (2013) 17–21. [10] Babbie, The Practice of Social Research, 13th ed., Wadsworth, Belmont, CA, 2012. [11] K. Komura, M. Yamamoto, T. Muroyama, Y. Murata, T. Nakanishi, M. Hoshi, T. Matsuzawa, The JCO criticality accident at Tokai-mura, Japan: an overview of the sampling campaign and preliminary results, J. Environ. Radioact. 50 (1–2) (2000) 3–14. [12] P. Le Bot, Human reliability data, human error and accident models-illustration through the Three Mile Island accident analysis, Reliab. Eng. Syst. Saf. 83 (2) (2004) 153–167. [13] V. Saenko, V. Ivanov, A. Tsyb, T. Bogdanova, M. Tronko, Y.U. Demidchik, S. Yamashita, The Chernobyl accident and its consequences, Clin. Oncol. 23 (4) (2011) 234–243. [14] W. Jung, W. Yoon, J. Kim, Structured information analysis for human reliability analysis of emergency tasks in nuclear power plants, Reliab. Eng. Syst. Saf. 71 (1) (2001) 21–32. [15] KINS 2015, Nuclear Safety Information Center. http://nsic.kins.re.kr/nsic/index.jsp. [16] S. Shen, C. Smidts, A. Mosleh, A methodology for collection and analysis of human error data based on a cognitive model: IDA, Nucl. Eng. Des. 172 (172) (1997) 157–186. [17] J. Reason, Human Error, Cambridge University Press, Cambridge, 1990. [18] J. Reason, Modelling the basic error tendencies of human operators, Reliab. Eng. Syst. Saf. 22 (1–4) (1998) 137–153. [19] A.D. Swain, H.E. Guttmann, Handbook of human reliability analysis with emphasis on nuclear power plant applications (NUREG/CR-1278), U.S. Nuclear Regulatory Commission, Washington DC, 1983. [20] A.D. Swain, Human reliability analysis: Need, status, trends & limitations, Reliab. Eng. Syst. Saf. 29 (3) (1990) 301–313. [21] J. Rasmussen, Human errors. A taxonomy for describing human malfunction in industrial installations, J. Occup. Accid. 4 (2) (1982) 311–333. [22] J. Rasmussen, Skills rules and knowledge, other distinctions in human performance models, IEEE Trans. Syst. Man Cybern. 13 (3) (1983) 257–266. [23] D. Norman, The psychology of everyday things, Basic Books, New York, 1988. [24] S.T. Shorrock, Errors of memory in air traffic control, Saf. Sci. 43 (8) (2005) 571–588. [25] USNRC, 2005, ‘Fire PRA Methodology for Nuclear Power Facilities’, EPRI-1011989 and NUREG/CR-6850, vol. 1. [26] J. Park, D. Lee, W. Jung, J. Kim, An experimental investigation on relationship between PSFs and operator performances in the digital main control room, Ann. Nucl. Energy 101 (2017) 58–68. [27] S.K. Sharma, Human reliability analysis : A compendium of methods, data and event studies for nuclear power plants (Tec. Doc. No. AERB/NPP/TD/O-2), Atomic Energy Regulatory Board, 2008. [28] L.N.D. Oliveira, I.J.A. Santos, P.V. Carvalho, A review of the evolution of human reliability analysis methods, Proceedings of International Nuclear Atlantic ConferenceINAC 2017, INAC, Brazil, 2017. [29] G.W.Hannaman,A.J.Spurgin,Y.Lukic,A model for assessing human cognitive reliability in PRA studies, Conference Record for 1985 IEEE Third Conference on Human Factors and Nuclear safety 1985, IEEE Service Center, US, 1985. [30] L. Zhang, X. He, L.C. Dai, X.R. Huang, The simulator experimental study on the operator reliability of Qinshan nuclear power plant, Reliab. Eng. Syst. Saf. 92 (2) (2007) 252–259 2007. [31] A. Newell, Unified theories of cognition and the role of SOAR, SOAR: A cognitive architecture in perspective. Studies in cognitive systems, 10, Springer, Dordrecht, 1992, pp. 25–79. [32] R.Sun,Cognition and multi-agent interaction,Cambridge University Press,2003,pp.9– 10.
Human reliability
199
[33] S. Franklin, T. Madl, S. D’mello, J. Snaider, LIDA: a systems-level architecture for cognition, emotion, and learning, IEEE Trans. Autom. Ment. Dev. 6 (1) (2013) 19–41. [34] M.J. Akhtar, I.B. Utne, Human fatigue’s effect on the risk of maritime groundings - a Bayesian network modeling approach, Saf. Sci. 62 (2014) 427–440. [35] R. Barati, S. Setayeshi, On the operator action analysis to reduce operational risk in research reactors, Process Saf. Environ. Prot. 92 (6) (2014) 789–795. [36] T. Bedford, C. Bayley, M. Revie, Screening, sensitivity, and uncertainty for the CREAM method of human reliability analysis, Reliab. Eng. Syst. Saf. 115 (2013) 100–110. [37] P. Bhavsar, B. Srinivasan, R. Srinivasan, Pupillometry based real-time monitoring of operator’s cognitive workload to prevent human error during abnormal situations, Ind. Eng. Chem. Res. 55 (12) (2016) 3372–3382. [38] H.S. Blackman, D.I. Gertman, R.L. Boring, Human error quantification using performance shaping factors in the SPAR-H method, in: Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 52, 2012, pp. 1733–1737. [39] R.L. Boring, D.I. Gertman, Human reliability analysis for computerized procedures, part two: applicability of current methods, in: Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 56, 2012, pp. 2026–2030. [40] R.L.Boring,H.S.Blackman,The origins of the SPAR-H method’s performance shaping factor multipliers, in: IEEE Conference on Human Factors and Power Plants, 2007, pp. 177–184. [41] R.L. Boring, J. Joe, Task decomposition in human reliability analysis, in: Proceedings of Probabilistic Safety Assessment, Idaho National Lab. (INL), Idaho Falls, ID (United States), 2014. [42] P.C. Cacciabue, Cognitive modelling: a fundamental issue for human reliability assessment methodology? Reliab. Eng. Syst. Saf. 38 (1–2) (1992) 91–97. [43] P.C. Cacciabue, Modelling and simulation of human behaviour for safety analysis and control of complex systems, Pergamon Saf. Sci. 28 (2) (1998) 97–110. [44] P.Cacciabue,Human factors impact on risk analysis of complex systems,J.Hazard.Mater. 71 (1–3) (2000) 101–116. [45] F. Castiglia, M. Giardina, E. Tomarchio, THERP and HEART integrated methodology for human error assessment, Radiat. Phys. Chem. 116 (1) (2015) 262–266. [46] IAEA TECDOC 592, Case study on the use od PSA methods: HRA, 1991. [47] G.W. Hannaman, A.J. Spurgin & Y.D. Lukic, Human cognitive reliability model for PRA analysis. Draft Report NUS-4531, EPRI Project RP2170-3. 1984, Electric Power and Research Institute: Palo Alto, CA. [48] International Atomic Energy Agency, Human reliability analysis for nuclear installations, IAEA safety report series, Vienna, Austria (in preparation). [49] A.I. Siegel, et al., The Maintenance Personnel Performance Simulation (MAPPS) Model, Proc. Hum. Factors Ergon. Soc. Annu. Meeting 28 (3) (1984) 247–251. [50] M.K. Comer, et al., Generating human reliability estimates using expert judgement, NUREG/CR-3688, USNRC, Washington DC, 1984. [51] J. Williams, Heart—A proposed method for achieving high reliability in process operation by means of human factors engineering technology, Saf. Reliab. 35 (2015) 5–25, doi:10.1080/09617353.2015.11691046. [52] D.E. Embrey, SLIM-MAUD: A computer-based technique for human reliability assessment, Int. J. Qual. Reliab. Manage. 3 (1) (1986) 5–12. https://doi.org/10.1108/ eb002855. [53] A.D.Swain,Accident sequence evaluation program:Human reliability analysis procedure (NUREG/CR–4772), United States (1987). [54] D. Woods, E. Hollnagel, Mapping cognitive demands in complex problemsolving worlds, Int. J. Man-Mach. Stud. 26 (1987) 257–275, doi:10.1016/S00207373(87)80095-0.
200
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
[55] B. Kirwan, A guide to practical human reliability assessment, Taylor & Francis, London, 1994. [56] E. Hollnagel, Cognitive Reliability and Error Analysis Method (CREAM), Elsevier, 1998. [57] Technical Basis and Implementation Guidelines for a Technique for Human Event Analysis (ATHEANA), Rev.01, NUREG-1624, US NRC. (2000). [58] B. Kirwan, et al.,Nuclear Action Reliability Assessment (NARA): A data-based HRA tool,Saf.Reliab.25(2) (2005) 38–45.https://doi.org/10.1080/09617353.2005.11690803. [59] D. Gertman, H. Blackman, J. Marble, J. Byers, C. Smith, others, 2005. The SPARH human reliability analysis method, NUREG/CR-6883. US Nuclear Regulatory Commission,Washington, DC. [60] D.E. Embrey, SHERPA: A systematic human error reduction and prediction approach, American Nuclear Society, United States, 1986. [61] Mica. Endsley, Situation awareness analysis and measurement, chapter theoretical underpinnings of situation awareness, Critic. Rev. (2000) 3–33. [62] K.S. Park, J.I. Lee, A new method for estimating human error probabilities: AHP–SLIM, Reliab. Eng. Syst. Saf. 93 (4) (2008) 578–587. [63] B. Hallbert, et al., Human Event Repository and Analysis (HERA) System, overview, NUREG/CR-6903 1-2 (2006). [64] The DUAL Cognitive Architecture: A Hybrid Multi-Agent Approach, Proceedings of ECAI’94.Ed. A. Cohn, John Wiley & Sons Ltd., London, 1994. [65] Ramamurthy, Uma & Baars, Bernard & K, D’Mello & Franklin, Stan. (2006). LIDA: A working model of cognition. Proceedings of the 7th International Conference on Cognitive Modeling.
CHAPTER 8
Common cause failures and dependency modeling in single and multiunit NPP sites Contents 8.1 Dependent failures 8.2 Common cause failures 8.3 CCF models 8.3.1 Beta factor model 8.3.2 Multiple Greek letter model 8.3.3 Alpha factor model 8.4 Impact vector method to estimate the alpha factors 8.4.1 Mapping techniques 8.4.2 Estimation of impact vectors 8.4.3 Estimation of alpha factors from impact vectors 8.5 Approach for interunit CCF in multiunit sites Further readings
202 202 204 204 204 205 206 206 209 209 217 218
Most of the modern technological systems are deployed with high redundancy, in particular, redundancy is the fundamental technique adopted for fault tolerance in nuclear safety systems. However, in redundant systems, common cause failures (CCFs) and dependent failures are the major contributors to risk. Protection and segregation: Redundant components should be physically separated. For example, high room temperature is an environmental condition that could affect several equipment. A physically separation with different cooling arrangements will reduce the dependent failures. There are different types of diversity that can be employed against CCFs. Functional diversity by way of providing diverse ways of providing power such as a DG for one section and a hydroelectric generator for other part. Diversity in components, such as use of redundant components obtained from different manufacturers, different principle of operation, use of different team to maintain, and test redundant trains are other defense mechanisms. Several models were developed for CCF analysis and the models were classified as shock and nonshock models. Shock models are those where each Reliability and Probabilistic Safety Assessment in Multi-Unit Nuclear Power Plants. DOI: https://doi.org/10.1016/B978-0-12-819392-1.00012-6
Copyright © 2023 Elsevier Inc. All rights reserved.
201
202
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
component experiences independent failures and the system is subjected to a shock that can cause system errors of any combination of component failures. Nonshock models account for failure frequencies of different combination of components irrespective of independent or dependent failures for single failure events.
8.1 Dependent failures Understanding how systems/components fails is important for the model development and risk assessment. The most appropriate definition of dependent failure, as given by Humphreys & Jenkins, is: the failure of a set of events, the probability of which cannot be expressed as the simple product of the unconditional failure probabilities of the individual events. Functional dependencies, cascading failures, maintenance-related errors, common environmental stresses are all examples of dependent failures. In fact, CCF is an example of dependent failure. CCF is a dependent failure in which multiple component failures occur due to a shared cause. Dependent failures are also applicable to human failure events and the MUPSA is incomplete if multiunit human factor dependency is not evaluated.
8.2 Common cause failures Reliability studies of redundant systems have to essentially cover the CCFs. As mentioned earlier, most of the safety and safety-related systems in nuclear power plants (NPPs) are built with redundant units. Therefore, the risk of operating NPP is dominated by accidents that occur mainly due to failure of multiple components to perform their functions.These multiple component failures occur as a result of common cause events. They are important to consider because they can violate the effects of redundancy. If we have two components and if Ei denote the failure of item i, then the probability that both components fail is: P(E1 ∩ E2 ) = P(E1 | E2 ).P(E2 ) = P(E2 | E1 ).P(E1 ). If the components are dependent, P(E1 | E2 ) = P(E1 ) and P(E2 | E1 ) = P(E2 ). In the context of multiunit NPPs,it is important to assess the risk increase due to dependencies among units as compared to the case when the units are completely independent. IAEA in the safety series No. 96, suggested a
Common cause failures and dependency modeling in single and multiunit NPP sites
203
metric called conditional probability of a multiunit accident to indicate the ratio of multiunit CDF in total CDF of a specific unit. Three main categories of CCF are - Common cause multiunit initiating events: External and internal events that have the potential for initiating a plant transient at a site and that increases the probability of failure in multiple systems. These events usually, but now always, result in severe environmental stresses on components and structures. Fire, flood, earthquakes, loss of offsite power, air craft crash, etc. - Shared equipment dependencies: These are dependencies of systems in multiple plants on the same components/subsystems. For example: when a diesel generator is shared between multiple plants or a collection of systems/subsystems in multiple plants are fed from the same electrical bus. - Human interaction dependencies: These dependencies are introduced by human actions that include errors of omission and commission. These are applicable for single and multiple unit PSAs. Consider a group of three components, A, B, and C, the Boolean expression of total failure of component A is AT = AI + CAB + CAC + CABC where AT is the total failure of component A. AI is the failure of component A due to independent causes (single failure). CAB is the failure of components A and B due to common causes (but not component C). CAC is the failure of components A and C due to common causes (but not component B). CABC is the failure of components A, B, and C due to common causes. It can also be written as AT = AI + AC . AC represents the failures due to dependent causes. If a system has n identical components, total failure frequency of one specific component is QT . We can also define QK as the failure frequency of one specific common cause basic events involving k specific components (1 ≤ k ≤ n) and QS is the total frequency of an event affecting any of the n components. The failure frequency of a system with three components is, QT = QI + 2Q2 + Q3 and QS = 3QI + 3Q2 + Q3 .
204
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
As a general formula,
n−1 Qk s QT = k=1 k − 1 n n Qk Qs = k=1 k n The binomial coefficient denotes the number of combinations of k k components out of the total n components. n
8.3 CCF models 8.3.1 Beta factor model Beta factor model is a special model that can be classified under both shock and nonshock model. It is a single parameter model and is most widely used for treating CCF because of its simplicity to account for dependent failures. The main assumption of this model is that whenever a CCF event occurs, all components within the CCF group fail. This model assumes that a constant fraction beta of the component failure can be associated with the common cause events shared by other components in that group. That is, if a specific component fails, it will cause all components in the group to fail with probability β and with probability 1−β involve just the specific component AI = (1 − β )AT Ak = 0(k = 2, 3, . . . , n − 1) Ac = βAT Here the system failure frequency is AS = [n(1 − β) + β] AT .
8.3.2 Multiple Greek letter model Multiple Greek letter (MGL) method is a generalized model of beta factor that allows for CCFs of various combinations. In MGL model other parameters in addition to the beta factor are introduced to account more explicitly for higher order redundancies and to allow for different probabilities of failures of subgroups of the common cause component group. For a system consisting of n components.
Common cause failures and dependency modeling in single and multiunit NPP sites
205
8.3.3 Alpha factor model The alpha factor model defines CCF probabilities from a set of failure frequency ratios and the total component failure probability QT . Among all the CCF models, alpha factor is considered to be more realistic as it can model the real scenario to a greater extent. Alpha factor method does not assume that in each CCF event all components share the common cause but assigns probabilities to the different degrees of the cause and is based on clearly formulated probabilistic assumptions. Thus, this approach poses a more complex structure to determine the alpha factors when the level of redundancy increases. One main advantage of this method is the ability to analyze various CCF events of different intensity as applicable to plant/system specific requirements. CCF quantification based on impact rate of the CCFs and the number of components of the common cause component group affected has shown realistic behavior of the model and is found suitable for high redundant systems. Mapping up technique enables the estimation of CCF basic event probability in a highly redundant system based on the plant specific data available for lower redundant system. The alpha factor model estimates the CCF frequencies from a set of ratios of failures and the total component failure rate. The parameters of the model are QT ≡ total failure probability of each component (includes independent and common-cause events) α (m) k ≡ fraction of the total probability of failure events that occur in the system involving the failure of k components in a system of m components due to a common-cause. The CCF basic event equation for any k out of m components failing in case of staggered testing is given by Wierman et al. [18]: ⎛ QCCF = QT
m i
⎞
m m ⎟ (m) ⎜ m (m) ⎟ α = QT ⎜ αi ⎝ m−1 ⎠ i i i=k i=k i−1
where α (m) i is the ratio of i and only i CCF failures to total failures in a system of m components; m is the number of total components in the component group; k is the failure criteria for a number of component failures in the component group;
206
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
QT is the random failure probability (total); QCCF is the failure probability of k and greater than k components due to CCF.
8.4 Impact vector method to estimate the alpha factors NUREG/CR-5485 proposed a technique for CCF analysis using “event impact vector.” An impact vector is a numerical representation of a CCF event and is classified according to the level of impact of common cause events. In this technique, the impact vectors are modified to reflect the likelihood of the occurrence of the event in the specific system of interest. This method is also known as mapping. The mapped impact vectors are finally used to arrive at alpha factors. For a CCCG of size m,an impact vector will have m elements and the kth element is denoted by Pk . Here Pk denotes the probability of k component failing due
to a common cause. For example, the impact vector a CCCG of (5) (5) (5) (5) (5) size 5 is P1 P2 P3 P4 P5 = [0 0 0 0 1]. Appropriate mapping technique is adopted to determine the value of Pk .
8.4.1 Mapping techniques Mapping process is performed from three different routines depending on the relationship between the original system and the size of target system of interest. r Mapping down is for computing impact vectors when exposed population size is larger than that of the target group size, for example, from four component system to two component systems. r Mapping up is when the impact vector exposed population size is smaller than that of the target group size, for example, from two component systems to four component systems. r The special case where the impact vector has been identified as a “lethal shock,” the impact vector for the new system of m components comprises a 1.0 in the Fm position and rest all values are zero. For (5) (5) (5) (5) (5) example P1 P2 P3 P4 P5 = [0 0 0 0 1]. A lethal shock is one which wipes out all redundant components present within a common cause group [14]. To focus on estimation of alpha factors for large redundant configurations with the help of mapping up technique, the technique of mapping up is described in a comprehensive manner.
Common cause failures and dependency modeling in single and multiunit NPP sites
207
To reasonably map up the effect of nonlethal shocks, it is required to relate the probability of failure of k or more components in terms of parameters that can be determined from measurements of number of failure events involving i = 0, 1, 2 …. k − 1 components. For each shock, there is a constant probability ρ, which is the conditional probability of each component failure given a shock. It is also known as mapping up parameter and is expressed as the probability that the nonlethal shock or cause would have failed a single component added to the system. The mapping up is performed for all the CCF events affecting the system and it is based on the subjective assessment of ρ. The assessment of ρ is performed for each CCF event and may be different for different events depending on the application. The frequency of events that occur within an n train system resulting in r failures due to nonlethal shocks is expressed using binomial failure rate (BFR) model as Pr(n) = μ .C nr ρ r (1 − ρ )n−r , where μ is the occurrence rate of shock. For a system of size 5, the observed values of Pi(5) i=1 .. 5 are generated in a BFR process with parameters μ and ρ. (5) P1 = 5μρ (1 − ρ )4 , (5) P2 = 10μρ 2 (1 − ρ )3 , (5) P3 = 10μρ 3 (1 − ρ )2 (5) P4 = 5μρ 4 (1 − ρ ), (5) P5 = μρ 5
Table 8.1 shows the impact of CCF events on the redundant configuration of five train system to lower redundant configurations of up to one train system. Ideally, it is sufficient to model the impact of CCF events till the level from where the system is mapped up. To map up from a system of size 2 to system of size 5, the observed value of P2(5) is modified as (5) P2 = 10μρ 2 (1 − ρ )3 , which is further simplified as follows: (5) 9μρ 2 (1 − ρ )3 P2 = μρ 2 (1 −ρ )3 + (5) 3 2 P2 = (1 − ρ ) μρ + ρ (1 − ρ )2 29 [2μρ(1 − ρ )] (5) (2) (2) P2 = (1 − ρ )3 P2 + 92 ρ (1 − ρ )2 P1 (5) (2) (2) ρ P2 = (1 − ρ )3 P2 + 9 2 (1 − ρ )2 P1
In order to estimate the contribution of P1(2) and P2(2) to P2(5) , the number of doubles, singles, and zeros needs to be determined from Table 8.1. This contribution is derived in Table 8.2 and it can be inferred that one tenth of
208
Event type
Independent Common cause impacting two components Common cause impacting three components Common cause impacting four components Common cause impacting five components
Basic events in five train system (A, B, C, D, E)
Impact on four train system (A, B, C, D)∗
A, B, C, D, E AB, AC, AD, AE, BC, BD, BE, CD, CE, DE ABC, ABD, ABE, ACD, ACE, ADE, BCD, BCE, BDE, CDE ABCD, ABCE, ABDE, ACDE, BCDE ABCDE
A, B, C, D, Nn A, B, C, Nn, Nn AB, AC, AD, A, BC, AB, AC, A, A, BC, BD, B, CD, C, D B, B, C, C, Nn
∗ indicates one component is removed; Nn refers to none.
Impact on three train system (A, B, C)∗
Impact on two train system (A, B)∗
Impact on one train system (A)∗
A, B, Nn, Nn, Nn A, Nn, Nn, Nn, Nn AB, A, A, A, B, B, B, A, A, A, A, Nn, Nn, Nn, Nn, Nn Nn, Nn, Nn, Nn
ABC, ABD, AB, ACD, AC, AD, BCD, BC, BD, CD ABCD, ABC, ABD, ACD, BCD
ABC, AB, AB, AC, AB, AB, AB, A, A, AC, A, BC,BC, B, A, B, B, B, Nn C
A, A, A, A, A, A, Nn, Nn, Nn, Nn
ABC, ABC, AB, AC, BC
AB, AB, AB, A, B
A, A, A, A, Nn
ABCD
ABC
AB
A
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Table 8.1 Impact of CCF events.
Common cause failures and dependency modeling in single and multiunit NPP sites
209
Table 8.2 CCF contribution of components after mapping up. Number of components affected by CCF
Number of zeros when mapped to two components
Number of singles when mapped to two components
Number of doubles when mapped to two components
1 2 3 4 5
3 3 1 0 0
2 6 6 2 0
0 1 3 3 1
(5) (2) P2 is observed as P2 in a two train system. The other part is observed as (2) P1 . Repeating the mapping up procedure, expressions for events classified as nonlethal shocks are obtained as shown in Table 8.2.
8.4.2 Estimation of impact vectors On scrutinizing the columns of Table 8.3 generated by applying the BFR model, it is obvious that the uncertainty inherent in mapping up impact vectors is reduced to the uncertainty in estimating the conditional probability, ρ of nonlethal shock to fail a single component. A higher value of ρ indicates the probability of more components failing due to the shock. Four CCF events with ρ values of 0.1, 0.2, 0.3, and 0.8 and a beta value of 5% is taken for P1(2) and P2(2) . Based upon the subjective assessment on the value of ρ and with the help of mapping techniques established earlier, impact vectors to map up a system of size 2 to system of size 5 have been calculated as shown in Table 8.4.
8.4.3 Estimation of alpha factors from impact vectors The number of events in each impact category (nk ) is calculated by adding the corresponding elements of the impact vectors. n nk = Pk ( j) j=1 th
where Pk (j) is the k element of the impact vector for event j, and n is the number of CCF events. Finally, the alpha factors are estimated using the following expression [18]: nk αk(m) = m nk k=1
Table 8.3 Mapping up procedure.
3
4
(3) P1
4
(4) (1) (4) = P1 = 4(1 − ρ)3 P1 P2 = 3(1 − ρ)2 P1(1) P2(3) = 6ρ(1 − ρ)2 P1(1) P3(4) = 3ρ(1 − ρ) P1(1) 4ρ 2 (1 − ρ) P1(1) P4(4) = ρ 3 P1(1) (3) (1) P3 = ρ 2 P1
5 (5) (1) P1 = 5(1 − ρ)4 P1 (5) (1) P2 = 10ρ(1 − ρ)3 P1 (5) 2 2 (1) (5) P3 = 10ρ (1 − ρ) P1 P4 = 5ρ 3 (1 − ρ) P1(1) (5) (1) P5 = ρ 4 P1
(3) (2) (4) (2) (4) (5) (2) P1 = 32 (1 − ρ) P1 P1 = 2(1 − ρ)4 P1 P2 = P1 = 52 (1 − ρ)4 P1 (2) (3) (2) (5) (2) (2) 5 2 (2) (4) 9 2 P2 = ρ P1 +(1 − 2 ρ(1−ρ) P1 +(1 − ρ) P2 P3 = P2 = 2 ρ(1 − ρ) P1 +(1 − ρ)3 P2 (2) (3) (2) (2) (2) (4) (5) (2) (2) ρ 2 P1 +2ρ(1 − ρ) P2 P4 = ρ) P2 P3 = ρ P2 P3 = 72 ρ 2 (1−ρ) P1 +3ρ(1 − ρ)2 P2 (5) (2) (2) 2 (2) 3 2 ρ P2 P4 = ρ P1 +3ρ (1 − ρ) P2 (5) (2) P5 = ρ 3 P2 (4) (3) (4) (5) P1 = 43 (1 − ρ) P1 P2 = P1 = (3) (3) (4) 5 (1 − ρ)2 P1(3) +9μρ 2 (1 − ρ)3 P2(5) = ρ P1 +(1 − ρ) P2 P3 = 3 (3) (3) (5) ρ P2(3) +(1 − ρ) P3(3) P4(4) = ρ P3(3) 73 ρ(1 − ρ) P1 +(1 − ρ)2 P2 P3 = 2 3 ρ P1 +2ρ(1 − ρ) P2(3) +(1 − ρ)2 P3(3) P4(5) = ρ 2 P23 +2ρ(1 − ρ) P3(3) (5) P5 = ρ 2 P33 (5) (4) P1 = 54 (1 − ρ) P1 (5) (4) (4) (5) P2 = ρ P1 +(1 − ρ) P2 P3 = (4) (5) 4 ρ P2 +(1 − ρ) P3 P4 = ρ P34 +(1 − ρ) P4(4) (5) P5 = ρ P44
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Size 1 P1(2) = 2(1 − ρ) P1(1) (2) (1) of P2 = ρ P1 system mapping from 2
210
Size of the system mapped to 2 3
Common cause failures and dependency modeling in single and multiunit NPP sites
211
Table 8.4 Mapping up of impact vectors. Event no
1.
2.
3.
4.
System size
Impact vector P1 P2
P3
Nonlethal shock (ρ = 0.1) Original two train system 0.95 0.05 Identical three train system 1.28 0.14 0.01 Identical four train system 1.54 0.25 0.02 Identical five train system 1.73 0.38 0.04 Nonlethal shock (ρ = 0.2) Original two train system 0.95 0.05 Identical three train system 1.14 0.23 0.01 Identical four train system 1.22 0.41 0.05 Identical five train system 1.22 0.57 0.13 Nonlethal shock (ρ = 0.3) Original two train system 0.95 0.05 Identical three train system 1.00 0.32 0.02 Identical four train system 0.93 0.52 0.11 Identical five train system 0.81 0.65 0.23 Nonlethal shock (ρ = 0.8) Original two train system 0.95 0.05 Identical three train system 0.29 0.77 0.04 Identical four train system 0.68 0.38 0.62 Identical five train system 0.02 0.14 0.43
P4
P5
0.00 0.00
0.00
0.00 0.01
0.00
0.01 0.04
0.00
0.03 0.51
0.03
A plot of alpha factor for the example is shown in Fig. 8.1. The estimation of alpha factors in CCF analysis is further demonstrated with three varied real applications for Indian NPPs in the following section. A simple MATLAB code can be developed to estimate the alpha factors and compute CCF contribution to total failure probability. Case studies a) Decay heat removal system New generation reactor designs use passive systems for safety critical functions to achieve high reliability in accomplishing safety functions.Moreover, passive safety systems are considered to be safer as they do not require any external power sources. Let us take an example of a passive decay heat removal system in a NPP for the demonstration of CCF analysis using alpha factor method. Assume there are four DHR loops and the event simulated in this case study demand operation of two out of the four DHR loops for initial 24 hours and subsequently availability of one loop till 720 hours after the shutdown of the reactor. As an example, the effect of three nonlethal CCF events affecting the DHR system is studied for various values of ρ.
212
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Figure 8.1 Alpha factors for five component system.
The objective of the case study is to first estimate the alpha factors and then arrive at the contribution of the CCF events to total failure probability of the system. The case when an additional CCF event is a lethal shock has also been analyzed to study the effect of lethal shock. Finally, a broad comparison between the alpha factor method and beta factor method for their assessment of CCF contribution to total failure probability of the system due to various CCF events is made. In this case study, since the mapping up is performed from two component data, we can define a term for “mapping up beta” and denote as MBeta which is expressed as the fraction of total failure probability of the two-component system attributable to dependent failures [13]. Mapping up Beta = Qm /Qt where Qm is the dependent failure probability and Qt is the total failure probability for each component. Another term “common beta” is defined to denote CCF for the complete system. Beta expressed in percentage is the CCF contribution to total failure probability in these cases. The case is studied under two parts. Part one for the first 24 hours of mission time when the success criterion is two out of four and part two for rest of the mission time when success criterion is one out of four.
Common cause failures and dependency modeling in single and multiunit NPP sites
213
Table 8.5 CCF contribution to total system failure probability for different CCF events. Values of ρ
CCF contribution to total system failure probability MBeta = 10% MBeta = 5% MBeta = 1% MBeta = 0.1%
ρ ρ ρ ρ
6.151154 11.69249 19.43477 29.73186
= 0.1, 0.2, 0.3 = 0.2, 0.3, 0.4 = 0.3, 0.4, 0.5 = 0.4, 0.5, 0.6
4.85 9.84 17.17 27.25
3.854994 8.414538 15.41352 25.30687
3.634923 8.098896 15.02437 24.8764
Table 8.6 CCF contribution to total system failure probability with lethal shock. Values of ρ
CCF contribution to total system failure probability MBeta = 10% MBeta = 5% MBeta = 1% MBeta = 0.1%
ρ ρ ρ ρ
21.92 27.42 34.62 43.73
= 0.1, 0.2, 0.3, 1 = 0.2, 0.3, 0.4, 1 = 0.3, 0.4, 0.5, 1 = 0.4, 0.5, 0.6, 1
20.55 25.65 32.57 41.57
19.49 24.27 30.98 39.88
19.26 23.96 30.62 39.51
Figure 8.2 CCF contribution in two out of four system without lethal shock.
Part 1: When two out of four loops are required After the shutdown of the reactor for first 24 hours two loops of SGDHR are required. The contribution of CCF events to total failure probability for various set of values of ρ is presented in Table 8.5. The results of the case with extra CCF event as lethal shock is presented in Table 8.6 and the graphs plotted for the results are shown in Figs. 8.2 and 8.3. Alpha factors and estimated contribution of CCF events to total failure probability for ρ value of 0.4, 0.5, 0.6 and by various values of mapping up
214
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Figure 8.3 CCF contribution in two out of four system with lethal shock.
Table 8.7 Estimation of alpha factors for ρ = 0.4, 0.5, 0.6. Alpha factors
MBeta = 10% MBeta = 5% MBeta = 1% MBeta = 0.1%
α1 α2 α3 α4 Q(CCF)/Q(TOTAL) Q(CCF)/Q(TOTAL) %
0.344648 0.427577 0.208629 0.019147 0.297319 29.73186
0.358732 0.434561 0.197266 0.00944 0.272462 27.24616
0.369721 0.440011 0.188401 0.001867 0.253069 25.30687
0.37216 0.44122 0.186433 0.000186 0.248764 24.8764
Table 8.8 Estimation of alpha factors for ρ = 0.4, 0.5, 0.6, and 1. Alpha factors
MBeta = 10% MBeta = 5% MBeta = 1% MBeta = 0.1%
α1 α2 α3 α4 Q(CCF)/Q(TOTAL) Q(CCF)/Q(TOTAL) %
0.2760131 0.3424276 0.1670815 0.2144777 0.4372531 43.725314
0.28809137 0.34898833 0.15842072 0.20449958 0.4157272 41.5727203
0.29756 0.35413 0.15163 0.19668 0.39885 39.885
0.299669 0.355277 0.150119 0.194935 0.395093 39.50935
beta are presented in Table 8.7. The results for the case when lethal shock is also considered are presented in Table 8.8. Fig. 8.4 shows the alpha factors for different MBeta values for lethal and nonlethal cases. Part 2: When one loop is required out of four This case is applicable after the first 24 hours of shutdown of the reactor till the end of mission time (720 hours). The contribution of CCF events
Common cause failures and dependency modeling in single and multiunit NPP sites
215
Figure 8.4 Comparison of alpha factors for lethal and nonlethal shock.
Figure 8.5 CCF contribution in one out of four system without lethal shock.
to total failure probability for various set of values of ρ is shown in Fig. 8.5. The case with extra CCF event as lethal shock is given in Fig. 8.6. Following inferences are made from the results obtained: 1. Contribution of CCF events to total failure probability is found to be very less sensitive to the value of mapping up beta (Figs. 8.2, 8.3, 8.5, and 8.6). 2. As the success criterion becomes more stringent, the CCF contribution increases appreciably for the same values of ρ (Fig. 8.5 vs. Fig. 8.2 and Fig. 8.6 vs. Fig. 8.3).
216
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Figure 8.6 CCF contribution in one out of four system with lethal shock.
3. Sensitivity of CCF contribution to change in value of ρ increases significantly as the success criteria requirement gets more stringent (Fig. 8.5 vs. Fig. 8.2 and Fig. 8.6 vs. Fig. 8.3). 4. As the hazard from shocks increase (Figs. 8.2, 8.3, 8.5, and 8.6), beta factor method provides unrealistic assessment of CCF contribution to the failure of system. 5. In presence of lethal shock, CCF contribution to total failure probability increases appreciably and beta factor model fails to address this case subjectively yielding highly repressed estimates. (Figs. 8.3 and 8.6). 6. The values of alpha factors are found to be less sensitive to change in the value of mapping up beta in both lethal and nonlethal cases (Fig. 8.4). Thus, alpha factor model can be used to realistically estimate the contribution of CCF events to the total system failure probability. The model assesses the contribution of each of the CCF event based upon subjective assessment of a constant ρ which is conditional probability of each component failure given a shock. The values of alpha factors are found to be less sensitive to change in the value of mapping up beta and this sensitivity further reduces with more number of components added to the system. Contribution of CCF events to total failure probability is also found to be less sensitive to the value of mapping up beta but it is highly sensitive to the change in success criterion for the system.
Common cause failures and dependency modeling in single and multiunit NPP sites
217
The use of alpha factors is found to be highly suitable, especially for the cases exhibiting large redundant configuration and with a requirement to meet a stringent success criterion.
8.5 Approach for interunit CCF in multiunit sites The methodology adopted for CCF analysis of a single unit based on the CCF database cannot be extended to interunit CCF as there is not enough data available. However, the interunit CCF is a major risk at a site with multiple units. Seabrook multiunit PSA investigated the issue of multiunit CCF for emergency diesel generators and motor operated valves. One approach is to have conservative approach in general and carry out a detailed CCF analysis for those CCF events that are risk significant. The main difficulty is in the asymmetricity between the units. A thorough assessment is required to verify the applicability of interunit CCF. For example, if the units at a site are of different type, there is no technical basis for modeling interunit CCF. Similarly, if the identical components at different units are of different vintage, there may not be similarity as the manufacturing process change over time and they are less vulnerable to CCF.For the passive systems, the phenomena that drive the system to operate may be affected by the influence of extreme environmental stresses that may be common between units. Interunit CCF must take into account the stresses on the systems that could be impacted. In this context, studies suggest to use factors to represent the degree of similarity among components in different units based on coupling factors. A coupling factor is a property of components that identifies them as being susceptible to the same mechanisms of failure. Such factors include similarity in design, hardware, software, location, environment, mission, operations, maintenance, and test procedures. Some of the failures due to coupling factors can be minimized with simple strategies such as design control, using environmentally qualified equipment, testing and preventive maintenance programs, review of procedures, training of personnel, quality control, etc. Staggered test scheme concept using time-dependent failure model is a strategy suggested by experts. The effect of staggered test schemes can be studied in interunit CCF analysis. Event-wise impact vectors must be mapped up or down to the required common cause component group size. In summary, multiunit CCF analysis is still under way and there is no consensus on a methodology as the data on CCF for multiple units is very scarce.
218
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Further readings [1] P. Humphreys, A.M. Jenkins, Dependent failures development, Reliab. Eng. Syst. Saf. 34 (1991) 417–427. [2] IAEA, Technical Approach to Probabilistic Safety Assessment for Multiple Reactor Units, Safety Reports Series No.96, 2019. [3] NUREG/CR-4780, Procedures for Treating Common Cause Failures in Safety and Reliability Studies, 1988. [4] S.S. Bajaj, A.R. Gore, The Indian PHWR, Nucl. Eng. Des. 236 (2005) 701–722. [5] B. Heinz-Peter, et al., Methods for the treatment of Common Cause Failures in redundant systems, J. Reliab. Risk Anal: Theory Applic. 1 (2008) 8–11. [6] IAEA-TECDOC-648, 1992. Procedures for conducting common cause failure analysis in probabilistic safety assessment. [7] IAEA-TECDOC-1200, 2001. Application for probabilistic safety assessment (PSA) for nuclear power plants. [8] IAEA-TECDOC-1511, 2006. Determining the quality of probabilistic safety assessment (PSA) for applications in nuclear power plants. [9] A. John Arul, et al., Reliability analysis of safety grade decay heat removal system of Indian prototype fast breeder reactor, Ann. Nucl. Energy 33 (2006) 180–188. [10] S.C. Katiyar, S.S. Bajaj, Tarapur atomic power station units-1 and 2, design features, operating experience and license renewal, Nucl. Eng. Des. 236 (2005) 881–893. [11] A. Lasitha, et al., Test and monitoring system 1 (TMS1) for shutdown system 1 for TAPS 3 & 4, BARC Newslett. (272) (2006) 2–8. [12] A. Mosleh et al., 1989. Procedures for analysis of common cause failures in probabilistic safety analysis, NUREG/CR-4780, Vol. 1. [13] A. Mosleh, Common cause failures: an analysis methodology and examples, Reliab. Eng. Syst. Saf. 34 (3) (1991) 249–292. [14] A. Mosleh et al., 1998. NUREG/CR-5485, Guideline on Modeling Common-Cause Failures in Probabilistic Risk Assessment. [15] T. Sajith Mathews, et al., Integration of functional reliability analysis with hardware reliability: an application to safety grade decay heat removal system of Indian 500 MWe PFBR, Ann. Nucl. Energy 36 (4) (2009) 481–492. [16] V.V.S. Sanyasi Rao, Probabilistic safety assessment of nuclear power plants – level 1, in: International Conference on Reliability Safety and Hazard, 2010. [17] V.K.Seth,Design features of reactor assembly and structures of Indian 500 MWe PWHR stations, Nucl. Eng. Des. 109 (1988) 163-169.10. [18] T.E. Wierman et al., 2001. Reliability Study: Combustion Engineering Reactor Protection System, NUREG/CR-5500, 1984–1998, Vol. 10. [19] T.E. Wierman et al., 2007. Common-Cause Failure Database and Analysis - System: Event Data Collection, Classification, and Coding, NUREG/CR-6268, Rev. 1. [20] C. Seong, G. Heo, S. Baek, J.W. Yoon, M.C. Kim, Analysis of the technical status of multiunit risk assessment in nuclear power plants, Nucl. Eng. Technol. 50 (2018) 319– 326. [21] S. Jang, M. Jae, A development of methodology for assessing the inter-unit common cause failure in multi-unit PSA model, Reliab. Eng. Syst. Saf. 203 (2020). https://doi.org/10.1016/j.ress.2020.107012. [22] Z.B. Rejc, M. Cepin, An extension of multiple greek letter method for common cause failures modelling, J. Loss Prev. Process Ind. 29 (2014) 144–154. [23] A. O’Connor, A. Mosleh, A general cause based methodology for analysis of common cause and dependent failures in system risk and reliability assessments, Reliab. Eng. Syst. Saf. 14 (2016) 341–350.
Common cause failures and dependency modeling in single and multiunit NPP sites
219
[24] T.E. Wierman, D.M. Rasmuson, NB. Stockton, Common-cause failure event insights: circuit breakers, NUREG/CR-6819 4 (2003) 1–140. [25] T.E. Wierman, D.M. Rasmuson, NB. Stockton, Common-cause failure event insights: Pumps, NUREG/CR-6819 3 (2003) INEEL/EXT-99-00613. [26] T.E. Wierman, D.M. Rasmuson, N.B. Stockton, Common-cause failure event insights: emergency diesel generators, NUREG/CR-6819 1 (2003) INEEL/EXT-99-00613. [27] R.J. Budnitz, G. Apostolakis, D.M. Boore, L.S. Cluff, K.J. Coppersmith, C.A. Cornell, P.A.Morris. Recommendations for probabilistic seismic hazard analysis: guidance on uncertainty and use of experts: main report, NUREG/CR- 63721997 (2021) 1. [28] J.K. Vaurio, Common cause failure probabilities in standby safety system fault tree analysis with testing - scheme and timing dependencies, Reliab. Eng. Syst. Saf. 79 (1) (2003) 43–57. [29] S. Soga, Mathematical justification of the staggered test scheme by a time-dependent model, Probab. Risk Assess. Manage. (2020).
CHAPTER 9
International studies related to multiunit PSA: A review Contents 9.1 9.2 9.3 9.4
Seabrook PSA Byron and Braidwood PSA Research work at Maryland University, United States Korea Atomic Energy Research Institute 9.4.1 MUPSA software 9.5 CANDU Owners Group 9.5.1 Proposed site safety goals 9.5.2 Site CDF 9.5.3 Large off-site release safety goal 9.6 Multiunit PSA studies at EDF France 9.7 Fukushima Daiichi experience 9.8 MUPSA research in India 9.9 MUPSA approach in the United Kingdom 9.10 Site risk model development in Hungary 9.11 Other countries 9.12 Summary of international experience on MUPSA Further readings
221 223 224 225 226 226 229 229 229 230 231 232 235 235 237 237 238
The progress in multiunit probabilistic safety assessment (PSA) has identified the importance of treatment of inter-unit dependencies and some approaches have been proposed. Some of the reports reviewed include IAEA technical documents, NRC document, CNSC document “Summary Report of CNSC International Workshop on Multiunit PSA,” COG report “Development of a whole-site PSA methodology,” Seabrook PSA report, and other presentations and papers from Muhlheim, Mohammad Modarres, Karl Fleming, Ebisawa as well as a few papers from India. Each one of these documents has been briefly discussed in this chapter.
9.1 Seabrook PSA One of the several studies on multiunit PSA (MUPSA) before the Fukushima accident is the Seabrook risk assessment. In 1983, an integrated risk of a two-unit station at Seabrook was carried out. The report documents Reliability and Probabilistic Safety Assessment in Multi-Unit Nuclear Power Plants. DOI: https://doi.org/10.1016/B978-0-12-819392-1.00005-9
Copyright © 2023 Elsevier Inc. All rights reserved.
221
222
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
an independent probabilistic safety assessment (PSA) of the twin nuclear power plants (NPPs) under construction near Seabrook. The aim is to quantify the risk to public health and safety associated with a potential accident during the period of operation of the station. The study also addressed the emergency planning issues arising from internal and external hazards. Four different operating states viz., 100%, 40%, 25%, and shutdown were considered. Two risk models were considered. The first model characterizes the two unit station as two independent units, that is, with no multiunit interactions. Another simple model is constructed to investigate the effect of one type of multiunit interaction identified for Seabrook station.Initiating events experienced at both units simultaneously are modeled. It is assumed that each unit has the possibility of a single type of accident whose time of occurrence is exponentially distributed with an occurrence rate λ1 for unit 1 and λ2 for unit 2. The main difference between the two models is that in model 2, the accident scenarios always occur concurrently. The Level 1 risk curves are drawn for each unit (as shown in Fig. 9.1) and the composite multiunit station risk curve is seen to be made of three components, two single-unit risk curves that represent the occurrence of a single accident in a given year on either unit and a third which quantifies the risk of two accidents occurring independently but in the same year.Initiating events were analyzed to resolve single and dual reactor impacts resulting from both internal and external hazards. Accident sequences involving releases from each reactor individually and both reactors concurrently were analyzed and frequencies quantified. The common cause failures (CCFs) in the accident sequences were major contributors to multiunit events. The integrated plant risk metrics estimated from the study are shown in Table 9.1. It is reported that among the initiating events, seismic events contributed 86.2%, loss of offsite power contributed 8.6%, external flooding was 4.9%, and truck cash into transmission lines 0.3% for the dual reactor CDF. The major contributors to multi-unit core damage were due to seismic station blackout and seismic loss-of-coolant accident (LOCA). The insights gained from the Seabrook multi-unit PSA study are: r The conditional probability of two core damages given one core damage on either unit is 0.14. r Single reactor risk metric cannot be used to represent integrated site risk. CDF and LERF are not adequate for addressing multi-unit risk. r A significant contribution is noticed for multi-reactor events despite minimum shared support systems and structures between two units. r Seismic correlation is important for low-intensity events but can be screened out for high-intensity events.
International studies related to multiunit PSA: A review
223
Figure 9.1 Combined Level 1 risk to a common initiating event (from: Seabrook PSA report).
9.2 Byron and Braidwood PSA Subsequent to the Seabrook MUPSA, in the year 1997, integrated PSA of two dual-units Westinghouse 4-loop Pressurized Water Reactors (PWRs) for internal events and internal floods at full power was carried out. Both the units had colocated equipment in a common structure and many of the support systems such as service water and AC power were shared between the units. The analysis explicitly modeled the dependencies related to the shared systems and identified the accident sequences that involved both units. The safety evaluation determined that the success criteria for the multiplant configurations with two essential service water (ESW) pumps/plants with crosstie capabilities assumed that one ESW pump can provide adequate cooling to shut down the operating plant through the crosstie connections if the need should arise. However, with only one ESW pump operating, the
224
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Table 9.1 Integrated plant risk metrics. Model type
Risk metric
Single reactor PRA
CDF per reactor year Single reactor CDF per site year Dual reactor CDF per site year Total site CDF per site year
Integrated site PSA of both units
Mean core damage frequency
2.3 × 10−4 4.0 × 10−4 3.2 × 10−5 4.3 × 10−4
other three pumps may not be available in sufficient time if the operating pump fails. The results are applicable to one plant in normal operation and the second plant already in the shutdown or refueling mode of operation. Hence, this analysis shows that multiunit vulnerabilities may exist even when one of the plants is not operational or when it is in a shutdown state. It also identified the potential for additional shared systems to have an inadequate number of available components or inadequate flow rates when one unit is shut down and the other unit is operating.
9.3 Research work at Maryland University, United States At Maryland University, research on MUPSA identified many technical issues and challenges to be resolved. Four different methodologies are presented to quantify dependencies in multiunit PSA. These methodologies are combination, parametric, causal-based, and extension methodologies. Some of these methodologies are still being investigated. The combination method simply requires combining existing singleunit PSAs into multiunit PSA. The double counting of the common SSCs is avoided by representing them as one item in multiunit PSA modeling. The parametric methods rely on a parameter, or several parameters, that are related to a conditional probability for all units. Common cause failure events in single-unit PSAs traditionally adopt parametric methods. However, such parametric modeling may not be directly applicable to multiunit PSA. For example, during a single-unit reactor trip, the supporting systems for that unit will be called upon while other units’ systems usually continue with normal operation. The causal-based method would require that all events be mapped back to a root problem, whether it is a physical failure or an organizational deficiency. This method basically integrates the probabilistic physics of failure
International studies related to multiunit PSA: A review
225
approach to establish the causal relationship. The possibility of building a BBN model to create a causal chain of dependencies of mechanical or thermal loads induced by an external event in multiple units is being studied. The extension method would only require some existing portions of the PSA to be developed further. There is also a special case of the extension method, which uses existing methodologies for external events and applies them to a broader subset of events.
9.4 Korea Atomic Energy Research Institute Korea has four multiunit sites and each site has more than six reactors of various types. MUPSA research activities are being carried out by Korea Atomic Energy Research Institute (KAERI). The main objective of the MUPSA research is to develop and validate the software tools for MUPSA. Though the software tool was developed for a site with six reactor units it can be used for a site with even 10 reactor units. Site risk was assessed by considering five initiating events: multiunit loss of offsite power, multiunit loss of ultimate heat sink, multiunit seismic events, multiunit tsunami events, and simultaneous occurrence of independent single unit initiations at a multiple unit site. The MUPSA project made several assumptions: – All the six units at a site are identical with the exception of diesel generators. – Interunit seismic correlation is the same and the distance between each pair of units is ignored. – Hook-up or additional safety measures implemented after Fukushima are not credited. – Adverse effects of core damage or release in one unit on the other units are not considered. – All six units are in full power operation. Other modes such as low-power and shutdown were considered only in the multiunit loss of offsite power. Spent fuel pools are not included. – Human error probability for human failure events is the same across all units. – A multiunit initiating event challenges all six units simultaneously with the same impact such as the same peak ground acceleration in seismic PSA, the same impact of flooding due to tsunami, etc. The steps followed for MUPSA are summarized below: Step 1: define site risk and multiunit initiators Step 2: estimate the multiunit initiating event (MUIE) frequency
226
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Step 3: develop the logic for MUPSA model Step 4: develop individual unit model and integrate into the top logic Step 5: model interunit dependencies Step 6: quantify the accident sequence frequencies Step 7: extend the Level 1 multiunit scenarios to Level 2 scenarios Step 8: consequence of accident scenarios The nuclear regulatory authority in Korea and the Korea institute of nuclear safety are in the process of developing site level risk metrics.
9.4.1 MUPSA software KAERI developed two quantitative approaches: minimal cut set approach (MCS) and MCS with Monte Carlo approach for multiunit PSA [18] (Fig. 9.2). To overcome the difficulty in managing the huge number of cut sets generated for multiunit, an approach to combine sequences of each unit in the form of a fault tree and quantify multiunit scenarios is developed. As a supplement to the MCS approach, the Monte Carlo technique was adopted for cut set evaluation using a dagger sampling technique. KAERI has also developed a suite of software for MCS generation (FTREX),fault tree evaluation (FTeMC),and for quantification of multiunit PSA scenario (SiTER). Further to this, KAERI extended the software development activity for multiunit Level 2 PSA. A software CONPAS requires the frequency of Level 1 PSA sequences as input and the plant damage states are converted to evaluate the frequency of source term categories for Level 2 PSA. KAERI has also been developing an online PSA tool called on-line consolidator and evaluator of all mode risk for nuclear system (OCEANS). OCEANS is a platform for risk assessment of all power modes and all hazards and integrates them with event tree (ET) and fault tree (FT) analysis. The integrated approach of OCEANS presently addresses single-unit PSA and is being enhanced to include Level 2 and Level 3 PSAs, fire PSA, seismic PSA, and shutdown PSA. Since the Fukushima accident, efforts are on to convert OCEANS for multiunit PSA and build a more holistic riskinformed, performance-based framework.
9.5 CANDU Owners Group The approach followed to date at the Canadian utilities has been to prepare separate PSAs for different hazards. Given the current CANDU PSA practice, the core damage frequency (CDF) and large release frequency
International studies related to multiunit PSA: A review
227
Figure 9.2 MCS approach for multiunit PSA (from: NED, vol. 50(8), 2018).
228
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
(LRF) at the site level cannot be derived through a simple multiplication by the number of units in the station. Current CANDU PSAs are conducted on an individual representative unit where the multiunit effects are duly considered.These include the common mode initiating events that can affect all the units (e.g., Loss of offsite power impact on the reliability of common mitigating systems), as well as the events in the adjacent units leading to the harsh environment (e.g., secondary side steam line breaks and feed water line breaks in the adjacent unit). Therefore, simply multiplying the unit CDF/LRF by the number of units, would give a result that is overestimated where some of the accident sequences are double counted. CNSC staff is following up with the Canadian industry, through the CANDU owners group (COG) risk and reliability group (RRG), on the development of a methodology for the whole site PSA as well on other topics of interest in the PSA area. r The Level 1 at-power internal events PSA is generally prepared first and is used as the “starting point” to develop the PSAs for other hazards. It is also the input to the Level 2 at-power internal events PSA, which contains the detailed modeling of severe accident progression and containment system response, and is used to calculate the frequencies of small or large releases. r The Level 1 outage internal events PSA is generally similar to the Level 1 at-power internal events PSA, but reflects initiating events and mitigating functions that are appropriate given that the selected reference model is in the guaranteed shutdown state with the Heat transport system initially cold. r For external hazards, screening assessments are performed to screen out hazards on the basis of distance or probability (e.g., external flooding) and to identify which hazards should be considered in detailed PSAs. r An internal fire PSA models the response of the selected at-power model unit following fires initiated by in-plant sources, for example, electrical equipment. r An internal flooding PSA models the response of the selected atpower model unit following floods originated from water sources internal to the plant, for example, domestic water or service water pipe breaks. r The seismic PSA models and estimates the risk of severe core damage for the selected at-power model unit following a common mode seismic event.
International studies related to multiunit PSA: A review
229
r
The high wind PSA models and estimates the risk of severe core damage for the selected at-power model unit following a high wind event. The PSAs focus on the radiological risk associated with damage of fuel located in the reactor core. Generally, risks associated with other radiological hazards (e.g., new fuel, irradiated fuel storage, airborne tritium), and risks associated with other nonradiological hazards (e.g., chemical exposure) are not assessed in the Canadian PSAs, but are covered to some extent in separate deterministic analyses. It should also be noted that environmental assessments are other forms of radiological assessments that complement PSA. In recent years, the term “whole-site risk” is being used, meaning the overall risk of the site due to multiple reactor units, other on-site sources of radioactivity such as spent fuel pools and fuel dry storage facilities, internal and external hazards, and other reactor operating modes. The key issues associated with whole-site PSA address the lack of international experience and consensus on the methodology, the procedure for risk aggregation across different hazards and the regulatory criteria for a sitebased PSA.
9.5.1 Proposed site safety goals Two site-based safety goals for a nuclear power plant (NPP), with a focus on nuclear accidents and their potential impacts on the local population, are defined: severe core damage frequency (SCDF) goal and a large off-site release safety goal.
9.5.2 Site CDF The aggregate of frequencies of all event sequences that can lead to significant core degradation in any of one or more reactors on the site should be less than ZSCD per site year.
9.5.3 Large off-site release safety goal The aggregate of frequencies, LRF, of all event sequences that can lead to a total release from the site to the environment of more than X (becquerel) of Y (radionuclide) should be less than ZLRF per site year; where, it is confirmed that smaller releases in terms of X and Y in accident source terms representative of other event sequences do not require extensive long-term relocation of the local population. The SCDF is aggregated for all reactor units sharing a common containment and is expressed in units “occurrences of severe core damage per
230
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
site-yr.” The large release goal should be calculated on a site-wide basis including contributions from all units and radiological sources on the site with the potential to exceed the safety goal and expressed as “occurrences of large release per site-yr.”
9.6 Multiunit PSA studies at EDF France Since 1990, the loss of the heat sink and total loss of power supply in single units were the two major events studied in French PSAs. As in many other countries, these single-unit PSA included some site aspects such as shared site resources and mitigations. Further, in the IRSN PSA, it is assumed that natural hazards always affect all the units at a site. Some of the assumptions are limited availability of water resources for secondary cooling, not more than one unit can simultaneously use common DG or any other backup devices, and unavailability of human resources in two units at the same time. In addition to this, IRSN introduced modeling improvements to MUPSA. The study proposes possible solutions or methodological options in order to switch from a Level 1 unit PSA model to a model for the site to take into account the multiunit dependencies right from initiating events, inter-unit CCF, units shared design or operational weakness and human factor. IRSN recommended EDF to perform MUPSA and consider the latest developments including the Fukushima mitigations. A comprehensive approach for computing the site level risk from an upgraded PSA model (with inter-unit dependencies accounted) is presented with a case study on EDF 900 MWe twin-unit. The study also compares the risk values estimated from a reference PSA model (without the inter-unit dependencies) with the upgraded PSA model and claims that it is a realistic approach. A systematic guideline on classification of initiating events, with typical examples, on the basis of their origin (internal events, internal and external hazards) is also presented. The study also addresses the inter-unit common cause failures for systems that are identical in different units of the same site under study. The consideration of the sharing of human resources across a particular site with respect to repair times, mission durations or success criteria has also been discussed. The main thrust of this study is on providing guidelines on converting the “usual” unit-level PSA into a model for site by taking into account the possible dependencies between the units. The risk at the unit level and the corresponding risk at site level from a reference PSA model and an upgraded PSA model are shown in Table 9.2.
International studies related to multiunit PSA: A review
231
Table 9.2 Site level risk. PSA model
Single unit level risk
Case 1: Reference PSA model 1.67E-10 Case 2: Upgraded PSA model 3.33E-10
Site level risk (single unit risk × 2)
Site level risk (realistic risk)
3.34E-10 6.66E-10
– 6.63E-10
The important conclusions derived from this study are given below: 1. Doubling directly the individual unit risk of the reference Level 1 PSA model (1.67E-10 × 2 = 3.34E-10). This method is not recommended since the reference PSA model often underestimates this risk due to lack of considering the existing dependencies of both units. 2. Doubling the individual risk of the upgraded PSA model (3.33E10 × 2 = 6.66E-10). Since the upgraded unit PSA model takes into account the existing dependencies and the “simultaneous” meltdown of both units P(meltdownA ∩ meltdownB ), therefore by doubling the result of the upgraded unit PSA model, this method overestimates this site risk because of counting twice P(meltdownA ∩ meltdownB ). 3. Calculating a more realistic site risk (6.63E-10) by developing the upgraded PSA models for both units and using the Eq. (9.1). This method gives the result a bit less than that of the second method. P(meltdownsite ) = P(meltdownA ) + P(meltdownB ) − P(meltdownA ∩ meltdownB )
(9.1)
9.7 Fukushima Daiichi experience The Fukushima Daiichi site comprising 6 reactor units was inundated due to the Tsunami caused by the earthquake. Units 1, 2, and 3 experienced core damage and containment breach resulting in large releases of radioactive material. Core damage at Unit 4 was prevented as the core was off loaded into the spent fuel pool. Core damage at Units 5 and 6 was prevented as one Emergency Diesel Generator (EDG) was safeguarded from flooding with prompt operator action. Four hydrogen explosions also occurred that caused damage in the reactor buildings of Units 1, 3, and 4 and the wet well of Unit 2. The accident resulted in a significant release of radioactivity into the public domain, including to the sea, and emergency measures like the evacuation of people in the vicinity of the NPP and some restrictions on the consumption of food items, etc. had to be implemented. It took a large effort over several days before Units 1, 2, and 3 could be stabilized and the cooling of fuel in the storage pools could be restored. The accident was rated at Level 7, the highest level of the international nuclear event scale. The main cause of the accident was due to flood that damaged the emergency switchgear
232
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
and EDGs located in basemen of turbine buildings and resulted in station blackout in Units 1–4. The lessons learned from the accidents include: r Inadequate protection against tsunami. r Location of EDGs and safety-related switchgear in basement. r Inadequate protection of the plant from internal flooding. A proper internal flooding PSA could have identified the vulnerability of structures, systems and components (SSCs) during a flooding event. For example, a major turbine building flood due to internal piping failure could have produced similar consequences. r Organizational deficiencies in clear-cut procedures to be followed during an accident. r The human response under challenging conditions affecting the timely decision-making process needs to be addressed in detail in the MUPSA study. r Too many multi-unit interactions, loss of infrastructure and contamination were huge challenges to operators. r Even if a limited risk assessment study of station blackout events was done, major deficiencies in procedures and accident management guidelines would have been identified. r Earlier study on Tsunami PSA that indicated CDF in the order of 1E2 and 1E-3/site year was ignored and not given due importance.The Fukushima accident has shown that occasionally the magnitude of natural events can be higher than what is considered in the design. It is therefore prudent to make additional design provisions such that at least the basic safety functions for the NPPs are not impaired even under beyond design basis natural events or extreme events. Towards this, it is recommended that the parameters for each postulated extreme natural event be defined conservatively using the best available analytical methods. While design basis external events should govern the design of SSCs, the functionality of the most safety-relevant SSCs should still be maintained under extreme events.In spite of the conservative estimates of the design basis external events of natural origin, there is a residual risk of exceeding these estimates. While absolute quantification of beyond design basis events is not feasible, their probable magnitudes should still be defined for safety margin assessment.
9.8 MUPSA research in India Two approaches are reported for integrated site risk assessment. One approach takes into account the age of different NPPs at a site and proposes
International studies related to multiunit PSA: A review
233
a mathematical model to estimate site core damage frequency. The model assumes that all the units at a site are independent with no shared system interactions between them even though the common factor for external event is considered. The CDF for a unit is assumed to be an equivalent CDF or average CDF over the operating year and then treated as a constant failure process following an exponential distribution. The model adopts a geometric mean of individual unit core damage frequency (due to external events only) to treat it as a common cause event causing damage to two or more units. Another approach which is an extension of schema proposed by Schroer and Modarres models the shared system interactions between the units at the site and proposes a comprehensive risk assessment in a multiunit NPP site. The hazards that will always affect multiple units are called definite hazards and those which only under certain circumstances affect multiple units are called conditional hazards. Examples of definite external hazard (DEH), conditional external hazard (CEH), definite internal initiating event (DIIE), conditional internal initiating event (CIIE), and internal initiating event (IIE) are discussed. The methodology proposed accounts for most of the dependency classes and key issues applicable for a multiple unit NPP site such as initiating events, shared connections, cliff edge effect, identical components, proximity dependencies, mission times and human dependencies. External events such as seismic and tsunami are found to have high potential for multiunit risk. However, a good interface of the shared resources with the plant can reduce the multiunit site risk to a greater extent.This method helps in identification of critical structures, systems and components important for safety in multiunit sites which are otherwise overlooked by carrying out individual unit risk assessment. Further, quantification of risk with this methodology will enable the regulatory authority to make risk-informed decisions in a realistic manner. The method is demonstrated with a case study. Though the main emphasis on multiunit safety is on external hazards, the proposed approach also includes risk from random internal events. The approach developed not only quantifies the frequency of multiple core damage for a multiunit site but also evaluates site core damage frequency which is the frequency of at least single core damage per site per year. A pictorial representation of the methodology is given in Fig. 9.3. The integrated approach leads to the formulation of site core damage frequency as follows: Site core damage frequency, SCDF =
5 m n i=1 j=1 k=1
CDF i, j, k
(9.2)
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
Figure 9.3 Schematic of method for multiunit risk assessment.
234
International studies related to multiunit PSA: A review
235
where, i denotes the number of simultaneous core damages n denotes the number of units at the site j denotes the category of hazard or event k denotes the type of hazard in jth category m denotes the total number of types of hazard in jth category. Therefore, CDF (i, j, and k) denotes the frequency of i number of simultaneous core damages due to jth category of hazard type k; j = 1 refers to definite external hazards for the site j = 2 refers to conditional external hazards for the site j = 3 refers to definite internal events for the site j = 4 refers to conditional internal events for the site j = 5 refers to internal independent events for all units the SCDF accounts for both single and multiple core damages occurring at the site.
9.9 MUPSA approach in the United Kingdom In the United Kingdom, almost all the licensed nuclear site has more than one reactor units and therefore all sites are multiunit sites. Initially, United Kingdom’s nuclear program started with twin-unit concept at a site and subsequently sites with many reactor units and varied facilities came into existence. The regulatory system in the United Kingdom is only goalsetting and does not prescribe requirement in terms of limits as given in 5.7.1 of nuclear regulation guide in the United Kingdom. PSA activities started in 1970 with an objective to include probabilistic evaluation as part of risk evaluation. During the early 1990s, when site safety was considered important, nuclear regulators in the United Kingdom dismissed the problem by stating that a few times higher risk from a multiunit site as compared to the risk by a plant is still acceptable. During the revision of safety assessment principles in 2006 and 2014 after Fukushima, the regulator expected that a safety case should consider the site with multifacilities as a whole and PSA would be performed at least up to Level 2 by including all external events that could lead to a significant offsite release. Methodology development for MUPSA is ongoing.
9.10 Site risk model development in Hungary Nuclear Safety regulations in Hungary require Level 1 and Level 2 PSA and therefore site Level 2 PSA is considered necessary for better characterization
236
Reliability and probabilistic safety assessment in multi-unit nuclear power plants
of release magnitudes and release frequencies. The site risk assessment includes release sources (including the spent fuel pool) and plant operating states (full power, low power, and shutdown states). As a part of the MUPSA model development, site risk for four units at Paks site, in Hungary was conducted. A preliminary MUPSA model of the loss of offsite power initiating event was constructed for unit 1 and 2 of the plant. The model study addressed interrelated aspects such as site level risk metrics, site-level plant operational states with multiple sources of releases, identification of multiunit initiating events, modeling simultaneous multiunit or multisource accident sequences, modeling human reliability in multiunit or multisource accident scenarios. The site level core damage frequencies for a four-unit site like Paks is expressed as SCDF = CDFi + CDFi j + CDFi jk + CDF1234 i
i, j i< j
i, j, k i< j