125 60 6MB
English Pages 294 [286] Year 2023
Risk, Reliability and Safety Engineering
Sunil S. Chirayath M. Sai Baba Editors
Human Reliability Programs in Industries of National Importance for Safety and Security
Risk, Reliability and Safety Engineering Series Editors Prabhakar V. Varde, Reactor Group, Bhabha Atomic Research Centre, Mumbai, Maharashtra, India Ajit Kumar Verma, Western Norway University of Applied Sciences, Faculty of Engineering and Natural Sciences, Haugesund, Norway Uday Kumar, Luleå University of Technology, Luleå, Sweden
In this era of globalization and competitive scenario there is a conscious effort to ensure that while meeting the reliability targets the potential risk to society is minimal and meet the acceptability criteria towards achieving long term targets, including sustainability of a given technology. The objective of reliability is not only limited to customer satisfaction but also important for design, operating systems, products, and services, while complying risk metrics. Particularly when it comes to complex systems, such as, power generation systems, process systems, transport systems, space systems, large banking and financial systems, pharmaceutical systems, the risk metrics becomes an overriding factor while designing and operating engineering systems to ensure reliability not only for mission phase but also for complete life cycle of the entity to satisfy the criteria of sustainable systems. This book series in Risk, Reliability and Safety Engineering covers topics that deal with reliability and risk in traditional sense, that is based on probabilistic notion, the sciencebased approaches like physics-of-failure (PoF), fracture mechanics, prognostics and health management (PHM), dynamic probabilistic risk assessment, risk-informed, riskbased, special considerations for human factor and uncertainty, common cause failure, AI based methods for design and operations, data driven or data mining approaches to the complex systems. Within the scope of the series are monographs, professional books or graduate textbooks and edited volumes on the following topics: • • • • • • • • • • • • • • • • • • • • • • •
Physics of Failure approach to Reliability for Electronics Mechanics of Failure approach to Mechanical Systems Fracture Risk Assessment Condition Monitoring Risk Based-In-service Inspection Common Cause Failure Risk-based audit Risk-informed operations management Reliability Cantered Maintenance Human and Institutional Factors in Operations Human Reliability Reliability Data Analysis Prognostics and Health Management Risk-informed approach Risk-based approach Digital System Reliability Power Electronics Reliability Artificial Intelligence in Operations and Maintenance Dynamic Probabilistic Risk Assessment Uncertainty Aging Assessment & Management Risk and Reliability standards and Codes Industrial Safety
Potential authors who wish to submit a book proposal should contact: Priya Vyas, Editor, e-mail: [email protected]
Sunil S. Chirayath · M. Sai Baba Editors
Human Reliability Programs in Industries of National Importance for Safety and Security
Editors Sunil S. Chirayath Center for Nuclear Security Science and Policy Initiatives (NSSPI) Texas A&M University College Station, TX, USA
M. Sai Baba National Institute of Advanced Studies Bengaluru, India M. S. Ramaiah University of Applied Sciences Bengaluru, India
ISSN 2731-7811 ISSN 2731-782X (electronic) Risk, Reliability and Safety Engineering ISBN 978-981-99-5004-1 ISBN 978-981-99-5005-8 (eBook) https://doi.org/10.1007/978-981-99-5005-8 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Preface
The safe and reliable operation of advanced technologies depends on technological advancements and well-trained individuals who operate them. While safety is a technical matter, it also depends on the operator’s response to the situations that emerge while operating the plants. When humans are posed with situations while handling the technology, the human response to conditions shapes the impact. Human Reliability Analysis (HRA) studies focus on accommodating the human response to situations. HRA is often used in conjunction with Probability Risk Analysis (PRA) to identify severe accident weaknesses so that solutions can be formulated at the design stage of large systems to minimize risk. While HRA focuses on incorporating the changes in the machine or processes to reduce human error, the Human Reliability Program (HRP) or its equivalent focuses on humans to minimize the error. During the operations stage of the advanced technological systems, an HRP or its equivalent can help ensure that the personnel who occupies positions with access to critical assets, operations, or sites meets the highest standards. Human reliability in the context of HRP refers to the consistency in the acts of personnel to perform the assigned tasks based on a protocol. HRP, also known as the trustworthiness program or fitness-for-duty program, is being implemented in several industries, in some fashion or other, and most share some basic similarities. HRP pertains to both security and safety. It is suitable to minimize not only unintentional human errors caused by the operations personnel’s lack of physical and mental stability but also the intentional insider actions resulting from ideological, economic, political, or personal motivations. HRP differs in many respects from HRA, and we realized there was a gap in the availability of scholarly literature in the domain of HRP compared to HRA. Hence, a group of Indian and US subject matter experts (SMEs) consisting of regulators and practitioners from critical industries, researchers from academia, and personnel from institutions that support policymaking met multiple times in India to discuss various components of HRA and HRP. The discussion material provided by these SMEs formed the basis for the chapters presented in this book. These discussions were facilitated through a collaborative research program between the National Institute
v
vi
Preface
of Advanced Studies (NIAS) and the Texas A&M University Center for Nuclear Security and Policy Initiatives (TAMU-NSSPI). In the first section, the authors define the elements of HRP, giving examples of how it is implemented in the US and India and presenting perspectives from industry, government, and academia. The second section presents academic studies that address the latest research into human factors and papers that show how universities and other educational institutions can play a role in fostering human reliability through training and education. The final section includes articles from practitioners that offer perspectives on how HRP factors into the regular operation of various industries. Given that each critical industry has its vertical structure in its operation, the experts concluded that there is scope for horizontal (inter-institutional) sharing of lessons learned in human performance. We believe this edited volume will be a valuable addition to the students and researchers in the field. College Station, USA Bengaluru, India August 2023
Prof. Sunil S. Chirayath Prof. M. Sai Baba
Contents
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sunil S. Chirayath and Magapu Sai Baba
1
Part I Fundamentals of a Human Reliability Program 2
An Overview of Human Reliability Program . . . . . . . . . . . . . . . . . . . . . Gerhard R. Eisele
3
Road to a Sustainable Human Reliability Program and Implementation Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sunil S. Chirayath and Oscar E. Acuna
17
The Significance of Human and Organizational Aspects in Ensuring Safety in High Hazard Installations . . . . . . . . . . . . . . . . . . Dinesh Kumar Shukla
27
Vulnerability of Human Reliability Aspects—Impact on Crucial Industries and Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Natesan Ramamoorthy
35
4
5
6
Psychological Assessments, Employee Ethics and Behavioural Observation Programmes—Important Components of a Robust Human Reliability Programme . . . . . . . . . . . . . . . . . . . . . . Joseph R. Stainback IV
9
45
7
Insider Threats and Strategies to Manage Insider Risk . . . . . . . . . . . Sunil S. Chirayath
51
8
Insider Threat Case Studies from Industry . . . . . . . . . . . . . . . . . . . . . . Kelley H. Ragusa and Sunil S. Chirayath
61
9
Human Reliability Programme in Industries of National Importance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Magapu Sai Baba
73
vii
viii
Part II
Contents
Perspectives on Human Reliability Programs from Academia
10 Relevance of Human Reliability Program: The Role of Academic Institutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reshmi Kazi
83
11 Steps Toward a Human Reliability Engagement with a Grounding in Human Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Vivek Kant and Sanjram Premjit Khanganba 12 Human Reliability Analysis in PSA and Resilience Engineering: Issues and Concerns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Vipul Garg, Mahendra Prasad, and Gopika Vinod 13 Human Reliability: Cognitive Bias in People—System Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Manas K. Mandal and Anirban Mandal 14 A System Theoretic Framework for Modeling Human Attributes, Human–Machine-Interface, and Cybernetics: A Safety Paradigm in Large Industries and Projects . . . . . . . . . . . . . . 139 Kallol Roy 15 Human Reliability and Technology Disruption . . . . . . . . . . . . . . . . . . . 159 Ajey Lele 16 Human Reliability Design—An Approach for Nuclear Power Plants in India . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Amal Xavier Raj Part III Human Reliability Program in Practice 17 Management and Human Reliability: Human Factors as Linchpin of Institutions and Organizations . . . . . . . . . . . . . . . . . . . . 189 Karanam L. Ramakumar 18 Safety and Security in Nuclear Industry—Importance of Good Practices and Human Reliability . . . . . . . . . . . . . . . . . . . . . . . . 197 Gorur R. Srinivasan 19 Human Performance Excellence in the Nuclear Power Corporation of India Limited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 Ravi Satyanarayana 20 HRA Techniques Used in Nuclear Power Plants . . . . . . . . . . . . . . . . . . 233 C. Senthil Kumar 21 HRP in the Aviation Sector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 Kishan K. Nohwar
Contents
ix
22 Myriad Ways in Which Things Could Have Gone Wrong but Did Not . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 Dinesh Srivastava 23 Human Reliability in Launch Vehicle Systems . . . . . . . . . . . . . . . . . . . 259 Rajaram Nagappa 24 Improvement of Human Reliability & Human Organisational Factors at NPPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 Natarajan Kanagalakshmi
About the Editors
Prof. Sunil S. Chirayath is Director of the Centre for Nuclear Security Science and Policy Initiatives (NSSPI) since June 2015 and serves as Professor in the Department of Nuclear Engineering at Texas A&M University. He has more than 30 years of experience in nuclear science and engineering fields and has worked in various capacities in academia and industry, including serving as a scientific staff member of the Indian Atomic Energy Regulatory Board for 18 years. He has more than 200 technical publications in refereed journals and national and international conference proceedings. Prof. Chirayath conducts research in nuclear security and non-proliferation. He teaches courses on nuclear fuel cycle, nuclear material safeguards, nuclear security, and Monte Carlo radiation transport. His research interests include development of safeguards approaches for advanced reactors, proliferation resistance analysis of nuclear fuel cycle facilities, nuclear forensics, and nuclear security insider threat analysis. Prof. M. Sai Baba is Outstanding Scientist and formerly Director, Resources Management Group at Indira Gandhi Centre for Atomic Research (IGCAR), Kalpakkam and Senior Professor at Homi Bhabha National Institute. He held the “TV Raman Pai Chair Professor” at the National Institute of Advanced Studies, Bengaluru, where he is a Visiting Professor. Currently he is with MS Ramaiah University of Applied Sciences and is a Professor and Director of Centre for Professional Development and Training. He is currently working in the domains of science and risk communication, human reliability program, and understanding ancient Indian knowledge systems for applying them for the holistic development of youth. Prof. M. Sai Baba is an accomplished researcher, scientific administrator, and institutional builder. He has made significant contributions to implementing several high-impact activities relevant to the Department of Atomic Energy.
xi
Chapter 1
Introduction Sunil S. Chirayath and Magapu Sai Baba
Human Reliability (to describe human performance) is a critical consideration in fields requiring high standards of safety, such as the aviation, petroleum, chemical, and nuclear industries. Human behavior always poses an inherent risk and introduces errors in the operation of a system or process. James Reason, who conducted a study of US Navy nuclear aircraft carriers, nuclear power plants, and air traffic control centers, argues that, due to the potential consequences of failure in their industries, “high reliability” organizations must be preoccupied with errors and expect that human errors will occur so that they can prepare to mitigate them or train to overcome them (Reason 2000). Furthermore, human factors can either positively or negatively affect performance in the workplace, and an organization that is aware of the importance of human factors and has systems in place to ensure the reliability of its staff will be better able to deal with critical situations that may arise. Although human errors can be minimized through technology and education or training/retraining programs, there are also some human actions (such as insider actions) that could be intentional and could compromise the safety and security at the workplace due to ideological, economic, political, or personal motivations. An organization must, therefore, also be equipped to detect and respond to potential security threats from within. A Human Reliability Program (HRP) introduces a set of measures intended to ensure that individuals who occupy positions with access to critical assets/operations/ sites meet the highest standards. These standards include adherence to safety and S. S. Chirayath (B) Center for Nuclear Security Science and Policy Initiatives (NSSPI), Texas A&M University, College Station, TX, USA e-mail: [email protected] M. Sai Baba National Institute of Advanced Studies and MS Ramaiah University of Applied Sciences, Bengaluru, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. S. Chirayath and M. Sai Baba (eds.), Human Reliability Programs in Industries of National Importance for Safety and Security, Risk, Reliability and Safety Engineering, https://doi.org/10.1007/978-981-99-5005-8_1
1
2
S. S. Chirayath and M. Sai Baba
security rules and regulations (reliability), high levels of moral character (trustworthiness), and physical and mental stability. There are myriad ways to implement HRP in industries; however, most programs will share some basic similarities. Human reliability in the context of HRP refers to the consistency in the acts of personnel to perform the assigned tasks based on a protocol (Shaughnessy et al 2000). HRP is different in many respects from Human Reliability Assessment (HRA). HRA is a method to estimate human reliability (Swain and Guttmann 1983) and predict potential human error by identifying the type of errors that can occur, determining when these errors can occur, and reducing the likelihood of their occurrence (Hollnagel 2005). In contrast, HRP is a set of procedures, protocols, and the corresponding performance of activities, including those by personnel, to support and sustain the secure and safe operation of a facility. The main objective of HRP, designed for a critical facility, is to ensure the availability of trustworthy employees to avoid undesired incidents and consequences. One of the important components of HRP is a robust system of Personnel Access Control (PAC) to critical assets and information (Hollnagel 2005). Flin et al. have recommended that personnel in high-risk industries should frequently consult the operating manual before taking any action (Flin et al 2008). Hence, rule-based (protocol-based) decision-making is envisaged in high-risk industries to avoid mishaps. Enya et al. reported that employees at high-risk industries work in an environment where they must operate nearly error and accident free (Enya and Dempsey 2018). To keep high-risk industries accident free, human error must be minimized through robust training programs and systems design. Protocols, an essential part of an HRP, developed at the organizational level and implemented with the availability of occupational psychologists, result in the minimization of accidents (Flin et al 2008). HRP protocols are effective for curbing unintentional human errors, but there are also some human actions that could be intentional (malicious) and intended to compromise the safety and security at the workplace. If such malicious acts are done by personnel within the facility, then it is referred to as an insider threat. The Centre for Protection and National Infrastructure has defined insiders as personnel exploiting their legitimate access, knowledge, and authority for unauthorized purposes, (Centre for the Protection of National Infrastructure 2021) and malicious actions by insiders could result in large-scale catastrophes. The importance of HRP comes in this context to minimize, if not prevent, such threats in the workplace. HRP is meant for individuals who occupy positions with access to critical assets, operations, sites, and information, and who must meet the highest standards of adherence to safety and security rules and regulations in their workplace. HRP components are planned to ensure confidence in individuals based on their character, including their physical and mental stability. In the context of HRP, note that safety features are implemented to protect against random, unwanted incidents, whereas security features are built to prevent and protect against deliberately planned malicious acts. Safety is also reported as the condition of being protected from random incidents that may cause danger, risk, or injury, while security is being free from threat (Pearsall and Hanks 2001). The underlying feature of safety and security is to protect assets (Albrechtsen 2003). Components of HRP
1 Introduction
3
fit into the safety and security framework, specifically in ensuring the availability of trustworthy individuals at the workplace, who can be assigned to positions with access to critical assets, operations, sites, and information. Components of HRP aid in identifying the employees whose judgement is marred and protecting the facility from the consequences by ensuring the safety and security of the facility. Employees go through HRP orientation and training before starting an HRP assignment to not only make them aware of their responsibilities but also to inform them of the various services (for example, Employee Assistance and Wellness Service Programs) available to them for any financial, emotional, and health assistance. Furthermore, any HRP information is treated as confidential. This aspect of confidentiality makes employees more forthcoming about any problems they might be facing and promotes a positive work environment. Oak Ridge National Laboratory (Coates and Eisele 2014) has reported that major safety-related accidents, such as that of the Three Mile Island and Chernobyl nuclear reactors, could have been averted if an HRP had been in place. A collaborative research program was undertaken by the Texas A&M University Center for Nuclear Security Science and Policy Initiatives (NSSPI) and the National Institute of Advanced Studies (NIAS). As part of this project, they jointly conducted a series of five meetings to discuss “Human Reliability Program (HRP) in Industries of National Importance” on the campus of NIAS in Bengaluru, India and virtually. The main objective of the meetings was to discuss various elements of HRP by bringing Indian and U.S. subject matter experts together to identify good practices in safety and security with respect to HRP and its implementation challenges with the goal of strengthening HRP activities within these industries of national importance in India (Sai Baba and Chirayath 2021). The discussion meetings were multi-disciplinary in nature due to the presence of Indian and U.S. subject matter experts from academia, the nuclear industry, nuclear research centers, as well as from think-tank-type organizations that advise industries of national importance such as aviation and defense. A majority of the participants had experience in at least one of the elements of HRP; however, it was obvious that even though some of the HRP elements are being implemented within both the aviation and nuclear industries of India, the programs are not using the same structure as in the U.S. (Sai Baba and Chirayath 2021). The discussions showed that HRP is being implemented in some fashion in the aerospace, airline, defense, biotechnology, Nuclear Power Corporation, National Thermal Power Corporation, and chemical industries in India. Given the fact that each critical industry has its own vertical structure in the way they operate, the experts concluded that there is scope for horizontal (inter-institutional) sharing of lessons learned in human performance. The data on human performance from the various industries could be systematically collected and discussed domestically among experts, which would also remove the focus on nuclear industry and instead focus on the critical industries of national importance as a whole. They also emphasized that academic and basic research institutions should work together with the
4
S. S. Chirayath and M. Sai Baba
government and industry to provide both research into human factors in human reliability analysis and training for HRP. The papers that follow are based on selected presentations given during the series of meetings on HRP. In the first section, the authors define the elements of HRP, giving examples of how it is implemented in the U.S. and in India and presenting perspectives from industry, government, and academia. This section begins with a paper by Dr. Gerhard R. Eisele, who was intimately involved in the shaping of the U.S. HRP during his time as Director of the Center for Human Reliability Studies at Oak Ridge Associated Universities. He presents the structure of the HRP developed by the U.S. Department of Energy that is largely applied to nuclear installations in the U.S. as an example of a formalized HRP currently in place. The paper by Dr. Sunil Chirayath and Dr. Oscar Acuna of Texas A&M University that follows provides a roadmap for developing a successful HRP that includes recommendations for managerial structures and administration for HRP. Mr. Dinesh Kumar Shukla, who previously served as executive director of the Indian Atomic Energy Regulatory Board (AERB) and the chairman of Safety Review Committee for Operating Plants (SARCOP), gives his perspective on the importance of human and organizational aspects of safety and security and emphasizes the limitations of purely technical means of eliminating error. Dr. Natesan Ramamoorthy of NIAS presents a series of case studies and outlines the human factor features that can pose a risk to critical industries. The paper by Dr. Joseph Stainback of the University of Tennessee outlines the basics of psychological testing and behavior observation programs used in the U.S. as part of their HRP. The paper by Dr. Sunil Chirayath of Texas A&M emphasizes the grave threat posed by insiders in critical industries and describes the insider motivations—such as money, ego, coercion, and ideology—that should be addressed in any HRP through a series of preventative measures. This paper is supported by a review of insider threat case studies from industry—nuclear, aerospace, information technology, etc.—by Ms. Kelley Ragusa and Dr. Sunil Chirayath. Finally, Dr. M. Sai Baba’s paper illustrates the importance of HRP in a number of different industries of national importance to reduce the number and the impact of incidents and accidents in high hazard installations. The second section presents academic studies that address the latest research into human factors, as well as papers that show how universities and other educational institutions can play a role in fostering human reliability through training and education. This latter point is the theme of the first paper in this section by Dr. Reshmi Kazi of the Nelson Mandela Centre for Peace and Conflict Resolution at the Jamia Millia Islamia University. She explains how academic institutions can serve not only as training centers, but also as organizers of multi-disciplinary discussions of HRP between industry, academics, and other interested parties. Dr. Vivek Kant and Dr. Sanjram Khanganba then discuss human factors as an academic discipline and explain the phases of development that have occurred in our understanding of human factors and how knowledge from this field could be applied to the development of HRP. The paper by Mr. Vipul Garg, Dr. Mahendra Prasad, and Dr. Gopika Vinod of the Reactor Safety Division in the Bhabha Atomic Research Center (BARC) analyzes the Probabilistic Safety Assessment process in the nuclear industry and the Human Reliability Analysis methods that are a part of the industry’s standard safety risk analysis. Dr.
1 Introduction
5
Manas Mandal of the Indian Institute of Technology, Kharagpur and Dr. Anirban Mandal of the University of Dayton, Ohio give an overview of human cognition and the cognitive biases involved in behavior that factor into human-to-machine interactions. Dr. Kallol Roy formerly from Bharatiya Nabhikiya Vidyut Nigam Limited (BHAVINI) in the Department of Atomic Energy picks up the human–machineinterface discussion and proposes models to quantitatively assess human performance parameters for the different stages of a large infrastructure project like the commissioning of a nuclear power plant. Dr. Ajey Lele of the Institute for Defense Studies Analyzes argues that in our current era of disruptive technology, there is a need to identify possible positive and negative behavioral aspects of humans and recognize that automation is ultimately made by and for humans. Dr. Amal Xavier Raj of NIAS concludes the group of papers from academia with a study of how human reliability design concepts could be applied to nuclear power plants in India. The final section includes papers from practitioners that show how HRP factors into the normal operation of various industries. Dr. Karanam L. Ramakumar, former head of the Nuclear Controls and Planning Wing of the Indian Department of Atomic Energy, highlights the role of management in assuring human reliability in an organization through numerous examples, including that of an analytical laboratory where quality control is paramount and human reliability has a real impact. Mr. G. R. Srinivasan, former vice chairman of the Atomic Energy Regulatory Board and former director of Projects for the Nuclear Power Corporation, discusses the importance of “cradle-to-grave” attention to human reliability, as well as engaging in good practices in the nuclear industry that could reduce the influence of human factors, including good safety and security cultures, reducing the need for human intervention, having procedures in place, practicing good design, etc. In the paper by Mr. Ravi Satyanarayana, who is the former site director of the Kaiga Site of the Nuclear Power Corporation of India Limited (NPCIL), he elaborates the policy and procedural aspects of human performance improvement initiatives at NPCIL and gives both a rationale for these initiatives and a list of good practices adopted to address human reliability. Dr. C. Senthil Kumar of the AERB-Southern Regional Regulatory Center then discusses some of the human reliability analysis (HRA) models used in the risk assessment process in nuclear power plants with special emphasis on human cognitive reliability models. Retired Air Marshal Kishan K. Nohwar of the Indian Air Force describes the HRP that has been adopted by individual airlines like Air India and argues that many aviation disasters could have been avoided with better HRP in place. In his paper, former Director of the Variable Energy Cyclotron Center, Dr. Dinesh Srivastava details how human reliability works in practice at the Center and has contributed to over five decades of operation without serious accidents or incidents. Dr. Rajaram Nagappa of NIAS then provides a perspective of human reliability analysis in the satellite industry, where small human errors or departures from the required procedures can result in mission failures. Ms. Natarajan Kanagalakshmi, who serves as engineer-in-charge of fire and industrial safety at The Bharatiya Nabhikiya Vidyut Nigam India Limited (BHAVINI), describes a cognitivebased model for fast-diagnosis, decision-making, and action for crew performance at nuclear power plants and suggests practical measures based in human factors research
6
S. S. Chirayath and M. Sai Baba
that could be implemented by decision makers and line managers to improve human reliability. The majority of the papers in this volume were presented in the initial meetings of the group to discuss HRP and served as a basis from which further discussions grew.
References Albrechtsen E (2003) Security vs safety. Department of Industrial Economics and Technology Management, Norwegian University of Science and Technology. [Online]. https://www.iot.ntnu. no/users/albrecht/rapporter/notat%20safety%20v%20security.pdf Centre for the Protection of National Infrastructure (2021) CPNI collection of guidance documents, 25 May 2021. [Online]. https://www.cpni.gov.uk/reducing-insider-risk. Accessed 9 Sep 2021 Coates CW, Eisele GR (2014) Human reliability implementation guide. Oak Ridge National Laboratory Enya MP, Dempsey S (2018) A systematic review on high reliability organizational theory as a safety management strategy in construction. Safety 4(1):6 Flin R, O’Connor P, Crichton M (2008) Safety at the sharp end: a guide to non-technical skills. Ashgate Publishing, Farnham, UK Hollnagel E (2005) Human reliability assessment in context. Nucl Eng Technol 37(2):159–166 Pearsall J, Hanks P (2001) The new Oxford dictionary of English. Clarendon Press, Oxford, UK Reason J (2000) Human error: models and management. BMJ 320(7237):768–770 Sai Baba M, Chirayath S (2021) Summary of the meetings on human reliability program in industries of national importance jointly organized by NIAS, India and Texas A&M University, USA, 2021. [Online]. https://nsspi.tamu.edu/summary-of-the-meetings-on-human-reliabilityprogram-in-industries-of-national-importance-jointly-organized-by-nias-india-and-texas-amuniversity-usa-77305/ Shaughnessy J, Zechmeister E, Zechmeister J (2000) Research methods in psychology. McGrawHill, New York Swain and Guttmann H (1983) Handbook of human reliability analysis with emphasis on nuclear power plant applications. NUREG/CR-1278, August 1983. [Online]. https://www.nrc.gov/docs/ ML0712/ML071210299.pdf
Part I
Fundamentals of a Human Reliability Program
Chapter 2
An Overview of Human Reliability Program Gerhard R. Eisele
2.1 What is the HRP? The Human Reliability Program (HRP) is a security and safety reliability program designed to ensure that individuals who occupy positions affording access to certain materials, facilities, and programs meet the highest standards of reliability and physical and mental suitability. This objective is accomplished through a system of continuous evaluation that identifies individuals whose judgment and reliability may be impaired by physical or mental/personality disorders, alcohol abuse, use of illegal drugs, the abuse of legal drugs or other substances, or any other condition or circumstance that may be of a security or safety concern.
2.2 Who Must Be HRP-Certified? HRP certification is required for individuals assigned to or applying for a position that allows an individual to have access to sensitive materials (nuclear and non-nuclear), facilities, and programs.1 An HRP position affords the potential to significantly affect national security or cause unacceptable damage to a facility, institution, or program. Before such 1 Information provided herein is from the United States Department of Energy (DOE) Human Reliability Program (HRP) for a complete glossary of terms used in this chapter, see the glossary in the DOE Human Reliability Program Handbook (US Department of Energy).
G. R. Eisele (B) Consultant Oak Ridge National Laboratory, Oak Ridge, USA e-mail: [email protected] Former Director Center for Human Reliability Studies, Oak Ridge Associated Universities, Oak Ridge, TN, USA © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. S. Chirayath and M. Sai Baba (eds.), Human Reliability Programs in Industries of National Importance for Safety and Security, Risk, Reliability and Safety Engineering, https://doi.org/10.1007/978-981-99-5005-8_2
9
10
G. R. Eisele
nomination, the Manager or the HRP management official must analyze the risks the position poses for the particular operational program. If the analysis shows that more restrictive physical, administrative, or other controls could be implemented that would prevent the position from being designated an HRP position, those controls will be implemented if practical.
2.3 What Are the HRP Requirements? A security clearance. A security clearance is granted by a governmental entity to have access to classified/sensitive information, materials, and facilities on a need-toknow basis. Signed releases, acknowledgments, and waivers. You must review and sign documents to facilitate the collection and dissemination of information and the performance of medical assessments and drug and alcohol testing. Completion of HRP instruction. HRP instruction must be completed for initial certification and annual recertification. The instruction includes the following elements: • Objectives of the HRP • The role and responsibilities of each HRP-certified individual, including: – Recognizing and responding to behavioral change and aberrant or unusual behavior that may result in a risk to national security or nuclear security – Recognizing and reporting nuclear security concerns – Reporting prescription drug use • Requirements for returning to work after sick leave • The HRP continuous behavioral observation evaluation process Counterintelligence (CI) evaluation. Individuals who occupy certain HRP positions may be required to successfully complete a CI evaluation.
2.3.1 Completion of Reviews, Evaluations, and Assessments • Supervisory review. Each supervisor of an HRP candidate or HRP-certified individual must conduct an initial and annual review to evaluate information (including security concerns) relevant to that individual’s suitability to perform HRP tasks in a reliable and safe manner. • Medical assessment. The medical assessment is performed for initial certification and then annually for recertification. A medical assessment may be performed more often if required by the site occupational medical director (SOMD). The designated physician, under the supervision of the SOMD, is responsible for the medical assessment of HRP candidates and HRP-certified individuals. In
2 An Overview of Human Reliability Program
11
performing this responsibility, the designated physician or the SOMD must integrate the medical evaluation, available test results, the psychological evaluation, a review of current legal drug use, and any other relevant information. This information is used to determine if a safety or security reliability concern exists and if the individual is medically qualified for HRP duties. – Psychological evaluation. As part of the medical assessment, a psychological evaluation must be conducted for initial HRP certification. This evaluation consists of a psychological assessment (test) and a semi-structured interview. For recertification, the evaluation consists of a semi-structured interview, but a psychological test may also be conducted if warranted. Every third year, the psychological evaluation includes a psychological test. • Management evaluation. The HRP management official considers the results of the supervisory review, medical assessment, drug and alcohol test results, and any other information relating to an individual’s reliability and trustworthiness and makes a recommendation regarding certification. • Personnel security review. A personnel security specialist will perform a personnel security file review upon receiving the individual’s supervisory review, medical assessment, and management evaluation and recommendation. Security concerns identified at any stage of the certification process will be evaluated and resolved in accordance with governmental regulations and procedures.
2.3.2 Drug and Alcohol Testing • Initial. All HRP candidates will be tested for the use of alcohol and illegal drugs before HRP certification is granted. • Random drug test. HRP-certified individuals are selected randomly, at least once in every 12-month period, for unscheduled and unannounced testing for the presence of illegal drugs. A confirmed positive drug test is considered a security concern that will result in immediate removal from HRP duties and adjudication under appropriate regulatory and procedural criteria. • Random alcohol test. HRP-certified individuals are selected randomly, at least once in every 12-month period, for unscheduled and unannounced testing for the presence of alcohol. A positive test is a blood alcohol concentration of 0.02 or greater on a confirmatory blood alcohol concentration test. A person who tests positive will be sent home and not be allowed to perform HRP duties for 24 h. The management official will be notified and based on alcohol testing history may require disciplinary action up to termination of employment.
12
G. R. Eisele
2.3.3 Other Requirements • Individuals performing certain critical nuclear duties are prohibited from consuming alcohol within an eight-hour period preceding scheduled work.
2.4 How Often Are HRP Requirements Performed? You must receive HRP certification before performing HRP duties and then be recertified every 12 months. The table below details the interval at which each requirement is performed. Supervisory review
Completed for initial HRP certification; annually for recertification
Medical assessment
Completed for initial HRP certification; annually for recertification
Psychological evaluation
Completed for initial HRP certification; annually for recertification Semi-structured interview—initially, then annually Psychological test—initially, then every 3 years
Management evaluation
Completed for initial HRP certification; annually for recertification
Drug test; Alcohol test
Completed for initial HRP certification and then, for recertification, on a random basis at least once every 12 months from the last test
Personnel security Completed for initial HRP certification; annually for recertification review HRP instruction
Completed for initial HRP certification; annually for recertification
2.5 What Are My HRP Responsibilities? As an HRP-certified individual, you have several responsibilities. You must: • Read, sign, and submit HRP releases, acknowledgments, and waivers to facilitate the collection and dissemination of information and the performance of medical examinations and drug and alcohol tests. • Provide full, frank, and truthful answers to relevant and material questions and, when requested, furnish, or authorize others to furnish, information that is deemed pertinent to reaching a decision on HRP certification or recertification. • Notify the designated physician, the designated psychologist, or the SOMD immediately of a physical or mental condition that requires medication or treatment.
2 An Overview of Human Reliability Program
13
• Report reliability concerns, including any observed or reported behavior or condition of another HRP-certified individual, to a supervisor, the designated physician, the designated psychologist, the SOMD, or the HRP management official. • If you have any behavior or condition that may affect your ability to perform your HRP duties, you must make a report to your supervisor, the designated physician, the designated psychologist, the SOMD, or the HRP management official. You are also required to: • Report in person to the designated physician, the designated psychologist, or the SOMD to obtain a written recommendation to return to work if you have been on sick leave for 5 or more consecutive days (or an equivalent amount of time). • Report for drug or alcohol testing if required following involvement in an incident, unsafe practice, or an occurrence, or if there is a reasonable suspicion that you may be impaired.
2.6 When Should I Report a Concern? As you learned in the responsibilities section, you must report the observed or reported behavior or condition of an HRP-certified individual that could indicate a safety or security reliability concern. Simply put, a reliability concern is any behavior or condition that is unusual or out of the ordinary that could potentially affect a person’s ability to adhere to security and safety requirements. Terms you should be familiar with are: • Reliability. An individual’s ability to adhere to security and safety requirements. • Safety concern. Any condition, practice, or violation that causes a substantial probability of physical harm, property loss, and/or environmental impact. • Security concern. Information regarding an HRP candidate or HRP-certified individual that may be considered derogatory (breach of promise, trust, disloyal, divided allegiance, dishonest)
2.7 What Might Be a Concern? • Mental/personality or physical disorder that impairs performance • Conduct that warrants referral for a criminal investigation or results in arrest or conviction • Indication of deceitful or delinquent behavior • Attempted or threatened destruction of property or life • Suicidal tendencies or attempted suicide • Use of illegal drugs; abuse of legal drugs or other substances • Alcohol use disorder • Recurring financial irresponsibility
14
• • • • • •
G. R. Eisele
Irresponsibility in performing assigned duties Inability to deal with stress, or the appearance of being under unusual stress Failure to comply with work directives; violation of safety or security procedures Hostility or aggression toward fellow workers or authority; uncontrolled anger Repeated absenteeism Significant behavioral changes, moodiness, depression
This list is not intended to be all-inclusive. If a behavior is not in character for a person, or if it appears the behavior could be a result of drug or alcohol abuse, stress, or serious personal problems/issues, it should be reported as a concern.
2.8 How Do I Report a Concern? If you have a concern about an HRP worker’s physical or mental condition, behavior, or actions, immediately report it to any HRP supervisor, the designated physician, the designated psychologist, the SOMD, or the HRP management official.
2.9 What Happens When a Reliability Concern is Identified? If a reasonable belief or credible evidence exists that an HRP-certified individual is not reliable, his or her supervisor must immediately do the following: • Require the individual to stop performing HRP duties. • Ensure the individual is denied both escorted and unescorted access to the HRP work areas. Within 24 h after taking the above actions, the supervisor must provide written notification to both the individual and the HRP management official of the reason for the actions. Immediate removal from HRP duties is an interim, precautionary action and does not constitute a determination that the individual is not fit to perform his or her required duties. Removal is not, in itself, a cause for loss of pay, benefits, or other changes in employment status.
2.10 What Happens When Someone is Removed from HRP Duties? If removal is due to a security concern:
2 An Overview of Human Reliability Program
15
• The HRP management official must notify the HRP certifying official and the applicable personnel security office. The concern is investigated and adjudicated by personnel security. If removal is due to a safety concern, the HRP management official: • Evaluates the circumstances or information that led to the removal of the individual from HRP duties. • Prepares a written report of the evaluation that includes a determination of the individual’s reliability for continuing HRP certification. If the HRP management official determines that an individual who has been temporarily removed continues to meet the requirements for certification, he or she must notify: • The individual’s supervisor, directing that the individual be allowed to return to HRP duties. • The individual. • The HRP certifying official. When an individual has been temporarily removed, the certifying official2 takes one of the following three actions: 1. Reinstates, and provides a written explanation of the factual basis for the action. 2. Continues temporary removal and directs action to resolve concerns (for example, rehabilitation). After completion of the intervention, the matter will be re-evaluated. 3. Revokes certification, sending a written decision, which includes the rationale for the action and the procedures for reconsideration/appeal, to the affected individual by certified mail.
2.11 What Are My Options? If your HRP certification is revoked, you may choose one of the three following options: 1. Take no action. 2. Submit a written request for reconsideration, which addresses the information or situation that initiated the concern, to the Manager. The Manager’s decision on the request for reconsideration is final.
2
The Manager may serve as the certifying official.
16
G. R. Eisele
2.12 Sample HRP Self-Assessment Questionnaire Note: questions are general and should be based on culture/institution additional specific questions should be developed that relate to organization structure, regulations and policies, and procedures. Areas that should be assessed are program organization, position designation and job task, program process, and program administration. Below Are Sample Questions • • • • • • • • • •
Responsible individual, organization, identified Work locations/areas monitored for only authorized access Annual security/threat briefings Clear documentation on standard operating procedures (SOP) Periodic review of policies and procedures Behavioral observation training and monitoring awareness Access records maintained and periodic evaluations Required annual training on all aspects of program Reporting procedures Understanding and adhering to need to know and confidentiality of information
The final version of a self-assessment will be based on the organization, institution, facility, and stakeholders.
Reference US Department of Energy. Human reliability program handbook. [Online]. Available: https://www. energy.gov/ehss/human-reliability-program-handbook
Chapter 3
Road to a Sustainable Human Reliability Program and Implementation Strategy Sunil S. Chirayath and Oscar E. Acuna
3.1 Introduction Factors associated with human performance in the implementation of security at a nuclear facility (or for that matter any critical industry of national importance) is a complex issue, especially with regard to the potential threats posed by insiders (e.g., facility employees, contractors, or visitors). The Human Reliability Program (HRP) represents an important measure to address insider threats, which is designed to screen out individuals who are neither trustworthy nor fit to perform critical and/ or sensitive duties. The goal of HRP in a nuclear facility is to prevent the loss of nuclear materials, facility sabotage, or any other malicious acts of the insiders. An HRP is a security and safety reliability program designed to ensure that individuals who occupy positions with access to certain nuclear materials, facilities, and programs meet the highest standards of reliability, or trustworthiness, including physical and mental stability. Sometimes called a Personnel Reliability Program or a Structured Trusted Employee Program, an HRP recognizes the importance of selecting and retaining sustainably reliable, trustworthy, and suitable individuals to maintain secure and safe facilities (Chirayath 2014).
S. S. Chirayath (B) · O. E. Acuna Center for Nuclear Security Sciences and Policy Initiatives (NSSPI), Texas A&M University, College Station, TX, USA e-mail: [email protected] O. E. Acuna e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. S. Chirayath and M. Sai Baba (eds.), Human Reliability Programs in Industries of National Importance for Safety and Security, Risk, Reliability and Safety Engineering, https://doi.org/10.1007/978-981-99-5005-8_3
17
18
S. S. Chirayath and O. E. Acuna
3.2 Why Develop an HRP? The concept of human reliability plays an important role in both nuclear safety and nuclear security. The major focus of human reliability in safety has to do with error prevention. The goal is to understand what leads human beings to inadvertently make mistakes in the workplace so that errors can be minimized or prevented altogether. Human errors may be caused by a system or design flaw and encompass both humanmachine and human-task mismatches. Contributing factors can range from lack of training and poor ergonomics to overwork and fatigue. Factors like error prevention, system design, ergonomics, and training also play important roles in security (WINS 2019). HRP can help mitigate insider risk by ensuring that personnel who hold sensitive jobs are—and remain—reliable and do not become either an unintentional or deliberate threat to the organization. These programs seek to understand human motivations and behaviors and to identify the internal and external influences that can negatively affect an individual’s trustworthiness and reliability. HRP also represents a sophisticated level of staff evaluation and monitoring. Staff are continuously monitored and evaluated for unusual behavior. To verify the reliability of employees in positions of trust, security focused HRPs put measures in place such as Fitness for Duty (FFD) program, which has a requirement for reporting unusual behavior.
3.3 HRP Benefits Considering the numerous potential benefits and the serious potential consequences of insider threats, it is important that nuclear organizations address this important responsibility in an appropriate manner. The first place to start is with an understanding of human motivations and behaviors in the workplace. Nuclear operators seek personnel who can be trusted with sensitive information, critical technology, and hazardous nuclear and radioactive materials. This requires employees who are honest, dependable, and mentally and physically stable. Social backgrounds and external influences, as well as a host of other influential factors, can create undue levels of vulnerability, altering a person’s dependability, moral character, motivations, and allegiances. History has repeatedly shown how such changes have catalyzed insider threats, employee crime, and weaknesses in nuclear safety and security, sometimes leading to serious consequences. An HRP is focused on preventing both mistakes and malicious acts; when implemented well, it can also strengthen a facility’s workforce and provide operational improvements. Some of the benefits of developing an effective HRP include: • • • •
Enhances security and safety culture; Mitigates insider threat; Leads to the retention of valued employees; Increases ethical decision-making and vigilance;
3 Road to a Sustainable Human Reliability Program and Implementation …
• • • •
19
Enhances national security posture; Increases employee trustworthiness and dependability; Increases employee morale and motivation; and Increases employee loyalty, ownership, and commitment.
3.4 Steps to Developing an HRP The main steps to developing an HRP are typically the following: 1. Recognize the Insider Threat The first step in developing an HRP is the acceptance by management and the regulator of the concept that an insider may become a threat to the facility. What makes insiders so potentially dangerous is that they are able to take advantage of their access rights, knowledge of a facility, and authority over other staff to bypass dedicated security measures. Furthermore, insiders are likely to have the time to plan their actions and recruit others. The threat significantly increases when an insider colludes with another insider—or an outsider—to carry out a malicious act. Therefore, it is critical that leadership recognizes that the threat is real and puts policies and programs in place that support human reliability and an effective security culture. 2. Establish an Executive Committee (EC) A strong philosophy of human reliability begins with an organization’s leadership. The senior management is responsible for establishing an Executive Committee (EC), which consists of a leadership team that works to define the parameters and program elements relevant to cultures and organizational makeup. The EC undertakes the following tasks: • Drafts specific Terms of Reference (TOR), which are the guiding principles for the implementation team; • Identifies members of the implementation team tasked with developing and empowering the HRP management; and • Identifies key stakeholders. An effective HRP must identify organizations, departments, and persons that can either affect or be affected by the program. Because human reliability is a critical component of safety and security, management should engage all stakeholders in the HRP implementation process. Stakeholders with dissenting viewpoints should be encouraged to make their concerns known in an appropriate forum; in turn, senior management should carefully consider such concerns and take whatever actions are appropriate to respond to them. Questions to consider when determining the stakeholders include the following: • What type of facility or information needs protection? • How is access to sensitive information controlled? • How are personnel with access to sensitive information controlled?
20
S. S. Chirayath and O. E. Acuna
• What are the significant local threats to the organization? • What are all of the organizations responsible for safety and security? Among the key stakeholders are the following: • • • • •
Senior management; Security Department; Safety Department, Information Technology Department; Reactor/Nuclear Systems Department; and Engineering Department.
3. Develop the Regulatory Basis for HRP Implementation The regulatory framework governing the safety and security of nuclear facilities and materials must incorporate the risk of an insider threat for HRP to be effective. The HRP for a State/region should be developed with these considerations in place and commit to working with appropriate agencies to comply with, develop, and adapt regulations as appropriate. The regulatory requirements are developed by the Regulatory Authorities. 4. Establish an HRP Implementation Team The Implementation Team is tasked with identifying the elements to be considered for the successful execution of the HRP. They develop the organization’s policies, scope, procedures, purpose, as well as the processes of the HRP implementation plan. Typically, the process starts with an overview of the HRP provided by the Subject Matters Experts (SMEs), followed by a more in-depth 3-day workshop in which the SMEs facilitate the discussions and guide the development of the overall structure and scope of the program. 5. Implementation Strategy The Implementation Team formulates an implementation strategy document to be submitted to the Executive Committee for approval. This document should include, at a minimum, the following information: 1. Roles and Responsibilities: Individuals/units empowered to ensure that the approved HRP elements are in place, approve appropriate policies and procedures, and clearly identify and define organizational relationships. 2. HRP Candidate Positions: Critical/sensitive HRP-designated positions with justification and a job task analysis, along with a detailed process for nominating new positions and removing positions that no longer require inclusion in the HRP. 3. Training: Required initial and annual refresher training to provide those employees in HRP-designated positions with the knowledge and requirements of the HRP. 4. Schedule: Timeline for program initiation, including subject material to be covered. 5. HRP Policies and Procedures: Policies and standard operating procedures (SOPs), evaluation criteria or performance metrics, and a schedule for assessments.
3 Road to a Sustainable Human Reliability Program and Implementation …
21
6. Determine Resources Needs To determine the HRP’s effectiveness and associated costs, the EC allocates the necessary resources to support the HRP Implementation Team. They should define the organizational structure for the HRP implementation and select the HRP Administrator. 7. Submit Draft HRP for Approval After the Implementation Team submits the implementation, the EC reviews and approves the team’s proposed plans and documents, ensuring that they are in accordance with national and state legislation and regulations. Senior Management acceptance and approval are critical to the success of an HRP. 8. Train Personnel to Fill HRP Positions It is critical to provide the required training to the identified HRP positions during the implementation plan development. The training process must include those employees in HRP-designated positions (HRP certified), as well as supervisors and managers (particularly those responsible for HRP-designated positions) with the knowledge and requirements of the HRP. The program elements may be tailored to accommodate group differences and training needs but must include the objectives of the HRP, individual responsibilities, the Continuous Evaluation process, and the benefits of the program. Training can be provided by SMEs or individuals trained by the SMEs at the organization. Trainers must have a strong understanding of the subject material they are teaching. 9. Organizational Relationship Once the HRP structure is approved, the Implementation Team develops and documents the organization’s policies, procedures, and other required program requirements. This process also establishes the position of the team within the organization, the responsible unit, and the responsible administrator. Members of the Implementation Team can also be considered HRP Candidates and become HRP Certified once the HRP is initiated. In addition, individuals administering the HRP (e.g., random alcohol testing, drug testing, and associated procedures) and general requirements for HRP certification are considered to be HRP Candidates and become HRP Certified once the HRP is initiated. The following are key positions in the HRP: • HRP candidate: an individual who is being considered for assignment to an HRP position. • HRP certified individual: an individual who has successfully completed the HRP requirements. • Supervisor: an individual who has oversight and organizational responsibility for a person holding an HRP position, and whose duties include evaluating the behavior and performance of the HRP certified individual. • HRP management official: The Administrative Secretary who has programmatic responsibility for HRP positions. The HRP management official reviews the supervisory review and the medical assessment before forwarding requests for certification or recertification by personnel security.
22
S. S. Chirayath and O. E. Acuna
Fig. 3.1 Organizational relationship in HRP
• Personnel security: performs a personnel security file review of an HRP candidate or HRP certified individual upon receiving the supervisory review, medical assessment, and management evaluation and recommendation. • HRP certifying official: The Director who certifies, recertifies, temporarily removes, reviews the circumstances of an individual’s removal from an HRP position, and directs reinstatement. Figure 3.1 shows the organizational relationship in HRP. 10. Identify Critical/Sensitive HRP Positions This Implementation Team is charged with identifying critical/sensitive positions and program elements that require HRP certification and the training required for individuals in those positions to become HRP Certified. It is important to perform a job task analysis on every position to determine the qualifications necessary to fill it effectively. These qualifications not only include educational background, training courses, and certifications, but also physical and mental requirements. If the job is a high stress position, such as a senior reactor operator, the candidate needs to be able to perform the required tasks without panicking or becoming mentally disorganized. The Human Resources (HR) Department typically works with specific disciplines within an organization to define the requirements for particular positions.
3 Road to a Sustainable Human Reliability Program and Implementation …
23
Examples of Critical/Sensitive Positions: • Security personnel (control over aspects of sensitive material access or information) • IT personnel (access to personnel access authorization records, schedule of events, etc.) • Nuclear material control and accountability personnel (access to inventory records) • Transportation staff of radioactive materials (access to shipping records) • Personnel with unescorted access to a reactor operations control room (ability to gain private access for sabotage) • Personnel with unescorted access to nuclear material (ability to divert material) • Health physics (radiation safety) staff (ability to falsify assay values) • First responders, e.g., firefighters (ability to divert material during emergency events) • Safety inspectors (ability to report situations that create unsafe conditions or sabotage safety systems) • Preventive maintenance personnel (ability to create unsafe conditions or sabotage safety systems) • Personnel that issue site access badges (ability to create false credentials) • Personnel responsible for nuclear safety (ability to create procedures that allow for diversion or for unsafe conditions when impaired). 11. Program Administration During implementation, policies and procedures are defined, and standard operating procedures (SOPs) are written. Parallel documents are important to identify the criteria used to assess the program’s progress and appropriate corrective actions that might be taken. Based on the organization’s HRP elements, some evaluation criteria should be formalized, and a schedule for this assessment should be documented. 12. Instruction The HRP management official establishes an initial and annual HRP instruction program. The program must provide HRP certified individuals, supervisors, and managers responsible for HRP positions, with the knowledge required to recognize and respond to behavioral change and aberrant or unusual behavior that may indicate a risk to security or safety. Train the trainer modalities are advisable for a continuous education and training program. HRP Certified individuals must be instructed in their personal responsibilities, including: • Identifying and reporting safety and security reliability concerns; • Self-reporting; and • Return to work requirements.
24
S. S. Chirayath and O. E. Acuna
13. Records All documentation related to employees’ participation in the HRP should be maintained in a separate file with access to the information given on a need-toknow basis. HRP documentation should include, at a minimum, the following records: • Individuals identified for inclusion in the program and documentation as to when they completed their initial and refresher HRP orientation; • Signatures on appropriate organization-specific releases, acknowledgments, and waivers indicating an individual’s agreement to participate; and • Additional HRP information (e.g., medical and physiological records, security infractions, positive substance tests, and recommendations maintained in accordance with national regulations, organization policies, and implementing regulations). 14. Forms The following forms should be kept in a separate file: • • • •
HRP Certification; Acknowledgment and Agreement to Participate in the HRP; Authorization and Consent to Release HRP Records; Refusal of Consent; and HRP Alcohol Testing Form.
15. Review and Evaluation A review of the HRP is key to understanding its performance. According to Coates and Eisele’s “Human Reliability Implementation Guide,” “Through a multi-tiered evaluation system, situations and conditions can be identified early that may not be seen routinely” (Coates and Eisele 2014). The HRP depends on reliable and trustworthy employees at all levels of the organization to maintain a robust nuclear security program and is an important tool within any nuclear or security organization, incorporating many elements to help protect nuclear security (Coates and Eisele 2014). A Continuous Evaluation process thus allows senior management to evaluate the state of security and safety culture within the organization and within the context of the broader level of national security.
3.5 Conclusion Clearly, one of the most important keys to developing a successful nuclear culture is trustworthy and reliable staff. This is why human reliability represents an important component of an overall protection system. Senior management has the responsibility to implement and sustain an effective human reliability program. This begins by framing and effectively communicating why human reliability plays such an important role in protecting the safety and security of the entire organization, what the legal and regulatory basis is for a human reliability program within the State, and the personal responsibility of every individual
3 Road to a Sustainable Human Reliability Program and Implementation …
25
within the organization—at every level—to demonstrate trustworthiness, integrity, and reliability. In summary, the main factors for an HRP’s success are: • • • • • •
Commitment from upper management; Commitment and support from employees; Employees must be aware of the need for the HRP; Clear Program Organization; Clear Program Administration; and Clear time frame for implementation.
References Chirayath S (2014) A roadmap to a sustainable human reliability program, partnership for nuclear security. In: Meeting on human reliability program in industries of national importance, Bengaluru, India Coates CW, Eisele GR (2014) Human reliability implementation guide. Oak Ridge National Laboratory WINS (2019) 3.2 Human reliability as a factor in nuclear security. World Institute for Nuclear Security
Chapter 4
The Significance of Human and Organizational Aspects in Ensuring Safety in High Hazard Installations Dinesh Kumar Shukla
Abstract Typically, Human Reliability Program (HRP) is implemented to ensure that individuals who occupy positions with access to critical assets/operations/sites meet the highest standards so that they adhere to safety and security rules and regulations, and ensure confidence in individuals based on their character and their physical and mental stability. Though HRP includes safety, conventionally designed HRP concentrates on security aspects only and invariably unintended acts mostly concerning safety are left out. These aspects are of very much importance for safety and security considerations during the operational phase and are related to human behaviour which in turn is influenced by Human and Organizational Factors (HOF). In this chapter, an attempt is made to bring out the significance of Human and Organizational Aspects in Ensuring Safety in High Hazard Installations so that HRP with its wider understanding could be used effectively in industries of national importance requiring high standards of safety enveloping security aspects also, such as aviation, petroleum, and other hazardous chemicals, and nuclear.
Dear friends, the title of my presentation might have triggered a natural question in your mind regarding its relationship with the core topic of the meeting “Human Reliability Program in Industries of National Importance.” Typically, Human Reliability Program (HRP) is implemented to ensure that individuals who occupy positions with access to critical assets/operations/sites meet the highest standards so that they adhere to safety and security rules and regulations (reliability), ensure confidence in individuals based on their character (trustworthiness) and their physical and mental stability. Though, HRP includes safety, conventionally designed HRP concentrates on security aspects (intended acts i.e., acts of sabotage type violations) only, and invariably unintended acts (i.e., errors, routine, and situational violations, mostly concerning safety) are left out. These aspects are of very much importance for safety and security considerations during the operational phase and are related to human D. K. Shukla (B) Atomic Energy Regulatory Board (AERB), Mumbai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. S. Chirayath and M. Sai Baba (eds.), Human Reliability Programs in Industries of National Importance for Safety and Security, Risk, Reliability and Safety Engineering, https://doi.org/10.1007/978-981-99-5005-8_4
27
28
D. K. Shukla
behaviour which in turn is influenced by Human and Organizational Factors (HOF). I am attempting to bring out the significance of Human and Organizational Aspects in Ensuring Safety in High Hazard Installations so that HRP with its wider understanding could be used effectively in industries of national importance requiring high standards of safety enveloping security aspects also, such as aviation, petroleum, and other hazardous chemicals, and nuclear. With this background let me start my presentation. The typical data of Accidents and Human & Organisational Factors (HOF) indicate a significant contribution of HOF in accident causation (accidents at Sea 90%, Chemical Industry 80–90%, Nuclear Industry 65%, Airline Industry 60–87%) (Gertman and Blackman 1993). These numbers hold today also. Trends in accident causation show that with the advancement of technology, the contribution of technical factors has reduced over the years and the contribution of human errors has gained prominence in accident causation. This does not mean that humans have become careless and inattentive, but it is because machines have become more sophisticated and reliable. The result of research work funded by FonCSI (Daniellou et al. 2011) clearly brings out the significance of human and organizational factors for continued improvement in safety performance. As can be seen from the report findings that each improvement reduced the accident rate down to the next plateau level and further strengthening of formal procedures is no longer resulting in a reduction in accident/incident rates (Health and Safety Executive). The simple reason I could see for this is that by the deployment of technical measures, the stage of initiation of incident/accident is changing but not getting eliminated. Deploying more technical measures, with less attention to human & organizational factors, may not have the desired impact on the safety performance of the organization. Further, more emphasis on holding operators (front-line people) responsible and a lesser importance placed on questioning organizational and management issues is the major reason for no further improvement in safety performance. The key reason is improper Root Cause Analysis (RCA) without taking cognizance of the primarily positive human contribution to safety and stopping RCA from identifying the human error and not extending it to identifying the type of error (Slip, Lapse, Mistake or Violation) and the underlying HOF responsible for the commitment of error. If the root cause is not identified, then corrective actions would treat only symptoms leaving underlying cause (s) unaddressed. Therefore, recurrence is bound to happen. So, what is the way out? The natural question pops in, “Is addressing technical factors (tools, equipment, design (technology) etc.) enough?” The obvious answer is, “No.” Because safety results from the interaction of individuals with technology and with the organization. This interaction can contribute positively or negatively to safety and is highly influenced by the approach followed to address Human, Organizational, and Technical (HOT) factors in the organization. Balanced attention should be given to all three factors. Further, the analyses of three major accidents in the nuclear industry (TMI, Chernobyl, and Fukushima Daiichi NPP Accidents) highlight the importance of considering the entire system that contributes to safety rather than separating HOF from the technical aspects. It is better to consider an integrated perspective (i.e., HOT factors rather than HOF alone) and an effective systemic approach. Towards this, the
4 The Significance of Human and Organizational Aspects in Ensuring …
29
first step is to understand Human and Organizational Factors. Technical Factors are quite familiar and do not need elaboration here. Organizational Factors: We all are quite familiar with Organizational Factors. These are namely. • • • • • • • • • •
Leadership Management Commitments: Vision, Mission, Core values Policies, Strategies, Plans, Procedures Decision making Setting Priorities Resource Management Work Environment Communication Culture Continuous improvements
An important point to keep in mind is that errors committed in the execution of these organizational functions can remain dormant (Latent Errors) for a very long time. Latent human errors, committed by the designers and constructors may get identified only after triggering an accident/event. Further, ultimately ‘technical measures’ (tools, equipment, design (technology) etc.) need to be ‘maintained’ to perform on demand and thus depend on ‘human’ (knowledge, thoughts, decisions, actions) and organizational factors (Management system, organizational structure, resources). Researcher James Reason has concluded that human error is a consequence, not a cause (Reason 1997, 1990). Understanding the management and organizational factors that can either reduce or identify and correct latent errors is an important element in reducing the consequences. Latent failures can occur at various levels, “strategic level (high-level decision making), tactical level (line management), operational level.” Thus, one of the challenges to reducing accident rates is to spot and correct latent errors/failures. People build up and develop organizations, and organizations influence the development of people and the way in which they act and interact. Therefore, it is important to understand what influences human activity, i.e., knowledge of human factors. Human Factors: Human factors as defined in various literature is a discipline concerned with designing machines, operations, and the work environment so that they match human capabilities, limitations, and needs. So, the HOF approach is to identify and put into place conditions that encourage positive contributions from operators both individually and in teams. Human factors can draw inputs from many disciplines (e.g., design, engineering, psychology, management etc.) and are considered a mixture of technology and psychological factors.
30
D. K. Shukla Technical System
Organizational System Commissioning, O&M
H U M A N
Human (Individual)
O R G A NI ZA TI O N
Tactical Decisions Site Selection
Working Environment
Technology Selecion
S Cs
Design Work Group
S
Production Performance
Safety Performance
Construction Others
HOF
Activity
Effects
Fig. 4.1 Interaction of HOF with the activities and their effects in the context of NPP
4.1 Application of Integrated Perspective and Systemic Approach to NPP Let us apply our understanding of integrated perspective and an effective systemic approach regarding HOF and the technical aspects to a Nuclear Power Plant (NPP) system (Fig. 4.1).1 The block diagram above depicts my understanding of the interaction of HOF with the activities and their effects in the context of NPP. The organizational system, i.e., Human and Organization, plans and executes various activities culminating in the establishment of Systems, Structures, and Components (SSCs) in the NPP, which gives as an outcome production performance and safety performance. Activities are conditioned by the intrinsic characteristics of humans, the working environment, and the characteristics of work groups. These human activity conditioning parameters are highly influenced by the Human and Organization, which in turn are influenced by human behaviour.
1
Perspectives on HOF are informed by: the International Atomic Energy Agency (“IAEA Report on Human and Organsiational Factors in Nuclear Safety,” 2013) (“Managing Human Performance to Improve Facility Operation—IAEA Nuclear Energy Series 2013) (“Regulatory Oversight of Human and Organizational Factors for Safety of Nuclear Installations—IAEA-TECDOC-1846,” 2018), Manna (2007), Cooper (2001), Shukla (2007, 2015, 2017, 2019); the International Civil Aviation Organization (2005); Health and Safety Executive (1999); Fleming and Lardner (2002); and Sorensen (2002).
4 The Significance of Human and Organizational Aspects in Ensuring …
31
Fig. 4.2 Factors that affect human behaviour
4.2 Behaviour to Performance The earlier section has brought out that performance depends on behaviour. Visible behaviour (behaviour that we observe, i.e., how we act, talk, walk, our gestures, etc.) imparts to activity and is influenced by invisible parts (cognitive, psychological & social aspects), as well as by the Working Environment and Work Group (Fig. 4.2). It is widely accepted that human behaviour is a contributory factor in approximately 80% of accidents. Focussing solely on changing individual behaviour without considering necessary changes in how people are organized, managed, motivated and rewarded, as well as their physical work environment, tools, and equipment can result in treating the symptoms only without addressing the root causes of unsafe behaviour.
4.3 Organizational Accidents and Relevance of HOFs in NPPs Individual accidents occur in situations where the hazards are close to people. These accidents have limited consequences, a short history, and an insignificant impact on technology. While Organizational accidents have widespread consequences, a long history, and a global impact on technology. Such accidents happen to complex systems having multiple levels and layers of defence-in-depth (DiD), as in NPPs. Levels and layers of DiD have a mixture of ‘hard’ and ‘soft’ defences (a combination of paper and people-rules and procedures, training, drills, administrative controls and, most particularly, front-line operators: Pilots, control room operators). For an organizational accident to happen, an unlikely combination of several different factors to penetrate many protective layers must occur to allow hazards to come into damaging
32
D. K. Shukla
contact with the plant, personnel, and the environment. Therefore, the occurrence of such accidents is rare. DiD renders complex systems significantly opaque to the people who manage and operate them. Both a distancing effect and the rarity of reportable events have a high potential for developing overconfidence, a leading stage in a declining safety culture. Safety Culture (which has HOF as its building blocks) has pervasive effects that can not only open gaps and weaknesses but also, most importantly, leave them uncorrected (latent failures). In a well-defended system, such as an NPP, only safety culture influences (thereby HOFs) are quite widespread to increase substantially the possibility of lining up a penetrable series of defensive weaknesses. Because of this, modern, high-tech, well-defended technologies, such as NPPs, chemical process plants, and commercial aviation are more vulnerable to the effects of a poor safety culture/poor management of HOFs than traditional industries.
4.4 Safety Behaviour Modification After understanding factors that affect human behaviour, let us see how to modify the behaviour. For this, we need to understand the root causes of the specific behaviour and then make practical and realistic changes to influence that behaviour. The famous ABC (Antecedents/Activators, Behaviour, Consequence) analysis is one incredibly good tool for this (Fig. 4.3). Behaviour is triggered by antecedents/activators and motivated by the nature of consequences. Consequences can increase or decrease the likelihood of repeating the behaviour. Let us examine the unsafe behaviour of people crossing railway lines at stations to go to the other platform. Antecedents—others are doing it (environment);
Fig. 4.3 The ABC model
4 The Significance of Human and Organizational Aspects in Ensuring …
33
foot over bridge may not be there (resource)–trigger the behaviour. The nature of consequences–reaching the platform quickly (+ve, immediate and certain); may meet accident (−ve but uncertain); penalty for rule violation (−ve but uncertain)–motivate this behaviour. Similarly, we can examine the unsafe behaviour of not wearing eye protection during welding work. The Antecedents/triggers for this unsafe behaviour could be un-availability, in a hurry, no one else is doing it, lack of training, and may not be in good condition. The nature of consequences could be: injury (−ve but later and uncertain); penalty for rule violation (−ve but later and uncertain); saving time (+ve immediate and certain); comfort and convenience (+ve, immediate and certain). So, the consequences help in reinforcing unsafe behaviour. Therefore, the Safety Management System should ensure that proper antecedents are in place to prompt safe behaviour, and consequences should support the desired behaviour to safeguard against wrong practices becoming Norms.
4.5 Concluding Remarks Humans can respond (dynamic response) to unlimited unknown situations while automated systems can respond to limited postulated events. Therefore, automated devices cannot completely replace humans, who are the ultimate level of DiD. The use of automation as part of technical corrective measures should be done judiciously considering the integrated perspective of human, organizational, and technical factors. Nuclear Installations and other high-hazard installations with multiple levels of DiD are more vulnerable to the effects of poor HOF management. Unlike active failures, which get identified easily, latent failures/errors remain dormant and may get identified only after triggering an accident/event. Latent failures can occur at various levels: strategic level (high-level decision making); selection of option, technology; tactical level (line management); Design, construction and at the operational level. Therefore, instead of holding operators (worker, technician, supervisor etc.) responsible for an event, more emphasis should be placed on questioning organizational and management issues (latent- which create conditions that promote the commission of errors). RCA must address this, and the formulation of corrective action plans should take appropriate inputs from ABC analyses to address human behaviour aspects. Development of HRP for the operational phase should account for these aspects appropriately, as well as aspects of the seamless integration of HRP with the overall Safety Management System (SMS). Well-developed SMS with balanced attention to HOT factors that address the essential role of management in coordinating “Rule-based Safety” and “Managed Safety” is key for the continual improvement in safety performance. Regulators have an important role in ensuring the establishment of proper SMS and its effective implementation. Being independent observers, they are better placed to identify gaps in SMS and the early signs of degradation in its implementation.
34
D. K. Shukla
References Cooper D (2001) improving safety culture: a practical guide, hull. Wiley, UK Daniellou F, Simard M, Boissières I (2011) Human and organizational factors of safety: a state of the art. Foundation for an Industrial Safety Culture, Toulouse, France Fleming M, Lardner R (2002) Strategies to promote safe behaviour as part of a health and safety management system. Prepared by The Keil Centre for the Health and Safety Executive, Edinburgh, UK Gertman D, Blackman H (1993) Human reliability & safety analysis data handbook, Hoboken. Wiley-Interscience, New Jersey Health and Safety Executive HSE Human Factors Briefing Note No. 7. http://www.nost.edu.au/ icms_docs/143973_HSE_Human_Factors_07_Safety_Culture.pdf Health and Safety Executive, Reducing error and influencing behaviour (1999) [Online]. https:// www.hse.gov.uk/pubns/priced/hsg48.pdf International Civil Aviation Organization (2005) ICAO accident prevention programme. ICAO Document Number 9422-AN/923, Montreal, Canada Manna G (2007) Human and organizational factors in nuclear installations: analysis of available models and identification of R&D issues [Online]. https://publications.jrc.ec.europa.eu/reposi tory/handle/JRC42236 Reason J (1990) Human error, Cambridge. Cambridge University Press, UK Reason J (1997) Managing the risks of organizational accidents. Burlington Ashgate Publishing, VT Shukla D (2007) Safety management and effective utilization of Indian research reactors APSARA, CIRUS and DHRUVA. In: International conference on research reactors, Sydney, Australia Shukla D (2015) Time tested good practices for strengthening of safety culture. In: Proceedings of 32nd DAE safety & occupational health professionals meet, RRCAT, Indore, India Shukla D (2017) Significance of human, organizational & technical factors in safety and practices to address them. In: Proceedings of 34th DAE safety & occupational health professionals meeting, Kudankulam, India Shukla D (2019) Significance of human, organizational & technical factors in safety. In: Presentation made in international conference on effective nuclear and radiation regulatory systems: working together to enhance cooperation. The Hague, The Netherlands Sorensen J (2002) Safety culture: a survey of the state-of-the-art. Reliab Eng Syst Saf 76:189–204 The International Atomic Energy Agency (IAEA) (2013) IAEA report on human and organsiational factors in nuclear safety. In: IAEA international experts meeting, Vienna, Austria The International Atomic Energy Agency (IAEA) (2013) Managing human performance to improve facility operation—IAEA Nuclear Energy Series No. NG-T-2.7. https://www-pub.iaea.org/ MTCD/Publications/PDF/Pub1623_web.pdf The International Atomic Energy Agency (IAEA) (2018) Regulatory oversight of human and organizational factors for safety of nuclear installations—IAEA-TECDOC-1846. https://www-pub. iaea.org/MTCD/Publications/PDF/TE-1846web.pdf
Chapter 5
Vulnerability of Human Reliability Aspects—Impact on Crucial Industries and Services Natesan Ramamoorthy
Abstract Human resources (HR) often underpin national pursuits of technology advances and developmental aspirations for the benefit of society. Professional and technical competencies of HR are very important but there are additional key requirements beyond them, especially for critical functions in key industries. Human reliability aspects, impact of attitude, etc. have accordingly gained attention over time. In this context, in nuclear science and technology-related practices, nuclear security measures include addressing HR-linked ‘insider-threat’ angle. Due to the close linkage of safety and security in nuclear and radiation technology practices, there is a need to holistically look at the human element aspect (beyond insider-threat angle). Potential vulnerability of human resources due to stress, family status, personal habits, indoctrination, blackmail, etc. has been the reason for several known instances of serious damage to major industries and society, including loss of lives. Human beings are prone to be influenced, as well as psychologically susceptible, to personal and external factors and these can negatively impact and endanger crucial industrial functions and industry-like services. The purpose of flagging human vulnerability is aimed at highlighting risks involved and the need to recognise and address the same. This is often cited in connection with nuclear and radiation facility functions and defence sector, while its extension to similar other crucial industry (and industry-like services) is deemed imperative. This chapter describes the details.
5.1 Preamble Absorbing advances in technologies and harnessing associated developments are aspirations of every society and nation. Among the requirements for all such pursuits, several Ms are cited by management experts: money; materials; man; methods; management. Here, MAN is no less a vital element than any of the other Ms! The N. Ramamoorthy (B) International Strategic and Security Studies Programme, National Institute of Advanced Studies, Bangalore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. S. Chirayath and M. Sai Baba (eds.), Human Reliability Programs in Industries of National Importance for Safety and Security, Risk, Reliability and Safety Engineering, https://doi.org/10.1007/978-981-99-5005-8_5
35
36
N. Ramamoorthy
importance of human resources (HR) and HR development is thus well recognised by the stakeholders. The same HR feature leads also to a certain degree of vulnerability, as humans have intrinsic individual features. Presumptive extrapolations or gross generalisations will not hold in every case. There are additional HR requirements beyond professional—technical competencies, soft skills, management expertise, etc., especially for critical or sensitive functions—in order to ensure safety of system, facility, and society, or safety of personnel and property. Personnel reliability programme, emphasis on attitude (i.e. attitude quotient), etc., have accordingly gained attention over time. This is because, in spite of high standards of education, training, skills, and so on, humans are prone to vulnerability due to numerous factors arising from their professional and personal lives. There are several well-known instances of events that have negatively impacted the society and industry that are the fault of an individual, whether intentionally or accidentally done or under external influence. Certain crucial areas of national governance are hence equipped (or have equipped themselves) to guard against this vulnerability, for example, the nuclear field, especially for defence-related facilities and practices. The approach called human reliability programme (HRP) has evolved in some countries like the USA, especially in the context of national security systems involving nuclear materials (US Department of Energy). This is an important approach and model for many stakeholders in other nations. In the past two decades (mainly post-9/11), the security of nuclear and other radioactive materials has received increasing attention, including under the auspices of the International Atomic Energy Agency (IAEA) (The International Atomic Energy Agency). While these efforts and procedures are very important, it is necessary to look at the whole issue of ‘human-element related aspects’, cited as ‘human factors’ (HF) in this article, for all crucial industries and industry-like service sectors of national importance and societal well-being. Further, some of the procedures evolved for HRP may not be applicable, or even legally admissible, in some countries. This article strives to highlight the above aspects, because, as the saying goes, ‘one size does not fit all’ in dealing with HF.
5.2 Nuclear Security and the IAEA—‘Insider-Threat’ to Impact of ‘Human-Element Aspects’ The security of nuclear and other radioactive materials is an important topic in the nuclear field, and the IAEA has been an important forum for global cooperation in planning and evolving strategies and measures for nuclear security. Plenty of recommendations and advisory notes, including on human and organisational factors, are available (The International Atomic Energy Agency). A person having access to nuclear materials and/or nuclear/radiation facilities (insider) can be turned to become an adversary (threat) by being induced with incentives, or indoctrination to pursue certain ideologies (political, religious, extremism, etc.), or blackmail, by entities working against society, the nation, or the world order. There may also be persons
5 Vulnerability of Human Reliability Aspects—Impact on Crucial …
37
(insiders) who may pose a potential threat due to personal and psychological reasons. The scope for such ‘insider threat’ impacting nuclear security is one of the several aspects dealt with in the approaches and procedures for ensuring effective security measures. Discussions in different forums, including at IAEA Conferences and other events, are notable in this context (The International Atomic Energy Agency). A classic example of an insider-threat event in a non-nuclear setting is the assassination of the Indian national leader, former Prime Minister Mrs. Indira Gandhi, on October 31, 1984, by her own security guard, which was attributed to political and religious indoctrination. Measures to address this type of threat are invariably followed in select areas. These are all necessary but not sufficient to cover all potential risks that may arise from human-element-related aspects (HF), even within the domain of nuclear and radiation applications (let alone defence-related or other areas). For example, in the worldwide application of high-intensity radioactive sources, challenges have been faced due to HR morale, e.g. in industrial radiography, harsh, open-field work sites and prolonged hours of work, as well as frequent transport of radioactive sources and devices over long distances and at odd hours; the repetitive nature of work combined with the lack of adequate career progression; tendency to look for softer job options, etc. Consequently, disregard and lack of discipline tend to creep into day-to-day practices, endangering other personnel as well as facilities and neighbourhood areas. In the case of sophisticated nuclear/radiation facilities, be it a radiation processing plant (RPP) for food products and medical devices, or a nuclear power plant (NPP), highly qualified, skilled and certified operators are required to be on round-theclock shift duty. The envisaged smooth continuous operations of such facilities over several days of RPP and over several months of NPP, can lead to complacency and presumptuous disposition among these high-tech operators and associated technical staff, which, in turn, poses a danger to and from the facility(ies). Such potential risks arising from practice-specific challenges and the need to address them have been presented at IAEA events1 (Ramamoorthy 2018, 2019). An important lesson from such consideration is the need to also look at similar potential risks in other industries of crucial (national) importance; e.g. aviation, defence, construction, mining, oil and gas, hazardous chemicals, etc. It is also imperative to extend the application of such considerations to (industry-like) vital service sectors, for example, power and water supply, mass transport systems, medical/ health-care facilities, large-scale catering units and so on (covered in the next sections). HF can be the cause, be it due to erratic or violent behaviour or error in action/judgement, or due to external influence, for severe accidents and damage to property, society and the nation. Advocacy for holistic consideration of HF is thus necessary in striving for and fostering human reliability.
1
There is also interface cum overlap of safety and security considerations in nuclear and radiation technologies and their applications. Any disconnect between the two increases the scope for additional risks.
38
N. Ramamoorthy
5.3 Events Traceable to Causation by HF There have been several instances of events of severe impact, damage and even fatality caused by individuals and resulting in the willful destruction of people and/or facilities. Almost all of these are traceable to erratic and/or violent human behaviour. 2 . In addition, there are several cases where an operator-caused error in action and/ or judgement has led to tragedies and accidents. A few examples cited below will suffice to show the range and magnitude of the impact of such events on society and the nuclear/radiation field.
5.3.1 Case-A (Violent, Erratic Human Behaviour; Psychopathic Actions) • A Germanwing-Lufthansa co-pilot action on 24/3/2015 took down the plane killing all occupants3 (European Aviation Safety Agency (EASA) 2017); • A male hospital nurse caused a series of patient deaths over many years in Germany (BBC News 2015); • The staff of some police/paramilitary security units in north India turned their guns against colleagues to the extent of killing a few (Johari 2014); • A disgruntled technician opted to take revenge on his company by throwing an industrial radiography source into a water canal in Chennai; and • Several road-rage encounters in and near Delhi resulted in murders executed by owners or drivers of the vehicle, including elite persons, whose sudden manifestation of uncontrolled anger led to the extremes of revenge, violence and murder (Chakrabarty and Riku 2013; Express News Service 2015; India TV News and Desk 2016; Munshi and Lama 2011).
5.3.2 Case-B (Error in Human Action and/or Judgement; Paralysis of Reasoning) • Chernobyl nuclear plant accident on April 26, 1986, in Ukraine (Chernobyl disaster 2022; World Nuclear Asociation 1986); • Unguarded disposal of radioactive source from old equipment in Brazil (1987, medical source) (The International Atomic Energy Agency (IAEA) 1988) and India (2010 Mayapuri; source used in research at Delhi University) (Kumar et al. 2015); 2
Blatant terrorist actions and genuine accidents are not included, as they are beyond the scope of coverage of this article. 3 One can also attribute organisational factor in this case; the case is picked only to illustrate the human factor role.
5 Vulnerability of Human Reliability Aspects—Impact on Crucial …
39
• Industrial accidents in the mining sector, mass transport systems (air, railways, shipping), oil & gas, construction sites, etc. There may be relatively fewer difficulties in addressing, or even preventing, CaseB situations in many instances, while Case-A situations pose greater challenges to manage and mitigate or resolve.
5.4 Features Driving HF and Potential Risks to HF Factors affecting human behaviour are numerous. In the context of the current article, the following are deemed of direct or high relevance. • Human beings have inherent, intrinsic characteristics and both perceptible and imperceptible features. • Human beings are part of a family, of society, of a certain profession or job environment (‘critical industry of national importance’ in the current context), and ultimately of the nation. • There is extreme diversity in the background of people in terms of education, economy, upbringing, culture, health, religion, region, lifestyles, traditions, among others. A number of typical risk factors impact HF4 ; this list is neither exhaustive nor to any grading scale. • family-related tensions/stress, either acute or chronic, with their implications on performance, behaviour, capability, etc. • social-life (societal)-related tensions/stress; poaching by entities spreading certain ideologies and indoctrination by dedicated entities or mercenaries (political, religious and other extremists) • ‘competency’ limitation, leading to a vicious cycle of sulking, inferiority complex, performance degradation, resentment due to low or nil reward, etc. • rivalry—competition in the workplace may result from anything, from stress to attention deficit hyperactivity disorder (ADHD), depression, resentment, disgruntlement, all the way to violence, malicious acts, sabotage and vulnerability to indoctrination. • conflict of human (social, psychological) aspirations vis-à-vis reality of career scope in certain cases. • loss of alertness, tendency for complacency due to issues like the repetitiveness of tasks or lack of tasks (compared to person’s skills), e.g. shift operation at
4
In the aftermath of COVID-19 pandemic affecting the whole world since early 2020, additional risk factors due to prolonged pandemic related restrictions being imposed in many countries including India and their consequences on day-to-day life, economy, physical and mental health condition, etc., can be cited in the context of the current article.
40
N. Ramamoorthy
major industrial plant running smoothly; security posts 24×7 function with hardly anything happening; etc. • negative tendencies due to personal matters and lifestyle changes, e.g. excessive dependency on alcohol, drugs, sex, gambling, etc., which lead to psycho-social effects, depression, lower performance level, financial issues, becoming prone to external influence, etc. • potential for being trapped or coerced to work for the goals/ideology of organised entities inimical to national/societal interest, and • vulnerability to provocation, uncontrolled anger or frustration, and similar causes leading to momentary, transient or sustained violent behaviour and/or weird-mad actions by otherwise normal humans. These risk factors will impact all walks of life and more so in crucial national industries, including beyond nuclear and defence-related cases. There is hence an imperative need to accept and address the effects of HF in all vital industries and services (and societal life too). Some nations have laws with (varying) provisions for the right to privacy, human rights, etc. exist, and they also must be taken into account in this context.
5.5 Importance of Human Relations and Nurturing Reliable Behaviour in Crucial Industries (and Services Sector) In the continual race for rapid development and growth, there is often very little time given for human relations-related considerations in most industries and service sectors. Exceptions to this are known but limited, e.g. in defence-related units in some countries. Humans are not machines or robots, and by and large most remain ‘normal’ (or near normal!) in life. This is not the case, however, with all people under all circumstances and that is the matter of concern and needs to be taken into consideration. As medical experts and doctors say, ‘people are seldom born emotionally resilient’. A person’s environment, be it social or in their workplace, needs to be positive to nurture normal desirable behaviour. Emotional challenges and a poor working environment can tilt vulnerable people to abnormal or erratic natures, irrespective of whether the persons hold managerial or non-managerial roles and duties. Further, organised entities may try to trap personnel engaged in crucial roles by one or more means, e.g. indoctrination, blackmail, incentives, etc., for causing accidents or other types of harm. Co-workers, immediate supervisors, et al. may often miss noticing anomalous behaviour, changes in behaviour or similar, unless they are naturally of a caring type or trained to do so. Often, realisation of the magnitude of fall-out is only after its negative impact, endangering the society or industrial entity or unit(s) delivering the services. The psychopathic co-pilot of the Germanwing plane and the killer male nurse were not suspected, if not spotted, as abnormal and alerted/ reported by anyone in their respective circles, in the family or the office.
5 Vulnerability of Human Reliability Aspects—Impact on Crucial …
41
Addressing HF (and HRP) must go beyond nuclear and defence domains. It should also cover industry-like services (listed later in this section), which could be maliciously damaged or destroyed to cause grave loss to society and nation. Towards this, some major industries of national importance with potential vulnerability to HF are5 : • Nuclear Power Plants and related fuel cycle facilities; also, radiation facilities containing Category 1 and Category 2 type radioactive sources (The International Atomic Energy Agency 2005) • Space programme-related organisations and facilities • Defence-related organisations • The aviation sector (both civilian and defence) • Mass transport systems, like railways, roadways, shipping, etc. • Construction engineering for establishing large infrastructure, high-security facilities, etc. • Mining, chemical, oil, and petroleum, among others. Other industries and industry-like services sector entities of potential vulnerability are: • • • • •
Urban services of daily life needs and their management, e.g. power, water supply Medical services Security forces, including military, para-military, police, etc. Catering and hotel services Communication media (e.g. post-Nov. 26, 2008, Mumbai attack by terrorists6 and post-Fukushima coverage7 ).8
5.6 Scope of Measures Required Focus on human-element-related aspects and addressing vulnerabilities must be a multi-pronged initiative and will be applicable only for a select group of personnel (to be) deployed in identified industries and services of critical nature. The process 5
Order of listing does not imply gradation of the extent of vulnerability. The extent of live visual media coverage of 26/11 related events directly aided the organisers of the crime with free-flow of information to further their plans and execution. Prudent action by the media personnel and their management would have been a more responsible support to the society/ nation. 7 The repetitive coverage of many past accidents, unrelated damages and other scenes of destruction, suffering, etc., projected by many visual media channels aggravated the pain and suffering of the already stressed population in the region (and elsewhere too), resulting in a huge burden of psychological issues among the affected community. Greater sensitivity in handling the coverage by these media personnel and their management should have been the rational strategy. 8 One may argue that this aspect is different and should not be clubbed here. On the other hand, scope is there for accidental or intentional actions (including mercenary role) by the media (especially the visual media) seriously impacting the larger interests of the society and hence this is flagged as potential area to watch out. 6
42
N. Ramamoorthy
will start right from the recruitment stage (professional qualification/competency and beyond—e.g. verifying antecedents, ascertaining personnel features such as personal attitudes, emotional stability), through the training and induction of personnel, all the way to periodically monitoring the behaviour of the persons in key roles during the tenure of their careers. Emphasis on sharing sensitive or critical information only on a need-to-know basis and at the right time among such personnel is necessary as part of information security practices; to cite a few examples—information related to: security systems of key facilities—e.g. access control, design details; radioactive source replacement in gamma radiation plants or telecobalt machines; transportation of nuclear materials and other hazardous goods, including toxic chemicals, high-intensity radioactive sources; similar others. Potential vulnerability of the relatively large teams of technical staff engaged in routinely handling utility services (power and water supply) in urban locations and constructing large/key infrastructure facilities (e.g. dams, bridges involving high-tech or special terrain) being targeted, by external entities inimical to the society/nation, to cause sabotage and/or serious harm to public has to be borne in mind. Many useful mechanisms and practices are available to broadly address HF and HRP matters and should be made use of, along with case-by-case needs-based assessment of supplementing measures. Legal position, personal laws, human rights, etc., need to be reckoned with in planning, instituting and practising initiatives. Periodic (mostly annual) medical examination programme of personnel working in key facilities/roles being implemented by many employers may not (or cannot) include certain types of investigation required for HRP—for example, psychological status assessment or intrusive medical tests for checking addiction or substance abuse—due to prevailing national laws in many countries (though allowed in some other countries). Any human reliability and HF evaluation exercise should be well-designed, validated and properly implemented; they should not be confined to only safety and security or remain ‘nuclear-centric’. One can deploy different resources depending upon national and regional laws and rules. This can include inter alia personality development workshops, social engagement of staff along with their families, periodic medical check-up (annual or biennial) schemes to include psychometric tests, professional counselling wherever required, etc. A multi-personnel presence/role, e.g. ‘two-person rule at all times’, in key locations/posts has been proposed in some cases, though the simple ‘two-person rule’ has perhaps been the more common practice. The Germanwing co-pilot case showed the vulnerability of the simple twoperson rule, as the miscreant/culprit was able to commit the heinous crime when the other person (pilot) went out of the cockpit to use the washroom. ‘Two-person rule at all times’ has since been insisted upon by airlines (a cabin crew member requiring to be inside the cockpit whenever the pilot or co-pilot needs to step out for any reason). The scope to use continuing medical advances in neurology (beyond psychometric tests) and neuro- and cognitive science, including neuro-receptor imaging, brain activation imaging studies, etc., will be helpful. A caveat is, however, necessary on their application, as there can be issues from medico-legal, ethical, and rightto-privacy angles. Also, validated protocols may need to be evolved, and this will involve a large study using volunteers, including both normal and abnormal cases.
5 Vulnerability of Human Reliability Aspects—Impact on Crucial …
43
The bottom line is to be able to discretely identify vulnerable persons by validated and legally admissible methods and remove them from doing tasks/jobs involving sensitive or hazardous materials, crucial facilities and information, or operations of critical nature. In making such decisions, one needs to be conservative, even to the point of error, in favour of safety. After all, societal and national safety is at stake and hence ‘discretion is the better part of valour’, as Shakespeare said a long time ago!
5.7 Concluding Remarks The impact of the work ambience and external influences on humans—especially on those engaged in key roles/operations in critical industries or services—can easily vary from severe stress to utter complacency on one hand, and from sulkingresentment to sabotage/violence on the other hand, all endangering not only the industry (safety, security, productivity, etc.), but also the family, society and the nation at large. The purpose of flagging the human-element-related aspects is to highlight the risks involved due to human deficiencies—vulnerable to provocation or indoctrination; erratic or violent or error-prone behaviour, along with an appeal to accept and address the same. Awareness cum admittance of an issue would already amount to a certain degree of progress towards finding a means of mitigation, though not resolution. Having cited the HF issues for nuclear and radiation facilities and their application in many a forum, including at IAEA events, its extension to similar other vital industries and services of national importance is deemed worth considering. Human reliability assessment approaches and measures are accordingly needed to holistically address HF matters in all major industries and services sectors of national importance, though this is not an easy task.
References BBC News (2015) German nurse jailed for life for murdering patients. https://www.bbc.com/news/ world-europe-31646097 Chakrabarty N, Riku R (2013) Aggressive driving case studies and mitigations in India. Int J Sci Res Publ 3(2):1–10 Chernobyl Disaster (2022). https://en.wikipedia.org/wiki/Chernobyl_disaster. Accessed 23 Aug 2022 European Aviation Safety Agency (EASA) (2017) Task force on measures following the accident of germanwings flight 9525: final report. https://ec.europa.eu/transport/sites/default/files/modes/ air/news/doc/2015-07-17-germanwings-report/germanwings-task-force-final-report.pdf Express News Service (2015) Delhi road rage: man beaten to death in front of his children. https://ind ianexpress.com/article/cities/delhi/road-rage-in-delhi-man-beaten-to-death-by-car-occupants/ India TV News Desk (2016) Delhi road rage: dentist beaten to death with sticks and iron rods. https://www.indiatvnews.com/news/india-delhi-road-rage-dentist-beaten-to-deathwith-sticks-and-iron-rods-320743
44
N. Ramamoorthy
Johari A (2014) Why are more Indian military men murdering their colleagues? https://scroll.in/art icle/665795/why-are-more-indian-military-men-murdering-their-colleagues Kumar R, Panda GK, Singh BK, Rane DM, Sunil Kumar JVK, Sonawane AU (2015), Lessons learned from the radiological accident in Mayapuri, New Delhi, in IAEA/STI/PUB/1667, pp. 517–528 Munshi S, Lama P (2011) Road rage: pilot crushes restaurant manager in Delhi. https://www.ind iatoday.in/india/north/story/restaurant-manager-killed-in-road-rage-in-khan-market-area-ofdelhi-126389-2011-01-12 Ramamoorthy N (2018) Fostering synergy of security of radiography sources and radiation safety in industrial applications. In: IAEA conference (CN269) on security of radioactive materials. Vienna, Austria Ramamoorthy N (2019) Practice-specific challenges in the management of regulatory functions of radiation sources and medical facilities. In: IAEA conference (CN270) on effective nuclear and radiation regulatory systems. The Hague, The Netherlands The International Atomic Energy Agency (IAEA) (1988) The radiologica accident in Goiania. https://www-pub.iaea.org/mtcd/publications/pdf/pub815_web.pdf. Accessed 27 Aug 2021 The International Atomic Energy Agency (IAEA) (2005) Categorization of radioactive sources— safety guide no. RS-G-1.9. https://www-pub.iaea.org/MTCD/Publications/PDF/Pub1227_web. pdf. The International Atomic Energy Agency (IAEA), Human and organizational factors. https://www. iaea.org/topics/human-and-organizational-factors The International Atomic Energy Agency (IAEA), Security of nuclear and other radioactive material. https://www.iaea.org/topics/security-of-nuclear-and-other-radioactive-material US Department of Energy, Human reliability program handbook. https://www.energy.gov/ehss/ human-reliability-program-handbook World Nuclear Asociation (1986) Chernobyl accident 1986. https://world-nuclear.org/informationlibrary/safety-and-security/safety-of-plants/chernobyl-accident.aspx
Chapter 6
Psychological Assessments, Employee Ethics and Behavioural Observation Programmes—Important Components of a Robust Human Reliability Programme Joseph R. Stainback IV
Social norms within the workplace, as driven by anthropological culture and moral relativism, dictate certain behaviours generally accepted to ensure products or services are being produced at the highest quality, without injury to the worker, at the lowest cost, and without harm to an organization including financial harm. Various countries around the world have different thresholds of ethical decision-making, observing behaviours, self-reporting of mistakes, and psychological well-being in the workplace. Within this section, experience-based policies and lessons learned about Psychological Assessments, Employee Ethics, and Behavioural Observation Programmes (BOPs) are shared for those interested in implementing a robust Human Reliability Programme (HRP) especially employees, vendors, and customers having access to high-risk entities (HREs). HREs defined herein are private companies or government agencies where laws, information, services, processes, equipment, and/or materials can detrimentally affect the safety and security of people, national security, and/or reputation of a nation or company.
6.1 Psychological Assessments A safe and secure HRE starts with the individual. Within an HRE, individuals should be of the highest physical and mental condition to perform the assigned tasks, especially under a Human Reliability Programme (HRP). Deviations from an individual’s J. R. Stainback IV (B) University of Tennessee (Affiliated), Knoxville, TN, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. S. Chirayath and M. Sai Baba (eds.), Human Reliability Programs in Industries of National Importance for Safety and Security, Risk, Reliability and Safety Engineering, https://doi.org/10.1007/978-981-99-5005-8_6
45
46
J. R. Stainback IV
mental, intellectual, and physical capacities could detrimentally affect their work, thus affecting the safety and security of the HRE and other coworkers’ safety and security. Under HRPs, there are ways to access, on a continuous and periodic basis, a person’s mental, intellectual, and physical status using Psychometric Instruments. Psychometric instruments are used to assess an individual’s overall mental, intellectual, and physical states. Used in various clinical settings, typically within an HRE under HRPs, Psychometric instruments require direct and indirect observations of the subject with respect to his/her appearance, speech, mood, affect, thinking, insight, and judgement. Psychometric instruments often also include screening for memory, intellect, and attention proficiencies. Early signs of impaired cognitive functioning including attention, short-term memory, visual–spatial information processing, reasoning, and reaction time could lead to incapacities and incompetencies required to perform assigned complex and sensitive tasks within HREs potentially causing safety and security issues. Psychometric instruments also assess an individual’s tendencies towards aggression by presenting them with different scenarios and while accessing their reactions, conclusions, and/or explanations for the scenario from a fixed set of responses. Aggression is often observed and may be an early indicator of aberrant behaviour in the workplace. Identification of these conditions early and proactively by psychometric instruments can possibly identify issues early so that they can be addressed before any issues arise. Some HREs use psychometric instruments extensively and continuously, especially those assigned to job tasks of the highest risk. Psychometric Instruments are performed by written tests, either on paper or by computer, and/or one-on-one structured or semi-structured interviews. Tests used by psychologists within HREs include, but are not limited to, psychometric instruments identified in Table 6.1. Psychometric instruments and the associated use of these tools and instruments by professionally trained HRE clinicians are important for proactive indications of undesirable behaviours in the workplace. These instruments are not a panacea and are not considered the ultimate catch-all to undesirable behaviours; however, they have shown to detect one or more within the following: Deceitful or delinquent behaviours; criminal activities; destruction of property or life; illegal use of drugs or other substances; financial irresponsibility; irresponsibility in performing assigned duties; failure to comply with work directives; hostility or aggression towards fellow workers or authority; uncontrolled anger; violation of safety or security procedures; repeated absenteeism; significant behavioural changes, moodiness, depression, or other evidence of loss of emotional control. It is important to note undesirable and unattended mental, intellectual, and physical behaviours prompted by family, work or financial stressors could lead to concerning behaviours in the workplace. Unattended concerning behaviours have also been shown to lead to hostile acts including violence, sabotage, and insider threats.
6 Psychological Assessments, Employee Ethics and Behavioural …
47
Table 6.1 Psychometric instruments used by psychologists within HREs Instrument
Brief description
Semi-structured interview
Psychosocial history one-on-one interview to include the person’s education, military, work, life-family, financial, legal, and medical history
Minnesota multiphasic personality inventory Self-report empirical measurements associated with psychiatric diagnosis and personality pathology Mental status exam
Brief, direct observation and assessment of an individual’s overall mental state
Automated forms of Cognitive Functioning Assessments
Computer-administered test designed to detect early signs of impaired cognitive functioning
Various Psychological Tests of Aggression
Indirect approach to measure tendencies toward aggression
Liver Function Tests (LFT)
Medical evaluation of liver enzymes to detect alcoholism
Shedler-Westen Assessment Procedure (SWAP)
Accesses personality disorders https://swapassessment.org/
Conner’s Continuous Performance Test (CPT A test to measure attention and concentration II)
6.2 Ethics Programmes Ethical decision-making or ‘doing the right thing’ in an HRE means someone making a choice consistent with the moral, social, and/or corporate core values either by policy or by collective judgements, such as what a society believes in or what they stand for. Ethics is not about the way things are, it is about how they ought to be, in essence, a moral compass. Notions of right and wrong come from religious beliefs, family tradition, lessons learned (historical lessons), education, and communities. Cultures, across the international spectrum, have their own moral relativism (https://ethicsunwrapped.utexas.edu/glossary/moral-relativism). As such, deviations from the right and wrong doings can be detrimental to a company or agency who employs the individual committing the act of unethical behaviour. These detriments include loss of business, employee and community trust, loss of revenue, and loss of customers. Historically, people have been arrested working for HREs where their behaviours have been unethical. Examples of these unacceptable behaviours are inappropriate acceptance of money (bribes), inappropriate or excessive gifts or entertainment, falsification of records, over-exaggeration of records, lapses in self-reporting errors or omissions, price fixing, boycotting contractors or customers, stealing trade secrets, bribery, disparaging, misrepresenting or harassing the HRE; stealing from the HRE; time theft (reporting hours not worked); sexual harassment; discrimination, mistreating other employees (bullying); breach of confidentiality; creating a toxic workplace; unauthorized use of company equipment.
48
J. R. Stainback IV
HREs invite unethical behaviours because of the risks associated with the detrimental effects on the safety and security of people, national security, and/or reputation of a nation or company. Greed is typically the motivator of such behaviours. Research has shown when people commit minor unethical behaviours, it is likely that they commit major unethical behaviours and crimes (Hubbs 2012). HREs, such as nuclear power plants, chemical processing facilities, and other facilities whereby commodities can lead to serious consequences, should pay close attention to ethical decision-making and policies. Experienced HRE’s methods of defending against ethics violations are maintaining robust ethics policies and training programmes. These robust policies and training programmes have been shown to deter such behaviours as these programmes inform the employee of the necessary expectations and associated consequences directly from top management. Some HREs refer to these policies as their Code of Conduct in addition to how an employee should act ethically. Codes of Conduct emphasize conducting oneself in a manner that is respectable to the company and its employees while making good ethical decisions. Codes of Conduct and Ethical decision-making encompass making informed decision-making judgements, acting in accordance with HRE policies and overall values, performing work in accordance with internal and external standards, holding each other accountable, and learning from mistakes. Given the safety and security risks within an HRE, Ethics Programmes and Codes of Conduct policies with associated training programmes provide another barrier against aberrant behaviours in the workplace protecting safety and security. Most HRE companies provide their policies publicly via the internet offering an initial source for newly created draft policies for those seeking to create their own policies. New Ethics Programme and Codes of Conduct policies should reflect HREs unique corporate expectations, anthropological culture, laws, moral relativism, and lessons learned. HREs should be fully committed to safeguarding its reputation by ensuring a culture of integrity throughout its operations with a robust Ethics Programme. A well-trained human resource culture of doing the right thing is a vital asset that differentiates it from its competitors, earns the trust of its customers, and drives business success today and for years to come.
6.3 Behavioural Observation Programmes Supervisors, co-workers, and peers are likely to be the most effective “sensors” of aberrant or troubling human behaviour within the workplace. People change, their circumstances in their personal and professional lives impact their job abilities and sometimes, impact their professional judgement. Sometimes greed plays a role or motivates workers to change their moral judgements as well. Other indicators could include a worker’s inability to perform a task or making excessive errors potentially impacting the safety or security of a HRE. Indicators of these behaviours could go unnoticed unless someone observes them (being vigilant), take notice
6 Psychological Assessments, Employee Ethics and Behavioural …
49
(layered—cumulative), and reports them (acts) in effect when “seeing something, say something”. The establishment of a well-documented system, through training, for reporting such observations enables others to feel comfortable in reporting suspected behaviours. An effective Behavioural Observation Programme (BOP) complements and helps improve the defense-in-depth provided by insider mitigation, physical security, and fitness for duty elements of a robust HRP programme. The BOP focusses on personnel who have authorized access to certain materials, information, and facilities deemed important, and at high risk, to the welfare of the company or national security. The BOP provides assurances that individuals, who are trained, to look out for certain unusual individual traits, that may result in a risk to the company, national security, employees, or public health, without retribution. Such examples of these potential risks are drug, and alcohol use, mental health issues, unusual spending, foreign contacts, working odd hours, disregard for policies, fatigue, and depression. Key observable action points, developed by the HRE, inform the employee about the necessary BOP expectations. The BOP goes together with the other elements of a fully implemented HRP. HRE employees should recognize that a BOP provides an atmosphere of a “spirit of concern and not a cloud of suspicion” (WINS 2019). Elements of a robust BOP should provide specific concerning behaviours and potential warning signs providing adequate assurances for objectivity and fidelity (Crockett et al. 2014). Observing behaviours should become second nature, intuitive, and everyone’s responsibility. For example, frequently recognizable behavioural indications such as alcohol, drug abuse, mental health, fatigue, and, to some extent, even criminal behaviour are all openly recognizable. BOPs should work parallel with whistleblowing policies whereby these programmes reward behaviour; maintain autonomy, protect against retaliation, investigate, and follow up with lessons learned. Trained BOP (HRP) employees should be able to observe potential adverse effects on job performance; identify prescription and over-the-counter drugs, alcohol, dietary factors, illness, mental stress, and fatigue; recognize illegal drugs and indications of the illegal use, sale, or possession of drugs; observe and detect performance degradation, indications of impairment, or behavioural changes. While trained BOP (HRP) employees should have a “constant unease” while working high-risk tasks, they should not feel like BOP policies are negative to the workplace but rather a spirit of teamwork and concern for everyone’s safety and security. Trained BOP (HRP) employees should also be assured that no negative impacts will result from self-reporting (WINS 2019). Examples of observable, unusual, or erratic behaviours fall into three categories, Performance, Personal, and Appearance. Table 6.2 provides observable example behaviours of each category that should be considered within a BOP. Behavioural Observation Programmes are an integral part of a robust HRP. BOPs cannot be automated; it must be performed by everyone assigned to the programme continuously. Willing participation, self-reporting, vigilance, workplace culture, and teamwork all play a vital role in the success of a BOP.
50
J. R. Stainback IV
Table 6.2 Observable example behaviours to be considered within a BOP Unusual or erratic performance behaviours Abrupt degradation of work performance
Mental lapses on the job
Physical impairment
Timeliness—punctual
Repeated absenteeism
Excessive misuse of resources
Ethics violations—dishonesty
Excessive safety—security infractions (careless)
Difficulty in paying attention; forgetfulness
Disregard to company policies
Unusual or erratic personal behaviours Unusual adventure/thrill
Ego/Self-image
Vulnerability to blackmail
Compulsive and destructive behaviour
Family problems
Suspicious contacts/relationships
Unusual foreign trips
Works odd hours
Habitual use of alcohol
Exotic vacations beyond means
Expensive car—home—boat beyond means
Financial irresponsibility
Suicidal tendencies or threatened suicide
Unprovoked outbursts on the job
Excessive moodiness/anger/revenge
Depression—paranoia
Indication of deceitful/delinquent behaviour
Secretive or suspicious behaviour
Unusual affiliations or relationships
Ideology/Identification
Divided loyalty
Criminal behaviours—terrorism—terrorist affiliations
Unusual changes in wealth Unusual or erratic appearance behaviours Abrupt change uncharacteristically in attire
Loss of weight—loss of appetite
Puffy face—eyes—unusual expressions
Walking—talking
Excessive sleepiness
Unusual body language or voice patterns
References Crockett J, Wagner S, Greenhalgh D (2014) Best practices for behavioral observation programs at operating power reactors and power reactor construction sites (NUREG/CR-7183), June 2014. [Online]. https://www.nrc.gov/docs/ML1418/ML14189A355.pdf Hubbs R (2012) Find deviant behaviors, find fraud? Human resources issues could uncover worse crimes, April 2012. [Online]. https://www.acfe.com/article.aspx?id=4294972144 McCombs School of Business, University of Texas. Ethics unwrapped—moral relativism. [Online]. https://ethicsunwrapped.utexas.edu/glossary/moral-relativism WINS (2019) 3.2 Human reliability as a factor in nuclear security. World Institute for Nuclear Security
Chapter 7
Insider Threats and Strategies to Manage Insider Risk Sunil S. Chirayath
7.1 Background Industrial facilities and institutions that possess critical assets must implement robust physical protection systems (PPS). The measures and protocols incorporated in a PPS design can vary depending on the type of assets that need to be protected against adversaries. The PPS design also depends on the results of the threat assessment performed for the facility or institution. The three key PPS features of detection, delay, and response are designed with the objective of deterring and defeating malicious or unauthorized acts of adversaries directed at the facility. While designing a PPS for a facility, the three types of adversaries that are generally considered to be deterred and defeated are criminals, protestors, and terrorists. However, the facilities should also incorporate measures to prevent and mitigate insider threats because the three attributes that they possess (access rights, authority, and knowledge) pose an elevated threat to a facility even with a robust PPS in place. In various critical industries and institutions worldwide, several damaging insider incidents have been reported, such as the unauthorized removal of assets like materials and information, the sabotage of assets, the compromising of critical equipment, critical process and protocols, the attacking or influencing of personnel, etc. A set of such insider incidents can be found in the scholarly work. (Bunn and Sagan 2016), and a set of case studies on insider incidents are also elaborated in this book. Prevention and mitigation of insider threat was extensively discussed at the 2016 International Nuclear Security Summit (NSS) held in Washington, D.C. on March 31 and April 1. In line with these discussions at the NSS, a joint statement was released by 27 countries and the INTERPOL on establishing and implementing S. S. Chirayath (B) Center for Nuclear Security Sciences and Policy Initiatives (NSSPI), Texas A&M University, College Station, TX, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. S. Chirayath and M. Sai Baba (eds.), Human Reliability Programs in Industries of National Importance for Safety and Security, Risk, Reliability and Safety Engineering, https://doi.org/10.1007/978-981-99-5005-8_7
51
52
S. S. Chirayath
national-level measures for insider threat mitigation (The White House 2016), which highlights the prominence of this security challenge. Every industrial sector with critical infrastructure should understand the complex insider threat and implement an insider threat mitigation program.
7.2 Understanding an Insider An insider is a person with access rights to the assets and sensitive information of a facility who, in addition, also possesses specialized knowledge about the facility. Depending on their position at the facility, insiders possess different types of authority over the facility personnel with the power to make and execute decisions (The International Atomic Energy Agency (IAEA) 2011). An outsider could fraudulently manage to gain access to the facility to become an insider or an outsider could collude with an insider for an unauthorized act, such as sabotage of the facility. An insider can make a conscious decision to perform malicious act with the motive to do harm or careless act, which, even though unintentional could also cause harm. An insider can also make an unintentional and non-conscious decision to perform an unauthorized act with no motive to cause harm. There could be situations where an adversary exploits the ignorance of an insider to perform a malicious act. A common example of this would be a phishing attack used by scammers through electronic mails containing a malware link that, if executed, could lead to unauthorized access or leakage of critical information. Understanding the reasons for unintentional acts or motivations for intentional malicious acts of insiders are vital in implementing measures of insider threat mitigation. Unintentional and careless acts could be due to improper training, fatigue, or the influence of alcohol or drugs. Furthermore, insiders can be either passive or active. The passive insider acts without taking part directly in the theft or sabotage but could provide the information needed for the outsider to gain access. An active insider, for example, could compromise the PPS elements or falsify asset accounting records to facilitate theft. Active insiders can also use violence to carry out malicious acts. The motivations for an insider’s malicious actions could be financial gains or need, orientation to a certain ideology, disgruntlement and revenge, ego, coercion, or a combination of these motivations (The International Atomic Energy Agency (IAEA) 2020). A malicious insider could take advantage of a particularly vulnerable situation at the facility, for example, a fire incident, to act when the attention of the facility personnel is elsewhere. A “rational” malicious insider usually waits for an appropriate opportunity to unleash their malicious act. The Germanwings co-pilot deliberately crashing an Airbus in the Alps on March 24, 2015, when the other pilot went to the toilet is a clear example of this type of behavior (Final Report of the Accident 2016). The variety of risky behaviors that insiders could exhibit and identifying the myriad reasons for such behaviors make the implementation of insider threat mitigation measures complex.
7 Insider Threats and Strategies to Manage Insider Risk
53
7.3 Best Practices to Prevent and Mitigate Insider Threat Institutions and agencies have been aware of this insider threat issue for several years. The increase in insider incidents has led several institutions and agencies to implement programs to prevent and mitigate insider threats. Even though the objectives of these programs are the same, they are known by different names, such as Fitness-forDuty (FFD) Program, Human Reliability Program (HRP), Trustworthiness Program, etc., depending on the institution or agency implementing them in their facilities. The U.S. Nuclear Regulatory Commission (USNRC) published the requirements for their FFD Program in 1989 (amended several times thereafter) to be implemented in nuclear power plants and at nuclear fuel cycle facilities with strategic nuclear materials. The USNRC instituted this program through the code of federal regulations—10 CFR Part 26 (US Nuclear Regulatory Commission (USNRC) 1989). The performance objectives of this program are to provide reasonable assurance on the following aspects: individuals of the facility are trustworthy and reliable as demonstrated by the avoidance of substance abuse; the workplace is free from the presence of illegal drugs and alcohol; the effects of fatigue and individuals becoming less alert will not adversely affect competency in performing their duties—10 CFR Part 26.23 (US Nuclear Regulatory Commission (USNRC) 1989). The facilities under the purview of USNRC regulations were mandated to implement the FFD Program for all personnel having unescorted access to the protected areas of their facilities and other personnel with any other type of access. Three main components of this FFD program are: (1) the testing of employees for drugs and alcohol abuse, (2) employee fatigue management, and (3) behavioral observation program (BOP) for employees. Typically, the drugs and alcohol testing use 0.04% as the baseline for Blood Alcohol Content (BAC) and test multiple drugs in the panel such as marijuana, cocaine, opiates, phencyclidine (PCP), and amphetamines. Fatigue management sets maximum working hours and minimum time off. The BOP covers behavior at work including those of remote workers. The U.S. Department of Energy (USDOE) uses HRP, which is a security and safety reliability program. The HRP is meant to ensure that personnel at USDOE facilities with access to critical assets and information meet the highest standards of reliability and physical and mental suitability and is accomplished through a system of continuous evaluation of these personnel. The continuous evaluation is intended to identify individuals involved in alcohol abuse, legal drugs, and other substance abuse, the use of illegal drugs, or any other condition or circumstance such as physical or mental/personality disorders. The positions that need to be designated as HRP positions depend on the type of critical duties to be performed by the persons occupying such positions or having access to critical information (US Department of Energy 2004). The International Atomic Energy Agency (IAEA) recommends the establishment of a program to continuously determine the trustworthiness of personnel for the mitigation of insider threat, which is a prominent component described in their implementation guide on Nuclear Security Culture (The International Atomic Energy
54
S. S. Chirayath
Agency (IAEA) 2008). Some of the processes recommended by the IAEA for the determination of staff trustworthiness are: role-dependent regular staff screening processes, including checks on mental illness and drug/alcohol abuse; investigations and adjudication on failures of the screening processes; awareness-raising among employees about the importance of trustworthiness determination; training for unusual behavior observation; assessments to identify any degradation in trustworthiness due to substance abuse, workplace violence or criminal and aberrant behavior (The International Atomic Energy Agency (IAEA) 2008). For the IAEA member states joining their “Strengthening Nuclear Security Implementation” initiative, insider threat mitigation measures are incorporated in documents INFCIRC/869 (The International Atomic Energy Agency (IAEA) 2014) and INFCIRC/225/Rev5 (The International Atomic Energy Agency (IAEA) 2011), as technical guides for the implementation of the IAEA Nuclear Security Series No.8—Preventive and Protective Measures Against Insider Threats (The International Atomic Energy Agency (IAEA) 2020). The International Air Transport Association (IATA) has issued a guidance document advising its member airlines to identify and mitigate risk from insider threats through its Security Management System (SeMS) Manual (The International Air Transport Association (IATA) 2023). Some of the proposed solutions in the SeMS manual for insider threat mitigation include pre-employment vetting, measures for continuous behavioral observation, measures to prevent or deter an insider from exploiting their authority for unauthorized acts, measures for vigilance, and plans for communication while conducting daily work routines, measures of response when an insider threat is identified, and documenting policies that identify the roles and responsibilities of various departments (The International Air Transport Association (IATA) 2023). The Cybersecurity and Infrastructure Security Agency (CISA), which also covers chemical sector infrastructure security in the U.S., has highlighted the integral role of Human Resources (HR) professionals and their security counterparts to effectively detect, deter, and mitigate insider threats (Cybersecurity and Infrastructure Security Agency (CISA) 2020). CISA recommends insider threat intervention strategies to incorporate actions directly involving the person of concern and any potential victims or targets (Cybersecurity and Infrastructure Security Agency (CISA) 2020). CISA also recommends that insider threat management strategies should be flexible, capable of changing as the need arises, and open to continual reassessment and adjustments. CISA’s intervention strategies to minimize insider threat include referrals of the person of concern for professional evaluation, such as mental health, substance abuse, or anger management. Based on the evidence gathered about the person of concern, CISA recommends a course of administrative action, such as after care, work restrictions or change, suspension, discipline, expulsion, or termination, as appropriate. The IAEA also recommends the development of nuclear security culture at institutions as an effective tool in mitigating risks from insider threats. They published an implementing guide on nuclear security culture (The International Atomic Energy Agency (IAEA) 2008) based on the Edgar Schein model (Schein 1997) with the goal
7 Insider Threats and Strategies to Manage Insider Risk
55
of achieving effective nuclear security through organizational culture. Among several characteristics identified in this security culture model, the foundation of the model is the belief that a credible threat exists, and hence nuclear security is important. The need for continual determination of personnel trustworthiness is another prominent part of this nuclear security culture model aimed at the mitigation of insider threats. Part of implementing a nuclear security culture that works to minimize insider threat is to inculcate a feeling among the employees that they are respected and well-treated. There should be avenues for employees to raise questions and report complaints where they feel that help can be obtained rather than punishment. There should be also mechanisms in place to address mental illness or emotional difficulties of employees. All these employee satisfaction measures can reduce the number of disgruntled employees, who are far more likely to pose an insider threat. Insider threat mitigation programs can be successful only if all the employees cooperate and do their part in implementing the common elements (such as nonabuse of drugs and alcohol) as well as subjecting themselves to the BOP with the conviction that these elements are in place for the betterment of security. By agreeing to participate in the BOP, the employees give consent to be observed by or reported to the supervisor on changes in work behavior, social interaction behavior, and personal health behavior. The aspects observed for changes in work behavior include nonadherence to company policies, non-cooperation with co-workers, work quality and quantity, absenteeism, mistakes or bad judgments, risk-taking, being overzealous or overcautious, disdain for the facility, and difficulty in concentrating. With respect to changes in social interaction behavior, the observations made are how others react to the worker, along with demonstrating anger, sociability, increased complaints, changing speech behavior or content, and manipulating others. Changes in personal health behavior observed are thinking patterns, signs of possible drug or alcohol abuse, fatigue, appearance, and nervousness. One of the preventive measures of an insider threat mitigation program is preemployment vetting for access authorization to the facility, assets, and information. The access authorization provided to different areas of the facility and assets, as well as information about the facility, should be on a need-to-know basis and should be periodically reviewed. This type of control on access to the facility could help in minimizing the opportunities for performing malicious acts. It is important to provide escorts to temporary workers and visitors and to surveil their activities to make sure that they are in the right place and that they are performing their duties properly. Enforcing strict requirements for exchanging confidential information to these escorted workers and visitors is critical. The U.S. Nuclear Energy Institute developed a document titled “Nuclear Power Plant Access Authorization,” which provides good insight into procedures to be followed in implementing access authorization for trustworthy and reliable personnel in facilities of national importance (Nuclear Energy Institute (NEI) 2014). One of the good practices that should be followed for insider threat mitigation is to segregate the facility into zones, such as limited area, protected area, vital area, tools area, information management area, etc., and to control access by grouping the employees based on required access to these segregated areas, authority
56
S. S. Chirayath
(management, operators, technicians, security personnel, safety personnel, inspectors, custodians, maintenance personnel, janitorial, administrative, etc.), and knowledge regarding procedures, processes, physical protection system, tools, equipment, locations, etc. The grouping of personnel should be done in such a way as to make sure that the personnel types belonging to a group should have identical access, authority, and knowledge. This kind of facility segregation and employee grouping ensures that insiders have only the access, authority, and knowledge required to perform their job functions. Another consideration to minimize insider threat or insider opportunities to conduct a malicious act is to frequently take inventory or monitor critical assets and materials so that information on missing items, if any exist, can be identified in a timely manner and appropriate action taken. In addition to taking inventory of materials, statistical analysis of inventory data collected over a period could indicate whether materials are being stolen by taking advantage of the measurement uncertainties. An insider can very well know the limits of the measurement systems at the facility used for inventory which could be exploited for protracted theft of material. A PPS at the facility has detection, delay, and response elements mainly to deter and defeat external adversaries. However, a different set of detection, delay, and response measures are also required to protect the facility in the case of insider threats. Identifying malicious insider acts requires monitoring and surveillance using cameras and other equipment, especially for vital areas. Delays may be provided by facility personnel, via procedures such as the two-person rule to avoid the presence of a person alone in a vital area, or through physical barriers that can be initiated on detection of malicious behavior. In the case of response to insider malicious actions, operations personnel should be able to stop the action and minimize the consequence. This operations personnel response is in addition to the usual response provided by the security personnel as part of the PPS design.
7.4 Inference on Drugs and Alcohol Testing Data One of the biggest concerns for most HRP or Fitness for Duty (FFD) programs is drug and alcohol abuse by personnel. Random testing programs are seen as necessary for identifying problematic personnel but may be considered controversial or intrusive by the personnel themselves. However, data from organizations like the U.S. Department of Transportation (USDOT) supports the utility of institutionalizing and implementing such programs. A brief analysis of drug and alcohol testing data is provided below. The USDOT has created regulations to implement drug and alcohol testing through their respective regulating agencies. The USDOT agencies are the Federal Motor Carrier Safety Administration (FMCSA), Federal Aviation Administration (FAA), Federal Railroad Administration (FRA), Federal Transit Administration (FTA), and Pipeline & Hazardous Material Safety Administration (PHMSA). The USDOT outlines the annual minimum drug and alcohol random testing rates and
7 Insider Threats and Strategies to Manage Insider Risk
57
these rates for the year 2023 are shown in Table 7.1, which has been published by the USDOT (US Department of Transportation (USDOT) 2023a). The USDOT regularly publishes publicly the data on drug and alcohol testing. For example, the 2021 data is available online (US Department of Transportation (USDOT) 2023b). The data reported is elaborate and belongs to different types of tests such as pre-employment, random, post-accident, reasonable cause or suspicion, return-to-duty, and follow-up. For brevity, only the data on pre-employment and random drug and alcohol testing for the USDOT agencies and U.S. nuclear power plants are provided here. In 2021, the pre-employment alcohol-positive test rates for a BAC of 0.04% or higher were 0.084%, 0.038%, 0.095%, and 0.104%, for FMCSA, FAA, FRA, and FTA agencies, respectively. Similarly, the pre-employment drugs positive test rates for one or more drugs were 1.236%, 1.216%, 1.684%, 3.256%, and 1.889% for FMCSA, FAA, FRA, FTA, and PHMSA agencies, respectively. An inference from this pre-employment drug and alcohol-positive test rate data for 2021 is that drug-positive rates are far higher compared to alcohol-positive test rates. Even though these pre-employment positive test rates are small, these individuals might have gone unnoticed if these testing programs were not institutionalized through appropriate regulations. These personnel, if not tested, could have obtained some type of access to the facility, leading to an increase in insider opportunities. Similar types of data in 2021 available for U.S. nuclear power plants showed that their pre-employment drugs and alcohol combined positive test rate is 1.16% (US Nuclear Regulatory Commission and (USNRC) 2020). According to the USNRC regulatory requirement, the random alcohol and drug test rate needed in the U.S. nuclear power plants is 100%. In 2021, the random alcohol-positive test rates for a BAC of 0.04% or higher were 0.082%, 0.086%, 0.177%, and 0.091%, for FMCSA, FAA, FRA, and FTA agencies, respectively. Similarly, the random drug-positive test rates for one or more drugs were 0.633%, 0.642%, 0.494%, 0.801%, and 0.771% for FMCSA, FAA, FRA, FTA, and PHMSA agencies, respectively. An inference from this random drug and alcohol-positive test rate data for 2021 is that the drug-positive rates are higher compared to the alcohol-positive test rate. Even though these random positive test rates are small, they represent a population that might have gone unnoticed if these Table 7.1 2023 annual minimum drug and alcohol random testing rates established within DOT agencies DOT agency
2023 random drug testing rate
2023 random alcohol testing rate
FMCSA
50%
10%
FAA
25%
10%
FRA
25% (covered service)
10% (covered service)
25% (maintenance of way)
10% (maintenance of way)
50% (mechanical)
25% (mechanical)
FTA
50%
10%
PHMSA
25%
Not applicable
58
S. S. Chirayath
testing programs were not institutionalized through appropriate regulations. Similar types of data in 2021 available for the U.S. nuclear power plants showed that the pre-employment drugs and alcohol combined positive test rate is 0.6% (US Nuclear Regulatory Commission and (USNRC) 2020). According to the USNRC regulatory requirement, the random alcohol and drugs test rate needed in the U.S. nuclear power plants is 50%, which is more stringent compared to the USDOT requirement (see Table 7.1). An inference to be drawn from these pre-employment and random drug and alcohol-positive test rates for the USDOT agencies and U.S. nuclear power plant personnel is that they are similar in percentage, which is probably because the personnel working at these institutions are drawn from the same U.S. population. This is an important observation because if a particular industry of national importance in a country has implemented drugs and alcohol testing, the positive test rate data generated for the personnel in that industry may be similar for personnel working in other industries of national importance in the same country. The USNRC published data on the U.S. nuclear power plants also showed that the positive test rates were higher by about a factor of two for contractor/ vendor employees compared to the licensee employees, both for pre-employment and random drugs and alcohol testing, for the period from 1990 to 2021 with one exception. The exception is that from 2010 to 2021, that factor of two has steadily increased to a factor of four in the case of random drug and alcohol-positive test rates. This difference in positive testing rate is critical to note and underlines the need for testing non-permanent employees thoroughly to minimize insider opportunities.
7.5 Summary It is important to understand the different types of insiders and the need for insider threat mitigation. Protecting against insider threats is an inherently difficult challenge because of the insider’s access to, authority over, and knowledge of the facility. Several strategies and best practices for insider threat mitigation implemented at different facilities of national importance are elaborated in this chapter. These insider threat mitigation strategies are being practiced in some countries very rigorously through regulations and enforcement. The available data presented on such instances of implementation shows the need for a rigorous implementation of fitness-for-duty, fatigue assessment, and behavioral observation programs. In some countries, some of these strategies of insider threat mitigation are implemented as a best practice but not institutionalized through regulations and enforcement for various reasons, some of which are cultural or societal. Hence, there is a need to keep trying, assessing, testing, and exchanging ideas among countries around the world. There is no room for complacency, which is always the enemy of effective security.
7 Insider Threats and Strategies to Manage Insider Risk
59
References Bunn M, Sagan S (2016) Insider threats. Cornell UP, Ithaca, NY Cybersecurity and Infrastructure Security Agency (CISA) (2020) Insider threat mitigation guide. CISA Final Report of the Accident on 24 March 2015 at Prads-Haute-Bléone (Alpes-de-Provence, France) to the Airbus A320-211 registered D-AIPX operated by Germanwings. Bureau d’Enquêtes et d’Analyses pour la sécurité de l’aviation civile, France The US Nuclear Energy Institute (NEI) (2014) Nuclear power plant access authorization program, report no. NEI 03–01 [Revision 4]. NEI Schein E (1997) Organizational culture and leadership. Jossey-Bass, San Fransisco, CA The International Air Transport Association (IATA) (2023) Security management system (SEMS) manual. IATA The International Atomic Energy Agency (IAEA) (2008) Nuclear security culture: implementing guide—IAEA nuclear security series no. 7. [Online]. Available: https://www-pub.iaea.org/ MTCD/Publications/PDF/Pub1347_web.pdf The International Atomic Energy Agency (IAEA) (2011) Nuclear security recommendations on physical protection of nuclear material and nuclear facilities (INFCIRC/225/Revision 5). [Online]. Available: https://www-pub.iaea.org/MTCD/Publications/PDF/Pub1481_web.pdf The International Atomic Energy Agency (IAEA) (2014) Communication received from the netherlands concerning the strengthening of nuclear security implementation: joint statement on strengthening nuclear security implementation (INFCIRC/869). [Online]. Available: https:// www.iaea.org/sites/default/files/publications/documents/infcircs/infcirc869.pdf. Accessed 15 May 2023 The International Atomic Energy Agency (IAEA) (2020) Preventive and protective measures against insider threats: implementing guide—IAEA nuclear security series no. 8-G (Rev. 1). [Online]. Available: https://www-pub.iaea.org/MTCD/Publications/PDF/PUB1858_web.pdf The White House (2016) Joint statement on insider threat mitigation, nuclear security summit 2016. [Online]. Available: http://www.nss2016.org/document-center-docs/2016/4/1/joint-statementon-insider-threat-mitigation-gb. Accessed 19 Jan 2023 US Department of Energy (2004) Human reliability program, code of federal regulations, 10 CFR Part 712 US Department of Energy. Human reliability program handbook. [Online]. Available: https://www. energy.gov/ehss/human-reliability-program-handbook US Department of Transportation (USDOT) (2023a) Random testing rates. [Online]. Available: https://www.transportation.gov/odapc/random-testing-rates. Accessed 07 June 2023a US Department of Transportation (USDOT) (2023b) 2021 MIS data. [Online]. Available: https:// www.transportation.gov/odapc/2021-MIS-DATA. Accessed 07 June 2023b US Nuclear Regulatory Commission (USNRC) (1989) Fitness for duty programs, code of federal regulations, 10 CFR Part 26. USNRC, 1989 US Nuclear Regulatory Commission (USNRC) (2023) Performance reports. [Online]. Available: https://www.nrc.gov/reactors/operating/ops-experience/fitness-for-duty-programs/perfor mance-reports.html. Accessed 07 June 2023
Chapter 8
Insider Threat Case Studies from Industry Kelley H. Ragusa and Sunil S. Chirayath
8.1 Introduction Human Reliability Programs (HRP) addresses the needs of sensitive or high-risk industries to employ the best possible workforce. Trustworthiness is essential to safety because it can help ensure compliance with procedures and works in accordance with other safety measures to reduce human error. An HRP can also mitigate the greatest security threat to these industries, the threat of an intentional insider action that leads to a security breach. In designing an HRP, an understanding of the potential motivations and advantages an insider might have can aid in developing the most effective measures for monitoring employees in these sensitive or high-risk environments. To that end, we have compiled a group of real-life case studies from various industries. These case studies show how difficult it can be to detect an insider action and how complicated their motivations might be. Whether the intent is sabotage of a nuclear power plant or industrial espionage, we can learn a lot about how an insider thinks and acts by studying these historical cases.
8.2 Case Study 1: Suspected Sabotage at San Onofre Nuclear Power Plant On October 30, 2012, officials at the San Onofre Nuclear Power Plant in Pendleton, California in the United States reported to regulators that they had discovered coolant mixed with the oil used for the emergency backup diesel generator in Unit 3 of K. H. Ragusa (B) · S. S. Chirayath Center for Nuclear Security Science and Policy Initiatives (NSSPI) at Texas A&M University, College Station, TX, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. S. Chirayath and M. Sai Baba (eds.), Human Reliability Programs in Industries of National Importance for Safety and Security, Risk, Reliability and Safety Engineering, https://doi.org/10.1007/978-981-99-5005-8_8
61
62
K. H. Ragusa and S. S. Chirayath
the plant (Santana 2020). When the generator was taken offline for two weeks of maintenance, operators found two cups of coolant in the oil for the generator governor (Santana 2020). At the time of the discovery, management at San Onofre already had a number of problems on their hands. Unit 3 had been shut down for months after a small radiation leak led to an investigation that uncovered a major problem with the hundreds of steam generator tubes in Units 2 and 3 (Santana 2020). The fuel had been removed from Unit 3, and management had not yet established a clear plan to reactivate the unit (Zeller 2012). Moreover, Southern California Edison, the company that operated the plant, had announced in October 2012 that more than 700 employees would be laid off by the end of that year (Zeller 2012). At around the same time as the incident, the local union had addressed a letter to the San Onofre management suggesting that employees were “concerned about plans to restart reactor Unit 2 with a diminished workforce” (Zeller 2012). There were worries not only about the layoffs but also about compromised safety at the plant and the well-being of the workers. San Onofre also had the worst record of any other nuclear power plant in the U.S. with regard to safety concerns being reported directly to the U.S. Nuclear Regulatory Commission (NRC) by employees. In 2010, the NRC found that some San Onofre employees did not feel free to report safety concerns to their supervisors for fear of retaliation from management (Sharma 2012). David Lochbaum of the Union of Concerned Scientists told journalists, “If the San Onofre workers felt comfortable reporting concerns to their management, they would use that option and not have to turn to the NRC instead. The fact that so many workers don’t trust management to resolve safety concerns is troubling” (Sharma 2012). In response to the incident involving the diesel generator, an unnamed employee with Southern California Edison remarked, “While morale is as low as it has ever been and the environment is as chilled as it has ever been, no one I know could imagine doing such a thing…. On the other hand, I’m also not completely surprised because the environment really is so harsh. People can do crazy things when they are under extreme stress” (Zeller 2012). After the incident, Southern California Edison conducted a “comprehensive internal investigation” and made a statement that the company “is committed to the safety of the public and its employees and takes this matter very seriously” (Santana 2020). According to the company statement, the comprehensive investigation included “rigorous tests, a review of station logs and employee interviews” (Zeller 2012). They also increased security at the plant as a result of the incident (Santana 2020). In late November 2012, the company notified the NRC that their investigation had turned up evidence of a potential case of sabotage, possibly by an employee (Zeller 2012). Supervisors told employees at the plant that the Federal Bureau of Investigation would be taking over the investigation into the potential sabotage, and that criminal charges could be filed (Santana 2020). While the plant was never in any real danger because the unit was not operating, the backup generator would have been used to keep the reactor cool in the case of a loss of off-site power if the Unit were to come back online (Zeller 2012). A representative
8 Insider Threat Case Studies from Industry
63
told reporters that had the coolant not been discovered—and the generator allowed to run with the tainted oil—the generator would have likely failed (Santana 2020).
8.3 Case Study 2: Bombing of the Koeberg Nuclear Power Plant Under Construction in South Africa In December 1982, four bombs were detonated at the Koeberg Nuclear Power Plant in Cape Town, South Africa. The plant was under construction at the time, and the bombing occurred before the reactor units had been loaded with fuel, so there was no radiological release associated with the bombing, and no one was hurt (Birch 2015). Nevertheless, the bombing caused approximately half a billion Rand in damage and delayed commissioning of the plant for 18 months (Bamford 2006). The perpetrator of the bombing was an unassuming-looking former South African national sword-fighting champion named Rodney Wilkinson, aided by his speech therapist girlfriend Heather Gray (Mail and Guardian 1995). Wilkinson had been a collegiate fencing champion, but he was barred from competing in the Olympics due to sanctions against the apartheid government. He dropped out of college and served briefly in the Angola military before returning to South Africa and joining a commune near the Koeberg site (Birch 2015). When he and his girlfriend ran out of money, Wilkinson reluctantly took a job in construction at the plant. After 18 months at the plant, Wilkinson stole a set of plans of the Koeberg site at Gray’s encouragement, and the couple attempted to hand the plans over to the African National Congress (ANC), the anti-apartheid political party in exile in the newly independent Zimbabwe. They hoped the ANC could use the plans to carry out an attack. The stolen plans were vetted by Soviet and Western nuclear scientists, and the ANC came back to Wilkinson asking him to carry out the attack himself (Mail and Guardian 1995). As his job at the plant involved mapping pipes and valves as part of its emergency management, Wilkinson’s intimate knowledge of the plant and access to most areas made him an ideal choice to plant the bombs. According to Wilkinson, his job gave him access to the most sensitive areas of the plant, but the company never did a background check on him to determine his reliability. He also says that he never hid the fact that he was against the apartheid regime and its nuclear program, often making his views known over drinks at the plant’s recreation center (Birch 2015). According to senior ANC officer Sathyandranath Maharaj, to whom the couple gave the plans, “To attack Koeberg with a small rocket would have done very little damage,” but an insider with plans to the plant and the appropriate type of bomb could “do maximum damage” (Birch 2015). Wilkinson began by trying to smuggle bottles of whiskey similar in size to the bombs he hoped to use for the attack past the power plant’s three layers of security (Birch 2015). When he was confident he could bypass the security measures, he brought four Soviet-made limpet bombs to the site, one a day over four days. He
64
K. H. Ragusa and S. S. Chirayath
placed two of them in an area where they would destroy the electrical system of the plant and then put magnetic bombs at the bottom of the two reactor vessels (Birch 2015). The ANC timed the bombings to occur before the operation of the plant because, as Wilkinson remarked, “The purpose was to make a political statement and to cause as much damage as possible…. We didn’t want to hurt anybody, and I completely didn’t want to get killed” (Birch 2015). Wilkinson set a timer on the bombs to go off a day later, attended his going away party with co-workers, and escaped from South Africa that same evening, flying to Johannesburg and then being driven by a relative to the border on a bicycle (Mail and Guardian 1995). After the fall of the apartheid government, Wilkinson was granted amnesty and even given a post in the government. According to Maharaj, the bombing of the Koeberg Power Plant was seen as an act of bravery and one of the “most significant armed propaganda actions” taken by the anti-apartheid movement (Birch 2015).
8.4 Case Study 3: Elliot Doxer’s Case of Attempted Foreign Economic Espionage In 2006, Elliot Doxer was a middle-aged employee working in the finance department of Akamai Technologies, a web delivery and cybersecurity company in Cambridge, Massachusetts in the United States. Doxer wrote to the Israeli Consulate in Boston and identified himself as “a Jewish American who lives in Boston” whose chief desire was “to help our homeland and our war against our enemies.” In his emails, he also asked for “a few thousand dollars” and information about his son and his son’s mother, who lived in “a foreign country” (Stein 2010). He described his estranged spouse as a “terrible human being” who had caused him “tremendous suffering.” He went on to say, “Not enough bad things can happen to her if you know what I mean” (McMillan 2011). The Israeli government forwarded the strange email from Doxer to the U.S. Federal Bureau of Investigation (FBI) (Stein 2010). In September 2007, an undercover FBI agent posing as an Israeli intelligence agent contacted Doxer and arranged to exchange information through a dead drop. Over a period of approximately two years, Doxer went to the drop site 62 times to leave and/or retrieve information (Attorney’s Office and District of Massachusetts 2011). According to emails Doxer wrote, Akamai’s clients included the Departments of Defense and Homeland Security, Airbus, and “some Arab companies from Dubai” (Stein 2010). Doxer provided the undercover agent with extensive lists of the company’s customers and contracts, as well as information about the company’s employees that included their positions and full contact information (McMillan 2011). He also gave information about the company’s physical and computer security systems (Attorney’s Office and District of Massachusetts 2011).
8 Insider Threat Case Studies from Industry
65
Because the Israeli consulate felt compelled to reveal the potential espionage to the U.S. government, allowing the FBI to take over the investigation, no sensitive information was exposed (Attorney’s Office and District of Massachusetts 2011). Doxer pleaded guilty to charges he committed foreign economic espionage and was sentenced to six months in prison, six months in home confinement with electronic monitoring, and fined $25,000 (Attorney’s Office and District of Massachusetts 2011). Doxer became only the eighth person in the U.S. to ever be prosecuted for attempting to sell corporate secrets to a foreign government (McMillan 2011).
8.5 Case Study 4: Wen Chyu Liu—Corporate Espionage Wen Chyu Liu originally came to the United States from China for graduate studies. In 1965, he began working as a research scientist at Dow Chemical Company at their facility in Plaquemine, Louisiana. He worked in the development and manufacture of Dow elastomers, including the Tyrin chlorinated polyethylene (CPE). Dow is a leader in the research and manufacturing of CPEs, which are used in many industrial applications and products (Department of Justice 2012). According to the Justice Department’s statement on the case, Liu had access to “trade secrets and confidential and proprietary information pertaining to Dow’s Tyrin CPE process and product technology” in his capacity at the company (Department of Justice 2012). Liu retired from Dow Chemical in 1992, after which he began traveling throughout China trying to market his knowledge of Dow’s CPE technology. According to a statement by Dow Chemical, “Because of his education and position within the company, Mr. Liou knew of its immense value” (Harris 2012). His long tenure at Dow also meant that he had access to other current and former employees at the company. Evidence presented at his trial showed that Lui paid current and former Dow employees for materials related to CPE. He even paid a $50,000 bribe to one employee who provided him with Dow’s process manual and other CPE materials (Department of Justice 2012). Lui was indicted in 2005 on charges of perjury and conspiring with at least four other former or current Dow employees to steal trade secrets (Harris 2012). The perjury charge was related to a false statement Liu made during a deposition when he denied meeting with representatives of a Chinese company that was interested in developing a CPE plant. In 2012 he was sentenced to 60 months in prison and ordered to forfeit $600,000 and pay an additional $25,000 fine (Department of Justice 2012).
8.6 Case Study 5: Suspected Sabotage at Oskarshamn NPP In 2008, two workers were arrested on suspicion of planning an act of sabotage on the Oskarshamn nuclear power plant in Sweden (Staff and Swedish Nuclear Reactor Shut Down After Sabotage Suspicions 2008). A spot security check found traces of
66
K. H. Ragusa and S. S. Chirayath
triacetone triperoxide, known as TATP, on the handle of a plastic bag that one of them, a welder, was carrying (Staff and Swedish Nuclear Reactor Shut Down After Sabotage Suspicions 2008). TATP is a highly explosive material that is “extremely unstable, especially when subjected to heat, friction, and shock” (Ahlander 2008). The substance can be prepared in a home laboratory and has been used by suicide bombers and other terrorists in the past (Ahlander 2008). The workers were not employees of the company but were hired by a subcontractor to do routine maintenance on one of the three reactors at the plant (Staff and Swedish Nuclear Reactor Shut Down After Sabotage Suspicions 2008). At the time of the event, the reactor where they were working, reactor two, had been shut down to carry out maintenance (Staff and “Swedish Nuclear Reactor Shut Down After Sabotage Suspicions,” Deutsche Welle, 22, 2008). After the TATP was discovered, reactor one at the plant was also shut down as a precaution to allow for a search of the premises, since the men had security clearance to access that reactor building as well (Staff and Swedish Nuclear Reactor Shut Down After Sabotage Suspicions 2008). Authorities sealed off an area of 300 m around the unit and called in explosives experts (Staff and Two held in bomb scare at nuclear plant 2008). They also worked with plant managers to monitor any potential security risks (Staff and Swedish Nuclear Reactor Shut Down After Sabotage Suspicions 2008). Representatives of the company that operates the Oskarshamn plant made a statement at the time that they did not believe that the reactor’s safety was ever threatened (Staff and Two held in bomb scare at nuclear plan 2008). One employee remarked, “I think it is really great that they caught them. I’ve been annoyed at the strict security measures and all the controls you have to go through on your way to work. Now it turns out they work, so maybe I won’t be so annoyed anymore” (Staff and Two arrests in nuke plant bomb plot 2008). One of the two men had a prior criminal record, having been convicted of a “minor crime” (Ahlander 2008). During questioning, both men denied any wrongdoing and waived the right to legal counsel, according to a statement released by the police. They were released, but they remained under suspicion (Ahlander 2008). In the end, the substance was found to be from nail polish, and the plant was never actually in danger, (Oskarshamn Nuclear 2023) but the security measures in place had worked to identify the potential threat.
8.7 Case Study 6: Suspected Sabotage at Doel NPP In 2014, the reactor at the Doel 4 installation in Belgium shut down automatically. Inspectors found that there had been a “disturbance’ in the steam turbine in a nonnuclear part of the plant (Hope 2021). A spokesman for the company that operates the Doel plant stated that there was “an intentional manipulation” (Clerq 2014). Someone had opened a valve in the emergency evacuation system, which is intended to quickly evacuate 65,000 L of oil used to lubricate the turbine in case of fire (Hope 2021). The leak caused major damage to the high-pressure section of the turbine, which closed
8 Insider Threat Case Studies from Industry
67
the unit for four months and had an impact on Belgian power production and income for the plant (Clerq 2014). The investigation into the incident revealed that there were no surveillance cameras in strategic areas of the plant, including the area where the valve was located (Hope 2021). As such, no footage of the incident had been captured. The plant also had not yet implemented a two-person system to require a minimum of two employees for visits to strategic areas of the plant. This system was adopted after the event (Hope 2021). The company spokesperson stated at the time of the event that no outsiders had penetrated the plant (Clerq 2014). The company filed a criminal complaint for sabotage in the local prosecutor’s office, but the case was elevated to the federal level once the possibility of terrorism had been raised. While prosecutors concluded that the incident was an act of sabotage likely involving an employee or someone with access to the plant, they were unable to identify the perpetrator and no charges were filed (Hope 2021).
8.8 Case Study 7: Trade Secret Misappropriation Among Dupont Employees DuPont, a science-based products and services company based in the U.S., expended tremendous resources in the development of Kevlar, a strong synthetic fiber developed over 50 years ago and used around the world for body armor, fiber optic cables, and a variety of other automotive and industrial applications (Bureau and of Investigation 2015). Information surrounding the design and manufacture of Kevlar is thus a major asset for the company. From 2006 to 2009, Kolon Industries, a rival company in South Korea, conspired with former DuPont employees to steal trade secrets related to Kevlar technology (Kolon Industries 2015). Michael Mitchell was an engineer and salesman for DuPont who had become disgruntled and was terminated due to his poor performance (Steinmeyer 2010). When he left the company, DuPont informed Mitchell of the nondisclosure agreements that he has signed in his employment contract, demanded that he return any proprietary DuPont information, and required that he sign a termination statement affirming his compliance with these items (Steinmeyer 2010). Unfortunately, Mitchell kept numerous files containing sensitive and proprietary information about DuPont technologies on his home computer while he was an employee, and these files remained on his home computer after his termination (Brdiczka 2014). When Mitchell began searching for another job, he met with personnel from Kolon who hired him as a consultant (Bureau and of Investigation 2015). He provided Kolon with some of the sensitive and proprietary information in his possession and also contacted current and former DuPont employees to ask them for additional information (Steinmeyer 2010). Some of the employees he contacted reported him to Dupont management, and they launched a federal investigation and a forensic examination of
68
K. H. Ragusa and S. S. Chirayath
his computers. Mitchell agreed to be a cooperating undercover witness (Steinmeyer 2010). In this capacity, he continued to interact with Kolon personnel, informing them on multiple occasions that the information they sought was proprietary and could be considered trade secrets. He also arranged for a meeting with another “disgruntled employee,” a cooperating witness, which allowed investigators to gather evidence of Kolon’s intention of committing violations of trade secrets (Kolon Industries 2015). In December 2009, Mitchell pleaded guilty to theft of trade secrets and obstruction of justice and was sentenced to 18 months in prison (Kolon Industries 2015). The FBI agent in charge of the investigation said, “Protecting American companies from the theft of their trade secrets is a high priority for the FBI. Each year, billions of U.S. dollars are lost to foreign competitors who pursue illegal commercial shortcuts by stealing valuable advanced technologies. This case demonstrates the FBI’s ability to penetrate these highly sophisticated criminal schemes and bring their perpetrators to justice. Its outcome should send a clear message to foreign commercial actors who seek to illegally exploit American companies and steal our nation’s innovation and technology” (Kolon Industries 2015). Kolon was ordered to pay $85 M in fines and $275 M in restitution for the theft.
8.9 Case Study 8: A Case of Mixed Loyalties In 2006, FBI agents uncovered an entire library of sensitive documents, including design manuals related to U.S. military aircraft and the shuttle program, in a crawlspace at the home of former Boeing engineer Greg Chung (Bhattacharjee 2014). More than 250,000 pages of documents from Boeing, Rockwell Corporation, and other defense contractors were found at his home (Department of Justice 2009). Chung was a naturalized U.S. citizen originally from China who had worked for three decades as a structural engineer in the stress-analysis group at the Rockwell Corporation, and then at Boeing after they acquired the company (Bhattacharjee 2014). He worked on the shuttle project and was even hired out of retirement to help improve the shuttle design after the Space Shuttle Columbia accident in 2003 (Bhattacharjee 2014). He held a “secret” security clearance when he worked in the shuttle program (Attorney’s Office and California 2010). After Boeing took over Rockwell, the office was relocated, and, at that time, Chung took home dozens of boxes that could be of use to the Chinese aviation industry. As he approached retirement, he printed many documents and whited out names, information about who printed the documents, and warnings about their export-controlled status (Bhattacharjee 2014). Chung was driven by a sense of loyalty to his country of origin, indicating in one letter that he “would like to make an effort to contribute to the Four Modernizations of China” (Attorney’s Office and California 2010). Chung began to receive “tasking” letters from the Chinese aviation industry as early as 1979, indicating the types of specific technological information they were seeking (Attorney’s Office and California 2010). He made multiple trips to China from 1985 to 2003 to give
8 Insider Threat Case Studies from Industry
69
lectures on technology and tour cities throughout the country (Department of Justice 2009). Some of the trips were arranged and funded by the Chinese aviation industry (Bhattacharjee 2014). In 1985, they even arranged for his sons to attend a language immersion program while he and his wife traveled the country (Bhattacharjee 2014). While prosecutors had evidence that Chung had shared trade secrets with China in the 1980s, they could not prosecute him for that since the statute of limitations of five years for export control violations had passed. They also determined that none of the documents in his home were classified, so he could not be charged with sharing national secrets. Chung became the first U.S. citizen to be convicted of economic espionage under the 1996 Economic Espionage Act. Under this statute, simply possessing the documents with the intent of using them to help a foreign state was considered a crime (Bhattacharjee 2014). Chung was sentenced to 188 months in federal prison (Attorney’s Office and California 2010). In an interview after his conviction, Chung’s wife elaborated on the deep conflict Chung felt between his loyalty to his adopted country, where he had spent the majority of his life and his homeland. She said that Chung wanted to help China, but did not intend on hurting the U.S. “It’s not that complicated,” she said. “You make a friend, and they ask you, if you are an engineer or an artist, ‘Do you know this?’ And you tell them what you know. Simple as that” (Bhattacharjee 2014).
8.10 Conclusions These case studies reveal some trends regarding the types of circumstances that can lead to a malicious action by an insider. Some of these cases were clear-cut instances of poor security policies. At the Doel NPP, a lack of video surveillance and twoperson policies for sensitive areas in the plant led to a circumstance in which an insider felt they could complete an act of sabotage without being detected. When Michael Mitchell was terminated at Dupont, he was asked to sign an acknowledgment of compliance with non-disclosure agreements and attest to not having sensitive files on his computer, but a stronger security policy may have detected his noncompliance with these items or barred the malicious action altogether. In several of these cases, the acts occurred in times of transition or when the company was not operating under “normal” circumstances. In the San Onofre, Oskarshamn, and Koeberg cases, the nuclear power plants were either in repair or in construction. This led to circumstances in which more people may have had access to more areas in the plant. There may also have been more subcontractors not directly employed by the plant given access, as with the Oskarshamn case. In the RockwellBoeing case, Greg Chung was able to smuggle large numbers of documents out of the company when it changed locations. These cases show us that security is likely weakest when everything is not operating under “normal” conditions, leading to uncertainty surrounding procedures and the potential for increased access. Apart from issues with the circumstances at the company or their security policies, all of these cases represent types of individuals or situations that HRP must be
70
K. H. Ragusa and S. S. Chirayath
designed to guard against. Some of the cases—such as the sabotage at San Onofre, the case of Michael Mitchell, and, to some degree, the Koeberg case—involved disgruntled employees. The low morale and high uncertainty among the staff at San Onofre led to a situation in which employees felt desperate. Communication between management and staff there was so poor that the NRC was pulled in to serve as a mediator. In the end, those communication and management failures led to a security incident. Apart from simply detecting potential insider threats through monitoring of employees, under an effective HRP, managers should communicate effectively to staff and motivate them to feel connected to their workplace. The cases presented here also show a large variety of insider motivations that may push an employee to act against their employer. We have already discussed employee dissatisfaction, but we also see people motivated by politics (Rodney Wilkinson, Elliot Doxer), financial gain (Wen Chyu Liu, Michael Mitchell), difficult situations in their personal lives (Elliot Doxer), or feelings of loyalty to another country (Greg Chung). Continuous evaluation of employees should help identify these motivations. Rodney Wilkinson did not hide his political affiliations, but he was given access to the entire site at Koeberg after only a cursory initial background check, and no red flags were raised by his behavior. A review of Greg Chung’s travel to China and interactions with rival companies, as well as a review of his financial situation, might have uncovered his actions. Elliott Doxer was obviously strained by his estrangement with his wife and the removal of his child, which might have been revealed in a psychological review of his behavior and state of mind. Employees at all levels of the organization must also be trained to report unusual behavior and feel comfortable raising concerns with their supervisors. Moreover, when an employee is terminated due to poor performance (e.g., Michael Mitchell) or other concerns identified by the HRP, procedures to ensure that they will not be able to act against the company after termination must be in place. All of these case studies emphasize the damage that can be done by a person with access to sensitive areas of a facility or proprietary data, the authority to act unimpeded within the organization, and the knowledge of the organization’s procedures, layout, and security systems. In many cases, the insider’s advantages can completely nullify the security measures in place. An insider who can bypass a locked door or disable a surveillance camera can significantly reduce the effectiveness of a physical protection system, for example. Furthermore, as many of these cases show, the nature of the insider’s intimate knowledge makes them a threat to the organization even beyond their actual employment tenure. It is for this reason that an HRP is essential to preventing less reliable individuals from gaining these advantages from the beginning, motivating employees to remain loyal to the organization and its procedures, and removing people from sensitive positions as soon as they are deemed to be a threat.
8 Insider Threat Case Studies from Industry
71
References Ahlander J (2008) Swedish police release two men after nuclear scare, Reuters, 22 May 2008. Available: https://www.reuters.com/article/uk-sweden-nuclear/swedish-police-releasetwo-men-after-nuclear-scare-idUKSAT00522420080522. [Accessed 23 August 2022]. Bamford H (2006) Koeberg: SA’s ill-starred nuclear power plant, IOL News, 11 March 2006. Available: https://www.iol.co.za/news/politics/koeberg-sas-ill-starred-nuclear-power-plant-269096. [Accessed 23 August 2022] Bhattacharjee Y (2014) A new kind of spy: How China obtains American technological secrets, The New Yorker, 5 May 2014. Available: https://www.newyorker.com/magazine/2014/05/05/anew-kind-of-spy. [Accessed 23 August 2022] Birch D (2015) South African who attacked a nuclear plant is a hero to his government and fellow citizens, The Center for Public Integrity, 17 March 2015. Available: https://publicintegrity.org/national-security/south-african-who-attacked-a-nuclear-plantis-a-hero-to-his-government-and-fellow-citizens/. [Accessed 23 August 2022] Brdiczka O (2014) Insider Threats—how they affect US companies, ComputerWorld, 22 October 2014. Available: https://www.computerworld.com/article/2691620/insider-threats-how-theyaffect-us-companies.html. [Accessed 23 August 2022] De Clerq G (2014) UPDATE 2-Belgian Doel 4 nuclear reactor closed till year-end, Reuters, 14 August 2014. Available: https://www.reuters.com/article/belgium-nuclear-doel/update-2-bel gian-doel-4-nuclear-reactor-closed-till-year-end-idUKL6N0QK43R20140814. [Accessed 23 August 2022] Department of Justice, Former Boeing Engineer Convicted of Economic Espionage in Theft of Space Shuttle Secrets for China, 16 July 2009. Available: https://www.justice.gov/opa/pr/former-boe ing-engineer-convicted-economic-espionage-theft-space-shuttle-secrets-china. [Accessed 23 August 2022] Department of Justice, Former dow research scientist sentenced to 60 months in prison for stealing trade secrets and perjury, 13 January 2012. Available: https://www.justice.gov/opa/pr/for mer-dow-research-scientist-sentenced-60-months-prison-stealing-trade-secrets-and-perjury. [Accessed 23 August 2022] Department of Justice, Kolon Industries Inc. Pleads Guilty for Conspiring to Steal DuPont Trade Secrets Involving Kevlar Technology, 30 April 2015. Available: https://www.justice.gov/opa/ pr/kolon-industries-inc-pleads-guilty-conspiring-steal-dupont-trade-secrets-involving-kevlar. [Accessed 23 August 2022] Deutsche Welle Staff, Swedish nuclear reactor shut down after sabotage suspicions, Deutsche Welle, 22 May 2008. Available: https://www.dw.com/en/swedish-nuclear-reactor-shut-down-after-sab otage-suspicions/a-3352502. [Accessed 23 August 2022] Federal Bureau of Investigation, “Former DuPont Employee Sentenced, FBI News, 17 August 2015. Available: https://www.fbi.gov/news/stories/former-dupont-employee-sentenced. [Accessed 23 August 2022] Harris A (2012) Ex-Dow scientist gets 5-Year term for trade secret theft, Bloomberg, 13 January 2012. Available: https://www.bloomberg.com/news/articles/2012-01-13/former-dow-researchscientist-sentenced-to-60-months-in-prison#xj4y7vzkg. [Accessed 23 August 2022] Hope A (2021) Enquiry into nuclear plant sabotage comes to no conclusion, The Brussels Times, 13 August 2021. Available: https://www.brusselstimes.com/181163/enquiry-into-nuclear-plantsabotage-comes-to-no-conclusion. [Accessed 23 August 2022] Irish Times Staff, Two held in bomb scare at nuclear plant, Irish Times, 22 May 2008. vailable: https://www.irishtimes.com/news/two-held-in-bomb-scare-at-nuclear-plant-1. 1214377. [Accessed 23 August 2022] Mail and Guardian, How we blew up Koeberg (… and escaped on a bicycle), Mail and Guardian, 15 December 1995. Available: https://mg.co.za/article/1995-12-15-how-we-blew-up-koebergand-escaped-on-a-bicycle/. [Accessed 23 August 2022]
72
K. H. Ragusa and S. S. Chirayath
McMillan R (2011) Akamai employee tried to sell secrets to israel, CSO Online, 30 August 2011. Available: https://www.csoonline.com/article/2129457/akamai-employee-tried-to-sell-secretsto-israel.html. [Accessed 23 August 2022] Oskarshamn Nuclear Power Plant (2023) Available: https://en.wikipedia.org/wiki/Oskarshamn_N uclear_Power_Plant. [Accessed 23 August 2022] Santana N (2020) San onofre officials investigate potential case of sabotage, 8 December 2020. Available: https://voiceofoc.org/2012/11/san-onofre-officials-investigate-potential-case-of-sab otage/. [Accessed 23 August 2022] Sharma A (2012) San onofre workers lack state whistleblower protections, KPBS, 28 June 2012. Available: https://www.kpbs.org/news/2012/jun/28/san-onofre-safety-complaintsremain-highest-worker/. [Accessed 23 August 2022] Stein J (2010) Doxer case: Boston spy yarn with an unhappy ending, The Washington Post, 14 October 2010. Available: http://voices.washingtonpost.com/spy-talk/2010/10/doxer_a_boston_ spy_yarn_with_a.html. [Accessed 23 August 2022] Steinmeyer P (2010) Former DuPont employee sentenced To 18 months for trade secret misappropriation, Trade Secrets and Employee Mobility, 22 March 2010. Available: https://www. tradesecretsandemployeemobility.com/2010/03/articles/trade-secrets-and-confidential-inform ation/former-dupont-employee-sentenced-to-18-months-for-trade-secret-misappropriation/. [Accessed 23 August 2022] The Local Staff, Two arrests in nuke plant bomb plot, The Local, 21 May 2008. Available: https:// www.thelocal.se/20080521/11908/. [Accessed 23 August 2022] U.S. Attorney’s Office, Central District California, Former Boeing Engineer Sentenced to Nearly 16 Years in Prison for Stealing Aerospace Secrets for China, Federal Bureau of Investigation, 8 February 2010. Available: https://archives.fbi.gov/archives/losangeles/press-releases/2010/la0 20810.htm. [Accessed 23 August 2022]. U.S. Attorney’s Office, District of Massachusetts, Brookline man sentenced for foreign economic espionage, Federal Bureau of Investigation, 19 December 2011. Available: https://archives.fbi. gov/archives/boston/press-releases/2011/brookline-man-sentenced-to-for-foreign-economicespionage. [Accessed 23 August 2022] Zeller T, (2012) San onofre nuclear plant investigating possible sabotage of safety system, Huffington Post, 29 November 2012. Available: https://www.huffpost.com/entry/san-onofre-nuclearplant-sabotage_n_2215260. [Accessed 23 August 2022]
Chapter 9
Human Reliability Programme in Industries of National Importance Magapu Sai Baba
Abstract Accidents do happen. They could be due to negligence of individuals, failure or malfunctioning of the equipment, or not adhering to the prescribed procedures. When an accident happens, the impact could be localised to the geographical boundaries of the incident/accident. The chapter introduces Human Reliability Program. The Human/People Reliability Programs (HRP) are put in place to protect establishments of national importance from human errors and malicious personnel acts. Describes the importance of various factors that must be considered to ensure the safe operation of highly hazardous industries. It discussed the cultural diversity, migration, automation, and advancements like AI will impact. Cognitive behavioural aspects are essential in understating the human response to situations. The chapter suggests some approaches to be adopted.
Accidents do happen. They could be due to negligence of individuals, failure or malfunctioning of the equipment, or not adhering to the prescribed procedures. When an accident happens, the impact could be localised to the geographical boundaries where the incident/accident happened. It impacts people and resources within those boundaries. These can be referred to as industrial accidents and occupational hazards. Some of the industrial accidents that occurred in the recent past are listed below.
M. Sai Baba (B) School of Natural Sciences and Engineering, National Institute of Advanced Studies, Bengaluru, India e-mail: [email protected] School of Social Sciences, Ramaiah University of Applied Sciences, Bengaluru, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. S. Chirayath and M. Sai Baba (eds.), Human Reliability Programs in Industries of National Importance for Safety and Security, Risk, Reliability and Safety Engineering, https://doi.org/10.1007/978-981-99-5005-8_9
73
74
M. Sai Baba
9.1 Case Studies from Different Industries Large crane crash at Hindustan Shipyard in Vizag: A massive 70-tonne crane under trial run in the Hindustan Shipyard Limited (HSL) came down crashing on August 1, 2020, crushing 11 victims to death. The crane was erected about two years ago but was not yet commissioned for regular operations due to a change in contractors. When the management started conducting a trial run of the crane, its cabin and base snapped and collapsed as a deadweight load was being tested. The committee which probed the crane crash concluded that the accident had occurred due to design and erection defects and levelled charges against the Mumbai-based firm to which the initial contract was given (Kumar 2020). GAIL pipeline explosion in Andhra Pradesh: A pipeline of the Gas Authority of India Limited (GAIL) exploded following a leak on June 27, 2014, near Nagaram village, leading to the demise of 15 people and injuring 12. Before the event, people in the village complained to the GAIL authorities about the leak, but no action was taken to plug it. The incident occurred in the early morning and just when tea stalls and small eateries were opening. Police officials said they suspected that gas was leaking, and when a tea stall vendor struck a match which led to the explosion. The subsequent inquiry led to the cause of the explosion to be the corrosion and not installing internal corrosion monitoring systems (DNA Web Team 2014). Gas leak in an industrial plant at Visakhapatnam: A gas leak occurred at the LG Polymers chemical plant located on the outskirts of Visakhapatnam, Andhra Pradesh and India, during the early morning of May 7, 2020. The resulting vapour cloud spread over around three kilometres, affecting nearby villages. As per the National Disaster Response Force (NDRF), the death toll was eleven, with more than one thousand people becoming sick after being exposed to the gas. Preliminary investigations concluded that the accident was likely due to insufficient maintenance of units storing the styrene monomer, improper storage, and operation errors. The plant stored two thousand metric tonnes of styrene in tanks, which were left unattended due to the lockdown imposed by the COVID-19 pandemic. Styrene monomer must be kept between 20 and 22 °C as higher temperatures would result in rapid vaporisation of the chemical. It is believed that a computer glitch in the factory’s cooling system allowed temperatures in the storage tanks to exceed safe levels, causing the styrene to vaporise. When the maintenance activity was in progress, the gas leaked from the plant and spread to nearby villages impacting several people living in the nearby areas (Visakhapatnam Gas Leak 2021). Deepwater Horizon Oil Spill: On April 20, 2010, an explosion at the BP Deepwater Horizon oil rig released over five hundred and twenty million litres of crude oil into the Gulf of Mexico. Eleven rig workers lost their lives, and millions of marine lives were lost. Even a decade later, it is a struggle to restore marine life. The oil mainly had gone within two years, the beaches are now essentially clean, and the fishing industry has rebounded. Still, the damage to Deepwater corals and fragile reefs may never be repaired. The cause of the explosion was attributed to a failure of the cement at the base of the 18,000-foot-deep well. That led to a cascade of
9 Human Reliability Programme in Industries of National Importance
75
human and mechanical errors that allowed natural gas under tremendous pressure to shoot onto the drilling platform, causing an explosion that took 87 days to get under control. The investigations on the accident revealed that the result of poor risk management, last-minute changes to plans, failure to observe and respond to critical indicators, inadequate control response and insufficient emergency bridge response training by companies and individuals responsible for drilling at the Macondo well and for the operation of the Deepwater Horizon. The report concluded that BP, the well’s owner, was ultimately responsible for the accident (Deepwater Horizon Oil Spill 2021). The loss caused to the marine environment cannot be restored.
9.2 Accidents and Their Causes The case studies described above fall under two categories: the first one, in which the impact is limited to the geographical boundary of the industry and affects the industrial workers and the second, which causes damage to people living in the nearby surroundings and the environment. The large crane crash at the Hindustan Shipyard in Vizag belongs to the first category. The incident of the GAIL pipeline explosion caused loss of life and damage caused to people who were not directly involved in the industry. This brings to the fore the need for safety practices to be adhered to even beyond geographical boundaries. Gas leak at the industrial plant and the Deepwater Horizon oil spill belongs to the second category. A gas leak in an industrial plant at Visakhapatnam impacted people living beyond the geographical boundary of the plant, resulting in loss of lives and long-term health impacts of the people. Incidents like Deepwater Horizon Oil Spill have a significant scale impact on marine life and the environment. Often, challenging to restore the ecosystem to its earlier state. While putting safety practices in place, the effect of such incidents needs to be looked into. Equally important is to consider the damage caused to the environment.
9.2.1 Industrialisation Considerable progress has been made, and efforts have been put into ensuring industrial safety. This has provided the protection of equipment, safe operation of the processes and, more importantly, industrial workers. Events like Chernobyl & Fukushima Daichi Nuclear Disasters and Bhopal Gas tragedy significantly impacted society and did long-lasting damage to the people and environment. These are well debated, discussed and documented. When the accident or incident can impact people living beyond the geographical boundaries of the establishment, the efforts required and procedures to be put in place take a different connotation. Some of these installations can be classified as high-hazard installations because these installations are vulnerable to high-risk,
76
M. Sai Baba
low-probability incidents/accidents. The safe and reliable operation of high-hazard installations depends not only on technical excellence but also on individuals and the organisation. It is recognised that safety is a technical matter and depends on the people working in the lifecycle of high-hazard installations. Human Reliability (to describe human performance) is widely used in fields requiring a high standard of safety, such as aviation, petroleum and chemical process, and nuclear industries. High-hazard installations such as nuclear power plants employ the Défense in Depth (DID) philosophy to reduce the likelihood of accidents.
9.2.2 Automation, Advancements in Science and Technology The Industrial Revolution marked a significant turning point in history. Almost every aspect of daily life was influenced and enhanced the standard of living (Industrial Revolution 2021). Migration is another consequence of industrialisation. Improved transportation made migration easier (History of Human Migration 2021). Cultural diversity in a society is another consequence of migration and adds another connotation when implementing safety and security processes. Accounting for cultural diversity and its impact on ensuring adherence to a safety culture has become an essential factor to consider while formulating the safety and security procedures. Advancements in the domain of Science and Technology facilitate more and more automation (Automation 2021). When it comes to processes and operations, it is seen that the component of automation is getting enhanced. With the advent of robotics and artificial intelligence (AI), it is envisaged that time is not far when totally automated industrial processes become a reality. Automation is guided by human understanding of the process and designable processes for complete automation. Automation is transforming societies in reducing human intervention in processes. It is achieved by predetermining decision criteria, subprocess relationships, and related actions and embodying those predeterminations in machines. Automation covers applications ranging from a household thermostat controlling a boiler to an extensive industrial control system with tens of thousands of input measurements and output control signals. Complicated systems, such as modern factories, aeroplanes, and ships typically use all these combined techniques. There are several benefits of automation, including savings in labour, materials input costs, and enhanced quality (Automation 2021). The extensive automation in the industries makes them vulnerable to sabotage and minor errors of judgement of the operating personnel, either by mistake or intentional, resulting in considerable damage.
9 Human Reliability Programme in Industries of National Importance
77
9.3 The Role of Human Reliability Programmes An industry’s safe and reliable operation of critical importance, like a nuclear power plant, depends on technical excellence, of both individuals and organisations. There is a realisation that safety is a technical matter and depends on the people’s response to the situations one encounters. Therefore, management systems must be developed to be effective and efficient, considering technical and human performance interactions. The Human/People Reliability Programmes (HRP) are put in place to protect establishments of national importance from human errors and malicious acts of personnel. These are more important for industries requiring a high standard of safety, such as aviation, petroleum and chemical process, and nuclear industries. The consequence of any mishap impacts the people and environment beyond the geographical boundaries. The events can happen inadvertently (can be termed as accidents) or advertently (either for personal financial gain or due to ideological and political affiliations). The Human Reliability Programmes have put in evaluation processes to assess the people who occupy important positions or access critical material. Periodical reviews (medical and psychological evaluation), training, and retraining are essential components of such HRP programmes. Ultimately, the success of such programmes depends on the commitment of individuals in the programme. While psychological testing and interviews should be a part of HRP, ongoing education and testing should also be an inevitable part of HRP. Furthermore, measuring the transition of human beings from normal to abnormal behaviour (potentially threatening safety) is an enormous challenge to HRP. It is also critical to know if there is a transition or an abrupt event. Currently, screening processes are in place which can neither reveal nor address the cause of any symptom. There is an underlying presumption that human response to any technology is constant, whereas technology is dynamic. Understanding and accommodating that diversity and identifying the odd person is important. An HRP provides organisations with a process to help ensure that the highest quality employees are retained in these critical/sensitive positions (US Department of Energy; Coates and Eisele 2014). The advancements in Science and Technology led to the enhancement of processes, automation and increased efficiency of operations. The advent of AI, Machine Learning is taking the technologies to a different level. While technology is advancing, the humans who benefit from such developments need to enhance their skills, persisting with the commitment to excel and meet growing individual aspirations. Cultural, religious beliefs and value systems continue to play an essential role in shaping personal growth. The challenge is keeping pace with the advancements in technology and human aspirations and ensuring and retaining efficient performance. When humans are posed with situations while handling technology, the human response to conditions shapes the impact. Human Machine Interface (HMI), Human Reliability Analysis (HRA) studies focus on accommodating the human response to situations. HRA is a comprehensive and structured methodology that applies qualitative and quantitative methods to assess the human contribution to risk. HRA helps in quantifying the likelihood of human error for a given task.
78
M. Sai Baba
Several HRA techniques have been developed for use in a variety of industries. Human Reliability Assessment is a process where the overall human performance in operating actions is studied. Many human reliability assessment techniques are partially based on the behavioural psychology. A Human–Machine Interface (HMI) is a user interface or dashboard that connects a person to a machine, system or device. When errors occur or mistakes happen, they are either unintentional or intentional. Accidental errors can be addressed by training or enhancing the technology to reduce them. The real challenge is to address the consequences caused by deliberate responses. Wellness Programmes and Employee Assistance Programmes are some of the initiatives the organisations have adopted to address the issues or avoid/ minimise such incidents being caused by individuals. Several programmes have emerged to address the issues arising out of individuals causing the damage intentionally. Some of them are: • • • • • •
Human Reliability Programme (Most common, specific to DOE, USA) Personnel Reliability Programme Personnel Assurance Programme (Initial name of HRP) People Reliability Programme Biological Personnel Reliability Programme (Biosecurity; For Chemical, Biological, Radiological and Nuclear weapons). Similar but different are:
• Fitness for duty • Personnel Clearance Programme (appeared in 1987). These programmes introduce a process to aid management evaluations and the identification of aberrant behaviour. Essentially, these programmes aid in decisionmaking, complement security and improve employee quality. They also address the issue of sabotage (causing damage intentionally) and ensuring safety and security. The challenge is to identify employees who are dishonest, disloyal and unreliable. While managing such cases, the measures put in place should not be hampering assuring quality or productivity. Sai Baba et al. (2022) have reviewed the Human Reliability Programs in the Nuclear Industry of various countries. The Human Reliability Programme or its equivalent that is in place addresses the symptomatic issues. Cognitive behavioural aspects play an essential role in understating the human response to situations. While safety protects individuals, organisations and assets from the elements likely to cause harm/injury, security protects individuals, organisations and assets against external threats and criminal activities. Studying the HRP implemented in various counties is essential to understand the approach and assess how cultural diversity is reflected. Equally important is to examine how human reliability aspects have been addressed in industries such as nuclear, aviation, chemical, mining and transportation. The challenge is to find the common thread among them in the same sector in various countries and in diverse industries. This has been the focus of the collaborative research undertaken by the National Institute of Advanced Studies (NIAS-India) and Texas A&M University, USA (Sai Baba and Chirayath 2021).
9 Human Reliability Programme in Industries of National Importance
79
The world is pursuing an alternative to replace the significant component of baseload power that is currently being met from the energy generated from coal plants to mitigate climate change. Nuclear energy would play an essential role in meeting this requirement. Nuclear safety and security have common goals: to protect people, property and the environment from the consequences of any untoward event or incident. Robust HRP programmes are in place in several nuclear establishments in various countries to address safety and security issues. The aberrant behaviour of individuals causing intentional damage is the transition of humans from being normal to not being normal. It is essential to understand when and why such change is happening. More important is to know whether it is a step process or continuum and, if so, how to quantify such change. Can the new technologies that are emerging and have become part of our lives help identify the transition? AI-based biometrics and behavioural signature analysis (facial expressions, vocals, kinesics, oculesics, etc.) are already employed in various strategic organisations. Equally important is to study how cultural diversity impacts safety and security. Large-scale migration has brought cultural diversity in societies. There is scope for using the newer tools that are becoming available to identify the deviation from the expected behaviour of individuals. Success in such an effort would ensure early detection of aberrant behaviour and minimise the damage to the organisations and, more importantly, societies. There is a necessity of assessing human reliability across different domains. The situation arising and environment created when every co-worker and employee is regarded as a threat and the adequate procedures to implement a human reliability programme require attention. There is a need for continuous evaluation and identifying the factors that make an individual unreliable. It is also essential to find the common threads of HRP across the critical industry to replicate the best practices and avoid duplication of efforts. Whether over-emphasis on the need for safety and security in nuclear plants can be counterproductive? The public becoming more apprehensive about advanced technologies like nuclear energy needs to be thought of.
References Automation (2021). https://en.wikipedia.org/wiki/Automation. Accessed 24 Aug 2021 Coates CW, Eisele GR (2014) Human reliability implementation guide. Oak Ridge National Laboratory Deepwater Horizon Oil Spill (2021). https://en.wikipedia.org/wiki/Deepwater_Horizon_oil_spill DNA Web Team (2014) GAIL gas pipeline explosion: 2 senior GAIL officials suspended over Andhra Pradesh pipeline. https://www.dnaindia.com/india/report-gail-gas-pipeline-explosion2-senior-gail-officials-suspended-over-andhra-pradesh-pipeline-mishap-1998115 History of Human Migration (2021). https://en.wikipedia.org/wiki/History_of_human_migration Industrial Revolution (2021). https://en.wikipedia.org/wiki/Industrial_Revolution Kumar VR (2020) 11 dead as crane collapses at Hindustan Shipyard in Vizag. https://www.thehin dubusinessline.com/news/11-feared-dead-as-crane-collapses-at-hindustan-shipyard-vizag/art icle32247621.ece
80
M. Sai Baba
Sai Baba M, Chirayath S (2021) Summary of the meetings on human reliability program in industries of national importance jointly organized by NIAS, India and Texas A&M University, USA. https://nsspi.tamu.edu/summary-of-the-meetings-on-human-reliability-program-in-ind ustries-of-national-importance-jointly-organized-by-nias-india-and-texas-am-university-usa77305/ Sai Baba M, Chowdhury I, Nidhi V, Shah AK, Chirayath SS (2022) J Nucl Mater Manag XLIX 64–76 US Department of Energy, Human reliability program handbook. https://www.energy.gov/ehss/ human-reliability-program-handbook Visakhapatnam Gas Leak (2021). https://en.wikipedia.org/wiki/Visakhapatnam_gas_leak
Part II
Perspectives on Human Reliability Programs from Academia
Chapter 10
Relevance of Human Reliability Program: The Role of Academic Institutions Reshmi Kazi
The art of war teaches us to rely not on the likelihood of the enemy’s not coming, but on our own readiness to receive him; not on the chance of his not attacking, but rather on the fact that we have made our position unassailable. —Sun Tzu
The human factor is an integral and indispensable aspect of any system, including the nuclear industry. With rapid advancement in nuclear technology and its associated complexities, the human–system interface has become a crucial platform for ensuring safety and security within critical infrastructures. Any human–system interface is intrinsically linked with human performance, personnel subsystem, organizational design, and the regulatory environment. These essential elements require critical focus when characterizing risk assessment. The results derived from risk assessment based on stringent methods are significant for developing safety-level indicators to measure, improve, and upgrade the effectiveness of plant operational structures. The results are also critically important for supporting operational risk management. This is a fundamental requirement to facilitate the development of an effective human reliability program to “develop the methodology and data for characterization, measurement, and prediction of human performance” (Moray and Huey 1988). Thus human reliability program holds a crucial place for the reduction of risk factors and effective operation of the nuclear industry. What role can academic institutions play in enhancing the importance of human reliability programs? Academic institutions have a critical role to play in this regard. It is an undeniable fact that there is a scarce domestic narrative on critical issues like human reliability program and its relevance to sensitive infrastructure. Most of the sources that are generally referred to while accounting for the relevance of human reliability programs are primarily from the Western domain. The academic R. Kazi (B) Nelson Mandela Center for Peace and Conflict Resolution, Jamia Millia Islamia, New Delhi, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. S. Chirayath and M. Sai Baba (eds.), Human Reliability Programs in Industries of National Importance for Safety and Security, Risk, Reliability and Safety Engineering, https://doi.org/10.1007/978-981-99-5005-8_10
83
84
R. Kazi
community has a significant responsibility to develop a meaningful narrative on India’s critical requirements premised upon a framework that pertains to potential threats and vulnerabilities emanating from within the critical infrastructures. For this purpose, academic institutions must be more forthcoming with organizing various workshops, symposiums, seminars, and conferences to generate awareness and interest in this critical domain. This paper attempts to explore the importance of human reliability programs, especially during the prevailing COVID-19 period. Has the pandemic and its ensuing consequences resulted in heightening concerns regarding the effective implementation of the human reliability program? This paper also examines whether the human reliability program is being severely challenged by advanced technologies with complex designs being widely used within the nuclear industry. An integral aspect of the paper will focus on the role academic institutions needs to play in emphasizing the importance of human reliability programs. In doing so, academic institutions have a responsibility to be a conduit between the nuclear industry and the public by generating confidence among the latter about safety and security aspects of the nuclear industry. Academic institutions also have an important role in emphasizing the challenges it encounters in ensuring an effective human reliability program.
10.1 Human Reliability Program and Its Relevance The 1979 Three Mile Island accident, the 1986 Chernobyl nuclear calamity, and the 2011 Fukushima Daiichi nuclear disaster are exemplars of that human error constitutes one of the main causes of nuclear accidents in nuclear power plants. Unfortunately, until the Three Mile Island catastrophe occurred, the operator’s role was paid scant attention in the proper functioning of the nuclear power plants. However, the investigation of the Three Mile Island accident showed that its “root causes” were primarily “human-related” (Erp 2002). “The Three Mile Island accident also occurred at a time of mounting public concern and debate over the future of nuclear power” that induced “confusion,” “suspicion,” and “fear” among the people at large (Cantelon and Williams 1980). In August 1986, Valeriy A. Legasov, the head of the Soviet delegation to the Post-Accident Review Meeting, organized by the International Atomic Energy Agency (IAEA), categorically stated: “I advocate the respect for human engineering and sound man–machine interaction. This is a lesson that the Chernobyl taught us” (Munipov 1992). Further, the IAEA’s Summary Report on the Post-Accident Review Meeting on the Chernobyl Accident also expressed that the root cause of the Chernobyl accident, it is concluded, is to be found in the so-called human element. The Chernobyl accident demonstrates that the absence of a vital safety system is indicative of an absence of safety culture in a nuclear power plant (IAEA 1986). In the case of the Fukushima Daiichi nuclear power plant, safety analyses “conducted during the licensing process and during its operation, did not fully address the possibility of a complex sequence of events that could lead to severe reactor core damage” (Amano 2015). In particular, the operators (TEPCO) “failed
10 Relevance of Human Reliability Program: The Role of Academic …
85
to identify the vulnerability of the plant to flooding and weaknesses in operating procedures and accident management guidelines” (Amano 2015). Recently, in India, the deadly boiler blast in Unit V of the thermal power stationII of the Neyveli Lignite Corporation (NLC) in Neyveli, Tamil Nadu in July 2020 is an example of how human negligence can result in “failure to ensure safety” in a critical infrastructure (Prasad 2020). Reportedly, the accident led to the death of six workers, and several others were injured. It is worthy to note that only two months earlier, in May 2020, a boiler explosion occurred in Unit VI, NLC killing five persons. In June 2020, two persons died and four were taken ill after inhaling benzimidazole vapors, which leaked at the pharma plant in Parawada, Vishakapatnam. On May 7, 2020, eleven people died and over 350 were admitted to hospitals after styrene monomer gas leaked from a chemical plant belonging to LG Polymers at RR Venkatapuram in Visakhapatnam (Bhattacharjee 2020). The gas reportedly affected at least five villages, leading to the evacuation of about 2,000 people from the 3km radius. Several charges, including that of negligence, were registered against the management of LG Polymers. These tragic instances bear testimony to the fact that human-related factors constitute one of the major reasons for most accidents. This fact can be also attributed to cases of incidents in any critical infrastructure, including the nuclear industry. Human reliability is thus a critical component of the operations within the nuclear power plants, and ignoring it will only lead to human catastrophe. For the purposes of this study, the term “Human Reliability Program” (HRP) is defined as a series of selective controls which are implemented and integrated to identify the “insider threat” from current and prospective employees who are dishonest, disloyal, and unreliable.1 According to Ranajit Kumar, Head, Nuclear Control and Planning Wing, Department of Atomic Energy, “a personnel reliability program is a program in which one looks at the social background or the societal aspects of a person: how he is living; how his behaviors are changing” (Guenther et al. 2013). A human reliability program encapsulates a “series of selective controls which are implemented and integrated to identify the insider threat from current and prospective employees who are dishonest, disloyal and unreliable” (Baley-Downs 1986). Thus, a human reliability program consists of an “evaluation” plan to ensure that only trustworthy people who are physically, psychologically, and professionally dependable are employed to work in a facility housing-sensitive materials or other critical infrastructure dealing with information or systems. [Ibid] The program must constitute necessary tools to test whether people are competent enough to undertake complex and demanding responsibilities. However, this does not imply that a human reliability program is foolproof or error-free. The evaluation process should be applied periodically to all employees, irrespective of whether they are regular, contractual, or casual employees within a nuclear power plant. The process must be continuous, rigorous, and involve several layers. Any access to sensitive installations or otherwise prohibited zones must be thus premised upon a laborious process. Any lax approach in this process would inevitably make it less rigorous and compromise the evaluation plan and professional competency. 1
See Baley-Downs (1986, pp. 661–665).
86
R. Kazi
10.2 Why Is the Human Reliability Program Important? Much of the safety aspects in nuclear power plants are contingent upon human and organizational factors. As Dhillon stated in his book Human Factors and Human Error in Nuclear Power Plant Maintenance, “In nuclear power plant maintenance, human factors play an important role because improving the maintainability, design of power plant facilities, equipment, and systems with regard to human factors helps to directly or indirectly increase plant availability, safety, and productivity” (Dhillon 2019). It is not an exaggeration to say that human reliability occupies a place of critical importance and constitutes an integral requirement in all sensitive installations within the nuclear industries. The fundamental importance of human reliability in the prevention of major accidents and its application in overall safety strategy cannot be ignored. Human reliability is directly related to the skills and competencies of the personnel. Hence, human reliability must not be flouted while assessing the reliability of critical systems. In the aftermath of historic nuclear catastrophes, states with nuclear programs have repeatedly emphasized the need for the timely availability of sufficient number of professionals with the necessary competencies to avoid major mishaps. A dedicated cadre of competent nuclear professionals with appropriate training is an important requirement so that potential human errors can be identified and be prevented, corrected, or their consequences can be reduced. “HRA is a comprehensive and structured methodology that applies qualitative and often also quantitative methods to assess the human contribution to risk” (Philippart 2018). This facilitates in effective management of human error and develops human reliability. Consequently, the overall system reliability is enhanced. Conversely, when human error is not adequately managed, human reliability is lower and the overall system reliability suffers. An example of the consequences of an ineffective HRP is illustrated by an incident that occurred on a US nuclear missile base in Montana. In January 2014, 34 Air Force officers of the US nuclear missile base in Montana were accused of “cheating” in “proficiency tests” related to safety issues concerning “launching nuclear missiles” (BBC News 2014). This was totally “unacceptable behavior” with potential consequences emanating from human error for the security of the nuclear program.
10.3 Assessing Reliability The process of identifying and detecting human error to prevent mishaps is fraught with extreme challenges. Human error and human reliability are two sides of the same coin, since all human reliability problems are intrinsically linked with human error. Another limitation is that human reliability cannot be effectively quantified simply because personnel cannot be analyzed akin to machines, equipment, and tools. “Most analyses of human performance fall in the realms of psychology and sociology rather than engineering” (Sutton 2014). Despite so, human error and human reliability
10 Relevance of Human Reliability Program: The Role of Academic …
87
cannot be treated in isolation from “system reliability” (Sutton 2014). The human reliability assessment process involves a layered and extremely rigorous approach. The personnel are subjected to various changing environments from time to time and it might have a bearing on his psychological (direct and indirect) behavior and actions. The HRP and its assessment tools may infringe on privacy, personal freedom, and professional ethics. The human mind is not like a machine; it is more dynamic. The human psyche is evolving. Hence, the same inputs may give different outputs because of the changing environment. Most of the assessment tools are indicative, so there exist several challenges while making judgments/decisions on the basis of indicative assessments. However, regular background checking may provide some relief in this regard; an element of continuity is required.
10.4 Human Reliability and the Emerging Security Environment Existing literature on terrorism establishes that post 9/11, incidents of terrorism have reached new heights of lethality. The Country Reports on Terrorism 2020, states how international terrorism is gradually gaining ground due to the growing threat from racially or ethnically motivated violent extremism and the global COVID-19 pandemic that has complicated the terrorist landscape, creating both challenges and opportunities for terrorist groups (Country Reports on Terrorism 2020). Terrorist groups have significantly adapted cyberspace to continue radicalizing others and inspiring to perpetrate violent attacks globally. It emerges that “a strategic perspective would also be realistic in acknowledging that terrorism is a permanent threat that must be consistently monitored and contained” (Smith 2008). According to Mapping the Global Future : Report of the National Intelligence Council’s 2020 Project, several social, economic, and geopolitical factors will have a deep impact on the future trends of terrorism starting from 2020 (National Intelligence Council 2004). The world is already reeling under the impact of a global pandemic caused by COVID-19 that has the potential to affect the nuclear industry, particularly the human factors. Given the emerging trends, a security strategy might portend a heightened vulnerability to sabotage or unauthorized access to or use of sensitive materials, information, and the systems of a nuclear facility. During the Nuclear Security Summits of 2010–2016, the international community acknowledged with trepidation the rising possibility and risk of an act of terrorism. According to the IAEA Incident and Trafficking Database (ITDB), as of 31 December 2019, the ITDB contained a total of “3686 confirmed incidents involving nuclear and/or other radioactive materials reported by participating States since 1993” (IAEA 2020). Of these 3686 confirmed incidents, “290 incidents involved a confirmed or likely act of trafficking or malicious use” (IAEA 2020). A more realistic WMD scenario is represented by a plot disrupted by Jordanian authorities on 20 April 2004 (Gorka and Sullivan 2004). They arrested at least three
88
R. Kazi
men accused of constructing a crude binary bomb, composed of conventional explosives mixed with toxic chemicals, to bomb the Jordanian Prime Minister’s office, the US Embassy, and the General Intelligence Department (GID) in Amman. Reportedly the plan was to use toxic chemicals to produce “a cloud of toxins that would disperse around the GID compound and out in the city, inducing mass casualties” (Gorka and Sullivan 2004). Hussein Sharif Hussein one of the co-conspirators in the plot, admitted on Jordanian television that this attack was intended to “carrying out the first suicide attack to be launched by al Qaeda using chemicals” (‘Confessions’ of Group Planning Jordan Chemical Attack 2004). Additionally, there might be an increase in risks of threat of subversion, insider threat, and cyberattacks. In June 2010, “the malware Stuxnet was designed to sabotage the Iranian nuclear program by targeting industrial control systems (ICSs)” at the Natanz uranium enrichment facility, that “temporarily took out nearly 1,000 of the 5,000 centrifuges Iran had spinning at the time to purify uranium” (Collins and McCombie 2012; Sanger 2012). In November 2019, the Nuclear Power Corporation of India Limited (NPCIL) confirmed that “identification of malware in NPCIL system was correct” at the Kudankulam Nuclear Power Plant (Nuclear Power Corporation of India Limited 2019). The attack was noticed on September 4 by the Indian Computer Emergency Response Team (CERT), which is the national agency for responding to cybersecurity incidents (Nuclear Power Corporation of India Limited 2019). Investigations revealed that a large amount of data from the KKNPP’s administrative network had been stolen. Serious apprehension arose that organized “cyberattacks on nuclear power plants could have physical effects,” like sabotage, theft of nuclear materials, or a reactor meltdown, “especially if the network that runs the machines and software controlling the nuclear reactor are compromised” (Das 2019). In March 2015, the co-pilot of Germanwing flight no 4U9525 intentionally crashed in a valley in France killing 150 persons on board (BBC News 2017). Such incidents bear serious consequences and have a deep impact on the security of critical industries such as nuclear and aviation.
10.5 Nuclear Safety Culture and Human Factor For people working in hazardous infrastructures, the avoidance of error is an absolute necessity for the creation, maintenance, and sustenance of a safe working environment. This approach helps to foster dependable behaviors and facilitate accident prevention. Human factor has a very important role in improving, upgrading, and sustaining nuclear safety culture. A pervasive nuclear security culture is essential for successfully protecting nuclear and other radioactive materials from external and insider threats. Similarly, an effective safety culture is a part of the defense in-depth strategy in all nuclear power plants. “A common characteristic of nuclear power plants is that a sizable amount of radioactive and potentially highly hazardous material is concentrated in a technological system under the centralized control of a few operators (working in a control room environment)” (Meshkati 2007). Hence, any nuclear
10 Relevance of Human Reliability Program: The Role of Academic …
89
disaster spells catastrophic consequences and poses long-lasting consequences for the population and adjoining environment. It can also have long-standing consequences for the entire world, as is evident from the nuclear disasters like Three Mile Island, Chernobyl, and Fukushima. As succinctly described by the former Director General of the IAEA, Dr. Mohamed ElBaradei: “nuclear and radiological risks transcend national borders- that an accident anywhere is a accident everywhere” (Dasgupta and Gupta 2011). It must be borne in mind that the interrelation between the human, technological, and organizational factors is a complex intermix. What adds to the complexity further is while the technological factors or hardware can be fixed, there is no temporary remedy for human and organizational factors. In fact, any quick-fix approach can portend consequences in the form of breakdowns in radiation-emitting units. Lack of “relevant human factor considerations, the causes of human error, and commonalities of human factor problems in major disasters” was evident in the cases of Three Mile Island, the Chernobyl accident, the Fukushima nuclear disaster, and the Bhopal Gas Tragedy (Meshkati 1991). Unfortunately, the critical role of the human element in safety issues was not taken into consideration until after the Chernobyl accident. However, with increasing demands for civilian nuclear energy, a growing number of nuclear power plants, and continuing instances of nuclear disasters, it is necessary that a safety culture is encouraged and sustained within nuclear power plants. To achieve this objective, it is imperative that the operative or human element is factored in as an integral aspect of in-depth strategy to mitigate operative or human error in nuclear power plants.
10.6 Challenges Involved Despite the enormous advantages of developing and maintaining an effective human reliability program, its implementation poses serious challenges. A human reliability program is a complex scheme based on rigorous and multilayered processes. It involves rigid exercises conducted upon personnel, who are subjected to various changing environments that affect (directly and indirectly) their psychological behavior and actions. Thus, it is logical to presume that a stringent human reliability program and its assessment tools might infringe on privacy, personal freedom, and professional ethics. This can certainly bear upon an individual’s competency and risk someone being disregarded, leading to disgruntlement. A probable consequence of such an occurrence is the heightened risk of insider threat. “Efforts at screening for behaviors will also inevitably lead to concerns about either (1) failing to identify someone who has the disqualifying background or behavior” or (2) wrongly assessing “someone as having a disqualifying background or behavior when she or he does not” (National Research Council 2009). “These two concerns are inversely related: the more one tries to avoid letting a security risk through the screening, the more one expands the number of otherwise competent individuals who will ‘fail’ the test” (National Research Council 2009). This is evident in the problems
90
R. Kazi
faced by decision-makers and policymakers in polygraph and lie detection tests. The objective of polygraph tests is to conduct investigations of a particular event. The process involved is fixated on a particular incident and reflective, so that facts can be discerned from the person undergoing such test. The goal is to detect untrustworthy individuals posing threats to assets of national importance and to extract reliable information with the purpose of assisting the investigation process. However, polygraphs are not always reliable, and, hence, they are far from perfect. For example, exams conducted for national security screening might cover a range of questions, which might be obscure or notional. In such cases, the examiner’s perspective might be at variance with that of the examinee on opinion-based questions or even when asking true/false questions, causing the individual’s responses to become the basis for deriving conclusions about his or her behavior in the future. It appears that an otherwise effective human reliability program and the tests process it uses may be socially engineered to be prejudiced. It might run the threat of unreasonably rejecting personnel who do not represent a security risk. Trained personnel are susceptible to psychological manipulation of the employer due to the prevailing culture of seniority and because they are trained to respect the authority. However, the whole process could result in attacking unknowing, qualified personnel and forcing them into a prejudicial situation. The elimination of personnel on the basis of an imperfect screening constitutes a compromise to building a dedicated cadre of technical and research personnel in the workforce. “If there is a large pool of potentially qualified applicants, a manager could decide that she or he can ‘afford’ to incorrectly exclude someone who is in fact qualified because there are many others from whom to choose” (National Research Council 2009). Paradoxically, even though the employer remains unaffected, “‘failing’ the test could have harmful consequences for the innocent individual involved, especially if there is a risk of any lasting career impact” (National Research Council 2009). It also poses the challenge of incurring the financial costs of finding competent replacements when highly trained workers are wrongly eliminated for appointments to sensitive posts. This is also detrimental to the research community when highly skilled personnel are unfairly eliminated, especially if there are limited number of skilled personnel from which to recruit. Hence, screening for security purposes may be biased and harmful both for the qualified personnel and for the nuclear industry. In the current state of international affairs, certain trends pose enormous risks to human reliability. With increasing demand for global nuclear disarmament, there can be potential risks to thousands of nuclear scientists who have been operating in important positions, leaving them vulnerable to terrorist groups desiring nuclear weapons. This was a matter of serious concern after the disintegration of the former Soviet Union, which left many nuclear scientists unemployed. An unemployed nuclear scientist is a potential insider risk who can compromise his skills and know-how to terrorist groups for financial gains. Preventing this “brain drain” of sensitive information is a crucial aspect of nuclear safety and security. The nuclear industry may be also expected to undergo dynamic changes due to workplace trends that will lead to fewer jobs and the replacement of the aging workforce (Kazi 2013). This can open up
10 Relevance of Human Reliability Program: The Role of Academic …
91
potential risks for disgruntled employees whose reliability may not remain reliable any longer. It is necessary to take adequate steps for redirecting the older workforce of nuclear weapons experts toward civilian work. As the world progresses toward the total elimination of nuclear weapons, it is important for authorities and experts to take measures for the rehabilitation of unemployed nuclear weapons workers and scientists to prevent them from becoming sources of insider betrayals. In view of the evolving trends, human reliability programs need to be adequately prepared to meet the emerging challenges. A human reliability program must not function merely as a punitive program. On the contrary, it should provide assurance that the personnel will not be unfairly penalized for reasons like illnesses or personal issues that occur because they are human (Kazi 2013). The human reliability program must not generate a culture of fear of retribution; it should enable personnel to become their own advocates and allow them to unreservedly engage in conversation with human reliability program officials without fear of being judged on their personal choices, as opposed to their reliability. There must be no application of social engineering to manipulate, intimidate, or coerce individuals in order to gain their trust and then exploit it to their disadvantage. A burdensome human personnel reliability program may not only drive competent personnel away, but also ignore their competence. A positive program respects individuality, while also ensuring that any perceived harmless behaviors are not actually menacing to safe and secure operations. For example, personnel should not be judged on day-to-day personality eccentricities, sexual orientation, hobbies, or otherwise innocuous personality traits, but rather on whether they can be blackmailed, coerced, or otherwise manipulated. Invoking such ideals establishes a nonpunitive environment for reporting. “Maintaining these principles and practices ensures a holistic evaluation of the person through a compassionate programme in which any investigation or disciplinary action is conducted with respect and in a consistent, objective, and confidential manner that is transparent to the staff member” (Higgins et al. 2013). It helps in developing an environment of trust and a culture of responsibility where, instead of manipulating individuals, efforts are made to foster not only personal performance but also the capabilities of other personnel. This can be part of the defense in-depth strategy and increases the overall safety and security within critical installations like the nuclear power plants.
10.7 Role of Academic Institutions in Fostering Human Reliability Academic institutions have a cardinal role in providing effective training and proper guidance that will essentially help in developing the fundamental basis for fostering human reliability. Effective training helps in the good mentoring of personnel to help them reach their full potential. It also infuses an element of positivity within the work environment by promoting organizational goals, developing new skills, and creating
92
R. Kazi
expectations of appropriate behavior. These factors help in facilitating other aspects of effective monitoring as part of an efficient human reliability assessment process. Thus, academic institutions play a crucial role through mentoring and training to lay the essential foundation upon which an efficient organization comprising confident and satisfied personnel operates. This foundation is crucial for a safety culture where individuals realize the importance of responsible human performance to reduce the probabilities of operational or human error in nuclear power plants. Academic institutions must impart versatile and multidimensional training and educational experiences to stimulate and foster the varied interdependent aspects in developing a culture of trust and responsibility within organizations vulnerable to human error. In supporting and advancing this objective, academic centers of learning must initiate and sponsor discussions and dialogue among personnel from the scientific, technical, and security communities. This is an enormous responsibility, as it helps in developing a common understanding. This is also part of the defense in-depth strategy that pertains to the indigenous nuclear industry. Dr Sitakanta Mishra, Associate Professor at the Pandit Deendayal Petroleum University (PDPU) states: The role of academic institutions in promoting human reliability in nuclear power plants is important. Human reliability is about the expected integrity and commitment of an employee towards the job he/she is assigned, and this is much more crucial. These qualities can be infused or developed through sensitization on the importance and consequences of laxity of the task an employee is assigned with. Though employees can be infused with these qualities through in-service training, inculcation of such high standards of reliability can best be undertaken at the university level when students join this course and adopt this career. University learning processes can infuse a sense of integrity and trustworthiness in the pupils while training them to take up responsibilities in the nuclear industry. Previous instances of laxity and their consequences can be brought to their notice. During the formative years, trainees can be sensitised how their personal integrity, trust, and commitment determine the future of the nuclear industry at large. The university course curriculum must include the human reliability issue (Mishra 2020).
A significant responsibility that lies upon the academic institutions is that they must help develop a domestic narrative on India’s critical requirements premised upon a framework that pertains to potential threats and vulnerabilities existing within the industry. Presently, domestic narratives on critical issues like human reliability program and its relevance to sensitive infrastructure is scarce. Most of the sources that are generally referred to while accounting for the relevance of human reliability program are primarily from the Western domain. There is an urgent need to develop, cultivate, and advance literature that focuses on indigenous conditions and how it can contribute to strengthening the safety-security interface. For this purpose, academic institutions must be more forthcoming with organizing various workshops, symposiums, seminars, and conferences to generate awareness and interest in this critical domain. These constitute relevant platforms where there is interface among scientific and technical experts and practitioners from varied sectors. Such programs help foster debates and discussions on various issues like vulnerability assessment, reliability assessment, operative error, preventive measures for mitigation of human errors, and security issues. The domestic narrative must factor in the risks and threats endangering our system, as well as the sociocultural aspects inherent in our system. This
10 Relevance of Human Reliability Program: The Role of Academic …
93
is particularly relevant since it encourages newer thinking in developing a roadmap for addressing and mitigating safety and security issues within organizations with hazardous work ethics.2 As a consequence, academic institutions help develop a dedicated cadre of nuclear personnel in various sectors like nuclear science, engineering, policy studies, and other specialized groups with ascertained tasks. Academic institutions are crucial in promoting research and development activities. This is vital from the perspective of developing a domestic discourse on the importance, relevance, and requirements of India’s nuclear safety and security imperatives. To encourage the pursuit of excellence in various aspects of nuclear security sciences (including engineering sciences), mathematics, strategic, and policy relevance in a manner that has major significance for the progress of indigenous nuclear strategic technological capability. Academic institutions have a vital role in providing an academic framework for: integrating basic research with technology development; encouraging inter-disciplinary research; and developing an environment for attracting high-quality manpower in various aspects related to nuclear studies that will encourage and prepare students to take up a career in nuclear science and technology and related areas. Academic institutions must always adhere to the highest ethical standards, and one measure of that is to put the good of the students first. This attitude will nurture a culture that promotes excellence in research, fosters innovation and creativity, and values responsibility not only for one’s own actions but for the actions of the organization as a whole. Academic institutions are also able to promote nuclear studies as important for the development of society. At present, there are “thirteen research institutions and organizations aided by the Department of Atomic Energy (DAE)” (Aided Institutions and Other Organizations of DAE 2020). Many of these institutions have been pursuing academic programs in various aspects of nuclear security sciences. For example, the Homi Bhabha National Institute is a research university that educates students at the doctoral and master’s level and has created a pool of considerable talent in its faculty members and students (Distinctive Characteristics of the Institute 2020). “The Constituent institutions and the Off-campus Center of HBNI have a wide variety of basic as well as applied research programs aimed toward the development of indigenous research programs on various facets of nuclear science and technology” (Distinctive Characteristics of the Institute 2020). This has further supported “indigenization” of the India’s nuclear power program and its associated fuel cycle facilities. “India’s participation in the international venture ITER has been possible only because of robust basic research in plasma physics and the development of related technologies at IPR and other institutions in the country” (Distinctive Characteristics of the Institute 2020). The Institute also pursues doctoral programs and Ph.D. research candidates produce a great deal 2
Hazardous work ethics involves jobs which are dangerous jobs that expose employees to significant physical risks like unhealthy working conditions that can cause in permanent disability, ill health and psychological damage. It may also affect injury or even potential death. The general attitude towards such physically dangerous jobs is that the employees are informed about the risks and they are provided reasonable safety measures and adequate compensation as incentives to take the jobs. On ethical grounds, such industries while exposing people to such degrees of physical risks must ensure that it is for a good social cause.
94
R. Kazi
in terms of research output. Moreover, fresh graduates from HBNI that are inducted in the DAE for a job are provided the opportunity to study nuclear science and engineering for a “period of one year at the BARC Training Schools” (Distinctive Characteristics of the Institute 2020). A variety of skill-oriented diploma programs, such as the Diploma in Radiological Physics, are conducted at Bhabha Atomic Research Center (BARC)/Tata Memorial Center and train a significant percentage of students of the Institute as either employees or potential employees. All BARC Training School students are trained to be competent potential employees, and approximately fifty percent of the doctoral students are employees of the DAE institutions. Various other universities are also pursuing courses in varied core areas to encourage the development of competent and skilled employees. For example, the Noida Campus of Amity University in Uttar Pradesh offers B.Tech., M.Tech., and Ph.D. programs in important aspects of nuclear science and technology, including nuclear fission systems like nuclear reactors and nuclear power plants, nuclear safety, and even nuclear weapons. The Nuclear Science and Technology prospective area provides potential employment to a large number of technically qualified professionals for the nuclear industry (About Us 2020). Ms. Archana Yadav, Assistant Professor at the Amity Institute of Nuclear Science and Technology, states that “to build an HRP program academia can help by providing trained manpower. Students who have been educated and trained at prestigious organizations (both in India and abroad) will help in improving the processes and procedures at par international standards, by bringing in new energy and ideas from working in varied atmospheres” (Yadav 2020). Yadav further emphasized the need “to incorporate stress as one of the parameters while preparing such models to ensure human are not always demeaned in life threating and dangerous scenario. Trainings and procedures might also include some acclimatization to systems and procedures in stress full conditions” (Yadav 2020). This is a crucial requirement since it would produce a workforce of trained skilled and permanent workers that are less prone to committing errors. Besides, a trained and permanent workforce will reduce the necessity of hiring contractual employees who are effectively inexperienced. Contractual employees can be less reliant, particularly in times of nuclear accidents in hazardous installations like nuclear power plants. Being inexperienced, and possibly ignorant of the local language (of the accident site), may seriously impede response mechanisms to control damage to the population and environment. Hence, a trained, skilled, and permanent workforce is an important requirement to strengthen the human factor in nuclear power plants. The Pandit Deendayal Petroleum University (PDPU) in Gandhinagar, Gujarat has a Department of Nuclear Energy with excellent faculty who are training students in relevant areas like Nuclear Power Plant Engineering, Nuclear Thermal Hydraulics, Nuclear Safety, and Security. The University has already organized several significant events of relevance. From May 14 to 16, 2019, PDPU organized a workshop on “Incorporating Both Technical and Human Elements to Reduce Hazards and Vulnerabilities in Industries of National Importance.” The main agenda of the workshop was to analyze the human element as a part of the comprehensive security system and identify practical approaches to evaluating and providing defense in-depth. In
10 Relevance of Human Reliability Program: The Role of Academic …
95
September 2019, PDPU organized a training course titled, “Nuclear Security for Scientists, Technicians and Engineers.” The purpose of this training course was to support the development of security competence for scientists, technicians, and engineers who would likely be very familiar with common safety practices but may know less about security practices. Notably, the security professions frequently recruit employees from varied backgrounds, such as the military and the police. Consequently, scientists, technicians, and engineers, as well as security practitioners, may have different worldviews, and it is sometimes “challenging to bridge this gap” (Concept Note 2019). It was expected that with the completion of training, participants would improve their competence and begin to consider how their knowledge, skills, and professionalism comprise an important asset that can be harnessed to improve security. The author was a participant in both events and found that the workshops conducted group discussions, case studies, and exercises as part of the engagement to facilitate an effective exchange of ideas and thoughts. This was extremely beneficial not only for the students but also for the various professionals from varied backgrounds like academia, think tanks, practitioners, and the industry. In April 2019, the National Institute of Advanced Studies (NIAS) in Bengaluru, in collaboration with Texas A&M University-USA, organized a “Discussion Meeting on Human Reliability Program in Industries of National Importance” in NIAS. The meeting aimed to serve as a platform for discussing ideas and information on the latest developments in the area of human reliability program in industries of national importance. The deliberations included safety and security case studies and lessons learned therein. The engagement was expected to identify good practices in safety and security with respect to human reliability programs. Academic institutions need to emphasize and propagate the idea that nuclear safety and security are national functions that require a national perspective for operational purposes. Establishments like universities have the potential to conduct fruitful exercises in which various institutions engage in bridging the gap between safety and security aspects to develop an interface between these crucial elements within the nuclear industry. Moreover, academic institutions help conduct exchange programs to give exposure to students that will enhance their interest in the area studies. In this regard, Jamia Millia Islamia, New Delhi is conducting a Masters’ course on Arms Control and Disarmament within the Nelson Mandela Center for Peace and Conflict Resolution. Among various topics, the course focuses on issues like nuclear terrorism to generate awareness about safety and security challenges, including insider threats. The center has also conducted field trips for students to attend events related to the human element in vulnerable facilities. Additionally, in August 2017, the Center organized a panel discussion on issues related “Nuclear Competition in the twentyfirst century” in collaboration with the Stimson Center, Washington, D.C. (Panel Discussion 2017). Academic institutions can play an important role in engaging the older workforce for delivering and sharing their expert knowledge in nuclear studies. This might also mitigate the possibilities of the older workforce from being manipulated by people with malicious intent.
96
R. Kazi
10.8 Global Center for Nuclear Energy Partnership (GCNEP) The GCNEP is a nuclear center of excellence proposed by India after the completion of the Nuclear Security Summit process. The GCNEP “aims to continue strengthening the security of its nuclear power plants and nuclear materials … together with the development of human resources in the field of nuclear energy” (World Nuclear News 2014). The GCNEP is “designed to be a state-of-the-art training centre based upon international collaboration with the IAEA and other interested foreign partners” (Global Centre for Nuclear Energy Partnership 2020). It symbolizes India’s “commitment to national and international fraternity to forge global partnership for development of technologies and processes which will promote large-scale yet sustainable, safe and secure exploitation of nuclear energy” (Sinha 2014). The Center houses five schools to conduct research into advanced nuclear energy systems, nuclear security, and radiological safety. These schools include: • • • • •
Advanced Nuclear Energy System Studies; Nuclear Security Studies; Nuclear Material Characterization Studies; Radiological Safety Studies; and Studies on Applications of Radioisotopes and Radiation Technologies.
The GCNEP provides a working platform that exhibits India’s efforts toward nuclear safety, security, and sophisticated nuclear and radiation technologies. It fosters and encourages research by Indian and visiting international scientists; training of Indian and international participants; and hosting international conferences, workshops, and group discussions by experts on topical issues. This is very important as it helps in showcasing what India is doing to enhance its nuclear security. It also helps in building a domestic narrative distinct from Western sources about India’s nuclear safety-security culture. The Center designs and conducts nuclear security courses in collaboration with like-minded countries and the IAEA (Nuclear Security Summit 2014). The GCNEP has a “dedicated Outreach Program Cell that vigorously exhibits India’s technological advancement in several areas like the physical protection of nuclear material and nuclear facilities, prevention of and response to radiological threats, nuclear material control and accounting practices, and protective measures against insider threats” (Kazi 2017). At the Nuclear Industry Summit 2016 Expo, the GCNEP and its industry partner Electronics Corporation of India Limited (ECIL) demonstrated India’s efforts toward global nuclear security through a “display of programs, technologies, and products in the areas of nuclear security, radiological safety, advanced nuclear energy systems, and safeguards” (GCNEP 2016). The GCNEP is reaching out to a range of target communities using domain-specific training programs, as well as orientation programs for students to promote science among young people.3 The GCNEP actively organizes and coordinates programs that 3
Several children from middle school standard are regularly invited from specific classes spanning over almost a week. The course curriculum pertains to issues on nuclear and radiological security.
10 Relevance of Human Reliability Program: The Role of Academic …
97
assist in capacity building in India to build a dedicated cadre of competent professionals to combat nuclear and radiological threats. The GCNEP frequently coordinates training programs and workshops on crucial issues like the physical protection of nuclear material and nuclear facilities, the design and evaluation of physical protection for nuclear material and nuclear facilities, preventive, and protective measures against insider threat, medical response to radiation incidents, and nuclear forensics. The objective is to enhance coordination of efforts at the national, sub-regional, regional, and international levels.
10.9 Conclusion A lot of information about the nuclear industry is wrapped under secrecy for reasons of national interest. Human reliability assessment, or model development, demands relevant information solicited only from experts. Unfortunately, there will be intrinsic prejudices within the elicited data, provided the reporting is done impartially. To overcome these limitations, it is imperative that the industry open itself up to academia and provide data so that models built can be validated, at least under some conditions, if not all. Lack of clarity within the public domain only strengthens the belief that there is probably nothing substantial being done to combat safety and security factors in hazardous units. Hence, it is important that some elements of transparency are maintained without compromising national interests to assure people effective measures are in place to ensure safety and security in nuclear power plants. To achieve this objective, the onus will lie upon the “agencies working in this domain must ensure the right and clear information reaches the stakeholders” (Yadav 2020). Stating concerns, Prof. N. K. Joshi, Former Professor and Head of the Department of Nuclear Science and Technology at Mody University of Science and Technology in Lakshmangarh (Sikar) and Former Senior Scientist at BARC opine: probability safety assessment (PSA) is used to visualize the risk level of nuclear power plants and human reliability analysis is to provide human error probabilities for safety critical task to support PSA. Human integrity analysis may play a major role in nuclear security issues. The integration of all these assessments techniques have to be made for nuclear industry. The control rooms in advanced nuclear power plants are changing from analog to digital control system. Thus operation monitoring model is changing from the knowledge driven to that of data driven and thus require new analysis models to properly assess the impact and risk of the digitization of Nuclear Power Plants (Joshi 2020).
Joshi strongly believes that “the success of Nuclear Industry requires skilled and qualified manpower. Besides DAE and its allied institutions, at present only 4/5 universities are engaged in human resource development for nuclear science and technology in India. A course related to risk analysis of nuclear facilities should be The objective is to spread awareness about sensitive nuclear materials from a very early stage. The author interacted with various school children attending the courses on nuclear and radiological security held by GCNEP in Anushaktinagar, Mumbai in 2016.
98
R. Kazi
the part of curriculum to familiarize the students with concepts such as deterministic safety analysis, probabilistic safety assessment and human reliability analysis” (Joshi 2020). Human reliability is a major concern in issues related to nuclear safety and security. Nuclear safety is ensured by human actions that are consistent, appropriate, and correct. Nuclear security relies heavily on the trustworthiness, honesty, and integrity of individuals not to perform malicious act. Human errors and misconduct lead to accidents or incidents and sabotage in nuclear facilities and thus must be minimized or prevented.
References About Us, Amity Institute of Nuclear Science and Technology, Amity University, [Online]. http:// ainst.amity.edu/. Accessed 12 July 2020 Aided Institutions and Other Organizations of DAE (2020) Department of Atomic Energy, Government of India. [Online]. https://dae.gov.in/node/81. Accessed 12 July 2020 Amano Y (2015) The Fukushima Daiichi accident: report by the director general. [Online]. https:// www-pub.iaea.org/mtcd/publications/pdf/pub1710-reportbythedg-web.pdf. Accessed 13 July 2020 Baley-Downs S (1986) Human reliability program: Components and effects. In: 27. Annual meeting of the Institute of Nuclear Materials Management, New Orleans, Louisiana BBC News (2014) US nuclear launch officers suspended for ‘cheating’. [Online]. https://www.bbc. com/news/world-us-canada-25753040. Accessed 14 July 2020 BBC News (2017) Germanwings crash: what happened in the final 30 minutes. [Online]. https:// www.bbc.com/news/world-europe-32072218. Accessed 12 July 2020 Bhattacharjee S (2020) Visakhapatnam gas leak claims 11 lives; over 350 hospitalised. [Online]. https://www.thehindu.com/news/cities/Visakhapatnam/visakhapatnam-chemical-plant-gasleak-claims-several-lives-scores-hospitalised/article31523456.ece. Accessed 13 July 2020 Cantelon PL, Williams RC (1980) Crisis contained: the department of energy at three mile island, a history. US Department of Energy, Washington, D.C. Collins S, McCombie S (2012) Stuxnet: the emergence of a new cyber weapon and its implications. J Polic Intell Count Terror 7(1):80–91 Concept Note (2019) Training course on nuclear security for scientists, technicians and engineers, “Training Course on Nuclear Safety & Security” held at Pandit Deendayal Petroleum University (PDPU), Gandhinagar, India ‘Confessions’ of Group Planning Jordan Chemical Attack (2004) BBC monitoring international reports Country Reports on Terrorism (2020) US State Department, December 2021 at https://www.state. gov/wpcontent/uploads/2021/07/Country_Reports_on_Terrorism_2020.pdf, p 3. Accessed 9 August 2023 Das D (2019) An Indian nuclear power plant suffered a cyberattack. Here’s what you need to know. The Washington Post Dasgupta S, Gupta A (2011) A nuclear accident anywhere is an accident everywhere: ElBaradei. [Online]. https://www.businesstoday.in/specials/india-today-conclave-2011/story/ mohamed-elbaradei-japan-nuclear-egypt-iaea-18513-2011-03-29. Accessed 12 July 2020 Dhillon B (2019) Safety, reliability, human factors, and human error in nuclear power plants. CRC Press, Boca Raton, Florida Distinctive Characteristics of the Institute, Homi Bhabha National Institute, [Online]. http://www. hbni.ac.in/about/hbnidc.html. Accessed 12 July 2020
10 Relevance of Human Reliability Program: The Role of Academic …
99
van Erp JB (2002) Safety culture and the accident at three mile island. [Online]. https://inis.iaea. org/collection/NCLCollectionStore/_Public/34/007/34007188.pdf?r=1&r=1. Accessed 13 July 2020 GCNEP (2016) Nuclear industry summit 2016 Expo. In: Nuclear industry summit 2016. Washington, DC Global Centre for Nuclear Energy Partnership, About GCNEP, [Online]. http://www.gcnep.gov.in/ about/about.html. Accessed 13 July 2020 Gorka S, Sullivan R (2004) Jordanian counterterrorist unit thwarts chemical bomb attack. Jane’s Intell Rev 10(1) Guenther R, Lowenthal M, Nagappa R, Mancheri N (2013) India-United States cooperation on global security: summary of a workshop on technical aspects of civilian nuclear materials security. [Online]. https://www.nap.edu/catalog/18412/india-united-states-cooperation-on-glo bal-security-summary-of-a Higgins J, Weaver P, Fitch J, Johnson B, Pearl R (2013) Implementation of a Personnel reliability program as a facilitator of biosafety and biosecurity culture in BSL-3 and BSL-4 laboratories. Biosecur Bioterror Biodefense Strat Pract Sci 11(2):130–137 Joshi NK (2020) Interviewee, email interview. [Interview] Kazi R (2013) Nuclear terrorism: the new terror of the 21st century. Institute for Defence Studies and Analyses, New Delhi, India Kazi R (2017) Global centre for nuclear energy partnership: India’s contribution to strengthening nuclear security. Strat Anal 41(2):190–196 Meshkati N (1991) Human factors in large-scale technological systems’ accidents: three Mile Island, Bhopal, Chernobyl. Ind Cris Q 5(2):133–154 Meshkati N (2007) Lessons of the Chernobyl nuclear accident for sustainable energy generation: creation of the safety culture in nuclear power plants around the world. Energy Sources Part A Recover Util Environ Eff 29(9):807–815 Mishra S (2020) Interviewee, email interview. [Interview] Moray NP, Huey B (1988) Human factors research and nuclear safety. National Academy Press, Washington, D.C. Munipov V (1992) Chernobyl operators: criminals or victims? Appl Ergon 23(5):337–342 National Intelligence Council (2004) Mapping the global future: report on the national intelligence council’s 2020 project. [Online]. https://www.dni.gov/files/documents/Global%20Trends_Map ping%20the%20Global%20Future%202020%20Project.pdf. Accessed 14 July 2020 National Research Council (2009) Committee on laboratory security and personnel reliability assurance systems for laboratories conducting research on biological select agents and toxins, Issues related to personnel reliability. In: Responsible research with biological select agents and toxins. The National Academies Press, Washington, D.C., pp 91–121 Nuclear Power Corporation of India Limited (2019) Press release. [Online]. https://akm-img-ain.tosshub.com/indiatoday/images/mediamanager/kudankulam-plant-malware-press-release. JPEG?K._..VyDzZMcsWi2pa8tifAzW9h79RVD?imbypass=true. Accessed 12 July 2020 Nuclear Security Summit 2014: National Progress Report India [Online]. https://www.nss2014. com/sites/default/files/documents/india.pdf. Accessed 13 July 2020 Panel Discussion (2017) Panel discussion on nuclear competition in the 21st century organised by Nelson Mandela Centre for peace and conflict resolution, Washington, DC Philippart M (2018) Human reliability analysis methods and tools. In: Space safety and human performance. Oxford, UK, Butterworth-Heinemann, pp 501–568 Prasad S (2020) Six workers killed, 17 injured in boiler blast at thermal power plant in Tamil Nadu’s Neyveli. [Online]. https://www.thehindu.com/news/national/tamil-nadu/boiler-blast-innlcil-in-neyveli/article31960462.ece. Accessed 13 July 2020 Sanger DE (2012) Obama order sped up wave of cyberattacks against Iran. The New York Times Sinha R (2014) Message by chairman AEC and secretary DAE, Global Centre for Nuclear Energy Partnership Annual Report 2013–14. [Online]. http://www.gcnep.gov.in/downloads/GCNEP% 20AR%202013-14.pdf. Accessed 13 July 2020
100
R. Kazi
Smith PJ (2008) Terrorism in the year 2020: examining the ideational, functional and geopolitical trends that will shape terrorism in the twenty-first century. Dyn Asymmetric Confl 1(1):48–65 Sutton I (2014) Formal safety analysis. In: Offshore safety management: implementing a SEMS program. Oxford, UK, William Andrew, pp 267–317 The International Atomic Energy (IAEA) (2020) IAEA INCIDENT AND TRAFFICKING DATABASE (ITDB): incidents of nuclear and other radioactive material out of regulatory control, 2020 Fact Sheet. [Online]. https://www.iaea.org/sites/default/files/20/02/itdb-factsheet2020.pdf. Accessed 14 July 2020 The International Atomic Energy Agency (IAEA) (1986) Summary report on the post-accident review meeting on the Chernobyl accident. International Atomic Energy Agency (IAEA), Vienna, Austria World Nuclear News (2014) Indian research centre takes shape. [Online]. https://www.world-nuc lear-news.org/NN-Indian-research-centre-takes-shape-0301144.html. Accessed 13 July 2020 Yadav A (2020) Interviewee, Email interview. [Interview]
Chapter 11
Steps Toward a Human Reliability Engagement with a Grounding in Human Factors Vivek Kant
and Sanjram Premjit Khanganba
11.1 Introduction The aim of this chapter is to present avenues for human reliability engagement with a strong grounding in the discipline of human factors. Whenever we introduce the research area of human factors in casual conversation, we often receive a wide range of comments. On the one hand, people tell us that they did not know that one could study these subjects at a university level; on the other hand, we are often told with a bemused and condescending expression that those aspects are obvious and we must be intellectually inferior in research to be studying this topic! In the wide range of replies, a few merit close attention. The engineers based in academia relate “human factors” to aspects such as memory and cognition. Reciprocally, engineers based in industry, nonchalantly, start by relating human factors to “characteristics” such as honesty and dishonesty. Mid-level managers in the industry immediately connect human factors to reliability on the job, motivation, and other “human characteristics,” such as age, demographics, and other socioeconomic factors. The only set of professionals in India who quickly recognize our field of study as a discipline in its own right are members of the research community in safety–critical sectors such as, nuclear, chemical, aviation, among others. These safety–critical-based researchers often recognize that “human factors” is a discipline unto itself and fits in with other branches of engineering wherever humans are involved in technical systems.
V. Kant (B) Department of Design, Indian Institute of Technology Kanpur, Kalyanpur, India e-mail: [email protected] S. P. Khanganba Human Factors and Applied Cognition Lab, Indian Institute of Technology Indore, Indore, Madhya Pradesh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. S. Chirayath and M. Sai Baba (eds.), Human Reliability Programs in Industries of National Importance for Safety and Security, Risk, Reliability and Safety Engineering, https://doi.org/10.1007/978-981-99-5005-8_11
101
102
V. Kant and S. P. Khanganba
The above minor narrative about our experiences was to highlight that a lacuna exists currently in the existing understanding of “human factors” as a discipline and its applicability to support industries of national importance. Setting out an important journey to understand the abovementioned lacuna in the current context of the Indian scenario presents an interesting quandary. Nevertheless, those who have identified their research domains in the field see it as the beginning of a growth within Indian academia driven by a research-practitioner model of addressing real-world challenges. The issue of human reliability engagement and its strong grounding in human factors itself highlights this growth and crucial viewpoint of holistic approach that we, as authors of this chapter, consider should become a norm. As large-scale technology-based transformation knocks on the doorsteps of industrial establishments, as authors of this chapter, we believe that stakeholders should endeavor toward triggering a movement of human-centric systems approaches in human reliability engagement and beyond. Toward this end, the chapter proposes that human reliability must be comprehended in terms of a broader understanding of the role of the human; furthermore, this basis of human reliability approach rests very strongly on human factors as a discipline. As a result, the rest of this chapter is aimed toward developing the concept of “human factors” in detail. Section 11.1 highlights Human Factors (HF) as a discipline; Sect. 11.2 highlights its current manifestations in India. Section 11.3 presents the need for addressing human reliability as a conjoined phenomenon in systemic terms with a basis in HF. Section 11.4 presents the need to incorporate a proper understanding of human factors in the government, industry, and citizens. Section 11.5 concludes with a way forward to systematically incorporate HF-based thinking and understanding in a variety of areas.
11.2 Human Factors as a Discipline In the past century, the study of human factors has become a formal discipline in industrialized countries on both sides of the Atlantic. The notion of incorporating the human with various technologies of designing for the human has existed since time immemorial, ranging from architectural design to human-powered tool design. However, a major change that led to stress on human factors in the last two centuries is due to industrialization. With the growth and development of new modes of industrial power in the nineteenth century (viz., steam and electrical), there was a growth in the factory system and a concomitant shift in the conceptualization of the human and the machine. A major challenge that came about was how to incorporate the worker as part of the industrial setup. This question of worker involvement went through multiple tumultuous engagements that had an enormous impact on how the workers were treated in capitalistic milieux. While India suffered from suppression by colonial forces for a long stretch of time, missing out on industrial growth, one major trend at the beginning of the nineteenth century was the growth of scientific management (Haber 1964; Maier 1970).
11 Steps Toward a Human Reliability Engagement with a Grounding …
103
Scientific management (or Taylorism) was put forward by Frederick W. Taylor to improve the efficiency of factory production. The quest for efficiency was championed by an American couple, the Gilbreths. The Gilbreths played an important role in decreasing “waste” motions in work activities with their time-and-motion studies. With the growth of the factory system there was a growth in the ways in which workers could be engaged more holistically. This approach led to broader methods deriving from industrial psychology and sociology. Meanwhile, psychology was also growing, quite rapidly, as a discipline in America. Psychologists were getting involved in a variety of settings, ranging from worker management to the study of workers and their interactions with technological equipment (Capshew 1999). Similarly, there was a growth of various studies of the human in the context of work in Europe, especially in industrializing nations, such as Germany (Kuorinka 2000). Among many such approaches that came forth in the context of industrialization and development in the early twentieth century, a few theoretical and methodological approaches, such as Activity Theory, still provide Human Factors researchers with renewed insights into present-day technological challenges (Norros 2004). World War II provided a major impetus to human factors as a discipline in America (Hughes 2004). During the war, a number of academics interacted with the military. This engagement worked even after the war due to the growth of the militaryindustrial-academic complex (Wolfe 2012). Within this engagement, the military would present problems that the academics could work on and the industry could implement for national importance. Thus, in America, a huge emphasis for human factors was in relation to the military. Apart from the military base, the emphasis later expanded towards industrial and consumer product-based human factors in the 1960 and 1970s. In the 1980s, there was a proliferation of computers as a technology and hence a growth in computer-based human factors (Helander 1997). In the 1990s, the fields of cognitive and organizational human factors came into prominence. The specific focus was on developing human factors for complex technological systems and relating to broader issues of safety (Corrigan et al. 2019; Vicente 2006). In the new millennium, the focus has been opened for technologies related to the NBIC convergence (Nanotechnology, Biotechnology, Information Technology, and Cognitive Science-related technologies) (Karwowski 2005, 2006, 2008; Kolbachev and Kolbacheva 2019). As this short history shows, human factors as a discipline have grown beyond the focus on any one type of technology. It aims toward human–technology interaction as a whole with the focus of developing theories and frameworks for the safety, productivity, and well-being of people. Currently, the whole scope of human factors as a discipline can be divided into three broad areas relating to mind-based, body-based, and context-based aspects of human factors. Under this broad banner, there are a number of avenues where research is actively pursued. While there are multiple foci, human factors as a discipline recognizes the following core features that are the hallmark of the discipline: system-orientedness, analysis, design, safety, productivity, and well-being (Dul et al. 2012). In addition, human factors professionals have an impact in multiple ways. These include the design of safe and usable products, process-control interfaces, workspace layout, and material handling systems. Along with the design aspects,
104
V. Kant and S. P. Khanganba
System-orientedness, Analysis, Design, Safety, Productivity, and Well-being Fig. 11.1 Key ideas in human factors as a discipline
the consideration of human factors also caters to training, jobs, and personnel selection and placement. Finally, occupational health and safety play a major role in human factors. All these aspects are undergirded by a strong emphasis on the study of human knowing and acting in a variety of contexts. In other words, the mind, body, behavior, and context are taken together for human-centered systems design. Such a broad basis of human factors is important for developing human reliability engagement in industries (Fig. 11.1). In order to understand the full scope of human factors as a discipline we will briefly review examples from different technology areas. Specifically, we will briefly look at the technical groups of a premier professional organization—Human Factors and Ergonomics Society1 (HFES). Currently, HFES has 24 technical groups. A brief description of these various technical groups will reveal the breadth of human factors as a discipline. The first technical group of HFES is Aerospace Systems. In this group, the emphasis is on the application of HFE for the aerospace sector. This would involve design and maintenance to support an HFE focus in both civilian and military systems. For example, designers will be involved in the design of cockpits of the aircraft, or they will be involved in the design of seating and floor space. Signage design is another challenge for the HFE designers, such that it supports movement in normal conditions, as well as, during emergency evacuations.
1
More details can be found out about the technical groups at www.hfes.org.
11 Steps Toward a Human Reliability Engagement with a Grounding …
105
A second group is that of Aging. In this technical group, the emphasis is meeting the needs for an older segment of the population in a variety of settings. This would include issues related to physical mobility and agility. This sector also includes issues related to the prevention of falls and injuries. In addition, it would also address support for older people in terms of cognitive capabilities when suffering from conditions such as dementia. A major issue of geriatric HFE is dealing with the overall wellbeing of people in this upper segment of demographics and supporting them such that the quality of life is improved. A third group is that of Augmented Cognition. In this group, the main emphasis is to acquire real-time data from physiological and neurological sources when the human is involved in interactions with computing systems. This data is then used for the development of applications to aid the human. The essential aspects of this approach of augmented cognition is to support the human by orchestrating a closed loop consisting of the human and the computer. In addition, the functioning of the closed loop works on the basis of real-time monitoring. This enables the adaptive capabilities of the human–computer system based on the rapidly changing world, as well as, the dynamically changing cognitive states of the human. Therefore, in this manner, the individual humans are catered to personally through an on-thego dynamically self-adapting technology. Augmented cognition has applications in areas where individual performance finds bottlenecks, such as the case of attention allocation in times of information explosion. Imagine a pair of glasses that would help Indian soldiers moving in enemy territories to interact more effectively with their environments through providing added information about places to hide, places to replenish resources, among many other capabilities, that are supported real time based on cognitive states of the individual. A fourth technical group of HFES is the Cognitive Engineering and DecisionMaking group. The focus of this group is to comprehend decision-making in complex technological systems. These include both descriptive and prescriptive models, as well as formative work analysis approaches for understanding how work is actually conducted in these complex setups. Studies in this sector involve those of operators in control rooms, or even workers involved in fieldwork in complex interactive systems. Researchers and practitioners in this field are often involved in designing human– machine interaction in terms of control panels of safety–critical systems such as nuclear power plants, oil and gas plants, among others. The design of these control panels should be such that they support the operators’ reasoning and mental strategies for problem-solving. A fifth technical group focuses on all aspects of human-to-human communications. These aspects include computer-mediated communications in which people use computers to communicate in a variety of settings and work contexts. For example, if you were a customer representative in a large technology firm and your job was to help clients to solve their computer problems, how could this task be supported in terms of making the communications more effective, productive, and satisfying for the customer? The focus of this group is on the whole range of communications. These include information services as well as other telecommunication and broadband application in a variety of sectors ranging from tele-education to telemedicine.
106
V. Kant and S. P. Khanganba
The sixth technical group of HFES deals with the human factors involved in the design of computer systems. These would include user-centered design of hardware and software as well as the supporting documentation and work activities. The members of this group also take into account challenges in design, evaluation, and deployment of information technologies, which may include products and services in this sector. Oftentimes, they emphasize a systems approach toward introducing new information technologies in organizations and new work sectors. These computers should be usable, efficacious, and support the productivity of the IT workers. In turn, if employed for leisure purposes, the computers should be desirable, fun, and helpful in improving the overall quality of life of their users. Interestingly, HFES also has a technical group devoted to the Internet. This group addresses the human factors associated with the web. This would include various kinds of styles and standards, as well as web applications. A major thrust area is that of accessibility and making the web usable by a number of people who may be sensorially disadvantaged. For example, how will you design a webpage for people who suffer from color blindness? In addition to these issues, this group also addresses challenges related to the psychological and sociological aspects of being on the internet. Have you ever been on an instant messaging application and had discussions with online friends? How do you communicate with friends? How do emojis become a powerful means of communication along with the text? These and many other aspects of communication with distributed networks become very common in the discussion of the technical group on the internet. Moving away from computers, Product Design forms another technical group. In this group, the focus is solely on consumer products and ways in which consumers can be addressed in terms of user research, human factors approach, as well as industrial design. At the same time, this background knowledge of the users is to be comprehended to support product design and development to make products usable, desirable, as well as safe in usage. A special technical group focuses on HFE Education. This group addresses education needs for upcoming HFE professionals, undergraduate and graduate curriculum, accreditation, and resources. Further, the focus on continuing education for HFE practitioners also, oftentimes, becomes a key focus for this technical group. Another emphasis provided by this group is on developing tools and techniques to support education for the next generation of HFE professionals. The tenth technical group of HFES is environmental design. This group tries to understand how the environment can be designed to be conducive toward human behavior. Can we design footpaths in India that will enable pedestrians to walk more rather than relying on vehicles? Can we design the interior of airports to support quick and safe evacuation during times of emergencies? HFE practitioners in this area oftentimes deal with the built environment. Would it surprise you if I ask that crime can be prevented by proper design and planning of the built environment? These and other questions fall under the purview of environmental design. The eleventh technical group involves dealing with forensics and is titled Forensics Professionals. Oftentimes, HFE professionals can be involved in the pursuit
11 Steps Toward a Human Reliability Engagement with a Grounding …
107
of accountability as well as establishing appropriate “standards of care” in regulatory and judicial cases involving products, systems, or services. In these cases, HFE professionals apply expertise based on their knowledge of theory and practice to issues that are being addressed in legal circumstances. The twelfth technical group deals with healthcare. Under the banner of healthcare, a diverse set of HFE professionals are involved. These could include medical professionals, occupational health workers, among many others. In this sector, a lot of designers are involved who are interested in the design of complex medical equipment for the hospital staff, which provide total hospital systems suites of programs, and contain integrated set of machines such as MRI, CAT-Scan, among many others. Therefore, healthcare as a group is broad and involves a number of different roles for HFE professionals. The thirteenth technical group has a rather interesting focus. It deals with individual differences in human performance. We know that people may perform similarly, but they are not the same, i.e., differences exist between individuals and their performance. When we design experiences for particular people, we must take into account these idiosyncrasies of behavior. Thus, the focus of this group will be quite helpful in those circumstances. Safety is another important technical group in HFE. This group’s main emphasis is to use HFE principles for ensuring safety for all segments of the population, as well as for products, services, and systems. Safety involves not only the design of products, but in terms of large-scale organizations, also the design of organizational elements. Successful and proactive removal of accidents means that safety has to be addressed not only in terms of personal protective equipment for individual safety, but also as systems safety in terms of heterogenous means of interlocks that prevent overall damage. The fifteenth technical group focuses on systems development. Systems development issues related to HFE integration can occur many times as well as iteratively over a period of time throughout the development process. HFE in systems development also involves a number of tools for assessment and prediction as well as specifying user requirements. In addition to these, there is the need for creating principles and tools for human–machine interactions, the effects of new technology integration, among many other challenges associated with systems development. The main emphasis of this technical group is on total human factors integration in all stages of systems development. While we have discussed these larger groups involving grown-ups, a number of HFE professionals work in the area of children’s issues. Imagine as a designer, you were to design a school bag such that the weight on the back of the child is optimally distributed. How would you go about this problem, while designing for the child? Therefore, the effective aim is to address all issues surrounding the design and incorporation of HFE for children. The seventeenth technical group is comprised of surface transportation challenges. The main aim of this group is to address all aspects of HFE for transportation which includes passenger rail, mass transit, pedestrian, and bicycle traffic, in any form, including traffic services and designs of intelligent transport systems.
108
V. Kant and S. P. Khanganba
Along with the focus on transportation, a key area of HFE also addresses issues related to training. For example, if you were employed in the defense sector, and you were asked to train soldiers for battle-readiness, how would you, in conjunction with the military instructor, devise a training program that takes into account the capabilities and limitations of the new recruits and at the same time sets them on a path where they successfully move towards becoming experts? Such issues as well as other general issues related to training are addressed by HFE professionals in this sector. The next technical group is that of test and evaluation. This group’s primary emphasis is on all aspects of testing and evaluation of diverse kinds of products, services, and systems from an HFE perspective. Therefore, this group’s professionals may study the challenges associated with evaluation of system performance as well as testing various parts and subsystems. Oftentimes, this group devises metrics to evaluate system performance in both normal and adverse conditions. The twentieth and twenty-first technical groups are the HFES Perception and Performance Modeling and the Human Performance Modeling group. The perception and performance group focuses on the links between human perception and its related performance in a variety of situations and under different sets of conditions. In addition, this group also explores systemic performance as well as the human system interaction. The main emphasis of this group is to focus on intelligence, autonomy, and sense modalities in systems design. Using a multi-modal approach to human knowing and acting, this group tries to optimize human–system interaction. The Human Performance Modeling technical group focuses on human abilities and tries to develop models for the same. In this realm, models can be built for perception, attention, memory, and action as well as control of these various phenomenon. The idea is that these models can be used to comprehend how humans use task-related information to produce certain actions. Therefore, the emphasis is on formulating and adopting new models, as well as cognitive control of these various phenomena. The idea is that these models can be used to comprehend how humans use task-related information to produce certain actions. Therefore, the emphasis is on formulating and adapting new models as well as assessing the success in the prediction of these models. Along with the individual, HFE caters to a number of issues related to largescale extra-personal issues through the areas of macroergonomics and Occupational Ergonomics. The emphasis of Occupational Ergonomics is to address workplace issues, which includes injuries. Any kind of challenges arising out from occupations, such as falls and slips, are often addressed by this group. The emphasis is on designing the task such that muscoskeletal disorders are averted. Along with occupational ergonomics’ focus on muscoskeletal disorders, macroergonomics adopts a macropoint of view on HFE. This group takes an organizational perspective and addresses issues such as jobs and work that are typically not addressed by groups focusing on the individuals. The main focus of this group is to take into account all aspects related to the psychosocial and cultural factors related to the interaction between human and technology. Therefore, the scope of this group covers people, jobs, workstations, organizations, as well as management-related systems. In this
11 Steps Toward a Human Reliability Engagement with a Grounding …
109
group, the emphasis is on the industry and challenges related to improvement of safety and productivity as well as the overall quality. This includes the activities done by workers on the assembly line in an automobile plant, as well as the ones who are handling dangerous chemicals. All these various facets of industrial work are addressed by this technical group. Thus, this group provides a focus for total improvement of the human and technology in terms of productivity, quality, health, and safety along with the quality of worker’s lives. In short, the focus of macroergonomics is not just the individual worker and technology but the overall work system. Finally, the twenty-fourth and last technical group addresses virtual environments and all their applications. In this group, the aim is to address HFE challenges in using these virtual environments, or augmented reality applications. Imagine that you were to build a simulator to train young doctors in surgery. How will you design such a virtual training environment? Or better still, imagine you are playing a video game in virtual reality with your friends. Thus, the virtual environment group focuses on the use of HFE theory, principles, and data to design vocational and recreational environments to support human performance as well as safety and leisure. All of these technical groups provide the breadth of human factors as a discipline globally. More importantly, they promise that a strong base in human factors can be used to develop a human reliability program. In the next subsection, we situate human factors in the Indian context by showcasing the state-of-the-art in India.
11.3 What is the Current State-of-Art in India? Up till now, the breadth of human factors as a discipline has been introduced at the global level. Here, we introduce Indian efforts in this sector and its associated linkages. Before we begin, the term ergonomics will have to be introduced. This term was not introduced till now because of its existing connotation in the Indian context. In most cases, whenever, I mention the term casually in conversations, people often use the term to relate to products that are well-designed. Often, people use examples of products such as chairs or seat-design of cars, in other words, predominantly bodybased examples. Even though the broader dimensions of ergonomics, such as cognitive, physical, and organizational, are present globally, this widespread ergonomics is not manifested in India currently. Ergonomics in India has primarily remained body-based (physical, physiological, and/or biomechanical). Apart from this reason, another reason for not introducing ergonomics earlier also exists. Notably, some researchers argue that the scope of human factors cannot be equated to the scope of ergonomics. In other words, human factors and ergonomics are not mutually interchangeable. Not surprisingly, the professional body, HFES, which was introduced earlier, keeps human factors together with ergonomics without interchanging them (viz. Human Factors and Ergonomics Society). In India, there has been a prominent emphasis on physiological and physicalbased ergonomics. Starting from the 1950s at the University of Calcutta (Growth of Ergonomics in India, https://www.ise.org.in/fountainhead.shtml), these efforts were
110
V. Kant and S. P. Khanganba
initiated at different academic institutions dealing with human labor as well as design. Currently, physiological-based ergonomics is well-entrenched in Indian academic institutions dealing with human labor and design. In fact, it is so well-entrenched that in public perception, physical, and physiological-based ergonomics have become a stand-in for the discipline of ergonomics as a whole. This could be attributed to the slow growth of offshoots of psychological sciences in India due to predominant orientation toward clinical practices (closely associated with the field of psychiatry). This state of affairs has been the case irrespective of various formal subfields flourishing in America and Europe. Regardless of their individual approaches, they ascribe to a common disciplinary approach that is currently not completely addressed in India (Dul et al. 2012). A second stream of activity that is closely aligned with human factors is that of human reliability and safety. With the advent of large-scale technological sectors, such as nuclear, chemical, among others, there is a need to understand the role of human errors and their mitigation. In these areas, researchers use a variety of quantitative models, such as probabilistic risk assessment, to determine the role of the human in technological setups. Along with safety–critical industries, defense is a mission-critical sector that recognizes the role of human factors and instantiates it through a physiological ergonomics-based setup. The above covers a few broad sketches of how human factors, and their closely aligned partner ergonomics, have been established in India and are depicted in its current form. The next section highlights the need to address human reliability with a broader basis of human factors and systems approach. In other words, to address the human as a systems component and recognize the system to be a broader construct.
11.4 Human Reliability as a Sociotechnical Problem Over the centuries, we have seen the skills and knowledge of humankind evolve to meet their changing requirements; this is reflected in the form of how new tools and techniques materialize to fulfill various utilities. In this process, much of what we once needed often tends to become obsolete and subsequently disappear (lost or changed) from the day-to-day repertoire of modern life. This inevitable process (sometimes uncertain shift) is often intervened and mediated by technological advancement, and computers have been the most evolutionary and unrolling advancement in history as we move toward technology-based society. Science and technology have reached a level where we are seeking to create (machine) technologies capable of thinking/making choices independent of human operations (i.e., automation), as well as those seamlessly adapting to the users rather than users adapting to the systems. When we deploy these technologies, it is highly probable that we will see an even more rapid change in human knowledge, skills, and the way we envision ourselves as capable of moving further in technological advancement. In light of the basic orientation of human factors and their breadth as a discipline discussed above, this section emphasizes the need to understand the ways in which
11 Steps Toward a Human Reliability Engagement with a Grounding …
111
human factors can be used as a basis for a unified human reliability engagement. Specifically, if a program for human reliability is to be initiated, then what are the aspects it should really consider? As we had encountered earlier, colloquially, “human factors” is considered as factors of the human, such as motivation, skills, willingness, among other human attributes. As a result, any failure in the system was blamed on the human and termed “human error.” In the past two decades, significant research has been conducted to unpack the notion of “human error” and recognize that the human is not always to be blamed. Thus, the strategy is to ask how human factors play out in the technological context rather than how human factors belong to the human. Human factors researchers have highlighted that to significantly address the human in technological setups, there is a need for “joint optimization” of the human and machine in context. In other words, the human and the immediate technological setup will be encapsulated together as a basic unit of analysis. This analysis ensures that the “factors” can only be understood in terms of the immediate operational environment, which, in turn, is constrained by the supra-human factors, i.e., organizational and other related variables. With the growth in large-scale systems and the rapid convergence of technologies, the need for a consolidated view of the human is becoming apparent. Further, human factors for national industries should consider recognizing that humans are not just individuals but also are parts of teams and organizations. Therefore, any consideration of the human should be in terms of the entire gamut of human factors approaches, ranging from physical and cognitive basis to the large-scale macro basis. Figure 11.2 depicts a generalized view of any technological sector in terms of multiple layers of abstraction. At the lower levels, which are more concrete, the engineering and technical dimensions of these sectors are more prominent. As we go higher up the levels, the social dimension becomes more prominent. At those higher levels, organizational and governmental constraints are more visible and important. These higher levels act on lower levels by providing constraints on what is acceptable. Therefore, if the particular organization favors transparency and the well-being of the employees, then the engineering-based levels will have designs that properly accommodate these values. Further, at its central core, there is a need for recognizing the human and the immediate operational environment as a central construct that is delimited by largescale constraints (Fig. 11.3). Therefore, as Fig. 11.3 demonstrates, there is a need for recognizing that human factors depend on a variety of issues, ranging from government policies to management rules. In complex systems, these must be taken together in a unified manner. Therefore, any program for addressing human reliability and security with its basis in human factors will have to take into account humans in a multifaceted manner. This will involve considering humans as individuals, teams, and organizations at a variety of levels and their engagements. Thus, developing a human reliability engagement would mean involving appropriate design at the lower levels, management practices at the intermediate levels, and policies at the higher levels. Conducting interventions at any one level will not be enough for total human reliability engagement. At the same time, the interventions should ensure that the
112
V. Kant and S. P. Khanganba
Fig. 11.2 Human factors in terms of multiple layers of engagements (Kant 2016)
interventions at the various levels are mutually integrated. Oftentimes, this requires a change management program in the industry to ensure successful human factors integration across all levels depicted in Figs. 11.2 and 11.3.
11.5 Discussion and Conclusion: The Way Forward The main aim of this chapter was to provide directions for holistically engaging human factors to support a human reliability program for industries of national importance. Toward this end, the chapter started with the breadth and scope of human factors as a discipline. Based on this coverage, it highlighted the need to address technological sectors as sociotechnical constructs. In other words, humans need to be involved in different ways at multiple levels of abstraction, ranging from individuals to teams. The sociotechnical angle was necessary to recognize that people are involved in different ways in these technological sectors and these various instantiations should be cogently addressed. Thus, in order to build a comprehensive program for human reliability, technical sectors have to be treated as sociotechnical sectors with a strong foundation in human factors as a discipline. Based on the current scenario in India, there is a need for five interrelated directions for developing human factors:
11 Steps Toward a Human Reliability Engagement with a Grounding …
113
Fig. 11.3 Basic unit of Performance with immediate operational environment and long-term constraints (Kyriakidis et al. 2018)
(1) Need for awareness-building: beyond common-sense notions of human reliability and human factors. • As mentioned earlier, in one approach, human reliability is limited to either common-sensical notions of the person being reliable or unreliable. However, the notion of human reliability is developed in a more mature manner in the areas of Human Factors, Human Systems Engineering, Safety Science, among other disciplines that address human reliability as an important construct. Thus, there is a proper need for awareness building of the current state-of-art situation of human reliability in all industrial sector deemed nationally important. • Further, there are a few industry verticals, such as nuclear and chemical, that employ human factors and safety concepts and existing state-of-art insights. However, they are quite limited to their individual sectors and there is a need for further awareness building for cross-sectoral engagement. (2) Need for capacity building: undergraduate and master’s programs in Human Factors, Human Reliability, and Systems Engineering. • There is a need to develope undergraduate and master’s programs in human factors and human reliability that draw upon both engineering and cognitive/ behavioral/social sciences.
114
V. Kant and S. P. Khanganba
• Courses in reliability, safety, quality, and security should be introduced as a core subject in all branches of engineering that cater to the heavy industry sector. (3) Interdisciplinary research and focus groups: • There is a need for a problem-driven approach involving interdisciplinary groups both from engineering, management, as well as cognitive/behavioral/ social sciences. Currently, there is a lack of such focused groups in academia that specifically address problems deriving from industrial practice in human factors in its mature form as well as in human reliability. (4) Moving beyond training: towards successful design, support and integration of equipment, teams and organizations. • Simply training the personnel will not ensure human factors compatibility. Design is equally, if not more, important than training to achieve safer outcomes. Currently, the notion of design of equipment, teams, and organizations has not been brought together coherently to form a unified view on human factors and human reliability in the industry. To support human reliability programs, a conjoined understanding toward meaningful design, maintenance, testing, and integration is required. This conjoined understanding requires a phased intervention in industries and academia. (5) The need for a conjoined government-academia-industry interaction. • Finally, there is a need for a conjoined approach of government, industry, and academia for developing human reliability through adequate public policies to support a comprehensive program on design and training at all levels of the sociotechnical systems. The above are a few directions that may help build avenues for a human reliability engagement program with a strong grounding in the discipline of human factors. One may undeniably conclude that in India there is a need to develop the next generation of human factors professionals (involving various stakeholders collectively) who will deliver leadership in various roles (in industry, academia, and government policymaking). The fundamental understanding regarding the essence of the field is wherever humans work, that area warrants application of human factors knowledge. In case of any mishap, oftentimes the search is for the ones closest to the accident. Attributing accidents to actions of front-line operators is not only an oversimplification but also self-defeating and counterproductive. In reality, complex systems are characterized by various sub-components involving interactions between people, technologies, and their environment. In order to understand more about interacting with technology and supporting the safety, well-being, and productivity, there is a need for human reliability programs based on a strong grounding in human factors.
11 Steps Toward a Human Reliability Engagement with a Grounding …
115
References Capshew J (1999) Psychologists on the march: science, practice, and professional identity in America, 1929–1969. Cambridge University Press, Cambridge, UK Corrigan S, Kay A, Ryan M, Ward ME, Brazil B (2019) Human factors and safety culture: challenges and opportunities for the port environment. Saf Sci 119:252–265 Dul J, Bruder R, Buckle P, Carayon P, Falzon P, Marras W, Wilson J, van der Doelen B (2012) A strategy for human factors/ergonomics: developing the discipline and profession. Ergonomics 55(4):377–395 Haber S (1964) Efficiency and uplift: scientific management in the progressive era, 1890–1920. University of Chicago Press, Chicago, llinois Helander M (1997) Forty years of IEA: some reflections on the evolution of ergonomics. Ergonomics 40(10):952–961 Hughes T (2004) American genesis: a century of invention and technological enthusiasm, 1870– 1970. University of Chicago Press, Chicago, Illinois Indian Society of Ergonomics. Growth of Ergonomics in India. https://www.ise.org.in/fountainh ead.shtml. Accessed 25 Mar 2019 Kant V (2016) Cyber-physical systems as sociotechnical systems: a view towards human–technology interaction. Cyber-Phys Syst 2(1–4):75–109 Karwowski W (2005) Ergonomics and human factors: the paradigms for science, engineering, design, technology and management of human-compatible systems. Ergonomics 48(5):436–463 Karwowski W (2006) The discipline of ergonomics and human factors. In: Handbook of human factors and ergonomics, Hoboken, New Jersey, Wiley, pp 3–31 Karwowski W (2008) Building sustainable human-centered systems: a grand challenge for the human factors and ergonomics discipline in the conceptual age. In: Corporate sustainability as a challenge for comprehensive management. Physica-Verlag Heidelberg, Heidelberg, Germany, pp 117–128 Kolbachev E, Kolbacheva T (2019) Human factor and working out of NBIC technologies. In: Advances in manufacturing, production management and process control. Springer, New York, pp 179–190 Kuorinka I (2000) History of the international ergonomics association: the first quarter of a century. The IEA Press, Paris, France Kyriakidis M, Kant V, Amir S, Dang V (2018) Understanding human performance in sociotechnical systems—steps towards a generic framework. Saf Sci 107:202–215 Maier CS (1970) Between Taylorism and technocracy: European ideologies and the vision of industrial productivity in the 1920s. J Contemp Hist 5(2):27–61 Norros L (2004) acting under certainty: the core-task analysis in ecological study of work. VTT Publications, VTT, Finland Vicente K (2006) The human factor: revolutionizing the way people live with technology, New York. Routledge, New York Wolfe A (2012) Competing with the soviets: science, technology, and the state in cold war America, Baltimore. Johns Hopkins University Press, Maryland
Chapter 12
Human Reliability Analysis in PSA and Resilience Engineering: Issues and Concerns Vipul Garg, Mahendra Prasad, and Gopika Vinod
12.1 Introduction Probabilistic Safety Assessment (PSA) is an analytical technique used to estimate the risk emanating from the facility. PSA is widely used in the nuclear, chemical, petroleum, and aerospace industries to perform risk analysis (The International Atomic Energy Agency 2010; Cepin 2015). In the nuclear industry, PSA is performed at three levels, as described. Level 1 PSA starts with the identification of the events that may put the plant’s safety at risk. The effect of every such event on plant safety is then analyzed in detail through the Event Trees (ET). In the ET, all the safety systems deployed to mitigate the abnormal event are accounted for in the analysis. The availability/failure of the safety systems is further analyzed in detail at the component level using Fault Trees. The target of the Level 1 PSA is to calculate the Core Damage Frequency (CDF) (The International Atomic Energy Agency 2010). Similarly, the Level 2 PSA analysis begins with the plant’s core in a damaged condition. The ability of the containment safety systems to prevent the release of radioactivity outside the containment is the area of focus in Level 2 PSA. The target of Level 2 PSA is to estimate the frequency of early release of radioactivity. The evaluation of the early release frequency is done by developing the Containment Event Trees (CET). The availability or failure of the containment safety systems upon demand is evaluated using the Fault Trees. The Level 3 PSA aims to estimate the risk to the public, in terms of radioactive dose, given that the early release of the activity has occurred. V. Garg (B) · M. Prasad · G. Vinod Reactor Safety Division, Bhabha Atomic Research Centre (BARC), Mumbai, India e-mail: [email protected] Homi Bhabha National Institute, Anushaktinagar, Mumbai 400094, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. S. Chirayath and M. Sai Baba (eds.), Human Reliability Programs in Industries of National Importance for Safety and Security, Risk, Reliability and Safety Engineering, https://doi.org/10.1007/978-981-99-5005-8_12
117
118
V. Garg et al.
As seen above, the safety systems play a crucial role in mitigating the impact of the abnormal event on the plant and its safety. However, the successful operation of the plant and its safety systems depend on the health of its hardware, components, and structure, as well as the human operator in the control room or field area who actuates these systems. Thus, the PSA study is incomplete without accounting for the possibility of human operator failures that may aggravate the situation or prevent the safety systems to perform their intended function. Human Reliability Analysis (HRA) is an integral part of PSA that accounts for the human control room or field operator errors by estimating the Human Error Probability (HEP). This HEP value can then be plugged into the PSA models in the same manner as the safety system’s unavailability (Pyy 2000).
12.2 Human Reliability Analysis Methods Human Reliability Analysis methods have been around for three–four decades and, over this period of time, a variety of HRA methods have been developed. Nevertheless, they are generally classified into two categories: (1) First-Generation HRA Methods (2) Second-Generation HRA Methods
12.3 First-Generation HRA Methods These methods are based on a task-based approach. The action performed by the human operator to mitigate the abnormal event is broken down into tasks. Then, HEP is evaluated for every such task. Finally, an Operator Event Tree (OAT) is developed wherein all the tasks’ HEP are plugged in to arrive at the operator response HEP (Swain and Guttmann 1983). First-generation methods focus on the knowledge, skill, and rule base level of human action and are often criticized for failing to consider such things as the impact of context, organizational factors, and errors of commission. Some of the widely practised first-generation HRA methods are shown in Table 12.1. These methods are used by the US Nuclear Regulatory Commission (USNRC) and the International Atomic Energy Agency (IAEA).
12.4 Second-Generation HRA Method The second-generation HRA methods were basically developed to address and overcome the limitations of the first-generation HRA methods. The second-generation methods especially try to incorporate the relationship between the impact/influence
12 Human Reliability Analysis in PSA and Resilience Engineering: Issues …
119
Table 12.1 Some of the first-generation HRA methods (Garg et al. 2023) 1st generation HRA method
Reference document
THERP (A Technique for Human Error Rate Prediction)
NUREG/CR-1278 (USNRC)
ASEP (Accident Sequence Evaluation Program)
NUREG/CR-4772 (USNRC)
HCR (Human Cognitive Reliability)
IAEA TECDOC 592
SPAR-H (Standardized Plant Analysis Risk—HRA)
NUREG/CR-6883 (USNRC)
Table 12.2 Some of the second-generation HRA methods (Garg et al. 2023) 2nd generation HRA method
Reference document
ATHEANA (A Technique for Human Event Analysis), 2007
NUREG/CR-1880 (USNRC) (Forester et al. 2007)
MERMOS (Method for assessing the completion of operator’s action for safety), 1997
EDF (Not Available in public domain)
CREAM (Cognitive Reliability and Error Analysis Method), 1999
Available in open literature (Hollnagel 1998)
CAHR (Connectionism Assessment of Human Available through the author (Strater 2000) Reliability), 1997
of the environment, i.e., the context and the operator’s performance. Some of the second-generation techniques in practice are given in Table 12.2.
12.5 Role of Human Factors Data in HRA Human Factors (HF) data forms the backbone of the human reliability analysis. In order to evaluate the HEP, the context in which the actions are performed and the relationship of the context with operator’s performance must be known. This information is gathered and then understood using the HF or the Performance Shaping Factors (PSFs). Some of the sources for HF data are: (1) Operating experience; (2) Simulator data; and (3) Expert judgment. HF data from the operation experience is either unavailable or sparse, if available. Expert opinion is prone to subjectivity. A good alternative is to use the simulators (Prasad and Gaikwad 2015) to mimic the real plant environment for suitable scenarios (Boring 2009). Simulators can be categorized in the following ways: (a) Real Simulator or (b) Virtual Simulator
120
V. Garg et al.
12.6 Real Simulators Here, the actual working environment is simulated such that training and testing can be carried out in this real environment. This can be achieved by constructing a prototype of the actual system which can facilitate training and testing. Once set up, it is difficult to make any changes in a real simulator.
12.7 Virtual Simulators Instead of building an exact replica of the plant’s actual control room to mimic its functionality with high fidelity, the control room, along with its functionality, is developed in a Virtual Environment (VE) utilizing advanced animation and virtual 3D technology. The HF data for plant scenarios could be generated in this VE.
12.8 Case Study for HRA with Virtual Simulator The aim of the study was to simulate a scenario in the virtual environment and thereby collect the HF data. An in-house facility at the Bhabha Atomic Research Center (BARC), along with its control panel, is modeled in the virtual environment as shown in Fig. 12.1 (Garg et al. 2017). To begin with, the operators were trained to make them familiar with the VE and capable of performing the necessary actions in this environment. The VE was designed to allow the operator to perform field actions as well as actions through the control panel. The operator is replicated in the VE as an animated avatar. The actual
Fig. 12.1 Animated mimic of the control panel of the facility
12 Human Reliability Analysis in PSA and Resilience Engineering: Issues …
121
operator can control the movement of the avatar using the keyboard keys and mouse. Thereafter, experiments were conducted to gather the HF data. The target of the study was to utilize the HF data collected through the experiments to predict the HEP for the desired contexts using an existing robust HRA method that is relatively specific to the in-house facility simulated in VE.
12.9 Selection of a Suitable HRA Method A literature survey was conducted to identify the most widely practised HRA methods. Upon completion of the literature survey, the HRA methods mentioned in Tables 12.1 and 12.2 were found to be the widely practised methods. Among these identified HRA methods, SPAR-H was selected as the suitable candidate based on certain desirable features as well as the simplicity of the method. SPAR-H, developed by the USNRC, has the following features (Gertman et al. 2005): • Decomposes probability into contributions from diagnosis failures and action failures; • Establishes a context associated with human failure events (HFEs) by 8 performance-shaping factors, (PSFs), and dependency assignment to adjust a base case/nominal HEP; • Uses nominal HEP Action (0.001) and Diagnosis (0.01); and • Provides guidance on HRA for both Full Power and Low power/Shutdown conditions.
12.10 Incorporating the Simulated HF Data for HEP Estimation Two sources of information on HEP data are required for risk analysis: (i) existing HRA model (here SPAR-H) and (ii) from simulated HF data. In order to integrate both, Bayesian Networks were used. A Bayesian Network represents a set of random variables and their conditional dependencies via a directed acyclic graph (DAG). Formally, Bayesian Networks are DAGs whose nodes represent random variables in the Bayesian sense: they may be observable quantities, latent variables, unknown parameters, or hypotheses (Mkrtchyan et al. 2015; Groth and Swiler 2013; Garg et al. 2017). Edges represent conditional dependencies among the nodes. The nodes that are not connected (there is no path from one of the variables to the other in the Bayesian Network) represent variables that are conditionally independent of each other. Each node is associated with a probability function that takes, as input, a particular set of values for the
122
V. Garg et al.
Fig. 12.2 BN model for the case study
node’s parent variables and gives (as output) the probability of the variable represented by the node. The degree of relationship between the nodes is incorporated into the network through the Conditional Probability Table (CPT). The HF data obtained was plugged into the CPT in an appropriate manner. The BN model developed for the case study is shown in Fig. 12.2 (Garg et al. 2017). It can be seen that the Bayesian network could depict all human factors identified from the SPAR-H model.
12.11 A Comparative HRA Study A typical scenario was chosen from an NPP wherein the control room operator injects fire water into the Steam Generator (SG) by remotely opening 2 motor-operated valves. The HEP was evaluated for the same context with HCR, ASEP, and SPAR-H. The comparison of the results obtained is shown in Table 12.3. From Table 12.3, it can be seen that the HEP estimation of SPAR-H is a bit conservative in comparison to ASEP and HCR. Also, SPAR-H is applicable to much Table 12.3 HEP results comparison SPAR-H
ASEP (HEP for action) and HCR (HEP for Diagnosis)
HEP (Diagnosis) = 5E–4
HEP (Diagnosis) = 1E–4
HEP (Action 1) = 6.25E–5
HEP (Action 1) = 5.5E–5
HEP (Diagnosis + Action) = 6.25E–4 HEP (Diagnosis + Action) = 3.66E–4
12 Human Reliability Analysis in PSA and Resilience Engineering: Issues …
123
wider variety of scenarios (Full Power, Low Power/Shutdown PSA) and evaluates HEP for diagnosis as well as action. In addition to that, SPAR-H has a higher number of PSFs to account for the context information in a better manner.
12.12 Some Challenges in HRA Methods There are some limitations of the current HRA methods (Boring 2007): 1. Current HRA methods do not account for the latency effect of the PSFs; i.e., the appearance of an abnormal situation has an impact over the PSFs, but it does not happen abruptly, and, similarly, after the situation is resolved, it does not diminish suddenly. 2. Present HRA methods do not incorporate the impact of the dependence between the tasks over PSFs.
12.13 Resilience Engineering Resilience engineering has existed even as a conceptual framework and a formal definition that is acceptable to the resilience/risk community at large has been elusive. Efforts at managing risk at different engineering facilities have divulged on a path to reduce the likelihood of disruptive events and consequences of the event. These strategies have focused on mitigation features in the form of prevention and protection: designing systems to avoid or absorb undesired events from occurring. The word resilience is derived from the Latin word resiliere, which means to “bounce back.” It implies, generally speaking, the ability and capacity to return to the original unperturbed state when the system was operating nominally and as desired. The common use of the word, resilience implies the ability of an entity or system to return to normal condition after the occurrence of an event that disrupts its state. Such an interpretation would apply to diverse fields such as power technology, engineering systems, aircraft industry, rail transport, etc. In literature, various definitions of resilience that span multiple disciplines have been proposed. Allenby and Fink (2000) define resilience as the “capability of a system to maintain its functions and structure in the face of internal and external change and to degrade gracefully when it must.” Disaster resilience is characterized by Infrastructure Security Partnership (2006) as the capability to prevent or protect against significant multi-hazard threats and incidents and to recover and reconstitute critical services with minimum devastation to public safety and health. Haimes (2009) defines resilience as the “ability of system to with stand a major disruption within acceptable degradation parameters and to recover with a suitable time and reasonable costs and risks. “Vugrin et al. (2010) define system resilience as: “Given the occurrence of a particular disruptive event (or set of events), the resilience of as system to that event (or events) is that system’s ability to reduce efficiently both the
124
V. Garg et al.
magnitude and duration of deviation from targeted system performance levels.” A disruptive event imposes an undesirable impact to a system (measured by the difference between targeted and disrupted performance level of system), and it consumes resources in the recovery of the system through repair, maintenance, and testing. Hosseini et al. (2016) give a detailed review of the qualitative and quantitative methods for resilience estimation in various types of applications. A new method of resilience engineering system has been developed Prasad et al. (2022).
12.14 Resilience and PSA Resilience engineering has emerged, in a sense, from the application of safety features, reliability principles, and human reliability analysis in PSA. It can provide a robust and sensible input to the safety of a system. The quantification of resilience at the component level and the system level would provide a structured means to prioritize the factors that influence it. Hosseini et al. (2016) outline the methods used to quantify resilience; however, these have been proposed in different fields, and a consistent mathematical procedure and formulation to provide a universal application is not yet established. Quantification in PSA and HRA also brings with it decisionmaking for changes appropriate to the system. PSA deals with quantification and is the basis for risk-informed, decision-making for high-consequence events. Currently, there is no analogue procedure for decision-making from resilience engineering. However, there is potential to develop this if formal methods are developed for NPP systems. Resilience assessment has a path for the recovery of equipment, and herein lie the factors analogue to performance-shaping factors or human factors in HRA. If the factor-shaping resilience of a system are identified, articulated, and used in resilience quantification, then there can be a direct benefit of resilience to safety. In reliability, the unknowns are treated as uncertainty. These might not have been anticipated for these types of events. Reliability is the ability to predict performance for known circumstances. There are, unfortunately, many grey areas, and focusing only on knowns may limit the safety of a system. Resilience, on the other hand, includes the ability of a system or organization to withstand such unanticipated occurrences.
12.15 Conclusions We can thus conclude that HRA is a vital part of PSA and helps in accounting for the contribution of human operator error towards the core damage frequency and early release frequency. HRA depends heavily on the HF data. In the absence of the operation experience of the plant, real/virtual simulators are a good alternative source to generate the HF data. However, present HRA methods do not incorporate the latency effects of the PSFs and the impact of the dependence between the tasks over
12 Human Reliability Analysis in PSA and Resilience Engineering: Issues …
125
PSFs. As such, resilience engineering is a paradigm shift from the reliability-based safety principles. It encompasses reliability/failure, recovery, and a larger scope of events not known. These unknown events are modeled as uncertainty in reliability analysis. Resilience quantification is not universalized, i.e., there is no consistent mathematical formulation for system or component resilience even though some methods have been proposed in literature. Nevertheless, resilience-shaping factors analogous to PSF in HRA can be defined.
References Allendby B, Fink J (2000) Social and ecological resilience: toward inherently secure and resilient societies. Science 24(3):347–364 Boring R (2009) Using nuclear plant training simulators for operator performance and human reliability research. In: Topical meeting on nuclear plant instrumentation, control, and humanmachine interface technologies NPIC&HMIT, Knoxville, Tennessee Boring R (2007) Dynamic human reliability analysis: Benefits and challenges of simulating human performance. In: Conference: European safety and reliability conference (ESREL 2007). Stavanger, Norway Cepin M (2015) Evolution of probabilistic safety assessment and its application in nuclear power plants. In: IEEE international conference on information and digital technologies, Zilina, Slovakia Forester J, Kolaczkowski A, Cooper S, Bley D, Lois E (2007) ATHEANA User’s Guide. NUREG1880: Final Report. U.S. Nuclear Regulatory Commission, Washington, D.C Garg V, Vinod G, Prasad M, Chattopadhyay J, Smith C, Kant V (2023) Human reliability analysis studies from simulator experiments using Bayesian inference. Reliab Eng Syst Saf 229 Garg V, Santhosh T, Anthony P, Vinod G (2017) Development of a BN framework for human reliability analysis through virtual simulation. Life Cycle Reliab Saf Eng 6:223–233 Gertman D, Blackman H, Marble J, Byers J, Smith C (2005) The SPAR-H human reliability analysis method. NUREG/CR-6883. U.S. Regulatory Commission, Washington, D.C. Groth K, Swiler L (2013) Bridging the gap between HRA research and HRA practice: a Bayesian network version of SPAR-H. Reliab Eng Syst Saf 115:33–42 Haimes Y (2009) On the definition of resilience in systems. Risk Anal 29(4):498–501 Hollnagel E (1998) Cognitive reliability and error analysis method (CREAM), amsterdam. Elsevier Science, The Netherlands Hosseini S, Barker K, Ramirez-Marque J (2016) A review of definitions and measures of system resilience. Reliab Eng Syst Saf 145:47–61 Mkrtchyan L, Podofillini L, Dang V (2015) Bayesian belief networks for human reliability analysis: a review of applications and gaps. Reliab Eng Syst Saf 139:1–16 Prasad M, Gaikwad A (2015) Human error probability estimation by coupling simulator data and deterministic analysis. Prog Nucl Energy 81:22–29 Prasad M, Gopika V, Andrews J (2022) Application of multi parameter path length method for resilience (MP-PLMR) to engineering systems. ASME J Risk Uncertainty Eng Syst Part B: Mech Eng. Accepted for Publication Pyy P (2000) Human reliability analysis methods for probabilistic safety assessment. VTT Publications. https://www.vttresearch.com/sites/default/files/pdf/publications/2000/P422.pdf Strater O (2000) Evaluation of human reliability on the basis of operational experience. GRS, Cologne, Germany
126
V. Garg et al.
Swain A, Guttmann H (1983) Handbook of human reliability analysis with emphasis on nuclear power plant applications. NUREG/CR-1278. https://www.nrc.gov/docs/ML0712/ML0 71210299.pdf The Infrastructure Security Partnership (TISP) (2006) The Infrastructure Security Partnership (TISP). Regional disaster resilience: a guide for developing on action plan. American Society of Civil Engineering, Reston, VA The International Atomic Energy Agency (2010) Development and application of level 1 probabilistic safety assessment for nuclear power plants, Specific Safety Guide No. SSG-3. https://www.iaea.org/publications/8235/development-and-application-of-level-1probabilistic-safety-assessment-for-nuclear-power-plants Vugrin E, Warren D, Ehlen M, Camphouse R (2010) A framework for assessing the resilience of infrastructure and economic systems. In: Sustainable and resilient critical infrastructure systems. Germany, Springer-Verlag Inc., Berlin, pp 77–116
Chapter 13
Human Reliability: Cognitive Bias in People—System Interface Manas K. Mandal and Anirban Mandal
13.1 Introduction The investigation of human reliability dates back to the 1950s and has evolved from the reliability technology of hardware systems. Human Reliability Analysis (HRA) is an estimate of human performance to system reliability and safety (Swain 1990). The analysis attempts to create a program (HRP: Human Reliability Program) that ensures the highest standards of reliability of human performance by examining employees’ physical and mental appropriateness. The program is designed to analyze human ability as per the safety and security guidelines of the system. A variant of this program is called Personnel Reliability Program (PRP), which is designed to select and retain individuals who may be trusted in security-critical systems or industries. While attempts have been made to model human cognition as a form of human reliability analysis (Cacciabue 1992), less attention has been given to understanding the issue of mental appropriateness based on cognitive elements. The science of human cognition has been thoroughly studied in domains that relate to human performance but not to the extent that fits uniquely to HRP for security-critical systems. The study is important since cognitive science reveals that certain forms of human error are hard-wired, non-reversible, and part of human cognition. Unless examined in a system-specific manner, human performance may be subjected to systemic error due to inherent cognitive styles of functioning. These errors vary widely among individuals and are embedded into our cognitive system, which makes them not easily discernible or modifiable by training. To examine these types of and forms of error more deeply, it is important to understand human cognitive systems and their biases. M. K. Mandal (B) Indian Institute of Technology, Kharagpur, India e-mail: [email protected] A. Mandal The University of Dayton, Dayton, OH, USA © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. S. Chirayath and M. Sai Baba (eds.), Human Reliability Programs in Industries of National Importance for Safety and Security, Risk, Reliability and Safety Engineering, https://doi.org/10.1007/978-981-99-5005-8_13
127
128
M. K. Mandal and A. Mandal
Before we discuss the relevance of cognitive systems in eliciting human error, it is important to note that human reliability program is not limited to the evaluation of cognitive systems. Several factors, other than cognitive factors, may contribute to human error, for example, personality disorders, stress or anxiety, usage of drugs like alcohol, hostility, or aggression towards authority, deceitful or delinquent behavior, self-defeating tendencies, etc. In this chapter, the focus will be on examining those cognitive factors that may predispose an individual towards committing an unintentional error. No attempt will be made to examine other factors that may also lead to human error, either due to “content,” or cognitive factors or due to “intent,” or affective/motivational factors. Studies have revealed that cognitively guided performance factors have a major role in dynamic operational conditions, especially in security-critical industries. Takano and Reason (1999) examined psychological bias, along with internal and external performance-shaping factors, in accidents in the US and Japanese nuclear power plants. In a multivariate analysis of data, they found psychological biases are important determinants of accidents in dynamic operational conditions. The study also suggested about higher cognitive abilities, like situational awareness and response planning, are important determinants. This chapter does not deal with these higherorder cognitive factors affecting human performance-based reliability, but rather highlights those cognitive primitives that are embedded into our cognitive system that remain invariant despite having higher-order cognitive abilities. However, it is important to note that cognitive bias is a derivative of our cognitive system. Therefore, before we discuss cognitive bias, it is important to understand the basic tenets of cognition. The following section deals with this construct.
13.2 Operationalizing Cognition Cognitive science deals with how information is registered, acquired, stored, processed, transferred, and retrieved by the brain. It is presumed that information is generally registered in some “formation,” acquired by some “representation,” processed by some form of “transformation,” and retrieved as “cognition.“ Behavioral scientists (for example, psychologists) believe that cognitive processing depends on internal representation (sensory input) and mental representations undergo transformation (stored memory). Thus, psychologists investigate the representation of information, along with the processing of information, by animals and human beings. Neuroscientists believe that (a) the transformation in the cognitive process has a neural basis and (b) the process has a locus at the higher centers in the brain. Thus, neuroscientists try to establish links between the brain and cognition. Computational scientists aim at understanding the intelligent agent in computational terms; therefore, they observe a variety of intelligent behavior as a (a) function of general-purpose cognitive architecture and (b) a functional division of the architecture with modules for perception, learning, memory, attention, etc. Linguists believe that the study of language is central to cognition, as our thoughts
13 Human Reliability: Cognitive Bias in People—System Interface
129
get expressed through language, and, therefore, language is the first form of higher cognitive ability. Traditionally, the science deals with the basic process that examines the architecture of human cognition beginning with sensation, perception, attention, memory, learning, intelligence, problem-solving to decision making.
13.3 The Architecture of Human Cognition All forms of human behavior have an element of cognition. The cognitive process entails several stages, for example: • Sensation: Inputs from the external world are accepted in various forms. • Perception: These inputs are interpreted to create a perception in the external world. • Attention: Perceptions are guided by attention. • Memory: Some products of perception are stored in memory. • Learning: Stored memory modifies subsequent perception and helps us learn. • Problem-solving: Learned behavior influences solving problems and making decisions subsequently. • Intelligence: Problem-solving and decision-making behaviors are moderated by intellectual capacity. • Cognition: The outcome of decisions results in experience (environment), which, coupled with inner constraints (heredity), results in cognition. Based on the inputs of these basic sciences like behavioral science, neuroscience, and computational science, cognitive scientists have expanded their scope of the investigation as a form of application-oriented basic research in the fields like knowledge modelling and simulation, cognitive assessment, decision making, and metacognition, language and communication, cognitive neuroscience, humanoids and augmentics, neuro-engineering and cognition, cognition in an intelligent network, visualization and simulation, etc.
13.4 Technology Interfacing Cognition The purpose of the present chapter is to highlight certain cognitive features that are relevant to Human Reliability Analysis. As indicated, human behaviors are prone to a variety of errors while interfacing with the technology. The proneness occurs because the technology develops at a faster pace compared to human skill, resulting in a gap between the technology development and human skill development. This gap creates two possibilities: (a) technology-enabled cognition (Human-as-a-System or automation) or (b) cognition-enabled technology (System-as-Human or autonomy). The major purpose of the Human-as-System is to reinforce, reproduce, or replace human capability while the purpose of System-as-Human is to perform extraordinary
130
M. K. Mandal and A. Mandal
missions, or work, without human intervention or follow rules of engagement without error. Human reliability program is designed to evaluate Human-as-System based on cognitive models. These cognitive control-based models [for example, the Contextual Control Model (COCOM) and the Cognitive Reliability and Error Analysis Method (CREAM) (Hollnagel 1993; Hollnagel 1998) are capable of deciphering human information processing for human reliability analysis (HRA) and human–machine interface design. Recent technological advances have made it possible to reduce human error to a great extent; however, it is not possible to develop system reliability completely where the human factor plays a role, especially in dynamic operational conditions. In fact, human error contributes dearly to overall incidences of system failure at around 60–90% (Pasquale et al. 2013) and correlates positively with the components of system error. Therefore, it is important to examine the nature of human error, which is variable in the dynamics of safety and accidents. The following section deals with various human error probabilities that are akin to cognitive architecture and are considered important as performance-shaping factors. Due consideration to such possible failures arising out of cognitive functions may help develop better cognitive control-based techniques. Unsafe human acts may be classified into two types: one that relates to the human operator’s internal cognitive state (perceptual, attentional bias) and the other that relates to the human operator’s way of working that causes routine violations. There may other factors that are also responsible, but those do not fall under the purview of this chapter, for example, adverse physiological conditions (like sickness) or the mental state (like personality, motivation, stress) of the operator. The following sections will deal with human cognitive bias as well as a cognitively guided way of working in safety behavior.
13.5 Cognitive Bias A cognitive bias refers to a tendency to make similar mistakes repeatedly in thinking and decision-making. Kahneman and Tversky (2000) proposed that human beings often make decisions based on imperfect cues, which leads to errors and fatalities. These errors are not random in nature and follow a predictable direction. Therefore, cognitive bias may be designated as a systematic form of deviation from the normative pattern in judgment, information processing, or decision-making. Because of their adaptive nature, individuals who indulge in such forms of bias create their own subjective reality and fail to develop insight into their bias (https://en.wikipedia.org/ wiki/Cognitive_bias). In this section, an attempt will be made to discuss these biases. In addition, cognitive errors that are not typically included as bias involving judgment will also be discussed, since these errors pose serious human reliability issues. More often these errors, which are part of our cognitive domains like attention, memory, and thinking, go unnoticed and are embedded into us in our style-of-functioning
13 Human Reliability: Cognitive Bias in People—System Interface
131
with a high degree of specificity. The concept of cognitive style, referred to as fielddependence or field-independence, is related to this issue of style-of-functioning (Witkin et al. 1962). People who possess a field-dependence style tend to rely more on the external world (the frame of a situation) for information processing, while people with a field-independence style rely heavily on internal cues or referents. Since these cognitive styles are discussed more often with reference to individual differences, no effort will be made in this chapter to examine the implications on human reliability. The following section will discuss four major types of cognitive bias: attentional bias, memory bias, judgmental bias, and decision-making bias (see Wikipedia for further details).
13.6 Attentional Bias (a) Change blindness: This feature is considered a fundamental flaw in the human attentional system in which an individual fails to notice ‘change,’ or the major difference between two visual objects or patterns shown in brief succession. Eye-tracking research has observed that change blindness is less common when the visual object is processed holistically, like a human face. However, in the case of objects that need not be processed holistically (for example, a change in machine architecture), change blindness is more prevalent. (b) Hemi-space neglect: This is referred to as a One-Sided-World for those who have a neurological condition in the right parietal lobe. The individual is unable to process information from both sides of our visual system and this inability to link to the side that is contralateral to the damaged site. The attentional deficit is linked not only to the visual system but may extend to the auditory, olfactory, or somatosensory domains. (c) Rubber gloves illusion: This is an illusory sensation in which a person believes that a rubber glove kept in front of their hand is their own. This illusion comes into effect when real and rubber hands are stimulated at the same time at a given speed. After some time, the visual information combines with the tactual sensation and the person starts believing the rubber hand is transmitting sensation. Italian scientists have found through their experiments that the electrical impulses that are supposed to travel down to the real hand through the spinal cord drop down dramatically. The illusion is found to be very strong in some people, which may lead to serious human reliability problems. (d) Attentional blinks: This is a normal phenomenon in which our attention takes a pause in order to help the brain ignore the distractions from another target presented in quick succession. The blink, or attentional gap, occurs to avoid any perceptual confusion while identifying targets. The attentional blink is largely affected by the visual or conceptual similarity between two objects. Similarity increases the attentional lag, and it has implications for attentional span or vigilance behavior. A related but somewhat different phenomenon is known as
132
M. K. Mandal and A. Mandal
repetition blindness in which a person fails to recognize when a target appears twice in quick succession. (e) Attentional lapse: A low level of cognition (hypo-focus), unwarranted distractions, or an intense level of attention (hyper-focus) for one given target may generate an attentional lapse. These lapses create absent-mindedness or forgetfulness. In such cases, an individual is susceptible to frequent distractions and shows weak memory and poor recollection of recently occurring events, at times resulting in serious consequences, injury, or loss in routine behavior. (f) Visual field bias: Visual field bias may result from reading bias, which is from left to right for English or any other Indian language. The bias may be from right to left for Urdu or any other language that is written in that direction. Similar to horizontal scanning bias, vertical bias (top to bottom) has also been observed for languages written or read in that direction. Face recognition studies have shown that right-handed individuals have a typical tendency to attend to left facial information more than the right side. The hemispheric specialization of the human brain is considered responsible for this effect.
13.7 Memory Bias (a) Memory blocking: This is a phenomenon in which individuals fail to retrieve a piece of information that is already registered. There are many reasons for the memory block, like certain medications to stop traumatic experiences, brain damage, Vitamin B-12 deficiency, alcoholism, etc. Misattribution, suggestibility, or cognitive bias can also block memory. (b) Confabulation: This is a cognitive error in which individuals fail to provide accurate details of events and instead fill in memory gaps with inaccurate stories, which they believe are true to the best of their knowledge. The inaccuracies may range from alterations of minor details to drastic changes, depending on the kind and nature of the experience. Confabulations may be provoked (a normal response to a faulty memory) or spontaneous (without a cue and more involuntary in nature). Neuroscientists believe that confabulation is primarily a problem of memory storage in the brain or a difficulty in encoding strategy. (c) Imagination inflation: This is a form of cognitive distortion in which a person recollects an event with an exaggerated version or imagines an event that has never happened. Imagining a false event, therefore, increases the probability of unforeseen familiarity or misattribution. This form of distortion occurs due to a person’s inability to locate source information or source monitoring errors. In many cases, people recognize the content of the memory but not the source and thereby generate information that is already stored in memory to fill in gaps. (d) Memory intrusion error: This form of cognitive error occurs when a piece of information gets associated with an original theme of memory but is not actually a part of it. With this form of association, it becomes difficult to distinguish between original and tainted memory. Scientists attribute this phenomenon to the
13 Human Reliability: Cognitive Bias in People—System Interface
133
loss of “recall inhibition,” which filters out irrelevant or unnecessary information to be part of awareness while attempting to remember. The other possible explanation is the loss of new context integration into an already existing memory trace (Stip et al. 2007). (e) Schematic error: Schema are utilized by a person when the parts of the experience are not remembered. Some people use parts of the schema that did not actually take place or use a schema that is typical to a situation (Kleider et al. 2008). These schemas are general guidelines to do daily activities. A person may forget to pay the shopkeeper and believe that he/she has done so. In effect, this error refers to a mental script that has wrongly reconstructed an event.
13.8 Judgmental Bias (a) All or none thinking: This form of bias occurs when an individual fails to take a differential view about an event. They are polarized thinkers and see everything in black and white with no grey shades in between. Generally, such people are perfectionists and view a minor mistake as a complete failure. (b) Magnification or minimization: This refers to giving extra weight to negative issues, failures, or threats while ignoring positive aspects or perceived elements of success. By magnifying weaknesses or possible threats, they make sure that their judgment is in sharp contrast with others who see otherwise. (c) Filtering & disqualifying the positive: A closely related phenomenon is filtering in which a person filters out all positive aspects of an event despite having contradictory evidence and dwells only on negative aspects. By disqualifying the positives, they maintain the negative outlook and try to gain a status of “always being right.“ (d) Fallacy of fairness & mislabelling: This is a form of belief in which a person always looks for fairness in every walk of life. Any perceived deviation from fairness is considered as injustice which results in an attempt to change the situation with emotional outbursts. In the process, they attack a person’s attribute rather than the issue that caused perceived unfairness. (e) Cognitive dissonance: This occurs when a person’s belief clashes with a new set of information. The presence of two or more contradictory ideas or beliefs results in some form of dissonance and the person experiences stress because of the togetherness of the contradictory ideas or beliefs. Festinger proposed that human beings strive their best to bring internal consistency and reduce cognitive dissonance (Westermann 1989).
134
M. K. Mandal and A. Mandal
13.9 Decision-Making Bias (a) Anchoring bias: This occurs when someone makes a decision based on their initial impression of an event. All following estimates or arguments made by an individual are based on the impression that was secured at the outset. Researchers have found that the anchoring effect is a common human tendency affecting decisions. (b) Confirmation bias: In this form of bias, a person relies heavily on input that confirms his/her pre-existing belief and discounts all evidence that is contradictory. There has been a tendency to overvalue evidence that matches their belief system. (c) Hindsight bias: This refers to a tendency to assume an event as more predictable than it actually was after it has already taken place (for example, the outcome of a safety investigation). This tendency causes people to overvalue their ability to predict an outcome. (d) Halo effect: With this effect, a person creates an overall impression about others based on one or two positive traits. This is considered to be a reasoning error, although it helps to make a quick decision in daily life. (e) Optimism bias: This form of bias pushes a person to believe that they are at a lesser risk of having a negative outcome compared to others. The bias changes objective reality and attributes to temporal discounting (overvaluing the present over the past). People with this form of bias believe that things or events are under their direct control (Sharot 2011). Apart from these cognitive biases, several other forms of bias are reported (Wilke and Mata 2012) in the literature, like illusory correlation (perceiving a correlation between two apparently unrelated events), conjunction fallacy (a tendency to assume a specific condition as more applicable than a general one), exposure effect (a tendency to prefer some action simply because of some familiarity), bias blind spot (tendency to remain unaware or to recognize their own bias), etc.
13.10 People-System Criticalities A review of cognitive biases suggests that human performance is differentially affected by these tendencies either from underestimating the potential threat or from overestimating the potential capacity of the safety tools to mitigate possible vulnerabilities in a work environment as well as in daily life. In high-stake environments, the presence of such biases becomes extremely critical. A decision is an act of choice for the possible solutions to a complex problem. In general, decisions are taken logically and, in some cases, intuitively. In logical decision-making, the individual gets an opportunity to use evidence, break down problems into smaller units, and utilize arguments to draw conclusions. However, there are problems when logical decision-making does not work, for example, when
13 Human Reliability: Cognitive Bias in People—System Interface
135
(a) situations involve a high uncertainty level, (b) there is a rapid response requirement, (c) the problem is poorly structured, (d) there are several plausible alternatives, (e) factors and rules are ambiguous, (f) there is no precedence. Under such conditions, it is important to gain a sense of why decision-making errors take place; for example, such errors are the outcome of faulty judgment or inadequate problem-solving (Klein 1993). The introduction of autonomy-based or transformative technologies has increased the level of uncertainties even more. In the era of automation, cognitive-motor performance has turned out to be even more important since (a) there is an uneven distribution of work between machine and human being, (b) a conflict between automation complacency vs manual override in emergency conditions, (c) the possibility of a breakdown in awareness (intent opposite to machine mode), (d) out-of-loop loss of situational awareness involving automation “surprises” or unintended confusion, (e) a need for new approaches to training (from manual to automation that requires deskilling and reskilling), (f) a false sense of security and trust in automation, (g) a need for transition from passive to active cognitive control, etc. (Salvendy 1997). These issues become paramount since there is a continuous gap widening between technology development and human cognitive skill development, resulting in safety consequences. Moreover, in emergency situations, human performance becomes even more vulnerable since decision-making suffers from some critical issues. Human performance, under such uncertain conditions, tends to follow certain forms of bias. For example, people (a) seek analogy that does not exist, (b) develop fixation based on initial impression, (c) give undue importance to expert opinion, (d) discount contradictory evidence, or (e) develop a perception that everything can be done has been done. Faced with such a situation, some people believe that what readily comes to mind is possibly correct and, therefore, rely more heavily on gut feeling (the automatic thinking process for which the brain spends the least energy or runs on autopilot) rather than on logical decision making, since that is considered hard and inefficient (for details about the two systems of decision making see, Kahneman and Tverskey (2000). When the brain is on autopilot, safety risks increase manifold. At times people try to anchor on initial impressions about the danger and that obscures the true causes of the emergent event. In some emergency situations, people try to seek an analogy to solve a problem that does not exist at all or give undue importance to expert opinion instead of trying to develop insight into the problem. In some cases, people consciously discount contradictory evidence or accept a solution that is not fully verified or validated.
136
M. K. Mandal and A. Mandal
13.11 Psychometric Testing Although several experiments were conducted to examine the nature of various forms of cognitive bias (Ramchandran 2012), relatively little effort has been made to assess these biases with psychometric tools. Most studies in cognitive bias were conducted to establish the phenomenon of bias and the robustness of it in the general population. Very few attempts were made to develop psychometric tests in this direction. End-users of cognitive tests, especially in the field of human reliability programs, often confuse a general-purpose cognitive test with a clinical-purpose cognitive function test or a cognitive bias test. General-purpose cognitive tests are utilized to examine core cognitive competencies in individuals as a baseline measure of the functional architecture of the brain, like crystallized (numerical ability, verbal ability, logical reasoning, problem solving, decision making) and fluid (abstract reasoning, spatial reasoning, creativity) intelligence. Clinical-purpose cognitive function tests are utilized for the screening and staging of mental health conditions in individuals having anxiety, depression, or other psychiatric conditions. Cognitive bias tests, on the other hand, are primarily meant to detect inherent tendencies that cause systematic error in our behavior or in performance. These biases are robust in nature, as individuals do not develop insight into them and fail to unlearn them without special training in dynamic operational conditions. Since cognitive biases are not easily detected and are found to have some relationship with personality (Ramos 2018; Witkin et al. 1972), it is important to utilize psychometric tests as part of human capability analysis. The general- or clinical-purpose cognitive tests are widely available, although very few cognitive bias tests have been found in the literature. While general-purpose cognitive tests are important to examine human reliability in performance, cognitive bias tests are vital to ensure an accidentfree performance level. Cognitive bias manifests in dynamic operational conditions; therefore, a job or an operational condition-based test development will be authentic for a human reliability program. Such tests need to follow standardized test development procedures after identifying the type of bias inherent in the task. Some of the tests of cognitive bias, invariant to task characteristic or operational conditions, that are available in the literature include the following: (a) The Assessment of Bias in Cognition by Gartner et al. (2016) that measures (1) confirmation bias (CB), (2) fundamental attribution error (FAE), (3) bias blind spot (BBS), (4) anchoring bias (ANC), (5) representativeness bias (REP), and (6) projection bias (PRO); (b) The Cognitive Reflection Tests by Frederick (2005) designed to measure “a person’s tendency to override an incorrect ‘gut’ response and engage in further reflection to find a correct answer” and (c) The Stroop Tests by Eysenck et al. (1987) and Golden (978), which examine the ability to inhibit cognitive interference that occurs during the processing of two stimulus features. In sum, a cognitive bias is a systematic pattern of deviations from a normative standard that affects our behaviosral outcomes, especially in high- stake environments involving a people–system interface. At times, the bias is treated as mental shortcuts that result in decision-making errors, although some researchers do not consider these
13 Human Reliability: Cognitive Bias in People—System Interface
137
as errors and treat them as “rational deviations from logical thought” (Gigerenzer 2006). To identify and overcome such forms of mental short-cuts or perceptual biases, psychometrics tools may be used to produce reliable and reproducible results. However, such tools are not widely available in the scientific literature, and therefore, need to be either customized or adapted based on available psychometric tools or developed based on operational conditions or system requirements.
References Cacciabue P (1992) Cognitive modelling: a fundamental issue for reliability issue for human reliability assessment methodology? Reliab Eng Syst Saf 38:91–97 Eysenck M, MacLeod C, Matthews A (1987) Cognitive functioning and anxiety. Psychol Res 39:189–195 Frederick S (2005) Cognitive reflection and decision making. J Econ Perspect 19(4):25–42 Gartner A, Zaromb F, Schneider R, Roberts R, Mathews G (2016) Development and evaluation of an assessment instrument for the measurement of cognitive bias, June 2016. https://www.mitre. org/publications/technical-papers/the-assessment-of-biases-in-cognition Gigerenzer G (2006) Bounded and rational. In: Contemporary debates in cognitive science. WileyBlackwell, Hoboken, New Jersey, p 129 Golden C (1978) Stroop color and word test: a manual for clinical and experimental uses, Chicago. Stoelting Co., Illinois Hollnagel E (1993) Models of cognition: procedural prototypes and contextual control. Le Travail Humain 56:27–51 Hollnagel E (1998) Context, cognition & control. In: Cooperation in process management: cognition and information technology. Taylor and Francis, London, UK, pp 1–28 https://en.wikipedia.org/wiki/Cognitive_bias, 21 August 2021 Kahneman D, Tversky A (2000) Choices, values, and frames. Cambridge University Press, New York Kleider H, Pezdek K, Goldinger S, Kirk A (2008) Schema-driven source misattribution errors: remembering the expected from a witnessed event. Appl Cogn Psychol 22:1–20 Klein G (1993) Sources of error in naturalistic decision-making tasks. In: Human factors and ergonomics society (37th) annual meeting, Santa Monica, CA, USA Di Pasquale V, Iannone R, Miranda S, Riemma S (2013) An overview of human reliability analysis techniques in manufacturing operations. In: Operations management. IntechOpen Ramchandran V (2012) The encyclopedia of human behaviour. Academic Press, London, UK Ramos V (2018) Analysing the role of cognitive bias in decision making process. IGI Global, Hersey, Pennsylvania Salvendy G (1997) Handbook of human factors & ergonomics. Wiley, New York Sharot T (2011) The optimism bias. Robinson, London, UK Stip E, Corbiere M, Bouay L, Lesage L, Lecomte T, Leclerc C, Richard N, Cyr M, Guillem F (2007) Intrusion errors in explicit memory: their differential relationship with clinical & social outcome in chronic schizophrenia. Cogn Neuropsych 12:112–127 Swain AD (1990) Human reliability analysis: Need, status, trends, and limitations. Reliab Eng Syst Saf 29:301–313 Takano K, Reason J (1999) Psychological biases affecting human cognitive performance in dynamic operational environments. J Nucl Sci Technol 36:1041–1051 Westermann R (1989) Festinger’s theory of cognitive dissonance. In: Psychological theories from a structuralist point of view. recent research in psychology. Springer, Berlin, Germany, pp 33–62
138
M. K. Mandal and A. Mandal
Wilke A, Mata R (2012) Cognitive bias. In: The encyclopedia of human behavior, vol 1. Academic Press, London, UK, pp 531–535 Witkin H, Dyk R, Faterson H, Goodenough D, Karp S (1962) Psychological differentiation. Wiley, New York Witkin H, Lewis H, Hertzman M, Machover K, Meissner P, Wapner S (1972) Personality through perception: an experimental and clinical study. Greenwood Press, Westport, Connecticut
Chapter 14
A System Theoretic Framework for Modeling Human Attributes, Human–Machine-Interface, and Cybernetics: A Safety Paradigm in Large Industries and Projects Kallol Roy
14.1 Introduction For any critical infrastructure project,1 there is a need for large-scale deployment of skilled manpower, requiring individuals to work in multiple teams. Thus, it is essential to form multiple teams based on domain specializations and then identify the tools and equipment to be used to execute specific jobs. In order to further assess the execution modalities of the various job modules in such large infra-structure projects, it becomes important to identify the levels of automation in the various tools being used and the underlying cybernetics systems, which provide the secondary layer of HMI. Thus, if an effective model of human reliability is to be formulated, it is necessary to do the same in cascade with the associated cybernetics and their reliability models. At the outset, this would require the proper outlining of the work profile of an individual, along with the associated attributes, viz., prior training in a given area (both in terms of skill sets and theoretical knowledge), tools and tackles being used, level of automated equipment used for carrying out field-level jobs and
1 The author’s ideas on this topic were informed by: Lee, Higgins, and Tillmann (Lee et al. 1990); Sheridan (Sheridan 1982); Dougherty (1996); Ekanem (2013); Moskeh et al. (Mosleh et al. 2010); Dragan and Isiac-Maniu (Dragan and Isiac-Maniu 2014); the International Atomic Energy Agency (1990); Curtis et al. (2001); Krushcke (2010a, b); and Pentland and Liu (Pentland and Liu 1999).
K. Roy (B) Former CMD, Department of Atomic Energy, Bharatiya Nabhikiya Vidyut Nigam Limited (BHAVINI), Kalpakkam, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. S. Chirayath and M. Sai Baba (eds.), Human Reliability Programs in Industries of National Importance for Safety and Security, Risk, Reliability and Safety Engineering, https://doi.org/10.1007/978-981-99-5005-8_14
139
140
K. Roy
obtaining associated measurements, the ergonomics and anthropometrics of the jobspecific equipment to be used, and the in-stage cognitive feedbacks for a job in progress, etc. In a broad sense, the most critical infrastructure projects have (1) a construction, equipment-erection, and installation phase followed by (2) a commissioning phase, and then (3) an operations & maintenance phase, each requiring totally different human skillsets and thus different attributes for assessment. While the first phase pertaining to construction, erection, installation, etc. requires extensive usage of mechanical-handling equipment with specific visual/tactile feedbacks and also handheld equipment, the second and third phases require skill sets pertaining to process engineering and controls. Thus, for the second and third phases, there is a need for the effective design of plant control rooms and the entire set of paradigms governing the HMI aspects of the same. In order to ensure an effective human reliability approach, there is a need to continuously update all the personnel. For this purpose, it is important to consider the following aspects: (1) the theoretical basis of every job undertaken; (2) the various analyses carried out for job hazards; (3) the intuitive reasoning of the field worker prior to the commencement of an important job; and (4) making the entire team aware of similar experiences elsewhere and the near-miss incidents. A proper combination of skill sets, along with a minimum knowledge base pertaining to the scientific understanding of the underlying risks involved in carrying out a particular job, goes a long way in providing the necessary amount of confidence in the field personnel. Wherever possible, the use of simulation tools, or actual scaled-down mock-ups, provides the necessary preparedness for critical infrastructure jobs, as well as the necessary a priori knowledge base for averting errors. In projects involving the erection/installation or commissioning of nuclear facilities, there is also a need to sensitize the personnel on the safety category of the Systems, Structures & Equipment (SSE) on which work is being executed. Presently, for the safety equipment/ instrumentation, one of the governing criteria is categorization, vide IEC-61226, into safety systems (category-IA), safety-related systems (category-IB), and other auxiliary systems that may only have an indirect bearing on safety (category –IC). This sensitization during the period of installation of SSEs ensures the necessary precision work culture along with safety culture, which contributes towards enhanced safety, availability, and asset management during the operating life of plants. To further the cause of human reliability during the total life cycle of critical infrastructure projects, it is always essential, as mentioned earlier, to provide a wide knowledge base to each individual over and above their individual specializations in order to ensure increased appreciation of the total plant and better co-operation among workforces in the different SSEs. For this, a list of suggested areas directly pertaining to commissioning & operational safety in the nuclear industry follows: • • • • •
Domain Knowledge & Choice of Sensor & Inst. Systems Understanding/Appreciating Safety Envelope/Boundaries Knowledge of Postulated Initiating Events (PIE) Rudimentary Knowledge of PIE Dynamics Adequacy of Sensor Type & Position/Mounting
14 A System Theoretic Framework for Modeling Human Attributes …
141
Fig. 14.1 There is a need for an analytical understanding for deciding on the optimum location of sensors in order to capture the maximum information from a process
• • • • • •
Sensor Accuracy & Time Constants Signal Conditioning Error & Processing Time Safety Assessment of Network Protocols/WSN Data Processing, Data Mining, Big Data Analytics Plant-wide EMI Assessment & Inst. Design for EMC Human Machine Interface Perspective
While an in-depth knowledge of each of the listed domains (which comes with long years of theoretical & practical training) is not required to be understood by all, an appreciation of each of the suggested domains brings about the necessary work culture and safety culture, by effective blending of individual & collective skill sets and enhancing total teamwork and a co-operative approach to various plant commissioning and Operation & Maintenance (O&M) jobs (Fig. 14.1).
14.2 Safety Paradigms Construction Phase: During the initial construction phase, the predominant industrial safety aspects pertain to civil engineering construction and job hazards with regard to the same. The typical safety issues, which have been well documented and are essentially adhered to in a major construction site, pertain to working at heights, working in confined spaces, working below the pathway of heavy load transportation by cranes, working in deep dug-out pits, etc. For all these issues, there is a strong need for safety awareness on the part of each individual worker, each of
142
K. Roy
whom must be trained on the operational aspects of the various earth-moving and mechanical-handling equipment and machinery. Thus, there is a need to address all aspects of the safety paradigms pertaining to the mechanical handling of heavy equipment/components, wherein the parametric break-ups, listed below, are of prime consideration. Situation & systems
Domain identification (heights, close-space, heavy eqpt. mobilization/ erection, etc.)
Mechanical handling equipment (long boom cranes, tower cranes, earth moving machinery, etc.)
HRP indexing
• Sensitization • Training • Procedural • Performance aspects limitations • Basic trade & field training
HMI & cybernetics (mechanical, electro-mechanical, cyber-physical – panel mounted/hand-held, etc.)
Safety margin identification (safety integrity levels, procedural margins, operational regimes, etc.)
• Visual feedback • Tactile feedbacks • Sensor based information on parameters
• Based on JHA • Based on limits of eeqpt. operation • Performance margins & de-rating factors
Further, there are special training requirements for workers handling overdimensional consignments using specialized equipment and semi-automatic cyberphysical systems controlling equipment/machinery operation. For many such activities, the HRP improves many folds with adequate job briefings, along with simulated exercises (either using sophisticated computer-based simulators or by use of temporary mock-up facilities), prior to initiation of the actual job. Improvement in HRP for such jobs is a continuous process, since effective operational status may be monitored and the feedback appropriately incorporated effectively as the job progresses. The diagram, given below, indicates the need for estimation of the various safety margins while executing major jobs as a part of the construction/erection/installation campaign in large infrastructural projects. The diagram, below, also provides an insight into the need for total domain awareness along with the necessary skillsets and associated cybernetics in various mechanical handling equipments (Fig. 14.2). Commissioning and Operation and Maintenance Phase: During this phase, the basic training for personnel involved in the system-wise plant commissioning, followed by O&M of the total systems/sub-systems of an NPP, comprises of understanding the system/equipment operational limits, the various alarm parameters & their significance/set-points, the correlation of various alarm parameters with the behavior of connected SSEs, and the need for an effective response towards mitigation of the source causing the alarm. However, in addition to such basic training, sensitization to the safety limits and the limiting conditions of operations (LCO) of the different systems/processes provides the individual with an appreciation for respecting the alarms and understanding the significance of each vis-à-vis plant safety. This, together with a knowledge of the various postulated initiating events
14 A System Theoretic Framework for Modeling Human Attributes …
143
Fig. 14.2 In this case, the operating procedures, of the HMI devices, determine the job execution regime (JER), the procedural margins (PM) and the safety integrity levels (SIL). For example, in a crane, the CG is indicated to the crane operator, along with the other relevant safety parameters, thereby ensuring the crane functioning gets halted if the PM boundaries are exceeded
(PIEs) and the process transient characteristics and their corresponding limiting safety settings (LSS), or trip-set-points, further enables the appropriate safety sensitization and thereby improves the human reliability factor. The figure below indicates the need for assessing the margins between the LSS and the safety limits, considering the overshoots in a transient prior to the effect of the mitigating device towards containing the same (Fig. 14.3). HMI and Cybernetics Associated with the Plant Control Rooms: The evolution of plant control room designs, with an effective blending of the parameters of cognitive engineering, ergonomics, and anthropometrics, have been a major subject of research by plant operators, HMI specialists, and industrial psychologists (Ivergard and Hunt 2009; Hugo et al. 2017; Rejas et al. 2011). (1) First-generation control rooms provided analog and electro-mechanical indications of limited plant parameters (wherever telemetry from the SSEs was possible, considering the limitations in pneumatic tubing & electrical cable layouts from field to control room instrumentation), along with push-button actuators. This led to (2) next-generation control rooms with increased electronic read-outs (with improved telemetry connections through signal cables) and, thereafter, to (3) third-generation control rooms having computer-based menu-driven concepts and digital protocols (replacing the electrical cables). Throughout these phases, the layout of control rooms and the corresponding
144
K. Roy
Fig. 14.3 The design parameters, govern the plant transients and it is necessary to ensure that the worst-case transient, after crossing the LSS (& thereby having actuated the mitigating devices), should not over-shoot and cross the safety limits (SL). Thus it is important to quantify the time constants & response times of the safety devices (mitigating devices), in order to ensure that the transients get suppressed, well before reaching the SL
cognitive engineering aspects have undergone a paradigm shift (Ivergard and Hunt 2009). Very often, operators, accustomed to the earlier generation control rooms, were unable to gravitate to the changed indicating methodology and either required retraining or were progressively replaced with the operators of the next generation who are familiar with the newer concepts of HMI and associated cybernetic tools. A major shift in cognitive engineering aspects requirements had to be addressed when the fixed locations of indicating alarm meters etc. on control panels were progressively replaced with menu-driven computer screens, and the corresponding pushbuttons on a physical mimic panel were replaced with menu-driven graphic screens with mouse-driven protocol for actuation of SSEs. Thus, it became imperative to impart the appropriate training to control room operators right from the design stage of the control room in a plant under construction so that the human operator who has to operate the plant is made familiar with the appropriate HMI device along with the cybernetic tools, required in the upcoming new plant in advance (Fig. 14.4).
14.3 Alarm Management and System Alerts The further growth in intelligent alarm systems, with an operator aid for identifying the root cause, brought in a radical change in the training skill sets of an operator for fault finding & locating parameter upsets in plant SSEs. In the earlier generation control rooms, plant information was limited, owing to limitations in telemetry & communication from field equipment, since the signal transmission was essentially
14 A System Theoretic Framework for Modeling Human Attributes …
145
Fig. 14.4 As the control room layout changed from indicating- alarm meters (analog) to digital indicators and thereafter to menu-driven displays (computer based), there was a need for complete retraining of the operators (Images courtesy of Pacific Northwest National Laboratory, Wikicommons, IAEA Imagebank, Idaho National Laboratory)
pneumatic (3 to 15 psi or 0.2 to 1 kg/cm2 ) or involved direct voltage signals which could not be taken over long distances. However, as current transmissions (10 to 50 mA and then 4 to 20 mA signals) became possible, an increased amount of plant information was made available to the control room, which eventually resulted in an “alarm flood,” resulting in the control room operator not being able to prioritize events based on the importance of the parameters. Especially during a system upset condition, there would be a burst of alarms, which required prioritized mitigation and depended entirely on the level of the operator’s training (Fig. 14.5). An intelligent alarm management system, as indicated in the above figure, essentially disseminates the sequence of alarms and correlates the patterns with events, by a combination of techniques, viz., pattern recognition, machine learning, ANN, deep learning, etc., and provides an alert and EOP to the human operator. In addition, the advantages of the large-scale data handling capabilities of historians in the control room also offered data-oriented/data-mining based solutions and prognosis for SSE behavior, especially during the operations and maintenance (O&M) phase. Given below is a typical data-mining framework in a control room, where the online data is systematically stored. The appropriate skill sets of the human operator lies in their ability to clearly organize the large acquired data sets, effectively
146
K. Roy
Fig. 14.5 Intelligent alarm management system, wherein a group of alarms, are processed, based on their sequence of occurrence and corresponding possible event & its mitigation requirements are provided as alerts
superimpose the acquired data in line with the standard paradigms of pre-processing, as well as classification, clustering, association-rule-mining, and data-warehousing, as shown in the figure above. The figure indicates a typical methodology, wherein the mapping of various activities is cast in the framework of the available control and instrumentation (C&I) systems in large plants with field-level embedded instrumentation, local control panels, and distributed computer control systems and storage historians for data partitioning and segmentation. Such a data-oriented approach clearly brings out the HRP components, the contributions of the HMS, and the automated C&I systems (Fig. 14.6).
14 A System Theoretic Framework for Modeling Human Attributes …
147
Fig. 14.6 The primary data mining methodology of data pre-processing, classification, clustering, association rule mining, etc. could be applied with the availability of plant historians in the control room, which could store large volumes of data
14.4 Human Reliability & Trustworthiness in Quantitative Form In order to evolve a quantitative comprehension of the effectiveness, safety perceptions, trustworthiness, etc. of an industrial worker with high-end skill sets, an approach towards the use of mathematico-statistical models, based on a set of attributes with associated uncertainties, pertaining to the various intellectual and physical dynamics of a human worker, is presented. Since precision jobs, typical in energy, aviation, & marine sectors, require either physical/manual or intellectual or both (considering planning & procedure preparation, followed by site execution) sets of attributes, along with applicable handling equipment with associated cybernetics, the mathematical models for the different work execution regimes could be different. Hence, a realistic model, wherein the human & HMI systems are coupled, may essentially require a framework of differential algebraic equations (DAE) containing linear algebraic equations, non-linear quadratic or higher order equations, linear homogenous or non-homogenous ordinary differential equations (ODEs) & partial differential equations (PDEs), or even higher degree/order ODEs/ PDEs. For the ease of mathematical formulation, all such attributes, which represent coupled human–machine interface (HMI) models, are considered to be linear & time-invariant (LTI)—the systems connected through algebraic formulations offer direct solutions; systems governed by ODEs/PDEs require analytical or numerical
148
K. Roy
solvers; and systems having a combination of deterministic and stochastic factors may require a math-stat approach and may require iterative/recursive solutions of Bayesian & Markovian models. In the classical system theory domain, using the transfer function approach for a single-input & single-output (SISO) model, an equivalent concept for humans, HMI, & cybernetics, behaving as a feed-back control loop can be considered, wherein, by use of the classical system theoretic techniques, in the time domain or in the frequency domain, aspects pertaining to transient and steady-state performance along with stability can be analyzed. In the present context, while performance relates to the measure of the correctness of the work executed, the transient characteristics represent the initial field trials and task initiation, with the steady-state characteristics determining the error-free completion of a job. Extending the system theoretic paradigms for a multi-input & multi-output (MIMO) modeling & analysis approach by casting all the attributes in a state-space framework requires defining the various attributes of human skill sets and handling tools/systems with associated cybernetics as states and casting the system as state-equations and measurementequations with system and measurement uncertainty as linear combinations. Various systems and control paradigms can be thereby addressed through a state-transfer approach, controllability & observability criteria, and pole-placement criteria. Further, issues pertaining to the estimation of non-measurable parameters or poorly defined parameters of human attributes may be addressed by a Bayesian approach for both linear-time-invariant (LTI) & non-linear-time-varying (NLTV) formulations, and issues pertaining to interrupt-driven formulations of HMI & Cybernetics may be addressed by appropriate discrete-event framework solvers, viz., automatons, petri-nets, etc. In both the transfer-function approach and state-space approach, the concept of bounded-input & bounded-output (BIBO) system stability is considered as the precision and quality of job execution within the realms of safety and availability, wherein error postulations and their mitigation actions, specific to different jobs, contribute to stability margins. In addition to developing first-principle mechanistic models based on the parametric dynamics of Human attributes, applications of Bayesian-Belief Networks (BBN) and Big-Data-Analytics are considered for evolving and quantifying behavioral patterns and the safety-related trustworthiness of individuals to ensure the seamless cascading of the human skill-sets along with the corresponding HMI & Cybernetics. The model for the industrial worker, models for the Human–Machine Interface in Erection, Installation & Commissioning of SSEs, Transfer Function Approach and the State-Space Framework, have been elaborately explained in an earlier paper (Roy 2019). The system theoretic approach, is indicated in Fig. 14.7 below and the transfer function approach, in the HRP framework is indicated in Fig. 14.8 (Fig. 14.9).
14 A System Theoretic Framework for Modeling Human Attributes …
149
Fig. 14.7 A system theoretic approach essentially identifies the entire project as a set of math-stat models, wherein the various SSEs, HMS and the Human are cast as either input–output models or as state-space models. Various modeling uncertainties are factored in as statistical models and the same may be validated a priori through the usage of simulators or physical mock-up procedures
Error
∑
Math Model of the Human, as a Polynomial Fraction (Transfer Function)
The Dynamic Models of the Machinery Handled, as Polynomial Fraction (Transfer Function)
Output
Summing Point
Measurement from the Devices/Machinery /Tools, etc. being fedback to the summing point
Fig. 14.8 Transfer Function Approach in the Framework of HRP
The state space concept, shown in the above figure, is available in any standard control theory book on the subject and has been presented in an earlier paper (Roy 2019). In the above state-space framework, the state equation defines the specifics of the job to be executed (in critical infrastructure development or commissioning) and
150
K. Roy
Fig. 14.9 Similar to the concepts of system states, various human attributes can be cast in the form of a state space, with inputs (drivers) and measurement space (performance assessor). This is a parametric approach, whereas various internal physiological/ medical parameters are considered as states and those are used for taking decisions with regard to qualifying an individual for carrying out specialized jobs with finite hazard levels, viz. working in closed-spaces, heights, etc.
thereby provides an effective description of the configuration management and procedural aspects of the different sub-tasks, along with the human models coupled with appropriate tools, material handling mechanisms and associated cyber-physical interfaces. The forcing function is obtained from the work-plan, procedures for job execution, and industrial safety practices in place. The figure below shows the state equation as difference equation models with the appropriate state transfer mechanisms [available in any standard control theory textbooks] (Fig. 14.10). As in all dynamical systems, here the optimum performance analysis may be achieved by suitable pole-placement (Eigen-value placement) considering the additional resources provided (forming the gain matrix K, for state feed-back) resulting in the effective modification of the [A-BK] matrix. This may be further modified, based on the feedback of the appropriate states pertaining to the human skill sets and the underlying efficiency improvement by the use of cyber-physical systems. [The detailed theory, with respect to the above, is available in multiple control theory texts] (Fig. 14.11).
14 A System Theoretic Framework for Modeling Human Attributes …
151
Fig. 14.10 In line with the concept of system state transfer, the combined human–machine system is modeled in the framework, as shown. Here, the deterministic efforts result in the job progress, while aspects of controllability & observability are guaranteed, for ensuring performance and stability
The proposed state estimator could make use of a Bayesian approach, as shown in the figure below. The Bayesian estimator computes the state “x” and uses the same in the framework of a state-feedback algorithm, where additional resources are used for augmenting the system performance and stability. [The detailed theory, of a Bayesian estimation paradigm, is available in multiple control theory texts]. This approach (a Bayesian estimation concept), explained in an earlier paper (Roy 2019) and shown in Fig. 14.12 below from the perspective of HRP, considers the prediction of human performance (predictor) at the (k-1)th instant, based on utilizing the state-equation dynamics (training, skill-sets, etc.), and then perfecting the job (corrector), based on the in-stage performance measures (true measurement) at the kth instant. In a Bayesian sense, this represents the propagation of the human attribute density function (xk ) conditioned on the available measurements (zk ) from the (k-1)th state to the kth state. Such model-based techniques for human performance may have large errors owing to various uncertainties, and it may be necessary to consider appropriate uncertainty factors, wherein the state uncertainty represents the modeling limitations in the state equation (which describes the human attributes & dynamics) and the measurement uncertainty represents the limitations in the assessment of the instage performance measures (represented by the measurement equation). Since most of the uncertainties being considered are Gaussian, the use of a Kalman filter for the prediction of job completion/performance may be considered appropriate. It may be worthwhile to note that for effective design of HMI and cyber-physical systems, such
152
K. Roy
Fig. 14.11 As in the case of systems, the human machine interface (HMI) model, cast in a statespace framework, may be analyzed, as a pole-placement (eigen-value placement) problem, wherein the states may be estimated and utilized to improve the job execution process. As shown in the figure, the additional resources act as the gain for a state-feedback concept, with the states being estimated by the use of an external estimator
human models are formulated and made use of in automated systems in the manufacturing industry (wherein repeated operations are carried out by cyber-physical systems). However, wherever special jobs, which could be first-of-a-kind, are being executed, such human performance models may be difficult to formulate/envisage and may require a lot of estimation-based strategies (such as the use of Bayesian estimator, viz., Kalman filter and its variants), along with big-data analytics. As a further improvement to the Bayesian estimation strategies, the methods employing Monte-Carlo techniques are considered more suited, since many of the human models could be non-linear and both the state and measurement equations could as well have non-Gaussian uncertainties. Towards this, a sample set of humans are subjected to certain pre-determined tests (response studies in handling abnormal events as observed from the plant control room parameters or performance studies of human operators while executing large erection/installation activities, etc.) and effective model parameters are obtained for further application of Monte-Carlo techniques.
14 A System Theoretic Framework for Modeling Human Attributes …
153
Fig. 14.12 The approach here is all about propagating the conditional probability density function (pdf), using the dynamics of the process (in this case the coupled dynamics of the HMS) and then correct the same, based on latest input (measurement)
14.5 Data Oriented Approach for Quantitative Modeling of Human Reliability Parameters Using human performance data, both during the construction/erection/installation phase and the commissioning/O&M phases and cast in the standard paradigms of classical data mining techniques (as is done by the human operator for plant data), an effective metric for the assessment of human reliability parameters can be obtained. As can be seen in the diagram below, the HRP umbrella encompasses the total data and collates the same with the domain knowledge of the human entity, and the dissatisfaction (satisfaction) factors are incorporated for obtaining the quantitative value of human trustworthiness and reliability for safety and security (Fig. 14.13). The concept of People Capability Maturity Model (PCMM) is an HR practice and is presently being followed extensively, for improving the overall human satisfaction factor and organizational management. Based on the above HRP umbrella, a more precise model of the human along with the associated HMI may be obtained using a dynamic feedback mechanism, wherein a precise assessment of the human performance measure in terms of parametric attributes may be made available (Fig. 14.14).
154
K. Roy
Fig. 14.13 For large infra-structure projects, the human reliability aspects are based on the collective wisdom (data) of construction, erection, installation stages, followed by the operation & maintenance phase. This together with a proper blend of effective training on the jobs, industrial safety and the HMI & cybernetics, ensures a high degree of HRP. This is further strengthened based on satisfaction attributes
The above assessment methodology provides a good comprehension of the safety index, thereby enabling the project planners to a priori decide on the details of required training required, expected domain knowledge, HMI to be used, etc. Such a formulation for obtaining an a priori estimate, apart from ensuring the HRP, also helps in improved project scheduling, time planning, and budget planning (Roy 2019).
14 A System Theoretic Framework for Modeling Human Attributes …
155
Fig. 14.14 This provides a basis for obtaining a quantitative value for the human performance, depending on the type of job being executed, irrespective of the phase of construction, erection & installation, commissioning or O&M. Here the assessment of the response times, performance index, etc. are monitored and hence such data can be used for the choice of manpower for subsequent enterprise
14.6 Conclusion The paper provides an overview of the HRP aspect, considering the various stages of progress of a large infrastructure project and the need for a collective assessment of the human along with the associated tools & cybernetics. As automation progressively plays an increased overlapping role, between the human operator and the job being executed, there is a need to consider all attributes as a coupled model, wherein the user-friendliness of the cybernetics, plays a significant role. Further, considering the continuous upgrade of technology and their associated HMI, it becomes all the more important, to ensure effective training of the human, at all stages of job execution along with the device or cybernetics being used. Thus, with regard to major infrastructure projects, it would be preferable to form an overall risk-based model, wherein a hierarchy of risk-based models considering the plant, systems, and equipment, along with the human–machine systems (HMS), is studied, so as to arrive at an appropriate SIL level, as explained below (Fig. 14.15). The estimation of the HRP eventually brings about a total understanding of the SIL of the various sub-activities of a large infra project and also provides the necessary knowledge base with respect to the risk-assessment vis-à-vis the job execution/operating regimes of large plants and thereby helps in providing the necessary guidelines for their trouble-free completion.
156
K. Roy
Fig. 14.15 As can be noted, there is a need for a detailed assessment of the various risk factors, pertaining to the phase of the project, along with the details of the HMS being used and their JHA & PSA studies
References Curtis B, Hefley W, Miller S (2001) People capability maturity model (P-CMM), version 2.0, 2nd edn. Carnegie Mellon Software Engineering Institute Dougherty E (1996) Is human failure a stochastic process? Reliab Eng Syst Saf 55:209–215 Dragan I, Isiac-Maniu A (2014) Human reliability modeling. J Appl Quant Methods 9(1):3–21 Ekanem N (2013) A model based human reliability analysis methodology (phoenix method). University of Maryland, College Park, Maryland Hugo J, Slay LI, Hernandez J (2017) Human factors and modeling methods in the development of control room modernization concepts. In: ANS NPIC&HMIT 2017. San Francisco, California Ivergard T, Hunt B (2009) Handbook of control room design and ergonomics. CRC Press, Boca Raton, Florida Krushcke J (2010b) Bayesian data analysis. [Online]. Available: https://doi.org/10.1002/wcs.72 Kurshcke J (2010a) What to believe: Bayesian methods for data analysis. Trends Cogn Sci 14(7):293–300 Lee K, Higgins J, Tillman F (1990) Stochastic models for mission effectiveness. IEEE Trans Reliab 39(3):321–324 Mosleh A, Forester J, Boring R, Langfitt-Hendrickson S (2010) A model-based human reliability analysis framework. In: Conference: 10th international probabilistic safety assessment & management conference, NRC human reliability assessment. Seattle, Washington Pentland A, Liu A (1999) Modeling and predicting human behaviour. Neural Comput 11:229–242 Rejas L, Larraz J, Ortega F (2011) Design and modernization of control rooms according to new I&C systems based on HFE principles. In: International nuclear Atlantic conference, Belo Horizonte, MG, Brazil Roy K (2019)Risk informed approach for project management and forecasting, during commissioning phase of a first-of-a-kind (FOAK) nuclear plant: a system theoretic and Bayesian framework. In: Proceedings of ICRESH 2019, Madras, India
14 A System Theoretic Framework for Modeling Human Attributes …
157
Sheridan T (1982) Measuring, modeling and augmenting, reliability of man-machine systems. IFAC Proc 15(6):337–346 The International Atomic Energy Agency (IAEA) (1990) Human reliability, data collection and modeling. [Online]. Available: https://inis.iaea.org/collection/NCLCollectionStore/_Public/23/ 008/23008575.pdf
Chapter 15
Human Reliability and Technology Disruption Ajey Lele
Human Reliability Analysis (HRA) could be said to have a history of around five decades. This subject mainly came into the limelight in the early 1970s when various aspects of reactor safety were being debated. There is a possibility of humans taking risky actions (unintentionally or otherwise) while working at units in nuclear power plants, requiring plants to undertake probabilistic risk assessments (PRAs). However, modelling human behaviour is a very challenging task. There could be various reasons for humans to undertake unsafe actions. It could happen due to a lack of training, negligence, accident, or by design. While a nuclear power plant is one such risk area, there could be various other industrial units where such things could happen. Particularly, there is a significant danger in the industrial units where critical technologies are in use. Presently, we are in the era of Industry 4.0, and some of the industrial units have already started functioning autonomously. Hence, the involvement of humans is slowly diminishing, but at the same time, the man-to-machine interphase (M2M) is increasing. Hence, it would be important to develop a context for HRA for modernday industrial setups. The purpose of this chapter is to take the HRA debate to an unchartered area of technology disruption, which is being witnessed in this era of Industry 4.0. There is a long history of the Industrial Revolution from the commencement of the industrial age. The First Industrial Revolution is known to have begun in England around the 1750–1760 period. The period of this revolution was about 80 to 100 years and that was the period when human and animal labor technology got transformed into machinery. This phase saw the arrival of the steam engine, coke smelting, processing of iron ore. The system of using animals (horses, camels etc.) witnessed improvement and rail travel started happening. Banking and other financial A. Lele (B) Consultant Manohar Parrikar Institute for Defence Studies and Analyses (MP-IDSA), New Delhi, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. S. Chirayath and M. Sai Baba (eds.), Human Reliability Programs in Industries of National Importance for Safety and Security, Risk, Reliability and Safety Engineering, https://doi.org/10.1007/978-981-99-5005-8_15
159
160
A. Lele
systems were obtained and the factory work started in a big way. On the flip side, it created a wide gap between the rich and the poor (Mohajan 2019). From 1780 to 1850, the economic transformation that took place in Europe was due to the First Industrial Revolution. Now we are in the age of the Fourth Industrial Revolution from the beginning of the twenty-first century (since the period around the year 2000), and we call this phase Industry 4.0. There is a general recognition that the initial Industrial Revolution was characterized by steam and water. The Second Industrial Revolution was about electricity being used for mass production. The third was about the Internet, communication technologies, and the adoption of major digitalization processes (Lele 2019). It is said that the First through Third Industrial Revolutions liberated humankind from animal power. It was an era of mass production and a period of digital capabilities becoming available to billions of people. The Fourth Revolution is about the emergence of a range of new technologies, which are fundamentally challenging ideas of human progress, impacting various disciplines of economies and industries. Today, the Fourth Revolution appears to be in the middle of a very interesting phase, where the concept of blurring the real world with the technological world is showing signs of becoming reality (Lele 2019). In the present era, humans and machines are expected to live in harmony as peers. They are likely to reinvent themselves through the application of advanced and future technologies. The future is expected to be shaped by stimulating technologies that would blur the gap between humans and machines. The ingenuity of the human race has ensured remarkable growth in technology over the centuries. There has been a good amount of backing to scientists from the political and policymakers community in their quest for research, development, and innovation. Largely, societies have been quick to absorb new technologies barring some exceptions. Over the periods of various industrial revolutions, humans have understood the importance of technologies for the development of society and also the commercial advantages of investing the technology. Humans have been using technologies in various sectors from health, education, military, and infrastructure development to disaster management creation (Lele 2019). Disruptive innovation is all about creating new markets. Disruptive technologies are the technologies which are known to uproot the existing technology. Obviously, there is a major commercial angle associated with it, and the arrival of technologies is known to influence the decision-making processes. Various technologies inherently come with a dual-use nature and the same could be the case with disruptive technologies. Many such technologies could come with interchangeable usage in both commercial and military arenas. These technologies are ground-breaking technologies that have the potential to cause sudden and unexpected dislodgment of an established technology market. The most common examples are disruptions caused like the digital cameras replacing film cameras, mobile phones substituting for wireline telephones and portable computing devices replacing desktops (Lele 2019). In the realm of defense and security, a disruptive technology represents a technological development that significantly changes the rules or conduct of conflict within one or two generations. Such a change forces the planning process for various industrial
15 Human Reliability and Technology Disruption
161
establishments and, hence, they must and align their long-term goals according to the ongoing technological developments. For last few decades, the innovation and operationalization of some of the enhanced or completely new technologies have replaced and disrupted some of the existing technologies, rendering them obsolete. This disruption is happening in various sectors, including hardware, software, networks, and combined technologies. Presently, every country is not in a position to benefit from all of the emerging technologies. In general, the challenges for human reliability could be at various levels from society to country to factory. Presently, some new technologies are making inroads in both the civilian and military domains. Many of the technologies are the by-products of various science innovation programmes undertaken globally. Technologies bringing in technology disruptions include Robotics, Quantum technologies, Artificial Intelligence (AI), the Internet of Things (IoT), Blockchain technology, Additive Manufacturing (3D printing), Next Generation Genomics, New Materials, and New Energy technologies. HRA is normally defined as a structured approach used to identify potential human failure events (HFEs) and to systematically estimate the probability of those errors using data, models, or expert judgment (U.S. xxxx). Many industrial accidents are known to have happened owing to human error. However, it is important to note that human error need not necessarily be viewed as only an error by an individual. It could be due to the result of contextual and situational factors affecting human performance. In a broader sense, HRA could be viewed as a combination human, system, and environmental failure. Hence, even in the absence of direct human involvement reliability would be an issue. HRA must be viewed with a different prism while debating the technology disruption. Hence, the basic question could be how well people will perform, and what they are supposed to do in normal and abnormal situations. HRA generally has three basic functions: the identification of human errors; the prediction of their likelihood; and the reduction of their likelihood, if required. There should be requirements in place to assess human error, but it could also be the case that no human factor is present. However, it is important to note that at some level the human factor would exist. Hence, present-day HRA needs to be more innovative. It is a reality that modern-day disruptive technologies would bring in more and more autonomy, and, hence, the dependence on humans is likely to reduce in the overall industrial setup. At the same time, it is important to note that the level of autonomy would vary from situation to situation. In order to develop a perspective for this argument, a typical example from a military domain could be of some interest. There are Lethal Autonomous Weapon Systems (LAWS) that can independently search for and engage targets. Since such systems are pre-programmed to undertake this task, no human intervention takes place, and the systems identify the military target automatically and mount an attack on it to neutralise it. LAWS could operate in various dimensions from land to air (space) and on water (under water). The US Department of Defence divides LAWS into three types (Autonomous Weapon Systems: Technical, Military 2014) according to the level of autonomy and the level of human control:
162
A. Lele
• Autonomous weapon system (also referred to as human ‘out-of-the-the-loop’): “A weapon system that, once activated, can select and engage targets without further intervention by a human operator.” Examples include some ‘loitering’ munitions that, once launched, search for and attack their intended targets (e.g., radar installations) over a specified area and without any further human intervention, or weapon systems that autonomously use electronic ‘jamming’ to disrupt communications. • Supervised autonomous weapon system (also referred to as human ‘on-the-loop’): “An autonomous weapon system that is designed to provide human operators with the ability to intervene and terminate engagements, including in the event of a weapon system failure, before unacceptable levels of damage occur.” Examples include defensive weapon systems used to attack incoming missile or rocket attacks. They independently select and attack targets according to their preprogramming. However, a human retains supervision of the weapon operation and can override the system if necessary within a limited time-period. • Semi-autonomous weapon system (also referred to as human ‘in-the-loop’): “A weapon system that, once activated, is intended to only engage individual targets or specific target groups that have been selected by a human operator.” Examples include ‘homing’ munitions that, once launched to a particular target location, search for and attack pre-programmed categories of targets (e.g., tanks) within the area. The above classification indicates that autonomy needs to be viewed in a ‘relative’ sense. In fact, not only in defense, but in many other industrial sectors various critical functions have been automated for a long time, and every industrial activity does not necessarily need to be highly complex for it to be autonomous. Under these circumstances, HRA would be still relevant; however, its applicability should be thought of beyond the existing, conventional way it is being designed and applied. In the twenty-first century, the domain of technology has expanded too widely. Almost every sphere of science, from physics to biology, is witnessing some form of a scientific discovery revolution. Obviously, there are increasing challenges from the human reliability point of view. There are many unknown factors and, hence, developing an architecture for HRA is becoming more challenging. It is not possible to derive any specific “one size fits all” structure for HRA, since there are many technology-specific challenges. To a certain extent, even the technology disruption could be broadly predicted, but the real test would be to address specific human reliability challenges. Since the beginning, the origin and focus of the HRP debate have been dominated by nuclear issues. Particularly during the Cold War era, which was possibly the need of the hour. But in the twenty-first century, the challenges are much beyond the nuclear domain. Unfortunately, the HRP debate is still getting restricted to the nuclear domain, where there is much talk about the possible impact of technologies like AI, M2M, etc. It is important to discuss the impact of these technologies on other high-tech industrial sectors, too. Most of the new emerging technologies could be
15 Human Reliability and Technology Disruption
163
considered support technologies. For example, AI as a technology must be juxtaposed onto other technology platforms to increase their effectiveness. Researchers have shown that an AI algorithm could be trained to classify COVID19 pneumonia in X-ray scans with up to 90% accuracy, an advance that can lead to the development of a complementary test tool, particularly for vulnerable populations. Their study has shown that the new technique can correctly identify positive COVID-19 cases 84% of the time and negative cases 93% of the time. This study has demonstrated that a deep, learning-based AI approach can serve as a standardized and objective tool to assist healthcare systems as well as patients (PTI 2020). Generally, this example indicates that there could possibly be very little or no scope for HRA when it comes to disruptive technologies. Hence, there is a need to dig further into the applicability of the HRA angle in case of disruptive technologies. HRA is an analytical study of human errors in any major industrial system. Human errors could be owing to any action or inaction by an individual that decreases the safety of the system with which he/she is interacting. Human error can happen owing to individual shortcomings as well as the culmination of contextual and situational factors that impinge on human performance. Such factors are performance-shaping factors (PSFs), which can enhance or degrade human performance relative to a baseline. Now the question is, “In the case of disruptive technologies, what is the relevance for HRA when minimal human interaction is taking place?”. One of the most important aspects of HRA involves mind reading and predicting the behavior. Here, human reliability could be seen as a measure of human performance, which mostly depends on human behavior. Today, a high standard of safety needs to be maintained in critical sectors like aviation, aerospace, petroleum, chemical process, and nuclear industries. Disruptive technologies like big data, AI, could computing, 3D printing, etc. have already started making an impact on such critical sectors. Scientists are undertaking modelling of the potential contributors to human error by undertaking a variety of task analyses as well as the identification of the potential contributors to human error. It would be of interest to find how HRP fits in under such settings. The aviation sector is one sector where HRA has found to be particularly useful. Human error is one of the most important risk factors affecting aviation safety. The original Cognitive Reliability and Error Analysis Method (CREAM) developed for the nuclear industry is reliable for human reliability quantification, but it is not fully applicable to human reliability analysis in aviation because it neglects the characteristics of long-duration flights. What is required is to identify a set of performance influencing factors (PIFs), such as flight procedures and ground support, is to reflect operational scenarios in flight. The probability of human error must be found for each operation in the approach and landing phases. Researchers have found that the most important cognitive function that influences human reliability needs to be factored in for undertaking such an assessment. Also, researchers are working on issues like dynamic human error assessment methods with time sequences and physiological parameters (Guo et al. 2019). At present, the aviation sector is experiencing an infusion of disruptive technologies, and there is a need to look at the factoring of HRA beyond the aviator. The modern-day aerospace industry as such needs an assessment of HRA.
164
A. Lele
Aerospace engineering and the aerospace industry represent one of the most important industrial sectors globally. It involves the development of aircraft, helicopters, and spacecraft. There are two major branches of engineering associated with this sector, namely, aeronautical engineering and astronautical engineering. Avionics engineering deals with the electronics side of aerospace engineering. Presently, this sector is using all technologies from 3D printing to AI. The expanse of this sector is huge in every respect, from technology to infrastructure to financial investment. This industry has both civilian and defense components. There are many geopolitical implications associated with various activities related to this industry, and there could be genuine and intentional human errors happening in various sectors of this industry. HRA generally has three basic functions (Aalipour et al. 2016): the identification of human errors; the prediction of their likelihood; and the reduction of their likelihood, if required. The aerospace industry is both technology, as well as humancapital, intensive. The human would remain “in the loop” in spite of this sector incorporating various new technologies. Hence, at some level, the human factor will always exist in the aerospace industry, and the relevance of human reliability would remain an issue. The investigation of human error is very case-specific; the context of the industrial arena should be considered for this purpose. In the case of the aerospace industry, human reliability factors could make a considerable contribution to the maintenance performance, safety, and cost-efficiency of any production process. To increase human reliability, the sources of human errors should be recognized and the probability of human errors should be measured. In the case of new emerging technologies from a human reliability point of view, some work is available to understand human–robot interaction. There could be some amount of learning from this sector for various other associated sectors. Interestingly, the initial human experiences with the robots are not that good. There have been communication failures, perception and comprehension of failures, and solving failures while modelling the M2M and other similar systems. Actually, human reliability is much more relevant when humans are trying to interact with the missions, not only in the pre-programmed mode, but possibility also in the cognitive mode. Just because disruptive technology demands much less human interaction that does not mean that aspects of human reliability have no relevance. In fact, most of these new technologies are in some way or other viewed as an upgrade to some of the existing technologies or ideas. Globally, industries have been using Computer numerical control (CNC) Machines and Flexible Manufacturing System (FMS, a manufacturing system based on multi-operation machine tools, incorporating automatic part handling and storage (Flexible Manufacturing System (FMS) and Automated Guided Vehicle System (SGVS) xxxx) for many years. Looking at the aspects like how such industries have been handling human reliability aspects for all these years could assist in understanding and designing new HRA practices for the industries using 3D printing technology. Similarly, in case of the disruption happening in the New Materials and Inexhaustible Energy Sources, the industries could learn from the petroleum sector. It would give them an idea about the possible processes for HRA for identifying potential human errors, estimating a system’s total risk, etc.
15 Human Reliability and Technology Disruption
165
In the case of technologies like Big Data, Cloud computing, AI, IoT, and Blockchain, they could learn a lot from the information technologies and communications sector. Issues like hacking, data theft, and system manipulation would definitely have a human angle associated with them, and there would be a need for great introspection in this regard, because such threats could happen due to an insider’s mischief, too. The idea of ‘smart factories’ is all about running a factory without any significant human intervention. Industry 4.0 is known for replacing the concept of automation by autonomy. Obviously, smart factories are expected to work in autonomous mode. This makes for humans getting replaced by intelligent machines with robotic systems communicating via an innovative software invention. It is much more than any simple machine-to-machine collaboration. There are advanced software units, AI-based systems, and autonomous industrial processes associated with it. Transparency is the key for smart factories and various manufacturing processes and related information is freely made available, across the system. This immensely helps in the management of supply chains. Smart factory is all about smart manufacturing. Here the system integrates various functions identifying the raw material needs and undertaking the process of logistics management accordingly. Various processes like production facilities, warehousing systems, transportation schedules, customer feedback and assessment for captivating new markets become part of the smart factory system. Here the aim is to optimize the manufacturing processes by using different tools like computer controls, CNC machines, 3D Printing, modelling techniques, big data, AI algorithms and blockchain technologies (Lele 2019). It has been predicted that such smart factories would have zero human intervention. Obviously, HRA may not have any relevance in such situations. So the question is, “Has the time come to develop models to predict how well robots/AI would perform?” However, the main ethical issue would remain, that is, “Could robots decide on what they are supposed to do in normal and abnormal situations?”. Finally, it is important to note that a new thinking process is required for the possible human response to emerging technology disruption. The nature of these technologies, which could be regarded as the cornerstone for Industry 4.0, demands the development of a new paradigm for human reliability programme.
References Autonomous weapon systems: technical, military, legal and humanitarian aspects, International Committee of the Red Cross, Geneva, Switzerland, 2014 Aalipour M, Ayele Y, Barabadi A (2016) Human reliability assessment (HRA) in maintenance of production process: a case study. Internatonal J Syst Assur Eng Manag 7:229–238 Flexible Manufacturing System (FMS) and Automated Guided Vehicle System (SGVS). Available: https://www.srividyaengg.ac.in/questionbank/Mech/QB114734.pdf. [Accessed 28 September 2020] Guo Y, Sun Y, Yang X, Wnag Z (2019) Flight safety assessment based on a modified human reliability quantification method. Int J Aerosp Eng 2019:1–13 Lele A (2019) Disruptive technologies for the militaries and security. Springer, Singapore
166
A. Lele
Mohajan H (2019) The first industrial revolution: creation of a new global human Era. J Soc Sci HumIties 5(4):377–387 PTI, Scientists develop AI tool to identify COVID-19 cases using CT scans, 1 October 2020. Available: https://www.financialexpress.com/lifestyle/health/scientists-develop-ai-toolto-identify-covid-19-cases-using-ct-scans/2096052/. [Accessed 7 October 2020] U.S. NRC office of nuclear regulatory research (RES) & electric power research institute (EPRI), Principles of Human Reliability, Available: https://www.nrc.gov/docs/ML1025/ML102560372. pdf. [Accessed 6 October 2020].
Chapter 16
Human Reliability Design—An Approach for Nuclear Power Plants in India Amal Xavier Raj
16.1 Introduction Over the decades, reliance on Nuclear Energy, a clean energy compared to fossil fuels, has led to an increase in the number of plants. This required an enormous increase in qualified, trained, highly motivated, and reliable personnel. Centralization, corporatization, and private sector involvement, all have increased the complexity of hiring, training, retaining, and motivating this workforce. Though technology, processes, and protocols for safety, improved socio-economic, cultural, and ideological/political diversification brought in newer challenges. The geopolitical situation introduced a newer dimension of vulnerability, amplifying the potential for intentional sabotage by determined individuals. Nuclear Energy, being a complex technology, must be safeguarded from being used for terrorist activities. All these issues have focused attention on human aspects for improving reliability in critical industries. The 21st Century witnessed newer developments in technologies, materials used, digital systems, the volume of data & usage, and human–machine interfaces. IR4.0 has had a considerable influence on the way machines and humans augment each other’s capabilities using data—historical, concurrent, and transient—for production, providing services, and monitoring/tracking. Some of the critical industries have embraced and adopted it to their advantage—aerospace, automobile, shipping, and even power generation. It is pertinent that atomic and nuclear industries shrug off their latency in adopting newer developments in technologies, especially the convergence of humans, social, and digital systems to come up with appropriate engineering designs that leverage the IR4.0 perspective to improve the reliability of machines, humans, processes, and outcomes. A. X. Raj (B) National Institute of Advanced Studies, Bengaluru, India e-mail: [email protected] Formerly of Loyola Institute of Business Administration, Chennai, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. S. Chirayath and M. Sai Baba (eds.), Human Reliability Programs in Industries of National Importance for Safety and Security, Risk, Reliability and Safety Engineering, https://doi.org/10.1007/978-981-99-5005-8_16
167
168
A. X. Raj
Embracing the convergence of systems, which can effectively leverage data, machine learning, cognitive/behavioural engineering, group dynamics, and control interfaces, is a requirement to build a more resilient and adaptive “system of systems”. It is increasingly important to take a long-term view of the technological advancement useful for improving the reliability of systems: machines, humans and information/data systems. Of special interest to critical industries are some of the emerging practices, such as Design 4.0, Human Centered Design, Resilience Engineering, and Socio-technical systems. The emphasis is on improving, or augmenting, human reliability effectively utilizing convergence options in Industry 4.0. This paper presents relevant information to focus attention on the need to identify a science of “Human Reliability Design” that puts human factors at the center of designing machines, plants, interfaces, processes, and digital systems; screening for and continuously evaluating the reliability of personnel with suitable physical, mental, and behavioral attributes; and an engagement system of personnel for overall safety. Human Reliability Design shall focus on converging emerging practices such as human-centric design, resilience engineering, socio-technical systems, digital & data systems, and supportive AI for improving reliability. Humanizing machines, interfaces, responses, recovery, and continuous evolution of systems is possible now, both in existing plants as well as in newer reactor plants.
16.2 Context 16.2.1 Contextualizing Human Errors In this context, the concept of human errors assumes importance. At times, this term could be somewhat misleading, as “error” cannot be attributed simply to cognitive and physical abilities. Flaws introduced during design, manufacturing, installation, maintenance, etc., have the potential to amplify unknown parameters and cofound the situation. A mission-critical operation must account for the failure of machines, processes, systems, and scenarios, and personnel must be trained for such eventualities. For a long time, and even to this day, data pertaining to critical processes remain too vast, requiring a number of controls to be operated and coordination among a number of teams to be effective. During incidents, these imperfections get amplified in unknown ways, posing a considerable strain on operators. To prepare for such incidents, one of the critical elements involved improving the interface of humans with controls, systems, and machines. Here we are referring to the interface in terms of information, feedback, and access to controls. Such a design requires a perspective, which is often not the greatest strength of engineering, technology, or management processes. More so in complex system-driven operations such as nuclear reactors, aviation or space travel, shipping, and so on.
16 Human Reliability Design—An Approach for Nuclear Power Plants …
169
Any mishap in such industries is catastrophic in nature. Existing engineering, technology, and systems have been successful in reducing such catastrophes, but the human response has been responsible for averting a significant number of them as well. Human decisions will continue to be critical in ensuring the safety of operations in industries that involve materials and processes that could be hazardous and harmful to human lives and other species.
16.2.2 Engineering Dilemmas—Risk Versus Usable Solution Despite advancements in technology, engineers are yet to design a machine, process, or system with an assurance of 100% safety. In contrast, the complex systems deployed in mission-critical operations require layers of highly trained personnel to operate, maintain, and mitigate disasters. Engineers have found that human intervention is a necessity to improve precision, effectiveness, continuity, and safety. This amplifies the role of humans in operating a system, various machines, and their parts. Engineers adopt a variety of protocols to reduce risk, including accounting for various human errors. The limitation, however, is acceptable risks, which is a requirement as it near impossible to build any machine without some level of acceptable risk. This could be a risk that affected people no longer feel apprehensive about or for which there is a series of systems, processes, and protocols that can avoid the risk. These systems, processes, and protocols are dependent on human cognitive abilities, perception, and the ability to respond. In instances that involve rapidly changing parameters, the human decision becomes critical to avert major disasters.
16.2.3 Data for Optimizing Preparedness Available data suggest that an incident on the scale of Fukushima has a 50% probability of occurring in 60 to 150 years. It also suggests that large events with a likely impact of USD 20 million have a possibility of occurring annually. Data availability within the industry has facilitated a systematic analysis of various factors contributing to near misses, false negatives, minor mishaps, and residual risk. Nuclear reactor administrators can now better anticipate and effectively manage extreme events (large and mega events) such as Fukushima. Better understanding through data also improved the science of: • • • • • •
predicting failures; frequency of faults; technology and human interaction; decisions and behavior; systems, actions, and management; and response options.
170
A. X. Raj
16.2.4 Usability Issues Affecting Human Response Effectiveness Don Norman is a pioneer in the field of user experience. In his book, Design in Everyday Things he writes, “The control room and computer interfaces at Three Mile Island could not have been more confusing if they had tried.” Furthermore, he argues that “pinning the blame on the person may be a comfortable way to proceed, but why was the system ever designed so that a single act by a single person could cause calamity?”. In a 2014 article, Norman argues that “in the majority of instances it is inappropriate design of equipment or procedures that leads to human error. It is time to launch a revolution, time to insist on a people-centered approach to technology.” He cites a U.S. National Transportation Safety Board report about a ferry accident that injured 80 people. In this report, the bureau chair concluded, “Yes, our report identified that the final error was made by the captain on the day of the accident, but the first vulnerabilities were designed into the system, years before. Accidents, like a fraying rope, are always a series of missed opportunities” (Norman 2018). There is a need to instill a people-centered approach in the training of technical staff. In this regard, it is important to design for people, instead of blaming them for deficiencies in the design of equipment or procedures. It is critical to discover and fix the real, underlying problems in our equipment and procedures.
16.2.5 Risk in Nuclear Reactor Nuclear reactor designs have consistently addressed various risk factors and provided for withstanding and containing any mishaps. Operators are key to realizing this consistency, specifically in correctly interpreting and acting on various feedbacks from machines and systems. Advancements in understanding, protocols, simulations, and training have reduced the number of instances over the years. Information sharing has further enabled cross-learning and improved systems and processes. The following table contains data from 1960 to 2014 that show a declining number of incidents and a stabilization in the number of incidents, some of which could have led to extreme events such Chernobyl or Three Mile, if not an event like Fukushima. Data (1960–2014)
1960
1980
Catalyst
Requirement
Rate of events
Dropped
Stabilized
Chernobyl
Frequency
Declined
Stabilized
Chernobyl
Collaborate to understand near misses, false negatives, minor mishaps and residual risk (continued)
16 Human Reliability Design—An Approach for Nuclear Power Plants …
171
(continued) Data (1960–2014)
1960
1980
Extreme events (Substantial Damage)
Larger(>$20 million)
Dragon Kings (Chernobyl/ Fukushima—$331.6 Billion)
Catalyst
Requirement Better anticipated, effectively managed by using human optimized interfaces
16.2.6 Newer Options Advancements in digital, sensors, data science, materials, and technologies have opened up newer options for efficient and effective touch points. A reduction in clutter in the operators room is clearly visible. Considering the scale, often retrofitting could fully utilize various options possible. Also, many of the changes continue to assume reliance on the adaptability of operators, systematic training, standardization, and elaborate protocols. So, errors leading to minor incidences and major events continue to be relevant. One solution is to build a user-centered design. That is, designing the machines in a way that reduces incidences requiring human intervention: improving touch interfaces for appropriate operations; improving precision in information for decisions; built-in redundancy; and eliminating or reducing operators’ exposure to hazards.
16.3 Human Reliability Programme 16.3.1 Sixty Years of Human Reliability Analysis The Technique for Human Error Prediction (THERP) was the first HRA approach developed in the 1960s to address the reliability of nuclear weapons assembly. Post Three Mile Accident, THERP was fully integrated into Probabilistic Risk Analysis (PRA) (Swain and Guttmann 1983). Over the 40 years, many types of HRA methods have been developed and are in use. The first, second, and third generations of HRA methods evolved adopting developments in modeling human performance, cognitive aspects, and artificial intelligence techniques (Boring 2012). In 1987, IAEA identified, several “general situations” or common causes and steps that could avert or reduce errors. They are: (1) During routine testing and Maintenance; (2) Systems with low levels of availability or redundancy or those not sufficiently automated; (3) During abnormal conditions, especially after alarms go off; (4) Bad design—system engineering, control-room layout and ergonomic principles; and (5) transfer of information during the shift changes. A possible solution
172
A. X. Raj
for these is the implementation of automatic procedures. To predict, prevent and respond to incidents, including those that are unpredictable/unforeseen.
16.3.2 Limitation of Traditional HRA Methods in Averting and Managing Incidents In nuclear power plants, human errors contribute to a substantial number of incidents (Reason 1990). Some of these arise owing to bad design, which requires a complex process of continuous testing, maintenance, and replacement. Despite the advancement and robustness of various PSA methods, the complexity of data limits its use in decision-making, redesigning plants or improving systems. Information, data, and insights, especially of human behavior under abnormal conditions, even to this day, are limited. “Traditional HRA methods are not suitable for today’s working environments due to the transformation of tasks and systems” (Guglielmi et al. 2022). Considerable literature has pointed to the vulnerability of, or limitations of, current risk analysis system, specifically in the context of climate change induced extreme weather conditions, computer controlled autonomous systems, cyber-attacks and terrorist threats. Jonghyun Kim lists the following five of several limitations of conventional safety analysis systems: (1) they primarily focus on technical dimension, (2) the analysis are linear and sequential, (3) they are dominated by static models, (4) they do not take a systemic view into account, and (5) they focus primarily on why accidents happen and not how success is achieved (Kim et al. 2016).
16.3.3 Number of Plants, Capacity, and Complexity Safety, Security, and Reliability are three important dimensions of any missioncritical operation, such as a nuclear reactor. All these three aspects are managed by humans interfacing with various machines, flow systems, the quantum of materials, storage, and disposal through a variety of controls. Predefined control parameters provide the guiding framework for timely actions depending on the inputs from various systems. While safety could be largely assured through engineering design, it is greatly enhanced through having personnel with the required skill sets, appropriate and iterative training, and improved interfaces between humans and controls, security, and reliability to address the largely behavioral and underlying motivations. With an increase in the number of units, the complexity of systems, and regulatory compliances, the number of personnel involved in the operation of nuclear reactors has increased manyfold.
16 Human Reliability Design—An Approach for Nuclear Power Plants …
173
16.3.4 Insider Threat Insider threat in the form of sabotage, violence, theft, espionage, terrorism, and others need proactive management, starting with personnel, who have access, present or past. Screening is an important step in ascertaining the suitability of personnel who will be provided access in a nuclear plant. Continuing such an evaluation on a continuous basis is a requirement. The human factor, from an insider threat point of view, is ever present, more so in a dynamic and ever-evolving context for the personnel involved (CISA 2022). The increasing global terrorism outlook has become a major source of concern casting a shadow on the reliability of personnel, which hitherto has been taken for granted. Reliability has become the cornerstone of operational integrity, as it can adversely impact the safety of operations (sabotage with an intention of harm or large-scale disruption) and security (in the larger sense). Human Reliability, thus, is no longer one of the boxes to be ticked; it is an essential aspect of operational safety and security, as well as national or international security.
16.3.5 Increase in the Number of Plant Personnel Plant Personnel, full-time and/or contractor personnel, have been increasing over the years. For many tasks, especially supply, servicing, maintenance, and certain specialized and unique activities, contractor personnel are involved on a short-term or long-term basis. Even full-time personnel may opt for shorter tenure, resulting in a new set of personnel hired on a regular basis. The sheer increase in the number of personnel required systematic processes, procedures, and documentation to identify, recruit, and retain personnel with the highest standards of reliability, trustworthiness, and appropriate physical and mental attributes. It is in this content that Security and safety reliability programmes have been designed by respective countries to ensure that plant personnel meet the required standards of reliability.
16.3.6 HRP Varies Across Countries In countries prone to terrorism, the reliability of personnel has assumed a significant importance. Countries are at varying stages of Human Reliability Programme in terms of extent, rigor, and implementation. They fall in a continuum with developed nations; for instance, the USA has taken a considerable lead, in terms of metrics, systems, processes, implementation, and seriousness. The US has a long history of personnel assurance programmes, which were consolidated into Human Reliability Programme in 2002. The same cannot be said about other countries. The rest are
174
A. X. Raj
driven by their country-specific context, cultural predisposition, and the evolution of their atomic energy programmes.
16.3.7 Codification of HRP Rules and Regulations—US as an Example DOE in USA uses a federal register of personnel suitable for working in nuclear establishments, certification and decertification procedures, which requires annual evaluation (Coates and Eisele 2014). The US has elaborate set of rules that governs security clearances and HRP compliances. Such set of regulations are used to consistently identify individuals who may compromise the safety and/or security of a plant. “The paramount intent of the HRP is to protect national security via the identification of individuals whose judgment and reliability may be impaired by any condition or circumstance that raises safety and/or security concerns” (Federal Register 2017). Given the criticality of the safe operation of a nuclear plant, a set of rules and regulations are a requirement. The federal register provides required visibility on personnel, taking into account past issues/concerns that may adversely affect the physical, mental or behavioral suitability of personnel.
16.3.8 Concept of Human Reliability Design Across the board, industries experienced newer development in technologies, digital, data usage, ergonomics, and human–machine interface. It is pertinent that Atomic and Nuclear Industries come up with appropriate engineering designs that shall leverage IR4.0 perspective for improving the reliability of machines, humans, and processes. It is increasingly important to take a long-term view of the technological advancement necessary to bring about the required transformation in nuclear energy agencies. It is in this context, Human Centred Design can radically transform and simplify human–machine interactions. A human-centered design, leveraging the latest advancements in sensors, computing, machine learning, AI, materials, modular forms, etc., could enable operators to effectively manage critical processes. By extension, this advancement in the convergence of engineering, technology, sciences, and understanding of human intent and behaviors, can be used to device surveillance systems that track anomalies, that crop up owing to imperfect or impaired machine, human, and process/management. Connected devices with humanized interfaces that enable appropriate decisions when required to avert or manage incidences. Human Centred Design, be it Nuclear Industry/Other Critical Industries or commonplace mass devices, will usher in innovation, transformative & adaptive technologies, and will enable operators to effectively monitor functioning, than decoding unknowns under stressful settings.
16 Human Reliability Design—An Approach for Nuclear Power Plants …
175
16.3.9 Human Reliability Design In Critical Industries, such as Nuclear Power Generation, when Human Centred Design (HCD) is applied to improve the reliability of machines, personnel, process, and protocols, additional elements, for example, safety, security, endurance, the integrity of systems, redundancy, surveillance/monitoring, self-check, etc., assumes importance. Human Reliability Design innovatively integrates sensors, data, communication, decision, control, command, etc., to greatly reduce or eliminate instability, human errors, deviations, distractions and deliberate sabotages. Resilience Engineering is another emerging field of engineering. Resilience engineering focuses on the prevention of incidents, the integrity of structures when an incident occurs, and recovery from such incidences, without human interface as well as accounting for errors introduced in the system by humans (Westrum 2006). The machines in difficult and extreme conditions stay within a safe envelope and avoid incidents (Heijer and Hale 2006). Resilience Engineering integrates into their design concepts in use in high-reliability organization, safety culture, and security measures at critical industries. That is moving from reactive to proactive designs that optimize safety, tackle unexpected challenges, and thwarts determined sabotage attempts. “Resilience engineering focus on designing systems, processes, and organizations that optimize safety. The latter is the basis of resilience engineering” (Boring 2012). Unlike PRA and HRA, which rely on sequential event progression with a certain tolerance for deviations, dynamic event trees in PRA, and dynamic performance shaping factors in HRA, reliance engineering relies on dynamic series of responses aimed to protect the integrity of the system (Boring 2012). In many ways resilience engineering shifts focus from complex modelling, based on PRA and HRA, for optimizing machine-human interface to simple responses of machines and human interventions (recovery process) to self-preserve the integrity of the system. That is moving from anticipation of events to responding to wide variety of unfolding conditions. Human Reliability Design in many ways converges resilience engineering, transformative technologies, human-centric design, PRA and HRA, behavioral aspects, surveillance, IOT, AI, Machine Learning and Decision Systems. By combining these elements Human Reliability Design can optimize various aspects of a plant— machines, structure, systems, and controls—leveraging IR4.0 convergence of Physical, Digital and Human systems. The humanizing control system, is one such example, wherein control rooms, control suites, and peripheral units can reduce human error.
176
A. X. Raj
16.4 Human Reliability Design—A Framework 16.4.1 Industry 4.0 Approach and Nuclear Power Plants Human Reliability Design as a practice for improving reliability in critical industries can take advantage of emerging practices such as Design 4.0, Resilience Engineering, Socio-technical Systems, and Cognitive Systems (Human and AI). Human Centred Design, which has been part of consideration in critical industries, including nuclear plants, shall continue to focus on improving man–machine interfaces, especially ergonomics, ease of using control systems and unconfounding information flow for the operators, specifically during incidents/accidents/unknown situations. Underpinning all these shall be the Industry 4.0 approach, where physical, digital, and human systems converge. Such a convergence necessitates rethinking of design, control, recovery, containment, and human reliability and incorporating the latest developments in materials, modular options, size and scale, human factors, cognitive aspects, social dimension, data (both historic and dynamic), and supportive AI. Human Reliability Design, in many ways, can support the transformation of the nuclear industry. Currently, various standalone experiments are on to improve NPPs. Converging these will be useful, not only to for existing NPPs, importantly coming up with innovative nuclear reactor designs. IR 4.0 provides this opportunity. One is in terms of the enormous energy required to power IR 4.0 society and options for innovation through the convergence of systems framework. Human Reliability Design, which relies on the convergence of various disciplines, systems, and approaches supports the nuclear industry to reinvent by transforming its approach. HRD can provide the required concepts and instruments, which can support the adoption of newer designs, emerging technologies, and convergence of human, machine, digital, data, and AI. The elements of Human Reliability Design are: 1. Taking advantage of Industry 4.0 approach—the convergence of physical, digital, and human systems 2. Adapting emerging practices such as Design 4.0, Resilience Engineering, Sociotechnical Systems, and Cognition Systems (Human and AI). 3. Leveraging advancements in using Human Centered Design in NPPs 4. Focusing on simple, modular, and adaptive Nuclear Power Plant (NPP) designs for new reactors 5. Balancing size, scale, and sustainability for safety, security, and safeguards In the sections below, key elements of Human Reliability Design are discussed. Specifically, IR 4.0, Resilience Engineering, and Socio-technical systems, which make Human Reliability Design a distinct approach with high relevance for transforming nuclear power plants: be it design, cost, scalability, and reliability.
16 Human Reliability Design—An Approach for Nuclear Power Plants …
177
16.4.2 IR4.0 is All About Convergence Industry 4.0 is all about the convergence of systems, innovating in this space, and designs that accommodate such convergence. An industry 4.0-inspired design philosophy puts humans at the center of design, operations, and outcomes. Industry 4.0 enables high-degree automations, data points, and minimizes human intervention. In the event of incidents with rapidly changing contexts or scenarios, the transfer of control to humans improves the ease of control and parameters to deal with and reduces/eliminates cognitive or anthropometric access issues. NPPs which were designed during IR 2.0 designed using analog systems. Structures, machines, and humans were three distinct layers. Very little integration was possible. Therefore, reliance was on controls to prevent accidents. Though control rooms provided a notion of a well-integrated system, for all purposes the three systems, physical structures, machines, materials, and humans operated independently. Much of the design changes at NPPs are visible in the control room. The trend is continuing. For example, the Brazilian Nuclear Energy Commission (CNEN), developed an experimental facility for human system interface design and human factors research and development, the Human System Interface Laboratory (LABIHS). Here the focus has been on improving the interface systems. LABIHS focused on types of fonts, reducing cluttering of displays, ways of emphasizing information presented, deviation, past dynamics, trends, etc. (Oliveira et al. 2015). Such developments could be further enhanced by effectively combining data. IR 4.0 provides such options for effectively utilizing data, both past and current, presenting information that are far more reliable and actionable. AI and machine learning could automate some of these processes removing the clutters and providing actionable choices for operators. That is freeing operators from assimilating, comprehending, and visualizing emerging crises, such capable data systems could provide possible scenarios and various options with associated risks. It is important that the convergence of systems in IR 4.0 should be adequately harnessed and applied to improve the reliability of nuclear power plants. That is the reliability of machines, humans, and process that governs operations utilizing advancements in designing, materials, digital & data systems, resilience engineering, socio-technical systems, and cognitive systems—both humans and AI.
16.5 Design 4.0—A Perspective Inspired by Industry 4.0 Industries are adjusting to and adapting to the transformative effects of Industry 4.0. Computing capabilities, connectivity, data generated, and automation are quickening innovation and development of newer technologies. AI-assisted prototyping, digital twinning, augmented realities, and IoT sensors are powering this transformation.
178
A. X. Raj
Industry 4.0 also requires a newer way of collaboration and skill sets. These trends have influenced the way a machine or process is designed. Design 4.0 uses a set of parameters, which generally govern the design perspective in Industry 4.0. These parameters are: • Involving stakeholders during the design stage: viewpoints of politicians, private sectors, communities, regulators, ethical and environmental bodies, and others as appropriate. • Including all or many touchpoints: man–machine interface, integration of other systems, industrial IoT, data generation and utilization, maintenance related, etc. • Empowering Operators, for example in NPP (Intuitive and easy to use) • Designing for optimizing performing in a connected interface (automation and human) • Questioning and disrupting existing assumptions • Contributing to the innovation process to improve technology, process, management, and process Specifically, in the context of NPP, it is important to consider the above Design 4.0 parameters to continuously innovate newer options, for example, SMR with dynamic capabilities to predict, identify and mitigate risks, faults, and humaninduced errors or add simple, innovative, and automated devices/systems in existing NPPs to improve safety, security, and safeguards. Collaborative frameworks are essential to come up with such transformative systems, both in new and existing NPPs. The minimum requirement is for designers to work with engineers, scientists, and nuclear experts to streamline the convergence of mechanical, electronics, electrical, software, extended intelligence (AI-assisted Human Operators), robotics, data analysts, cognitive systems, psychologists, social system specialists, and other concerned stakeholders. Design 4.0 also focuses on how to effectively integrate control, command, and communication to minimize risks. Recognizing patterns through AI-powered analytics, machine learning and efficient use of time series data will be a requirement for such an effective command system, that supports humans to take effective decisions, even during rapidly changing situations. Design 4.0 will also usher in openness at all levels to effectively adapt and utilize newer systems. Design 4.0 requires looking beyond the current set of expertise for designing nuclear reactors or nuclear power plants. Darren Yeo talks about four ways to become a Designer 4.0. Taking a cue from these a team involved in designing NPPs could consider how to effectively utilize the convergence of systems in IR 4.0. To accomplish the team should be open to bringing in people from elsewhere, emerging practices, who may provide a different or refreshingly different perspective. The four pathways to leverage Design 4.0 for NPPs are: 1. Look for Design innovators on the fringes: Design 4.0 will require expanding the design team with people who have a propensity to experiment. More like innovators, in their respective disciplines. In the era of IR 4.0, it is now possible to identify innovators who work at the intersect of disciplines, truly converging systems. They may not be part of the existing NPP design eco-system. Finding
16 Human Reliability Design—An Approach for Nuclear Power Plants …
179
them, though it is hard, could bring in a greater value with their newer perspective and ideas. There are, however, often at the fringes. 2. Call into question previous design standards: Emergence of digital systems, internet, capacity to analyze large volumes of data, capabilities for identifying patterns, simulations based on machine learning, embedded industrial IoTs, time taken to prototype, etc., provides opportunities to reframe existing designs of nuclear reactors and NPPs. With newer technologies, some of the old premises that governed the designing of nuclear reactors and NPPs could be reframed or transformed, or replaced with newer innovative premises, which is feasible considering technological advancements. Till 3.0 engineers built massive, large, and chunky systems, with many layers of safeguards, which made everything bulky. Be it a gadget like computers or Thermal Power Plants or NPPs. 3. Be good with what you know but be better with what you do not know: This requires a transdisciplinary mindset and technology collaborations. A “X-shaped thinking” to innovate at the intersect of what is known and the unknowns. A convergence of multiple disciplines for optimizing human–machine collaborations. 4. Be a scientist: experiment, learn, and repeat: Design 4.0 will be a new way of developing and testing various components and ensuring that they work well together. One of the advantages in Industry 4.0, the cycle for the development of a prototype is very short. Rapid prototyping using 3D modeling along with 3D printing is advanced. The same applies to materials, nuclear science, nuclear engineering, resilience engineering, etc. All of these provide opportunities to continuously experiment and learn. Repeat the process till a suitable and viable design is materialized. One good example of how Design 4.0 concepts are utilized in coming up with innovative nuclear reactors is the Transformational Challenge Reactor Programme at Oak Ridge Laboratory. “The nuclear industry is still constrained in thinking about the way we design, build and deploy nuclear energy technology,” ORNL Director Thomas Zacharia said. “DOE launched this program to seek a new approach to rapidly and economically develop transformational energy solutions that deliver reliable, clean energy” (Oak Ridge National Laboratory 2020). Another initiative is Chao Lu et al.’s framework for NPPs: a Nuclear Power Plant Human-Cyber-Physical System (NPPHCPS) as the top-level design in IR 4.0. Chao Lu recognizes challenges in adapting the latest in digital, engineering, or social/psychological/cognitive systems in NPPs (Lu et al. 2020). Especially considering the fact conceptual design framework for NPPs has undergone very few changes. All technological additions have been at the fringes or peripheral to the core functioning of NPPs’ complex systems. It is in this context, the Design 4.0 approach can effectively leverage and incorporate emerging practices, such as resilience engineering, socio-technical systems, cognitive systems, Industrial IoT, etc., in transforming NPPs, both existing and new NPPs.
180
A. X. Raj
16.5.1 Resilience Engineering—Transformation to System 4.0 The resilienceenineering.net website lists the following as key principles of resilience engineering: • Complex systems may produce outcomes that cannot be predicted by analyzing individual components (such as people, teams, equipment, processes or subsystems). In other words, complex systems have ‘emergent properties’ that cannot be predicted by traditional risk analysis. • The environment in which organizations operate is dynamic and may produce conditions, situations, or disruptions that are foreseen by traditional risk analysis. • Humans and human organizations can adapt to unforeseen conditions, situations, or disruptions. These behaviors or responses may not have been planned or designed. These adaptations are an example of how systems can be resilient. • Traditional accident and incident analysis are based on linear models of causeconsequence chains, and fail to model the true complexity of systems. • If we are able to understand the properties of a system that support those beneficial adaptions to internal and external disruptions, we will be able to enhance the adaptive capabilities of a system (and therefore increase its resilience) (What is Resilience Engineering 2022). Resilience engineering relies on recognizing the interconnectedness of complex systems, the requirement of monitoring, learning from incidents, anticipating situations, and adapting continuously. The focus here is on returning to stability as quickly as possible, despite recurrence or co-occurrence or frequency of incidents or human factors, holistic perspective of complex systems, situations, and responses. The fundamental approach to identify causes relies on systematically listing adverse events, sequences and missteps. Often these point to human factors, regularly reduced as “human error”, or the inability of people in charge to respond adequately to adverse events. Analysis of these leads to coming up with recommendations to eliminate the causes (of adverse events), vulnerabilities, hazards, “threats” and responses, which acted as tipping points, through appropriate modifications/additions to machines, processes, monitoring, and controls. On the one hand, such additions/ modifications, accumulated over the years, are large in number and contributed to the complexity. On the other hand, future adverse events may have certain unknowns, which have not come up so far or are anticipated. The complexity of the systems also adds to the strain and often tests the limits of the capacity of machines and humans. Given the complexity of responses during an adverse event in an NPP, the concept of resilience is being considered. Resilience engineering approaches the stability of a system differently. That is an integrated system of man, machines, and processes that continuously learns, adapts, anticipates, intervenes, and recovers autonomously. All these with higher visibility of dynamic changes, responses, and effects, by using advancements in sensors, programming, and visualization. Passive monitoring, the anticipation of hazards and insider threats, and focus on restoring stability (than
16 Human Reliability Design—An Approach for Nuclear Power Plants …
181
tackling the causes) are a possibility with resilience engineering. Integration of the resilience engineering concept will open opportunities for designing components, industrial IoTs, learning systems, and better integration of man, machine, digital, data, and connectivity at NPPs. Despite interest, there is a general lack of literature on how resilience could be incorporated in existing systems for improving safety or security. It is feasible at existing NPPs as well as in designing futuristic NPPs. Innovation in nuclear reactor design, in terms of size, modularity, scale, self-containment, cost of setting up and running, reducing time for setting up, nuclear waste management, and handling various risks are possible through such a transformative framework that applies resilience engineering.
16.5.2 Socio-Technical Systems—The Missing Perspective Socio-technical systems point to the fact that social factors reflected in organizational culture are important for safety, security, performance, and responses during an adverse event. Current Physical Protection Systems (PPS) relies on making necessary changes to components and process for improving performance and quality. An elaborate system of scrutiny, action, and accountability is considered sufficient. This in many ways is highly simplifying. For instance, the structure, machines, equipment, tools, and replacement, that is the hardware forms part of PPS. Hardware is a passive element in an NPP. It is the working of these high-risk engineering systems, which have been designed to be durable and not to be resilient, and require continuous monitoring, controlling, and reporting. The monitoring, controlling, and reporting, however, depend on the people at NPP. Diversity of roles and responsibilities at NPPs requires that all teams work in sync and share a common set of values. The shared values provide a philosophical framework, reflected as risk-averse at one end of the continuum to action & accountability at the other end (Williams 2019). The process, responses, and evaluation stem from where the organizational philosophy lies, that is along a gamut of risk averse and risk response. In the age of convergence, this assumes importance. Specifically, to strengthen safety and security at NPPs by applying socio-technical systems tools. Socio-technical systems’ perspective in many ways reverses reliance on equipment and emphasizes the contribution of culture. That is good safety, security, efficiency, and effectiveness is 20% equipment and 80% culture. The organization’s culture, in addition to shared values, requires shared perception, norms (rules), attitudes, behaviors, and belief. Behavior is at three levels: organizational, group, and individual. Belief in such organizational culture—the values, norms, attitudes, and behaviors—ensures commitment through a shared worldview of the work. Culture assumes importance in critical industries, more so when not being supervised or monitored, which is often the case in such industries, to complete the tasks assigned as required. Such behavioral attributes are important not only for appropriate responses but also for learning, recovering, and adapting. Socio-technical systems strengthen
182
A. X. Raj
the functioning of NPPs, by complementing effectively the design (which is given), improving resilience and effectiveness of responses during incidents.
16.5.3 Simple, Modular, and Adaptive Nuclear Power Plant (NPP) Designs for New Reactors Nuclear Power Plants, across the world, are atypical systems of machines, man, and control. Their integrations are complex and so are maintenance, ensuring safety, and managing security. All these systems developed in the 1950 and 1960s, primarily with an emphasis on ‘defense in depth’, provided very little scope for introducing transformative technologies. Especially in terms of improving man–machine-digital integration or automation. Much of the changes occurred in control rooms, in terms of ergonomics, display systems, and some analytics. While the rest of the critical industries managed to adopt digital and now AI-based predictive analysis for optimizing performance by designing better man–machine systems, NPP continued to rely on older technological systems and approaches. Small Modular Reactors (SMR), in their early design stage, should take advantage by adopting IR 4.0 approaches. That is a Human Reliability Design that transforms all aspects of reactor design, including the ‘defense in depth’ approach, which has created atypical large complex systems, which are not flexible and adaptive in nature, thereby missing the latest developments in technologies. That is from a ‘defense in depth’ approach to a ‘risk-based approach’ for nuclear power plants. In many ways the ‘defense in depth’ approach resulted in adding to the complexity of systems, operations, and regulatory processes, which are often not in sync with each other. SMRs, which are simple, small, self-contained, digitally enabled, AI-augmented autonomous, and modular reactor designs could transform where SMR could be located, managed, and serviced with built-in resilience, are possible in IR 4.0 and IR 5.0. The latest technologies can be used to reduce the cost of SMRs: designing, setting up, operating, maintaining, and shutting down. AI-based, Industrial IoT, and Human Cognitive System based algorithms are expected to not only bring down the cost of operations but also contribute to the longevity of NPPs and managing the waste. Socio-technical systems theory recognizes the inter-connectedness of social dimensions—people (actors), tasks, and data—and Technological systems which include infrastructure, machines, equipment, and digital devices. This approach also improves participation through understanding and recognition of complex systems. Though social systems and technical systems are viewed as interdependent parts of complex systems, in critical industries such as NPPs, social systems prevail over technological systems. In many ways, NPPs depend on people (actors) to perform their routine tasks and during mishaps well. Spatio-temporal Socio-technical risk analysis methodology—an application in emergency response is one of theoretical causal frameworks that “explicitly integrates
16 Human Reliability Design—An Approach for Nuclear Power Plants …
183
both the social aspects (e.g., safety culture) and the structural features (e.g., safety practices) of organizations into a technical system PRA model” with spatial attributes (Bui et al. 2017).
16.6 Contextualizing Human Reliability Design in India 16.6.1 Public Service Approach In India, the reliance on human reliability has been on recruitment, training, and retaining the trained staff. Being government-run establishments, human resource management for critical industries has remained comparable to various government services. Qualified science, engineering, and technology personnel are hired through a common recruitment process, which screens the candidates on various aspects, including a psychological profile. Once a person joins a service, the progression is more or less automatic. Training, retraining, and deployment have established systems, procedures, and norms. Apparently, this system holds the team together and makes them sufficiently motivated and eager to contribute to the safe running of nuclear plants and to contribute to continuously improving technology, safety, and safeguards. The question that needs to be answered is, “Is this sufficient?” This is especially true in the context of an increase in the number of plants, rising threats of geo-political terrorism, and an increase in the number of contractual options. It is in this context, learning from Human Reliability Programme elsewhere, from countries like the US or Europe, would be useful to manage human resource complexity. Specific examples include applying psychological metrics, customized for local socio-cultural context, at the time of recruitment, periodic assessment, continuation of terms annually, etc. The most urgent requirement is to create a space for collaboration between India and stakeholders in other countries to launch a systematic process to utilize advancements in Human Reliability Programme globally.
16.6.2 Dependence on Technical and Probabilistic Models The conceptual framework for a workable Human Reliability Design should recognize the limitations of Probabilistic Safety Assessment Models, i.e., that perception, judgment, discernment, and discretion influence critical individual and group decisions during incidents. Such decisions are taken by operators, often relying on a perceived reality, which rapidly shifts and is created by an interplay of technology, people, and other organizational and external factors. Except for technology and feedback obtained from instruments, the decision is largely influenced by human perception, judgment, and decision. A qualitative interpretation of the rules, processes,
184
A. X. Raj
procedures, competence, alertness, etc., impels the actions of the people involved. There is a need to optimize the interface between the machines, processes, data, interpretation, and decisions. Advancements in technology, data science, digitalization, and automation in the 21st Century should be leveraged to ensure optimum human decisions and reduce incompatibilities in interfaces. A human-user-centered interface design is a possibility now. Such designs, and a holistic approach, should enhance the reliability of inputs, overcoming perceptual flaws and responding appropriately in a rapidly changing context, leading to better judgment and timely decisions. The case for relying on human-centered and user-contextualized human reliability programme is feasible in the twenty-first century.
16.6.3 Aligning with Emerging Needs Thus, the need in India in the 21st Century, considering the likelihood of an increase in the number of nuclear plants, is for setting up a robust, effective, user-centric human reliability programme. To realize this, institutions, such as NIAS, should devise a scalable programme with projects as modules. To start with, the focus should be on a Human Reliability Programme (Soft Elements) leading to appropriate user-centered Human Reliability Design utilizing emerging practices.
16.7 Role of NIAS—Imagining Newer Viable Possibilities NIAS, shall focus on following interlinked modular steps—some in parallel and some in a sequence: As a first step NIAS could form a core group with a diverse set of experts drawn from industry, governments, design specialists, resilience engineering, psychologies, social scientists, technologists, policymakers, etc. These could be from India and elsewhere. Those identified elsewhere will have considerable exposure, expertise, and experience in working on emerging practices such as socio-technical systems, resilience engineering, cognitive systems, the latest in nuclear engineering, and small modular reactors. The Core group should facilitate a quick mapping of the extent of Human Reliability Practices in Nuclear Power Plants and other Mission Critical Sectors in India. To start with this mapping could cover nuclear energy, the power sector, oil & gas, and aviation. To enhance and strengthen Human Reliability Practices, it is also important to map, identify and engage with stakeholders, both in India and elsewhere. NIAS considers identifying suitable partners in the US, Europe, and Asian Countries, including Japan, Russia, and China. The core group in consultation with various stakeholders set up an appropriate mechanism for the exchange of information and documentation of certain practices. Systematic consultations, round tables, research briefs, and policy briefs could inform
16 Human Reliability Design—An Approach for Nuclear Power Plants …
185
decision-makers and elicit support from governments and various regulatory bodies. The core group should continuously focus on converging various emerging practices across multiple disciplines to shape knowledge and practice systems for Human Reliability Design. One of the key contributions from the NIAS core group shall be a universally adaptable solution to the world of Human Reliability Programme, using Human Reliability Design that puts humans at the center of various processes, procedures, metrics, and action protocols. Once a workable Human Reliability Design system prototype is available develop an implementation road map, in association with the Government of India, regulators, and agencies. Prepare a blueprint, post-testing of Human Reliability Design, to roll out a human reliability programme in India, meeting international standards, to implement in identified critical sectors, to start with the Nuclear Energy Sector.
References Boring RL (2012) Fifty years of THERP and human reliability analysis. Idaho National Laboratory Bui H, Pence J, Mohaghegh Z, Reihani S, Kee E (2017) Spatio-temporal socio-technical risk analysis methodology: an application in emergency response. in ANS International Topical Meeting on Probabilistic Safety Assessment and Analysis, PSA 2017. Pittsburgh, Pennsylvania Coates CW, Eisele GR (2014) Human reliability implementation guide. Oak Ridge National Laboratory Cybersecurity & Infrastructure Security Agency website, CISA (2022). https://www.cisa.gov/. Accessed 30 09 2022 Federal Register (2017) Human Reliability Program. https://www.federalregister.gov/documents/ 2017/06/22/2017-12810/human-reliability-program. Accessed 30 09 2022 Guglielmi D, Paolucci A, Cozzani V, Mariani M, Pietrantoni L, Fraboni F (2022) Integrating human barriers in human reliability analysis: a new model for the energy sector. Int J Environ Res Public Health 19(5):2797 Heijer A, Hale T (2006) Defining resilience. in Resilience Engineering Precepts and Concepts. Boca Raton, Florida, CRC Press, pp 35-40 Kim J, Park J, Kim T (2016) Modeling the resilience of nuclear power plant for unexpected reactor trip. Trans Am Nucl Soc 115:362–364 Lu C, Lyu J, Zhang L, Gong A (2020) Nuclear power plants with artificial intelligence in industry 4.0 Era: top-level design and current applications—a systemic review. IEEE Access 8:194315 Norman D (2018) Human Error? No, Bad Design. https://jnd.org/stop_blaming_people_blame_ inept_design/. Accessed 30-09-2022 Oak Ridge National Laboratory (2020) 3D-printed nuclear reactor promises faster, more economical path to nuclear energy. https://www.ornl.gov/news/3d-printed-nuclear-reactor-promises-fastermore-economical-path-nuclear-energy. Accessed 30 09 2022 Oliveira M, Almeida J, Jaime G, Augusto S (2015) Design of new overview screens for the LABIHS simulator. in ASSOCIAÇÃO BRASILEIRA DE ENERGIA NUCLEAR - ABEN, Sao Paulo, Brazil Reason J (1990) Human error. Cambridge, UK: Cambridge University Press Swain A, Guttmann H (1983) Handbook of human reliability analysis with emphasis on nuclear power plant applications. NUREG/CR-1278. https://www.nrc.gov/docs/ML0712/ML0 71210299.pdf Westrum R (2006) Typology of resilience situations. Boca Raton, Florida: CRC Press
186
A. X. Raj
What is Resilience Engineering? (2022). https://resilienceengineering.net/what-is-resilience-eng ineering/. Accessed 30 09 2022 Williams A (2019) Socio-technical interactions: a new paradigm for nuclear security. https://www. osti.gov/servlets/purl/1640683. Accessed 30 09 2022
Part III
Human Reliability Program in Practice
Chapter 17
Management and Human Reliability: Human Factors as Linchpin of Institutions and Organizations Karanam L. Ramakumar
17.1 Introduction The prosperity and well-being of any organization or institution depend on many factors that, apart from the product portfolio, include contemporary relevance, institutional infrastructure, good management practices, prompt rectification of business approach to care of market directions and preferences, quick response to grievance redressal, and human factors. Among all these, human factors is perhaps the most sensitive and critical component. The literature available on human reliability assessment generally speaks of human–machine interface and how to minimize, if not avert or avoid, failures. In contrast to this, the human–human interface or interaction is not given its due credit. Human reliability is only one of the cogs in the whole process and cannot be treated in isolation. One should speak of human relations development followed by human resources management and then proceed to human reliability assessment. The terms human reliability, human relations, and human resources in the context of realizing the sustained prosperity and well-being of Institutions and organizations are perceived as having different connotations when they, have, in fact, a common thread interlinking them, namely, improving efficiency, creativity, productivity, and job satisfaction with the goal of minimizing errors. The phrase human factors may be preferred to imply any or all of the three terms mentioned above. All the stakeholders—owners, management, and the staff—have equal responsibility and obligation in realizing this objective. Figure 17.1 illustrates the link between Human Factors and different hierarchical levels of an organization. The term human factors K. L. Ramakumar (B) Formerly of Nuclear Controls & Planning Wing, Department of Atomic Energy, Mumbai, India e-mail: [email protected] Radiochemistry & Isotope Group, Formerly of Bhabha Atomic Research Centre, Mumbai, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. S. Chirayath and M. Sai Baba (eds.), Human Reliability Programs in Industries of National Importance for Safety and Security, Risk, Reliability and Safety Engineering, https://doi.org/10.1007/978-981-99-5005-8_17
189
190
K. L. Ramakumar
Fig. 17.1 Link between human factors and different hierarchical levels of an organisation
essentially describes the interactions among three interrelated aspects: individuals at work; the task at hand; and the workplace itself. Human factors play a very important role in deciding performance in the workplace. The assessment of these factors, particularly of the staff and people involved in critical operations and sensitive assignments, has been dealt with extensively, and various models are now available. In contrast, the attitude of the management professional responsible for coordination and decision-making has not received the same level of attention or scrutiny. These are also critical for overall growth and need close scrutiny. This “indifference” attitude or “blind spot” of the management, though not intentional, may also result in potentially undesirable consequences. We will discuss how human factors in connection with the management influence overall functioning of organizations by taking some typical examples from the literature. All these examples highlight cases in which the management abjectly failed in its responsibilities—be it through a trust deficit, indecisiveness, or an inability to take the staff into confidence—thus eroding human values. Lastly, based on the author’s experience, an attempt has been made to demonstrate how an ambiance and environment could be created to ensure human factors are nurtured in a research laboratory leading to the overall growth of the organization. It is shown that no quick solutions are available and that the process is multi-stage. At no point in time is the staff stressed in believing that they are under reliability assessment; this assessment is inherently included in daily routine operations. The emphasis is not on the human reliability assessment but on the whole process of realizing the desired analytical output. Periodic work assessment followed by a well-structured approach to motivate the staff to contribute wholeheartedly results in a congenial work environment with a desired output.
17 Management and Human Reliability: Human Factors as Linchpin …
191
17.2 Human Factors There are various definitions of human reliability and human reliability assessment. The IAEA Training Course on ‘Safety Assessment of NPPs to Assist Decision Making’ defines human reliability as the probability of successful performance of only those human activities necessary to make a system reliable or available (IAEA 2020). Compare this with another definition (Collins and Najafi 2017) from the American Institute of Chemical Engineers (AIChE): “Human reliability analysis (HRA) is defined as a structured approach used to identify potential human failure events (HFEs) and to systematically estimate the probability of those events using data, models, or expert judgment. The probabilities used to evaluate HFEs are known as human error probabilities or HEPs.” From these definitions, one may get the impression that human reliability, in a majority of cases, deals with the human–machine interface and how to minimize, if not avert or avoid, failures. There is yet another dimension to human reliability and that is to be assessed based on the human–human interface in addition to the human–machine interface. Due to this, human reliability assessment cannot be done in isolation. The sustainability and prosperity of any organization or institution depends on many factors. It is a two-way process: both management and the staff contribute in equal measure. For growth, the working culture of any organization should, in the opinion of the author, start with cultivating human relations followed by human resources development. Then only as the third step, human reliability assessment needs to be considered. All these three human-related qualities may be combined in a single phrase, Human Factors. Human Factors may be defined as “the study of all the factors that make it easier to do the work in the right way by all the stakeholders for the overall growth of the organization or institution.”
17.3 A Brief History of Human Factors In the modern era, the term Human Reliability Assessment (HRA), or human reliability evaluation or analysis, was first introduced in 1962 by Munger et al. (1962). These authors considered human reliability with respect to the human–machine interface (electronic equipment operability). It is interesting to note that all the aspects of different traits of human characteristics—such as a moral code of conduct, how to wield power, the role of human beings in different types of government, etc.—were elaborated by various historical figures, such as Aristotle, Plato, Plato, Machiavelli, Confucius, and Karl Marx. Specifically, human factors describing human relations, human resources, and human reliability assessment were covered in ancient Indian texts such as Srimad Valmiki Ramayana, Kautilaya’s Niti Sastra, Arthasastra, and Mui (2012). It would be quite challenging to adopt those practices to suit current-day requirements. If not all, at least some practices might be easily adopted. These include
192
K. L. Ramakumar
rigorous recruitment procedures, testing and identifying different skills among people suitable for a particular profession, followed by periodic assessment of their commitment and integrity through ethical spying and ethical enticements. It is not the intention of the author to either advocate or recommend these practices but is an attempt only to highlight the ancient practices in assessing human factors.
17.4 Human Factors and the Management Human factor management is part of incident prevention and overall health and safety management. Human factors have contributed to various incidents in aviation, nuclear, chemical, and process industries. It is essential to understand how people affect the safety of operations in order to effectively manage the overall health and safety of a plant. The role of management professionals responsible for decisionmaking is critical in this area. Traditionally, performance has been managed by setting goals for (employees) performers to achieve. These goals may be related to safety, quality, or production. When the goals are not achieved, a series of actions invariably follow. The worker is then trained, counseled, retrained, admonished, possibly punished, demoted, or let go. Generally, the interventions are directed at the worker, ignoring the fact that in the operational model there are two sources of failure risk: people and processes. Since producers work within the system (interface with the system) they may influence to take actions and make choices that may result in errors or discrepancies leading to underachievement. In addition, the organizational systems could also affect the producers as well. Now let us look at some examples where the management was found wanting. These cases have been critically scrutinized many times by different experts. These examples are included only to bring out what the author perceives as a lack of will by the management to make the right decision based on the facts presented by the staff, resulting in catastrophic consequences.
17.4.1 Kodak Company There are few corporate blunders as staggering as Kodak’s missed opportunities in digital photography, a technology that it invented (Valmiki Ramayana 2009). Steve Sasson, the Kodak engineer who invented the first digital camera in 1975, was advised by the Management, “That’s cute—but don’t tell anyone about it.” Kodak management not only presided over the creation of technological breakthroughs by Kodak Research Labs but was also presented with an accurate market assessment by market researchers about the risks and opportunities of such capabilities. There was, however, little appreciation for the effort being conducted in the
17 Management and Human Reliability: Human Factors as Linchpin …
193
Kodak Research Labs with digital technology. The top management merely took note of market research. There was a complete disconnect between the two organs of the organization: management and staff. Holistic thinking was missing altogether. Does it fall under HRA requirement? If yes, how do we apply Human Reliability here?
17.4.2 Nokia In October 1998, Nokia became the best-selling mobile phone brand in the world. Operating profit went from $1 billion in 1995 to $4 billion by 1999. In 2007, Apple introduced the iPhone. By the end of 2007, half of all smartphones sold in the world were Nokias, while Apple’s iPhone had a mere 5% share of the global market. But subsequently, the quality of Nokia’s high-end phones declined. In just six years, the market value of Nokia declined by about 90%. Nokia’s decline accelerated by 2011 and it was acquired by Microsoft in 2013. According to Tim O. Vuori and Qui Huy (2018), Nokia suffered from organizational fear that was grounded in a culture of temperamental leaders and frightened middle managers. The middle management was scared of telling the truth of the dearth of innovations in the organization to meet the expectations of a new generation of customers for fear of being fired, and top managers intimidated middle managers by accusing them of not being ambitious enough to meet their goals. Nokia’s culture of status led to an atmosphere of shared fear, which influenced how employees were interacting with each other. This human factor has generated a state of “temporal myopia” that hindered Nokia’s ability to innovate.
17.4.3 The Accident of the Space Shuttle Challenger, Mission 51-L The flight of the Space Shuttle Challenger began at 11:38 a.m. Eastern Standard Time, January 28, 1986. Just about 72 s after liftoff, Challenger burst into flames, and all seven Astronauts were killed. The Rogers Commission’s report (1989) “Report to the President by the Presidential Commission on the Space Shuttle Challenger Accident” stated that managers and engineers had known in advance of the O-ring danger that ultimately led to the tragedy. The accident was principally caused by a lack of communication between engineers and management and by poor management practices. In looking over the NASA calculations, Dr. Feynman, who was one of the members of the Rogers Commission, said, “I saw considerable flaws in their logic. I found they were making up numbers not based on experience. NASA’s engineering judgment was not the judgment of its engineers.”
194
K. L. Ramakumar
What was sacrificed was proper judgment. Added to this, a complete lack of proper communication resulted in the catastrophe.
17.5 Observations The above three examples indicate that there was no insider threat in all three instances discussed, and, moreover, each of the stakeholders had the wellbeing of the institution in focus. But still, things went wrong! One question naturally crops up, “Could the concept of Human Reliability Assessment be applied as we perceive it?” If yes, how does one go about applying and studying the concept? Is there any compromise on human factors? Close scrutiny of the events clearly indicates that human factors did play their role. Management was found wanting in its approach to the problem at hand. There was a disconnect between the management and the staff, resulting in a complete trust deficit among all the stakeholders.
17.6 Ensuring Quality Measurements in an Analytical Laboratory in a Nuclear Research Institute Chemical quality control measurements are one of the critical requirements in the nuclear industry for a number of purposes. A number of analytical methodologies, both classical and instrumental, are used for carrying out measurements on the: • Purity of feed materials; • Chemical characterization of nuclear fuel and other materials: – – – –
Chemical composition Trace elements (metallic and non-metallic) Homogeneity Stoichiometry;
• Development of new analytical methodologies. It is needless to emphasize the utmost importance of trained and skilled staff in this context. There is also a primary focus on Human Factors. In fact, the organization undertakes a very long, multi-stage exercise spanning almost two years to realize its goal. The whole exercise involves: Step-1: • Batch recruitment of qualified people with the necessary educational background through a written examination followed by personal interview; • Background personal verification, medical and psychological examination.
17 Management and Human Reliability: Human Factors as Linchpin …
195
Step-2: • Rigorous training (both theoretical and practical) to hone their skills to suit the requirements for chemical quality control; • Periodic assessment of the trainees and final examination at the end of training; • Absorption and initial hands-on training in classical and instrumental techniques; • Ensuring quality measurements in an analytical laboratory in a nuclear research institute. Step-3: Periodic assessment of the analysts through analysis of • Blind samples (Samples that were previously analyzed but are being given with changed code for testing); or • Reference materials; or • Repeat analyses of the same sample (different identification) by more than one analyst and also by the same analyst at different time periods; or • Interlaboratory comparison experiments; or • Professional testing. In addition, the laboratory maintains trend analysis charts for both analysts and the samples, maintaining archives of the analyses for carrying out internal consistency testing. It may be seen that the reliability assessment is embedded in the operational framework of the laboratory as a part of the routine analytical output, enabling each of the analysts to recognize the importance of their work contributing to the activities of the organization. There is no need to go for a separate module. Further, the organization is committed to motivating the staff for their fruitful work output through: • Regular career progression opportunities; • Encouraging the staff to acquire additional higher educational qualifications; • Participation in conferences/symposia to present their research work.
17.7 Conclusions Human reliability is only one of the cogs in the whole process of the smooth running of an organization and cannot be treated in isolation. One should speak of human relations development followed by human resources management and then proceed to human reliability assessment. The phrase “human factors” may be preferred to imply any or all of the three terms mentioned above to describe essential interactions among three interrelated aspects: individuals at work, the task at hand, and the workplace itself. One should realize the dynamic nature of assessment of human factors. The requirement could be different as different space and time coordinates across the Institutions or even within the Institution depending on circumstances. (One size does not fit all!). Fundamental human factors are invariant for Management as well as
196
K. L. Ramakumar
Staff. All the stakeholders (Owners, Management, and Staff) must meet the human factors criteria. Communication channels are very critical. Most of the current assessments of human reliability focus on Man–Machine Interface. The Human–Human interface deserves equal attention. When machines/ software and AI take operations unto themselves, due consideration to the extent of human intervention should be assessed before passing judgment. Human reliability assessment should be seamlessly integrated into the working environment as a part of the procedural requirement to achieve the desired goals. Discussion meetings on such themes would have more utility if they include typical case studies (both real-life as well as hypothetical) as tabletop exercises.
References Chanakya Arthashastra (1915) Book-1, Translated by R. Shamasastry. Government Press, Bangalore Chanakya, Niti Shastra, Chapter 5, Viswamitra Sharma (Commentator) (2013) Manoj Publications, New Delhi Collins E, Najafi B (2017) Human reliability analysis for evaluation of conduct of operations and training. in AIChE Spring Meeting and Global Congress on Process Safety, San Antonio, Texas Mui C (2012) How Kodak Failed, Forbes, January 18. Munger SJ, Smith R, Payne D (1962) An index of electronic equipment operability: data store Presidential Commission on the Space Shuttle Challenger Accident (1989) Report to the president by the presidential commission on the space shuttle challenger accident (Rogers Commission Report). Washington, DC The International Atomic Energy Agency (IAEA) (2020) Human reliability analysis, IAEA training course on safety assessment of NPPs to assist decision making, [Online], Available: Initiating Event Analysis (iaea.org) Valmiki Ramayana (2009) Book II : Ayodhya Kanda, [Online], Available: https://www.valmikira mayan.net/utf8/ayodhya/sarga100/ayodhya_100_frame.htm WAIO (2018) WHY DID NOKIA FAIL?, [Online]. https://brandminds.ro/why-did-nokia-fail/
Chapter 18
Safety and Security in Nuclear Industry—Importance of Good Practices and Human Reliability Gorur R. Srinivasan
18.1 Introduction For any endeavor, three “Ms” are required: Men; Material; and Money. Of these, in my opinion, the most important is men. After all, man creates the other two also. Men include Human Resources, which further contains the areas of manpower, human relations, human factor, Human Reliability (HR), etc. This discussion meeting focuses on HR. While HR is rightly directed towards high levels of safety and security, its impact on viability (good business, making profits, sound economics, and commercial performance) and reliability (continuous operation, high plant capacity factor) cannot be neglected. The following sections are devoted to discussing: (1) The wholesome approach to safety, nuclear security, viability, and reliability in the nuclear industry; (2) Good practices in the nuclear industry to reduce dependence on Human Reliability (HR); (3) Nuclear Security; (4) Human reliability aspects in the nuclear industry; (5) Elements of safety culture have huge impacts on human reliability; (6) Impact of the World Association of Nuclear Operators (WANO) and the International Atomic Energy Agency’s (IAEA) Operation SAfety Review Team (OSART) on HR; and (7) conclusions.
G. R. Srinivasan (B) Formerly of Atomic Energy Regulatory Board (AERB), Mumbai, India e-mail: [email protected] Formerly of Nuclear Power Corporation of India Limited (NPCIL), Mumbai, India Nuclear Power Business, GMR Infra, Mumbai, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. S. Chirayath and M. Sai Baba (eds.), Human Reliability Programs in Industries of National Importance for Safety and Security, Risk, Reliability and Safety Engineering, https://doi.org/10.1007/978-981-99-5005-8_18
197
198
G. R. Srinivasan
18.2 A Wholesome Approach to Safety and Security in the Nuclear Industry HR is important in any endeavor, more so in industries, and it is vital in some industries like Nuclear, Chemical, Aircraft, etc. There was a good chance that with good HR, the Bhopal accident could have either been prevented or its consequences could have been mitigated. Safety and nuclear security require multipronged efforts; HR is one of these multipronged efforts. Improvements in HR need to be complemented by other efforts to achieve excellence. Prevention is better than the cure. While prevention is achieved by other efforts and good practices, HR takes the role of curing the consequences. Being confident of these steps, the operator is advised to sit back and watch, keeping himself ready for any unlikely intervention that he needs to do. This is similar to the autopilot coupled with inputs from navigational systems in aircraft. The following approach is taken in the nuclear industry. Firstly, steps are taken in the siting, designing, manufacturing, and constructing, as well as during the subsequent operation of the nuclear power plants to prevent or reduce the severity of accidents by good practices. This greatly reduces the dependence on HR to a bare minimum. Notwithstanding this, monumental efforts are put to independently strengthen HR. The nuclear industry is unforgiving; an accident anywhere is an accident everywhere, not radiation-wise but in terms of public acceptance. It is observed that global nuclear growth was temporarily retarded after each of the three major nuclear accidents. This trend is unique to the nuclear industry and hence calls for the above approach.
18.3 Good Practices in the Nuclear Industry to Reduce Dependence on HR While the nuclear industry has borrowed several good practices, it can also provide some to other industries, especially to the chemical, process, and aircraft industries. The good practices to reduce dependence on HR, prevent accidents/incidents, or mitigate consequences cover not only the areas of safety and security but also viability (economics/profitability) and reliability (high plant load factors, continuous operation, very few shutdowns or outages). HR has been shown to contribute in all these areas. It is very important that these good practices cover all stages from the cradle to the grave. The operation of Nuclear Power Plants and other nuclear facilities is the penultimate stage of the cycle just prior to decommissioning. Human reliability has to be achieved by steps taken from the beginning of the cycle, i.e., site selection and siting, and ensured until decommissioning and waste disposal. Human reliability has to be achieved through both “software” (Culture, Concepts, Practices, Systems, Procedures, Regulations, etc.) and “hardware” (Design, equipment, etc.). A brief listing of good practices follows.
18 Safety and Security in Nuclear Industry—Importance of Good Practices …
199
Good safety and security cultures in organizations and individuals. These, being important elements in HR, are described in detail later.
18.3.1 Defense-In-Depth (DID) This is a good practice borrowed from the army. The concept of DID has evolved from just having physical barriers to prevent the spread of radiation to having measures at all stages of NPPs, as well as being applied in all disciplines, like design, manufacture, construction, operation, etc. DID is now applied to both hardware and “software,” like practices, systems, procedures, etc., and from cradle to grave. As mentioned earlier, a two-fold strategy is incorporated: first, prevent accidents; and second, if it occurs, mitigate the consequences. We say, “If it matters, it should not happen. If it happens, it should not matter!” There are now five levels of DID: First, prevent abnormal operations and failures; second, control these failures if they still occur; third, prevent or control accidents within the Design Basis; fourth, control severe accidents that are beyond the Design Basis; and fifth, initiate an off-site emergency response to mitigate the radiological consequences. As can be seen, this layered DID takes advantage of both “hardware,” like systems/equipment as designed, as well “software,” like human effort, procedures, human reliability, characteristics and culture of staff, their training, their effectiveness, etc.
18.3.2 Siting and Design to Reduce Dependence on HR Siting of the Nuclear facility (NF), including the nuclear power plant, is done as per the regulatory code on siting. A majority of countries have adopted the standards documented by the International Atomic Energy Agency (IAEA) with their countryspecific requirements added. Siting reduces the risks due to extreme weather conditions, earthquakes, sabotage (Nuclear Security), etc., thus putting less dependence on HR. Siting should be avoided in terror-prone and sensitive regions. Good practices in design greatly reduce dependence on HR. A list of Postulated Initiating Events (PIEs) is made. A thorough accident analysis is performed to determine the elements on which the five levels of DID need to be applied. Important systems that take care of the three safety functions—namely, control of reactivity/reactor power, ensuring cooling of fuel, and confinement of radioactivity—are designed with Redundancy, Independence, Diverse (elimination of common cause/ mode failures), Fails Safe, single Failure Criteria, Physical separation, etc. These greatly reduce the probability of an incident/accident. The design shall ensure that the nuclear installation is suited for reliable, stable, and easily manageable operation. The design takes care of nuclear security, minimum radiation exposure, fire protection, operability, maintainability, capability to perform in-service inspections,
200
G. R. Srinivasan
and capability to test all important logic and safety equipment even during reactor operation to uncover any latent problems thus ensuring they act as and when needed, etc. However, there is a conflict while taking care of all these simultaneously, and some balancing becomes necessary. The objective of operator-friendly design is to promote the success of operator actions considering the time available, the expected physical environment, physiological pressure/operator stress, etc. The design relies on the operator to take any further action after the last automatic action without any time pressure and typically only after 30 min. Detailed design review by regulatory boards ensures that all the requirements mentioned in this paragraph are taken care of.
18.3.3 Human Factors in Design for Optimal Operator Performance In a report on the basic safety principles for nuclear power plants, the IAEA International Nuclear Safety Advisory Group, explains that plants must be “user friendly”: ‘User friendly’ is a term more commonly encountered in connection with computers, but it is also appropriate in describing properties of the plant sought for purposes of good human factors. The design should be user friendly in that the layout and structure of the plant are readily understandable so that human error is unlikely. Components should be located and identified unambiguously so they cannot easily be mistaken one for another. Operations should not be required simultaneously at points distant from each other. The control room and its artificial intelligence system should be designed after a failure modes and effects analysis of the plant, with information flow and processing that enable control room personnel to have a clear and complete running understanding of the status of the plant (International Nuclear Safety Advisory Group 1988).
The IAEA also advises that the design of nuclear power plants should be “operator friendly” and “aimed at limiting the effects of human errors” (The International Atomic Energy Agency 2000). In defining what is meant by “operator friendly,” they advise that interface between the “operating personnel and the plant” can be improved by paying attention to “plant layout and procedures (administrative, operational, and emergency), including maintenance and inspection” (The International Atomic Energy Agency 2000). The working environment should be designed according to ergonomic principles, and there should be a “systematic consideration of human factors and the human–machine interface” at an early stage of the design process, and throughout the entire process, to “ensure an appropriate and clear distinction of functions between operating personnel and the automatic systems provided” (The International Atomic Energy Agency 2000). The IAEA further suggests that the aim of nuclear power plant design should be focused on improving the likelihood of success in the case of a required operator response:
18 Safety and Security in Nuclear Industry—Importance of Good Practices …
201
The design shall be aimed at promoting the success of operator actions with due regard for the time available for action, the physical environment to be expected, and the psychological demands to be made on the operator. The need for intervention by the operator on a short timescale shall be kept to a minimum. It shall be taken into account in the design that the necessity for such intervention is only acceptable provided that the designer can demonstrate that the operator has sufficient time to make a decision and to act; that the information necessary for the operator to make the decision to act is simply and unambiguously presented; and that following an event the physical environment in the control room or in the Back up control room and on the access route to that Back up control room is acceptable. Adequacy of space, low noise. Good illumination and operating environment are inputs to operating efficiency (The International Atomic Energy Agency 2000).
Changes made in the more advanced designs of nuclear power plants are intended to make them more user-friendly. The use of advanced electronics in the control room and artificial intelligence in the monitoring of plant operations affects human factors in fundamental ways (The International Atomic Energy Agency 2000). However, the IAEA cautions, Some of the new designs also include software endowing the instrumentation and control circuitry with a diagnostic capability, to guide the operating staff in responding to any abnormality. Even where there is a greater degree of automatic response to abnormal conditions, with the operator informed immediately of the action taken (this is the practice at some operating plants), evolutionary designs rely on operators to take any further action after some specified time, typically beyond 30 minutes following the first automatic response. The possibility remains that human error could occur in the course of these subsequent actions, even though the extended period provided for reflection would remove time pressure (The International Atomic Energy Agency 2000).
A small possibility of software error leading to “inappropriate automatic action” exists, but these types of errors can be reduced significantly in number and impact through “sound software design practices, prior testing, including testing using plant simulators and adopting good verification and validation procedure but the elimination of errors can never be absolutely guaranteed” (The International Atomic Energy Agency 2000). With human error, however, there is always an “inherent limit to improvement” (The International Atomic Energy Agency 2000). Because of this limit, designs must be “forgiving,” and defense-in-depth remains important to ensure that no harm is done in the case of failure at any level (The International Atomic Energy Agency 2000). The IAEA recognizes that attention to human factors “will ensure that the installation is tolerant of human errors” (The International Atomic Energy Agency 2005). The elements they suggest must undergo a systematic application of ergonomic principles include: “(a) Engineered systems; (b) The provision of automatic control, protection, and alarm systems; (c) The elimination of human actions that jeopardize safety; (d) The clear presentation of data; and (e) Reliable communication within the installation” (The International Atomic Energy Agency 2005). They further emphasize that design should “reduce dependence on early operator action” (The International Atomic Energy Agency 2000). Errors by nuclear plant staff are rare, but they are most likely to occur when staff are asked to make decisions under time pressure. Therefore, “any required immediate response to an abnormal situation should be
202
G. R. Srinivasan
automatic” (The International Atomic Energy Agency 2000). A safe design is “operator friendly” and is aimed at accommodating human error. The artificial intelligence system should clearly inform control room personnel of any such automatic action and why it is being taken. The automated response should continue for at least a reasonable predetermined time dependent on prior assessment, but the opportunity should remain for the operators to override automatic actions if the diagnosis shows that they need supplementing or correcting (International Nuclear Safety Advisory Group 1988). At the user interface and where there is a relatively high potential for error, the operator should be presented with information in such a way that is manageable, allowing sufficient time for decision-making and action. The IAEA advises: Where prompt action is not necessary, manual initiation of systems or other operator actions may be permitted, provided that the need for the action be revealed in sufficient time and that adequate procedures (such as administrative, operational, and emergency procedures) be defined to ensure the reliability of such actions (The International Atomic Energy Agency 2015).
In an accident condition, according to the IAEA, “operator actions that may be necessary to diagnose the state of the plant and to put it into a stable long-term shutdown condition in a timely manner shall be taken into account and facilitated by the provision of adequate instrumentation to monitor the plant status and controls for manual operation of equipment” (The International Atomic Energy Agency 2007). In this case, “Any equipment necessary in the manual response and recovery processes shall be placed at the most suitable location to ensure its ready availability at the time of need and to allow human access in the anticipated environmental conditions” (The International Atomic Energy Agency 2007). Modern practice includes a dedicated safety parameter display system (SPDS), “Core Cooling Healthy” and other operator-friendly comprehensive displays. Equipment manufacture, construction, and commissioning are to the highest Quality Assurance (QA) standards. Equipment must follow classification as per regulatory requirements, with vital safety equipment classified in the top class. Rigorous preservice inspection, generally at the factory prior to dispatch, ensures adherence to requirements. Shake table tests prove seismic qualification. Safety equipment that is called for operating in a hostile environment is tested in LOCA (Loss of Coolant Accident) chambers to qualify them under those conditions. Good life cycle management and routine testing ensure the equipment can act as required by design until decommissioning. Many other such steps ensure equipment reliability. All these greatly reduce the human efforts required. The reliability of the equipment can be gauged from its performance in one of the Kaiga Nuclear Power plant units during its world record-breaking 960 days continuous operation. The equipment ran for 1000 days without anyone even looking at them!!! Such equipment greatly reduces the human efforts required, as well as the need for HR, to operate and maintain them. 35 such continuous runs of more than one year and over 600 reactor years of safe, secure, viable, and reliable operation bears eloquent testimony to over 3000 indigenous manufacturers. Many of them have mentioned that by supplying equipment
18 Safety and Security in Nuclear Industry—Importance of Good Practices …
203
to Indian Nuclear Power Plants (NPPs) with the required QA their credibility has greatly increased, with a few even obtaining ASME stamps. Good practices in operation, safety, and security cultures, which need to be incorporated at all stages and in all involved organizations, including in regulatory bodies, are one of the strongest contributors to efficient management. There exists an effective Safety Management System (SMS). The elements of this SMS are described later. Technical specification for operations brings out operational policies and principles (OPP) and establishes a safe envelope (LCO-Limiting condition of Operation) within which the NPP shall be operated. Technical Specifications are considered as a Bible: much revered, respected, and obeyed. The performance in industrial safety in NPPs is much higher than the cross-industry average. It is, however, a challenge to meet the required level of industrial safety during the construction stage. For maintenance work during the operation stage, the nuclear industry has borrowed the Standard Protection Code from conventional plant practice. This prevents injuries through an effective work permit system. Strong Regulation. Most of the 30-odd countries that have NPPs also have a strong regulatory system with a legal framework, adequate governmental support, fierce independence, well-written documents, including codes, standards, rules, etc., and, above all, a very competent, experienced, and knowledgeable staff. Safety is the responsibility of the operator, and the regulator should not diminish this. The reviews by regulatory bodies should be exhaustive, complete, and detailed. However, they should avoid prescriptive stipulations but mention the overall requirements of involved safety parameters. Independence comes from two factors: first, from its legal framework, and, second, due to the competence of the regulators who are held in high esteem by the regulated so that their decisions are respected and obeyed. The Indian regulatory body, the Atomic Energy Regulatory Board (AERB), amply possesses the second requirement. However, I feel the NSRA (Nuclear Safety Regulatory Authority) bill should be passed to give the AERB a statutory status and the ability to withstand international scrutiny during reviews in bodies like the “Convention on Nuclear Safety” of the IAEA, etc. However, the AERB is held in good esteem internationally and in the IAEA. An example of the exhaustive review can be seen in the review for licensing for various activities in the Narora NPPs. The regulatory review lasted 10 years and consumed one million technical man-hours. The above are some good practices, among many, which reduce dependence on HR. However phenomenal work has been and will be done in the nuclear industry in the areas of human resource development, human factors, ergonomics/man–machine interface, HR, etc. A few of them will be described later.
204
G. R. Srinivasan
18.4 Nuclear Security Nuclear Security is covered in my talk for two reasons. One, it is of national importance, the emphasis in this discussion meeting, and second, it is impacted greatly by HR. Nuclear Security (NS) has to prevent (a) the unauthorized removal of nuclear material and (b) Sabotage.
18.4.1 Historical Development There is a sea change in attitudes, perceptions, responses, and an increased need for measures towards nuclear security after September 2001. After this terrorist attack, there was a small group of people asking for the closure of NPPs. That will amount to punishing the victims and not the aggressors. Even before this act, an elementary physical protection system (PPS) existed. Most of the countries incorporated the provisions of IAEA documents INFCIRC/225/on physical protection of nuclear material and nuclear facilities, guidance for implementing this, as well as the Handbook on the physical protection of nuclear materials and facilities and INFCIRC/274, the convention on the physical protection of nuclear material (CPPNM). However Nuclear security is much bigger than PPS. Many countries have invited and hosted an International Physical Protection Advisory Service (IPPAS) mission. Very few industrial facilities have the larger provisions of nuclear security, much greater than PPS, such as the combination of robust design, tough PPS, well-trained armed security forces, defense protection (against war-like acts), administrative and operational measures, emergency response capability and above all, security culture, etc. found in nuclear facilities. Long ago, one of the main objectives of security is to prevent pilferage, and theft. There is a vast change from this. One of the important nuclear materials is Highly Enriched Uranium even though dirty bombs from radiation sources are also important. Slam together two chunks of HEU, which is commonly known as a “gun-type nuke,” and you have a repeat of Hiroshima. A report says there are more than 2500 metric tonnes of HEU globally. More than 100 civilian nuclear facilities around the world still run on weapons-grade HEU. These may not disappear anytime soon. Fortunately, some are getting converted to run on low-enriched uranium (LEU). These indicate the importance of NS. NS requires a top-to-bottom, comprehensive, multi-disciplinary, multi-operational, and holistic approach which needs to be initiated from the stage of siting the nuclear facility. Back fitting would be impossible. It can also be seen that it is the responsibility of both the state as well as nuclear organization.
18 Safety and Security in Nuclear Industry—Importance of Good Practices …
205
18.4.2 What Is Nuclear Security and Why It Is Important? Nuclear security involves establishing conditions that would minimize the possibilities of unauthorized removal of Nuclear Material and/or sabotage. Sabotage is defined as any deliberate act directed against the nuclear facility or nuclear material in use, storage, or transport which could directly or indirectly endanger the health and safety of personnel, the public, and the environment by exposure to radiation or release of radioactive substances. It could have both domestic as well as international repercussions. The sensitive nature of nuclear command, control, and communications systems, and of nuclear facilities, calls for a powerful security architecture that accounts for consequences emerging from physical and cyber threats and attacks. Similarly, understanding the concept of cybersecurity can guide policymakers in determining its connection with nuclear security measures. A useful definition of cybersecurity is the following: “the body of technologies, processes, and practices designed to protect networks, devices, programs, and data from attack, damage, or unauthorized access. Cyber security may also be referred to as information technology security.” The term cyber threats, meanwhile, encompasses a range of threats, including cyber terrorism, cyber espionage, malware attacks, and distributed denial of service. The fuel for nuclear reactors, Uranium 235 and Plutonium are the very materials for making atomic bombs. A few kilograms would suffice to change the benign fuel for power generation into a fearsome weapon for mass destruction. Considering tons of these materials are needed to fuel reactors, how can we be assured that a few kilograms do not get stolen? Fortunately, in spite of over 450 reactors operating in different parts of the globe, this has not happened. For one thing, the physical security systems that are in place and the internationally accepted procedures that have been erected to account for fissile materials are robust and effective and have secured nuclear fuels from diversion for decades. There is also a technical safeguard inherent to nuclear fuels. Nuclear Security consists of a Physical Protection System (PPS) as part of NPP design, a Nuclear Material Accounting System, a defense system to tackle Terrorism, and, above all, a good security culture for all those involved. Thus, NS is much bigger than just the Physical Protection System (PPS). The PPS needs to be integrated into the initial design stage itself as it is impossible to build it in later. Design Basis Threats (DBTs) need to be identified, and design should be directed towards satisfactorily addressing these DBTs. For example, physical separation and redundancy can tackle sabotage, and protecting vital areas can avoid the unauthorized removal of nuclear material. These considerations need to be built into the initial design, as back fitting is impossible. Insider threat is a difficult issue to be addressed. IAEA gives guidance for establishing NS systems, and most countries have adopted this. Security culture and HR requirements have a great impact on NS. NS is kept in top gear by periodic checking: testing and drills/exercises. Exercises should ensure that adversaries are neutralized in the stipulated time. Despite the provision of all the above, another essential input that is vital is security culture. It is impossible for a
206
G. R. Srinivasan
handful of security (and safety) professionals to take care of NS and safety. In one of the cities in India, a strange plastic bag was deliberately placed in a busy bus stand where hundreds of people move around. Even after three days nobody ever thought it was their duty to see and report it to concerned officials. This is not satisfactory security culture and gives enough latitude to the adversaries. Fortunately, security culture is much better in NPPs. An experienced operator has a good memory/snapshot of each area and can easily recognize strange/new material. As mentioned earlier, insider threat is also an issue. Many human factor elements are needed to address this issue.
18.4.3 How to Achieve Good Nuclear Security There is a need for security to be addressed as early as possible in the design process and integrated with safety systems. In addition, effective security by design will require multi-disciplinary teams of technologists, engineers, and security experts to address the challenge of implementing security by design during the siting, design, construction, operation, and decommissioning of advanced nuclear reactors. It implies adopting an integrated approach from the start that ensures an inherently secure design, passive security (to the extent possible), and adequate resilience to evolving threats. These attributes are easy to set out but will require a fundamental shift in the attitude towards security, both by designers and technologists and the security community. Working out how to do this—to share design basis threat information, develop common vocabularies and definitions, and agree on a common approach to safety and security—will be challenging. The following could be brief details of steps to be taken to achieve satisfactory nuclear security. To achieve the objectives of NS stated above, we would need a “multipronged attack” as described earlier. This attack” needs to address the following areas. (1) Sabotage by an insider (could be due to a threat from an outsider) or by an outsider in collusion with an insider. (2) Malevolent acts by outsiders and (3) War-like acts and Beyond Design Basis threats (BDBTs). NPP organizations are not expected to tackle war-like acts, it is the responsibility of states. I have not covered this area due to a lack of knowledge. With respect to BDBTs, we can list them and address them making the best estimates. The basic approach of NS could be done in a graded manner and in the sequence of Deterrence, Detection, Assess, Delay, Response, and Neutralization. 1. A list of credible terrorist scenarios (including both external and insider driven). This should also include cyber security issues. 2. Now prepare a Design Basis Threat against which systems should be designed. 3. Analyze the scenario and come up with design changes. 4. Maintain satisfactory nuclear security during operation as described elsewhere.
18 Safety and Security in Nuclear Industry—Importance of Good Practices …
207
18.4.4 WINS The World Institute for Nuclear Security (WINS) is working with developers, technologists, and regulators to encourage them to collaborate and share their know-how on security matters. Last year, more than 100 international subject matter experts attended two international workshops organized by WINS. These were held in Vienna, Austria, and Ottawa, Canada, and focused on best security practices and new solutions relating to the design, commissioning, and operation of these new reactors (both workshop reports can be downloaded for free by members on www. wins.org). Topics addressed during the events included: • Effective security by design: Interfaces of nuclear security with nuclear safety, safeguards and security; • Cybersecurity challenges; • Security issues in the supply chain and fuel manufacturing facilities; and • Successful engagement with stakeholders on SMR security. WINS will be publishing a Special Report on the Security of Advanced Reactors, supported by the Nuclear Threat Initiative (NTI). This report will encapsulate the research and findings at that time.
18.5 HR Aspects in the Nuclear Industry What is HR? It is “the probability that a human will perform a required mission under given conditions in a given time interval.” HR is considered very important as it can still mitigate consequences and play an important role in safety. In addition, improvements in security, viability, and reliability need HR inputs. Strengthening HR pays rich dividends in improvements in all four above areas. No wonder many organizations and industries, including nuclear, put a lot of effort into strengthening HR. The following are some of the many efforts by the nuclear industry to develop a strong HR and maintain it at a high level: • Safety and security cultures have a huge impact on HR. However, safety as a value is still better. This will be explained later. • Organization, staffing, qualification training, licensing of operations and maintainers. Recognizing these as valuable inputs to achieve high levels of HR, a lot of due diligence goes into implementing these. There is an IAEA document on organization and staffing for NPPs. Having an optimum number of staff at the predetermined levels/positions and having the right staff at each level in any organization is an essential initial input to achieve good levels of HR. Staffing should address the availability of people for training. Many NPPs have an extra shift crew to permit training. Not sending staff to training and quoting job requirements as an excuse is considered bad safety culture.
208
G. R. Srinivasan
Resources spent on qualification and training have yielded rich dividends in improving HR. Providing training opportunities is management’s responsibility and getting trained is that of the individual. The nuclear industry considers it as a top priority for individuals across many disciplines to develop skills, knowledge, and expertise. SAT, a Systematic Approach to Training, is adopted in NPPs. Briefly, SAT has the following elements. First, the organization’s mission, vision, goals, and objectives/tasks to achieve these are drawn. Staffing details to carry out these objectives/tasks are worked out. After this, the required competencies for each position and training to gain those competencies, including preparing the appropriate courses, are made. On-the-job training is made an important component. Continuous review is also part of SAT. Review needs to be dynamic, and all failures/performance shortages/lessons taken from Root Cause Analysis are used as inputs for correction/addition to training. Lessons are drawn not only from national incidents but also from international systems like the IAEA Incident Review System, Operational experience feedback, and advice from various fora, like the Light Water Reactors Operator’s Group, the World Association of Nuclear Operators (WANO), etc. are added in the training material. This makes it relevant and focused and cuts out unwanted areas. Only job-related training is conducted. SAT has been a vital input in strengthening HR. Planning for training is important, including giving consideration to a gestation period. For example, it takes six to eight years to produce a licensed shift charge engineer. Operators’ training for complex operations and during accidents is a difficult task due to high plant load factors and the inability to do on-the-panel training, accidents luckily not happening, etc. For such cases, a full-scope simulator is used. Many nuclear simulators are programmed for most types of accidents. In India, it is mandatory to have a simulator installed and commissioned before an NPP is started. Training is important for maintainers also. There is some critical and complex equipment, which is not accessible during plant operation. Any wrong step in maintaining this equipment could prove costly. Many NPPs have extra sets of equipment like reactor pumps, etc., which can be used for mock-up training of maintainers. In NPPs, operations staff (which includes maintenance staff) are put in place before commissioning and get extensively involved with the commissioning of NPPs. The experience one gains during commissioning is a unique situation, and the staff get vital experience during commissioning, which will stand them in good stead during subsequent stages of operation. From the above one can visualize the importance of staffing and training in the march towards excellence in HR. • Human-induced events are listed, analyzed and siting, design, and operational provisions are made to avoid them or mitigate their effects. Both utility and regulatory bodies prepare safety guides on these. • Lessons from national and international events, Opex, RCA, Low-Level Events/ Near misses, etc. are imbibed. There are classical Methods of performing Root cause analysis of events. As per elements of Quality culture/Total quality management, there is no incident/event without a report, no report without analyses, no analyses without lessons/feedback being drawn, no lesson without corrective
18 Safety and Security in Nuclear Industry—Importance of Good Practices …
209
actions, and finally, the loop is closed by confirming that the corrective action is right and that similar a event does not repeat. If it does, it is considered an inferior safety culture. Accurate RCA requires a blame-free culture. Classical RCA does not permit human error to be designated as a root cause; it is the cause, but not the root cause. Analyses must be continued until the root cause of that human error is determined. Is human error due to lack of training, wrong or obsolete procedures, improper design, negligence, etc.? The Incident Review System (IRS) and the International Nuclear Event Scale (INES), both from the IAEA, offer platforms for open, frank, and detailed technical discussion/sharing of events. I am amazed at the transparency and depth of these discussions. Improving HR through the international exchange of knowledge, experience, and lessons is unique to the nuclear industry. Operational Experience (OPEX) feedback is another useful tool for strengthening HR. These are being done through IAEA, WANO, and various users’ fora like the Light Water Reactor forum, etc. Many vendors continue to send periodic reports of experiences in all reactors supplied by them throughout the life of the NPPs. In an attempt to march towards excellence, many organizations in the nuclear industry also make reports on low-level events and near misses. There are almost 10–15 reports each day. The importance of the system of reporting for low-level events emerges when one looks at the pyramid described below. For every 10,000 low-level events/near misses there are 1000 minor incidents, 100 major incidents, 10 accidents, and one severe accident. Hence, one must start reducing low-level events. Near misses are also important. If a crane hook falls but just misses the head of a person and does not damage any other equipment, it may never get reported, since it is not required to be reported as per guidelines. However, it could have caused a fatality or damage to critical equipment. In addition, the lessons that need to be drawn will be the same in both cases. In one NPP, the lack of parking space for the cars of staff who were driving to work was reported as a low-level event!!! They claimed it caused stress, anxiety, and restlessness. This could lower their reliability in performing. A system of low-level event reporting is quite difficult to institute and practice but there is no substitute if we have to achieve excellence. • Documentation. In general, there are three types of procedure checklists in NPPs. The first type is carried to the field, used to carry out the complex operation step by step, and is verified by ticking in the box after each step and noting any observations. The second type is similar to the first but is used for less critical operations and only followed stepwise but not filled in/ticked. The third type is a procedure that needs to be remembered and performed from memory. The third type of procedures are practiced several times in the simulator. These arise when there is no time to refer to the written procedure. An example of this in our everyday life is driving a car. One cannot read the manual/operating instructions and drive at the same time! Adherence to the requirements of these three procedures is vital as it can eliminate/reduce errors.
210
G. R. Srinivasan
Documentation is useful in many ways. The nuclear industry is famous for documenting. And when computers were not there for a paperless office, it was said that one could run a 100 MWe fluidized bed plant with the documentation from a NPP. Operating and maintenance manuals are prepared before commissioning and used during this period. This allows for these manuals to be checked out and updated. • Behavioral Research. There is a lot of research done on this topic and many of these research results are being implemented. The effect of the “biological clock” on round-the-clock shift workers has been studied extensively. Environmental conditions are set up to make graveyard shifts (night shifts) look as if they are day shifts. It is observed that 03 00 AM is a vulnerable period for operators. All three major accidents have taken place around this time. Critical operations are generally avoided during this time. In addition, many excellent organizations determine when their staff is going through stressful/anxious periods like when one’s wife is going through a complicated pregnancy, when children are ill, etc., and try to avoid giving mentally taxing jobs to individuals during these periods. A blame-free culture is established. In my NPP, I used to make the staff extremely comfortable. During meetings, when serious issues or some setbacks were discussed, I used to cut a few jokes. Staff used to want to come to my office for a few hours even if they were on leave. An environment like this enables each individual to perform up to their full potential thus improving HR. The behaviors of all operating personnel result in safe, reliable, and viable operations. Behaviors that contribute to excellence in human performance are reinforced to continuously strive for the event-free operation of NPPs. Managers must develop strategies, policies, processes, and practices that increase human reliability toward attaining excellence. – Staff, especially operators, have a calm temperament, not impulsive, fast response, not jittery, fit, not an excessive alcoholic. – the staff should be “super communicators.” • Periodic assessment of HR. To ensure all the parameters required for an organization to be efficient, including HR, are maintained at the desired level, it is very important to perform periodic assessments/audits. The methodology for such audits is as follows. First, a target is set for the parameter. The targets should be challenging, achievable, and measurable. Performance indicators need to be developed to facilitate measurement. The periodicity of the audit must be decided. The audit should be conducted through an expert group. From the findings, corrective actions should then be determined and implemented. Monitoring for such corrected items should be increased. These steps are not easy for various parameters of HR, since many of them are intangible. However, they can still be assessed through the tangible results accountable to them. In NPPs such audits are conducted, combined with audits for the Safety Management System (SMS), Safety, and security and quality cultures. A more detailed audit is done during the mandatory Periodic Safety Review every ten years.
18 Safety and Security in Nuclear Industry—Importance of Good Practices …
211
• Accident Management, Emergency Operating Procedures (EOP), Off-site Emergency procedures and drills/exercises. Notwithstanding the various provisions to prevent accidents or mitigate their effects before operator actions are required, in the longer term (hours/days), operator action is useful. Detailed plans for these actions are drawn by various people. In addition, the design provides data/ indications to aid the operator. In the early period, the operator is advised to sit and watch and observe automatic actions as they are taking place. In a very rare case, the operator may intervene. After a few hours, he may have to carry out the steps in EOPs. EOPs are fed into the simulator enabling periodic training. EOPs are prepared for each of the Postulated Initiating Events as well as for several others as per a prepared list. While the world started preparing EOPs in 1979 after the Three Mile Island accident, we in India prepared them in 1977. During the Narora NPP fire incident, the operator exactly performed the steps in the corresponding EOP, which prevented even minor damage to the reactor and the fuel. Offsite Emergency preparedness procedures involve public officials, and a full-scale exercise is done every two years. Drills are done more frequently to check packages like communication, emergency sirens, etc. The nuclear industry has developed a methodology to insert HR inputs in Probability Safety Analyses. While the above paragraphs describe various aspects and practices of HR in the nuclear industry, a majority of HR items are embedded in SMS and Safety, Security, and Quality cultures. These are brought out in the next few pages. • Non-performance control. The following is quoted from an AERB document. The management should empower personnel in the organization with the authority and responsibility to report non-conformances at any stage of a process in order to ensure timely detection and disposition of nonconformances. Authority should be clearly defined to ensure appropriate disposition of non-conformance to meet item/process requirements (Atomic Energy Regulatory Board 2006).
Management shall establish and maintain a process or processes that provide for identifying, reporting, reviewing and physically controlling items, services, or processes that do not conform to specified requirements. The above elements went into staffing and training in NPPs.
18.5.1 Safety Management System. What Is It and What Are Its Elements? The Nuclear Industry has a fairly good performance record in the areas of safety, security, viability, and reliability. This is due to good SMS and HR. SMS consists of all arrangements to manage safety, promote good safety culture, and achieve excellent safety performance. It promotes self-regulation and proactive
212
G. R. Srinivasan
habits. The role of the regulator is to ensure SMS but avoid diminishing the responsibility of the regulated organization. The following table (Fig. 18.1) gives elements of SMS. These elements are an important input to strengthen HR.
Fig. 18.1 Elements of the Safety Management System (SMS) and their intersection with safety and security cultures
18 Safety and Security in Nuclear Industry—Importance of Good Practices …
213
18.6 Safety Culture and Its Characteristics/Ingredients 18.6.1 What Is the Culture of an Organization and Safety Culture? The culture of an organization can be described as a mix of shared values, attitudes, and patterns of behavior that gives the organization its particular character. Some companies have a good image. This image is of its people and the perceived manner in which it conducts business. It is the organizational values and the way it treats its people which determine the image of an organization. While buildings, systems, procedures, etc. definitely add to an organization’s image, it is mainly its people and the value system that make a major contribution. IAEA Definition of Safety Culture: The assembly of characteristics and attitudes in organizations and individuals which establishes that, as an overriding priority, protection and safety issues receives the attention warranted by their significance (https://www.iaea.org/topics/safety-and-security-culture).
Another definition is as follows: The Safety Culture of an organization could be described as the ideas and beliefs that all members of the organization share about risk, accidents and ill health (Confederation of British Industry 1990).
However, I have come up with a mundane definition of safety culture. It is when every individual does the right thing for safety when nobody is looking at him. Can the culture be changed? Yes. However, it is a slow process and requires extremely careful handling. This is not today’s subject and will not be dealt with. I am going a step higher than safety culture and would like safety to be incorporated as a value of the organization. There is no question of giving priority to safety! Safety is so intermingled with many activities, including production, and must be taken care of simultaneously with other activities. Even a housewife cannot say, “Let me cut the vegetables (a production activity) first and later take care of safety.” She is bound to cut her fingers. Hence, safety has to be a parallel activity. If a person is hungry, his priority is to eat. But after a seven-course meal, eating will no longer be a top priority but could be 7th or 8th. Priorities keep changing, but good nourishment as a value is permanent. Thus, safety and security need to be part of the values of the organization and must be built into its systems, procedures, etc., and, above all, into its people. A part of their brain should be ingrained with safety and security. Like my mundane definition of safety culture, such embedding needs to be inherent, not forced or induced. Inherent habits are more powerful than induced ones. Even a child will not do forbidden acts when being observed. The moment the elder turns away, the child will do it. However, if the child wants to do something it will do it irrespective of onlookers. This will also make the staff proactive and not reactive. A small group of safety or security staff cannot achieve high levels. But with 5% of the brains of all staff embedded with safety and security, a large contribution is added,
214
G. R. Srinivasan
enabling much higher levels to be achieved. All the above ensures our efforts will eliminate causes for affecting safety/security to nearly 95% leaving 5% to luck/God and not vice versa.
18.6.2 Safety, Security as a Value of the Organization I inculcated a practice higher than safety and security cultures. That these should be VALUE in an organization. There is no question of giving priority to safety and security. Priorities keep changing. For example, when one is hungry, his top priority is to have a good buffet lunch. But once he is full, further eating is the last priority. But good nourishment as a value is forever. Safety, security, production, and many other activities need to be achieved simultaneously as they are intensely intermingled. One cannot say that I will give priority to completing a production activity and later take care of safety and security. Even a housewife cannot say that she will cut vegetables first and take care of safety later. She is bound to cut her finger. The safety of her finger and the production activity of cutting vegetables need to be achieved simultaneously. Just like cutting vegetables and the safety of the fingers of a person are intermingled activities, many production activities in industries are closely knit with safety. The only way to achieve safety, security, and production satisfactorily is to achieve the objectives of all simultaneously. This could only be achieved if safety and security, as values, are embedded in each individual in an organization. Good safety and security practices must be built-in in all systems, procedures, and production activities and above all embedded in each and every individual in the organizations. Almost every staff should have a few percent of their brain allotted to safety and security. It is observed that inherent safety and security are much stronger than induced ones (by force or regimentation). Even a child who should not do a few things against his parents’ wish, will not do these things in front of them, but the moment the parents go away the child will do the unwanted act because he wants to do it. Of course, if the child himself does not want to do a thing, he will not. Thus, inherent quality is stronger that induced ones. Each staff should be “involved” in maintaining safety and security as against just participation. Involvement is voluntary, with commitment, and with pride. An example of how to induce people is given here. In our nuclear plant, a workshop was conducted for drivers. This was organized and run by them and not by management. In the presentations by drivers and conductors, they covered safe driving, objectives of transport, how to take care of travelers, etc. There was a marked improvement which our telling them could not have achieved. The management should create an environment in which the staff inherently acquire safety and security as values.
18 Safety and Security in Nuclear Industry—Importance of Good Practices …
215
18.6.3 Characteristics and Ingredients of Safety Culture These are described below. Each of these ingredients has a tremendous impact on HR. A “good’ safety culture can be said to comprise: • At the societal level, a commitment by the government and a strong regulatory body with adequate powers; there should be good policy and legal frameworks. It is also seen that the safety level in each country depends on how much attachment/ feeling its people have for life, injury, etc.). • License policy level commitment comprising a statement of safety policy, management structures, adequate resources, and self-regulation, but monitored (proactive rather than reactive). Good organizations have as a permanent first item in the Agenda of Board meetings a review of the safety and security status. • Management commitment comprising definition and control of safety practices, audit, review, and comparison (top-level commitment is a must). • At the individual level, defined responsibilities, a questioning attitude, a rigorous and prudent approach, and good communication. Practices should cover these ingredients: • Top management commitment, including – – – – –
A mission statement; A safety policy document; Visible commitment Serving as a “Role model”; and Being a leader cum manager. If the top management only does lip service to safety/security, their body language will reveal their lack of commitment.
• Well-structured training, including a qualification, licensing system. • Good organizational framework, including good procedures, documentation, etc.; Systematized approach. • Clarity of roles and responsibilities. It is a good practice to display the organizational chart, drawn up to the last tradesman, on the notice boards in each maintenance shop. • Well thought-out awards and sanctions. While there should be a blame-free culture, at the same time some pressure on staff should prevail to make them perform at the highest level. The organization should balance between caring for its staff and being exacting towards it. Thus, a system of rewarding excellent performance and penalizing below-par performance should be developed and put into place. • A good dissemination system consisting of Root Cause Analysis, asset review as prescribed by IAEA, and low-level/near miss review should be in place. • Involvement of all rather than just participation. In my dictionary, involvement is voluntary and performed with pride and with commitment. • Highly effective, motivated team.
216
G. R. Srinivasan
• Communication. This is very important. In fact, many management seminars, lasting three days, are conducted on communication. As an example, in NPPs it is very important for the last supervisor to be aware of the outage plan, including the restart up plan, lest he takes up, for instance, a two-day job when start-up is planned the next day. • Openness and transparency; ability to give and take negative feedback. • A majority of people do the majority of jobs right the first time. This is a Total Quality Management requirement. These and others allow for excellence and the achievement of six sigma performances. To achieve excellence in Safety, the following 3 characteristics of safety culture are vital: Questioning attitude Plus A rigorous and prudent approach Plus Communication (1) Questioning attitude • Do I understand the task? What are my responsibilities? • How do they relate to safety? • Do I have the necessary knowledge to proceed? What are the responsibilities of others? • Are there any unusual circumstances? • Do I need any assistance? • What can go wrong? • What could be the consequences of failure or error? What should be done to prevent failures? What do I do if a fault occurs? (2) Rigorous and prudent approach • • • • • • •
Understanding the work procedures. Complying with procedures. Being alert for the unexpected. Stopping and thinking if a problem arises. Seeking help if necessary Devoting attention to orderliness, timeliness, and housekeeping. Proceeding with deliberate care.
(3) Communicative approach • Obtaining useful information from others • Transmitting information to others • Reporting on and documenting results of work both routine and unusual— Suggesting new safety initiatives (Fig. 18.2).
18 Safety and Security in Nuclear Industry—Importance of Good Practices …
217
Fig. 18.2 Ingredients for SMS, safety, and security cultures: indicators of desired characteristics and culture to achieve excellence
18.7 Impact of WANO PEER Review, and the IAEA OSART on HR WANO conducts peer reviews at the request of host nations. Similarly, IAEA conducts OSART (Operational Safety Review Team) and “SALTO” safety assessments of long-term operation. These reviews have a lot of items that are related to HR. While I was in a nuclear power corporation, I combined the first two and added our own items and called it ISROS (Internal Safety Review for Operating Stations). I lead teams to conduct ISROS for all our NPPs every 5 years.
18.7.1 Effect of These Factors Including HR on the Nuclear Industry Globally, the safety and security performances during more than 17,000 reactor years of operation have been extremely satisfactory. A World Health Organization study indicated that nuclear is the safest energy source. The NPPs are making profits. Many NPPS are being run by private utilities, and they will not run unless the NPPs are
218
G. R. Srinivasan
economical. The average plant load factors of NPPs are higher than conventional plants. A similar performance can be seen in the Indian nuclear industry. AERB and MOEF/Pollution Control Boards mandate limits for radiation exposure to plant personnel, the general public, radioactivity in various effluent releases (gas, liquid, and solid), the release of chemicals to the environment, etc. As a matter of fixing challenging targets, the Indian NPPs choose 10% of the statutory targets and have invariably met them. For example, AERB limits for radiation exposure to the public surrounding NPPs are 100 units (a conveniently chosen unit). Actual measurements indicate a man from the general public (assuming that he is sitting at the plant fence 24/7/52, i.e., every minute of all 365 days) gets around one unit, i.e., 1% of the statutory limit. At most of the NPP locations in India, the natural background radiation (which the public is exposed to whether NPP is there or not) is around 200 units. Thus, the NPP operation adds one unit to the 200 units already existing. All the NPP units are making profits with the Tarapore plant supplying power cheaper than hydraulic plants (Viability). Kaiga NPP operated for 960 days, establishing a world record, and was shut down for maintenance. The average plant load factor for the 22 NPPs in India is around 80%. HR has been a vital input in the above performances of global and Indian NPPs.
18.8 Feedback and Recommendations on the Discussion Meeting In my talk, I listed the ingredients/characteristics that are required to have satisfactory levels of HR. Many other speakers also did so. They also brought out steps taken in their respective industries/organizations to audit and strengthen HR. As a future action that may be pursued, it may be a good idea to collate these into a complete set and using these to strengthen HR across industries/organizations. I also brought out the importance of maintaining HR at the highest level. This would require periodical auditing. It is very advantageous to develop performance indicators on each ingredient to assist in auditing. By comparing these indicators over a fixed duration, gaps can be determined for timely corrective actions. Auditing was also covered by other speakers. Summarizing actions to be pursued, the following are suggested. I consolidated the following list from the talks: • Ingredients/characteristics required for excellence in HR; • Steps to strengthen them; – (a) Develop Performance Indicators for each ingredient in (1), – (b) Details of auditing, using the above Performance Indicators and determining the gaps to take timely corrective actions to maintain HR at the highest levels.
18 Safety and Security in Nuclear Industry—Importance of Good Practices …
219
The above can form the basic input for further action to improve them. Then they can come up with suggestions/Recommendations to individuals/organizations to move towards excellence in HR and stay there. There were comments that the talks were biased, with more topics on the nuclear industry. Such larger coverage is justified considering the phenomenal work done on HR in the nuclear industry. There was also a mention that there was too much coverage on safety. This again is justified. Safety and production are not mutually exclusive. It is observed that the safest plants have the best production, profits, and commercial gains. This is because the ingredients required for safety, as brought out in my talk, are the same for excellent business, high production, and large profits.
18.9 Conclusion and Recommendations 1. Safety and nuclear security are of utmost importance in the nuclear industry as well as many other endeavors. The dual approach in the nuclear industry is to reduce dependence on HR by engaging in good practices from cradle to grave and simultaneously and independently taking steps to strengthen HR. 2. The nuclear industry realizes the importance of HR in mitigating consequences after accidents and bringing in improvements in Safety, Nuclear Security, Viability, and Reliability. Hence, monumental efforts are put in improving HR. 3. Safety and Security cultures have vital inputs in for strengthening HR. 4. The global nuclear industry has completed more than 17,000 reactor years with satisfactory Safety, Nuclear Security, Viability, and Reliability. 5. This is also the case with greater than 600 reactor years in India. For both 4 and 5 h has been a vital input. 6. Generation 3, 3+ and 4 reactor designs further reduce the need for human intervention; in these designs, even after severe accidents, there will be either no need, or minimal need, for emergency countermeasures in the public domain. It is recommended to collate the following from various talks: (a) Create a list of items required to strengthen HR. (b) Develop performance indicators for each of the above. (c) Develop a sound audit system using these performance indicators. Also, determine how to benefit from such audits in bringing in corrective actions and maintaining excellence in HR. The march towards excellence is a continuous journey not a destination.
220
G. R. Srinivasan
References Atomic Energy Regulatory Board (2006) Non-conformance control, corrective and preventative actions for nuclear power plants—AERB SAFETY GUIDE NO. AERB/NPP/SG/QA-8. Atomic Energy Regulatory Board, Mumbai Confederation of British Industry (1990) Developing a safety culture—business for safety. London International Nuclear Safety Advisory Group (1988) Basic safety principles for nuclear power plants: a report by the International Nuclear Safety Advisory Group 75-INSAG-3. International Atomic Energy Agency, Vienna The International Atomic Energy Agency (2000) IAEA safety standards: safety of nuclear power plants: design NS-R-1. The International Atomic Energy Agency, Vienna The International Atomic Energy Agency (2005) Safety of research reactors NS-R-4. The International Atomic Energy Agency, Vienna The International Atomic Energy Agency (2007) Proposal for a technology-neutral safety approach for new reactor designs. The International Atomic Energy Agency, Vienna The International Atomic Energy Agency (2015) Basic professional training course: design of a nuclear reactor. The International Atomic Energy Agency, Vienna The International Atomic Energy Agency (IAEA). Safety and security culture. [Online]. https:// www.iaea.org/topics/safety-and-security-culture
Chapter 19
Human Performance Excellence in the Nuclear Power Corporation of India Limited Ravi Satyanarayana
19.1 Introduction The Nuclear Power Corporation of India, Limited (NPCIL) operates 22 commercial power reactors in seven sites. Striving for safe and reliable operations is the prime goal of the organization. Over the five decades of reactor operation experience, the hardware or equipment performance of NPCIL reactors has gradually improved, which happened as a result of focused attention to maintenance practices, including equipment condition monitoring, reliability-centred maintenance, and the training and qualification of maintenance personnel. As a result, NPCIL was able to convert the annual shutdown for maintenance to a biennial shutdown. Human performance at NPCIL has improved over the years; however, it still has further scope for improvement. Prevention of any negative impact on safety and reliability attributed to human error is the prime target due to the very nature of human performance initiatives. Many high-performing nuclear power plants have developed and implemented a comprehensive human performance enhancement programme. These plants use a set of performance indicators to assess human performance. Therefore, NPCIL adopted a vision of excellence in human performance to proactively prevent events due to human error with its theme of implementation as “Do it right the first time!” In view of this, there is an increased focus on human performance improvement at all operating stations of NPCIL. To achieve the vision of human performance, the following primary actions were initiated: (1) provide enhanced training, and (2) provide necessary tools and knowledge on techniques to be used so that employees are empowered to think, plan, and act in all the tasks while preventing human errors. These actions were implemented by identifying human performance-related issues, spot coaching at the workplace, classroom briefings, R. Satyanarayana (B) Formerly of Nuclear Power Corporation of India Limited (NPCIL), Kaiga, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. S. Chirayath and M. Sai Baba (eds.), Human Reliability Programs in Industries of National Importance for Safety and Security, Risk, Reliability and Safety Engineering, https://doi.org/10.1007/978-981-99-5005-8_19
221
222
R. Satyanarayana
and monitoring human performance trends. These actions are envisaged to eventually improve the safety and reliability of the nuclear power plants by keeping in mind the three basic operational functions at all times: (a) human level, (b) machine level, and (c) procedural level—human–machine interface. This document elaborates on the policy and procedural aspects of human performance improvement initiatives in NPCIL.
19.2 Human Performance Objectives The steps needed to meet excellence in human performance include understanding coherent strategic approaches to improve human performance; recognizing the manageable elements of human performance; adopting leadership behaviors that align organizational processes and values to optimize human performance at the workplace; communicating and thinking proactively; and applying techniques to identify and eliminate error-likely situations and/or flawed defenses. These steps involve anticipating and preventing active errors at the workplace in addition to identifying and eliminating latent organizational weaknesses, if any. At the design stage of the reactor, the necessary provisional redundancies (hardware and software) are to be provided in accordance with the average level of quality expected from the employee, and equipment and procedures to ensure safe operation. At the preparation stage prior to operation, a quality assurance program that ensures safe and reliable operation resulting in quality of operating personnel, equipment, and procedures must be deployed. At the operation stage, a preventive maintenance program to stop any degradation of the quality level of personnel, equipment, and procedures is required. Also at the operation stage, a surveillance program (Detection and Restoration), capable of identifying and correcting any latent weakness affecting the expected quality level of personnel, equipment, and procedures is required. Timely detection of latent weaknesses in human, machine, and human–machine interfaces and its timely corrections provide the ultimate barrier of the defence-in-depth concept dedicated to the prevention of undesired incidents. The detection program should aim at thoroughly assessing the proficiency of personnel, usability of procedures, and operability of the equipment to detect latent weakness, which may lead to personnel, equipment, or procedure failure under adverse circumstances. The end goal of human performance is to minimize the frequency and severity of incidents/events.
19.3 Facts About Human Performance If a plant has not experienced any incidents/events, it may be just out of luck. Incidents/events allow the plant managers to think about improving human performance if such incidents were caused due to human error. It should be understood by the
19 Human Performance Excellence in the Nuclear Power Corporation …
223
plant managers that complacency can build into daily activities, latent weaknesses might persist, and defenses could degrade. It is not a good idea to manage by crisis. If the plant managers do not plan to train their employees regularly, incidents are bound to happen. It is worth noting that mere seminars and training courses accomplish very little and are helpful only to bring brief attention to the subject. Without a follow-through by the leadership, training accomplishes little. Continuous improvements in human performance are a result of active follow-up carried out by leadership. It is a fact that human errors cannot be eliminated by just attentive employees because errors are specific to unique combinations of activity and personnel circumstances. Hence, there is more to human performance than just self-checking or attentive employees at the workplace. Undue reliance on “accountability” (interpreted as discipline and punishment) is narrow-minded and dangerous. It is indeed a necessary element in managing human performance, but it is not as simple as adding negative consequences to honest mistakes. Isolated, honest mistakes, regardless of results, should not be “punished” (using the broadest sense of the word). The whole system must also be held accountable. Human Performance is a system, not an individual. Root cause analysis of an incident/event should not simply determine why a person made the mistake but also find out how the incident could have been prevented by taking to consideration the level of “error tolerance.” Employees do not necessarily want a blame-free environment, but they do want a just environment, where events are perceived from a prevention perspective instead of the question of why a person erred. Actions to punish employees for “honest” mistakes could backfire. Managers must think about what they want and what will happen when punishment is used. The severity of an event caused due to human error depends on the gravity of the error, despite the fact that people have been making the same error over and over. Management has to realize that severe events are an indication of flaws in their ability to maintain appropriate defense-in-depth. An important consideration to improve human performance is by rewarding the employees for good performance. Celebrations and recognition occur infrequently even though they take little time to do and require little effort from a manager. Positive reinforcement, by the manager through his or her presence at the workplace and direct interactions with the employees, is another way to improve human performance. Positive reinforcement definitely requires preparation from the manager’s side. A fairly large number of managers, supervisors, as well as workers believe expertise prevents errors. The issue is not always competence, but human nature has a large role in their performance. It is not always aptitude but attitude also matters in human performance. Human nature being what it is, requires controls, or defenses, to reduce the chances of error or prevent specific errors altogether. Workers typically do not correct their peers for similar reasons. They do not want to embarrass or insult fellow workers and friends not understanding the need for control of human nature. Bottom line is human errors cause significant events and hence they are bad for the organization.
224
R. Satyanarayana
19.4 Desired Attitudes of High-Reliability Organizations It is a desirable characteristic for an organization to feel uneasy about human fallibility and the human nature to err at times, because this type of uneasiness is due to a better understanding of common human limitations and tendency towards complacency. On the contrary, reluctance to acknowledge the presence of hazards, known as Pollyanna Effect, is undesirable to an organization. Similarly, understanding and analysing the state of the plant in a holistic way are equally desirable for an organization to remain vigilant against any vulnerabilities and to be ready for remedial measures. This concept of holistic view was developed from the Vietnam conflict, based on which fighter pilots are taught to keep the big picture, not only in the sky but from the ground also. Organizations shall strive for the development and maintenance of errorprevention tools, techniques, and programs in order to identify and defend against error-likely situations at the workplace. While developing such programs, a key thing to be conscious of is that human error is a major contributor to incidents/events/ accidents. Finally, communication throughout the organization is another desired characteristic to be inculcated to improve human performance. The will to communicate problems is one of the most important elements needed towards continuous improvement in human performance. Obstacles in communication is a common mode failure in more than 80% of all aviation accidents, somebody knew something somewhere that could have prevented the disaster. The constant flow of information about hazards (error-likely situations) is highly desirable to improve human performance.
19.5 Human Performance Principles and Their Elements Striving for excellence in human performance shall be an ongoing effort in an organization to significantly reduce plant events caused by human error. Human errors are caused by various conditions related to individual behaviour, management and leadership practices, and organizational processes and values. Behaviour at all levels needs alignment and harmony to improve individual performance, reduce errors, and to prevent events (U.S. 2009).
19.5.1 Principles To achieve excellence in human performance, following underlying principles need to be understood: (a) humans are fallible and even the best people can make mistakes, (b) error-likely situations are predictable, manageable, and preventable, (c) organizational processes and values influence individual behaviour, (d) behaviour can be
19 Human Performance Excellence in the Nuclear Power Corporation …
225
influenced by encouragement and reinforcement, and (e) events can be avoided by applying lessons learned.
19.5.2 Elements Human performance is a series of behaviours executed in a series to accomplish a specific task objective (result). Behaviour is what people do. Results are achieved by the behaviour, which is the mental and physical efforts to perform a task. Although results that add value are important, the desired behaviours must be deployed for the improvement of results. To achieve optimized successful performance at the job site, it is necessary that appropriate individual behaviour and leader’s behaviour must go hand in hand with appropriate organizational processes and values. All these three elements (individual, leader-level, and organizational) should work together during all phases of a task i.e., from the work identification through the completion of documentation. (U.S. 2009).
19.5.2.1
Individuals
The collective behaviours of individuals in a plant are indicators of the level of safety and performance achieved. The execution of work by an individual is dependent on the capability of the individual and the associated mental process. (U.S. 2009) The mental process is influenced by diverse factors such as the work environment and demands of the task. At high-performing plants, individuals at all levels take responsibility for their behaviours and are committed to improving themselves as well as the task and work environment. (Association and NIA Essential Guide: The Essential Guide for the Nuclear New Build Supply Chain 2019). In general, individuals are expected to communicate to create a shared understanding, anticipate error-likely situations, improve their capabilities, report nearmiss events together with explanations of direct and indirect causes, and regularly use techniques in improving human performance, all of which lead to a successful organization. (Association and NIA Essential Guide: The Essential Guide for the Nuclear New Build Supply Chain 2019).
19.5.2.2
Leaders
Leadership is a set of behaviours continually practiced to direct and focus individual and team efforts toward accomplishing the goals of the organization. An effective leader must understand what influences both individual and organizational performance. (Association and NIA Essential Guide: The Essential Guide for the Nuclear New Build Supply Chain 2019) To optimize the execution of jobs at the workplace, it is important to align organizational processes and values. Leaders should promote
226
R. Satyanarayana
positive outcomes in the work environment to encourage desired behaviours and results. All the individuals in a leadership role need to demonstrate a passion for the goal of preventing plant events and the errors that cause them. (U.S. 2009). In general, leaders should set clear expectations, promote open and unambiguous communication, promote teamwork to eliminate error-likely situations and strengthen defences, search for and eliminate organizational weaknesses that create conditions for error, reinforce desired job-site behaviour, and value prevention of error, reporting of near misses and use human performance techniques.
19.5.2.3
Organization
Organizational processes and values help the human activities involved in plant design, construction, operation, and maintenance. These factors also establish an environment that accepts that people make mistakes. The goals, policies, and priorities of the organization directly influence individual and leader’s behaviour by generating a pattern of shared understanding, processes, and values. All individuals within the organization should take it upon themselves to improve organizational processes and promote values of excellence. (U.S. 2009). The organization and individuals at all levels should encourage a culture that values prevention of events; strengthen the integrity and visibility of defences to prevent or mitigate the consequence of an error; restrict the development of errorlikely situations and create a learning environment that encourages continuous improvement. (World Association of Nuclear Operators 2006).
19.5.3 Potential Reasons for Human Errors The human mind sees things and tries to put them into perspective based on what is already known to them and also tries to remember information in terms of keywords, phrases, and pictures, which prevents remembering the details due to a limited working memory. (Collins 2004). Human beings tend to take the path of least resistance because it is easier, and we also tend to focus more on what we want to accomplish (goal) and less on what needs to be avoided because we are primarily goal-oriented by nature. As such, we tend to “see” only what the mind expects, or wants, to see. The human mind order and once established it ignores anything outside that mental model. Information that does not fit a mindset may not be noticed and vice versa. (Collins 2004). Since we tend to select the path of least resistance to accomplish a task, it is extremely hard to go against the rest of the group’s thinking in a meeting or work activity. We want to keep the harmony of the workgroup. We go with the flow! Group thinking is a reluctance to share information about a problem for the sake of maintaining the harmony of the workgroup. (Collins 2004).
19 Human Performance Excellence in the Nuclear Power Corporation …
227
Social loafing is when individuals are usually held personally accountable for a group’s performance, some individuals in the group may not actively participate due to that to avoid accountability, or “loaf” in the team’s “social” activities. (Collins 2004).
19.6 Human Performance Error Traps Engineers and supervisors should be able to identify potential human performance error traps and should use, encourage, and coach on the use of specific human performance tools to address the error traps. Some of the common error traps identified are time pressure, distraction/interruption, multiple tasks, overconfidence, vague guidance, first shift / late shift, peer pressure, change / off-normal conditions, physical environment, and mental stress.
19.6.1 Error Precursors Precursors identified for an error bound to happen are demanding tasks, hostilities in the work environment, and lack of the needed aptitude and attitude by individuals to perform tasks at the workplace. Unsafe attitudes include pride (do not insult my intelligence), heroic nature (I will get it done by hook or crook), fatalistic (what is the use?), summit fever (we are almost done), invulnerable (that cannot happen to me), Pollyanna (nothing bad will happen), and bald tire (got 90 kms and have not had a flat tire yet). The only way you can determine someone’s attitude is through observing their behaviour. The ability to detect error-likely situations, to head off preventable events, depends largely on the extent these factors are understood regarding their role in human error. Pride prevents an employee from asking for a peer check or positive feedback from peers. Extreme focus on the goal at the workplace without giving consideration on what to avoid is part of an employee’s heroic behaviour, something to watch for. The fatalistic behaviour is the belief that all events are predetermined and inevitable and one cannot do anything about them, which is also problematic. Sometimes when an employee is about to accomplish a goal at the workplace, he tends to forget or ignore important conditions of safety. This is dangerous and is known as summit fever behaviour. For example, running a red light. The sense of an employee that he or she has immunity to error or invulnerability is another behaviour to observe in employees. Pollyanna is the belief that all is well at the worksite, which could lead to not paying attention to off-normal or unusual conditions. The belief that the past is sufficient justification for not changing (improving) existing practices or conditions, known as the bald tire attitude, is an error precursor and needs to be observed.
228
R. Satyanarayana
19.7 Classification of Errors Errors are classified as skill, rule, and knowledge-based. A skill-based error is an inappropriate act due to a lack of attention. A low level of attention is given to the performance of a task. Unintentional slips or lapses. The performance of the task is routine and accomplished with little or no thought. A rule-based error is an inappropriate action performed, without the use of, or in opposition to, learned rules and/or procedures. A higher level of attention is given to the performance of a task. The performance of the task is achieved with the use of learned or referenced rules. A knowledge-based error is an inappropriate action performed due to improper analysis of the situation. There were no learned rules or procedures to apply. The task is undertaken with an analysis of the information available. The learned or referenced rules do not apply to the situation at hand. The consequences of human errors are substantial, to give some examples; 100,000 people die every year due to medication errors, diagnostic errors, and surgical errors; 17,000 people are seriously injured or die every year due to driving human errors; 15—20% revenue loss for a typical corporation due to decision-making errors; 80% of all aircraft hull loss accidents are due to pilot error, and 75% of all reportable events at nuclear power plants are caused by human error.
19.8 Error Prevention Techniques/Tools All workers, supervisors, and engineers should regularly practice human performance tools in all critical activities. The human performance tools which are to be used by workers are conservative decision-making, change management, communication practices, concurrent verification, independent verification, meetings, peer-checking, place-keeping, pre-job briefing, problem-solving, procedure use and adherence, questioning attitude, self-checking, and collaboration. (Collins 2004). In the human performance fundamentals guidance produced by the U.S. Nuclear Regulatory Commission (NRC) for training purposes, there is an elaboration on these error prevention techniques: Conservative decision-making is a rule-based and knowledge-based performance strategy that places the safety needs of the physical plant, in particular the reactor core, above the near-term production goals of the organization. Change management is typically reserved for large-scale organizational change and is usually not considered for day-to-day management activities. However, most day-to-day management involves change. The aim of communication is to achieve mutual understanding between two or more individuals involving speaking and listening. Concurrent verification is the act by a second qualified individual of verifying a component’s position before and during component repositioning. (Collins 2004)
Concurrent verification aims to prevent errors and should be used with irrecoverable acts. According to the NRC,
19 Human Performance Excellence in the Nuclear Power Corporation …
229
Peer-checking and concurrent verification are designed to catch errors before they are made. Independent verification, on the other hand, catches errors after they have been made. Meetings are conducted to solve problems that cannot be handled by an individual. Errors can be made during meetings (knowledge-based and rule-based) due to mostly inaccurate mental models and misinterpretation of information. Peer checking provides an individual the opportunity to get a second qualified person on an informal basis to verify the correct component is selected for manipulation before the act (Collins 2004).
Place keeping involves the physical act of reliably marking steps in a procedure that have been completed or not applicable (skipped). The NRC cites two primary purposes of the pre-job briefing: 1) to prepare workers for what is to be accomplished, and 2) to sensitize them for what is to be avoided. (Collins 2004) Otherwise, without this type of guidance, human beings “do not usually solve problems rigorously, methodically, and painstakingly. Consequently, the chance for error increases dramatically in a knowledge-based work situation. Therefore, people need to work with others and apply a disciplined approach to problem solving” (Collins 2004).
19.9 Defence-In-Depth Concept in Operations Defence-in-Depth concept has been well-established and practiced in several industries. One of the key beliefs in this concept is that always assume that errors are bound to happen and hence there shall be in place procedures [barriers, and practices] that will catch and correct them before they snowball. Practices and procedures must be embedded into every human activity, not just operations and maintenance. These activities at a nuclear power plant include engineers designing temporary modifications, changes to design basis, root cause analysis, health physics technicians conducting radiation and contamination surveys, filling out of radiation work permits, erecting rope barriers for contaminated surfaces, managers conducting planning meetings, analyzing human performance problems, budgeting, task-allocation decisions, administrators revising procedures, updating drawings, communicating messages between managers, etc.
19.10 Latent Organizational Weaknesses Latent organizational weakness can be broadly classified into two categories, which are weaknesses in processes and in values. The following aspects come under the processes weakness category: work control, training, accountability policy, reviews and approvals, equipment design, procedure development, and human resources. The aspects that come under the values category are priorities, measures and controls, critical incidents, coaching and teamwork, rewards and sanctions, reinforcement, promotion, and terminations.
230
R. Satyanarayana
Some examples of latent errors are design and construction deficiencies, management weaknesses, maintenance errors, component deficiencies, inadequate procedures, and repetitive procedure violations. These latent errors can exist in their dormant/hidden mode and not show until some event, during operation, triggers them. These triggers include environmental conditions, unusual system states, operator errors, and component failures. Hence to improve human performance is to identify the latent weaknesses.
19.10.1 Methods to Identify Organizational Weaknesses Some of the methods for identifying latent organizational weaknesses are selfassessments, benchmarking, post-job critiques, trending, surveys and questionnaires, observations, and root cause analysis. Self-assessment can be performed by focusing on processes and/or programs. Benchmarking can be performed by asking others, outside your utility, what are the best practices and compare to your practices. Trending to determine what is declining in performance and search for organizational weaknesses that are preventing good performance. Observations help to identify better ways to perform work activities. Finally, root cause analysis support determining the problem and implementing the corrective action to avoid the recurrence of the event.
19.10.2 Programs to Eliminate Organizational Weaknesses Programs that can help in eliminating the organizational weaknesses are soliciting and acting on feedback from workers, monitoring trends, performing process mapping, and conducting task analysis. Useful organizational attributes to achieve the best human performance are expectations for excellence and safety culture, healthy relationships and teamwork, priority given for communication by reinforcing expectations, and intolerance to error traps and events. The organization provides the context for performance, what is to be accomplished and what is to be avoided, strategies and plans, goals, objectives, resources, and functions. Senior managers are expected to set strategic perspectives and work towards the accomplishment of station goals by avoiding safety hazards. Healthy relationships are necessary to minimize obstacles or barriers to interpersonal and interdepartmental communications. Communication is one of the greatest defense mechanisms against events; studies of major disasters have shown communication flaws to be a “common mode” failure mechanism. Communication and reinforcement are absolutely critical to the production and prevention of errors and events; feedback shall be actively sought by the managers and supervisors. Effective reinforcement strategies are providing feedback specifically and frequently; verbally reinforcing preferred behavior specifically and frequently (personal); removing obstacles or
19 Human Performance Excellence in the Nuclear Power Corporation …
231
giving workers a strategy for going around obstacles; let workers know the work priorities; removing negative consequences for preferred behavior; removing negative reinforcers and rewards for at-risk behavior; enforcing penalties only with consistent undesirable behavior (progressive discipline); and accommodating those with personal problems, if practicable, otherwise, give the work to someone else. (Collins 2004) According to the NRC’s guidance on Human Performance Fundamentals: Rewards and reinforcers for front-line workers come from three sources: the work, their peers, and the boss. It is uncommon for workers to compliment or correct their peers. So, do not count heavily on peer reinforcement except from role models. Aside from the personal satisfaction people may get from their work, employees simply want to know that their bosses “saw what you did and know who you are.” This is important. Personal, or social, reinforcers are the most powerful, the easiest, and the most cost-effective means of reinforcing behavior. Managers and supervisors must know how to reinforce expectations. The above strategies go a long way toward improving performance without time-consuming programmatic changes being made (Collins 2004).
The intolerance attributes of the organization, an attitude of avoiding error traps and events (prevention mindset) in addition to production. Error precursors influence the error rate and event frequency; flawed defenses drive the event severity.
19.11 Learned from Major Nuclear Accidents 19.11.1 Three Mile Island Major lessons learned from the Three Mile Island nuclear accident in the US are (1) the need for nuclear reactor operator training improvement; before the incident training focused on diagnosing the underlying problem but afterward, it focused on reacting to the emergency by going through a standardized checklist to ensure that the core is receiving enough coolant under sufficient pressure, (2) need for improvements in quality assurance, engineering, operational surveillance, and emergency planning, and (3) Need for improvements in control room habitability (Three Mile Island Accident 1979).
19.11.2 Chernobyl The Chernobyl accident resulted from a combination of external circumstances, engineering design flaws and errors made by insufficient trained operators. Major lessons learned are the need for ensuring proper job planning for the critical activities, necessity for adequate training of manpower to perform the task planned, the organisation must have safety as its paramount goal and this must reflect in every decision and action taken up by the management, and the enforcement of adherence to procedures for all activities at the plant.
232
R. Satyanarayana
19.11.3 Fukushima The two Boiling Water Reactors (BWRs) in India of the Fukushima type have been renovated, upgraded, and additional safety features provided with latest state-of-theart safety standards learned from Fukushima nuclear accident. At the time of the incident, the NPCIL chairman explained that the Indian Pressurized Heavy Water Reactors “are of different design than that of BWRs and have multiple, redundant and diverse shutdown systems as well as cooling water systems”. Furthermore, they assured the public that “In depth review of all such events has been done for all the plants and necessary reinforcement features based on the outcome of these reviews after the Fukushima nuclear accident have been incorporated as a laid down procedure” (TeluguPeople 2011).
19.12 Good Practices Adopted in NPCIL A set of good practices adopted in NPCIL are: low-level events monitoring and trend analysis; a committee for human performance review and enhancement; routine field visits by plant management; job observation programmes; human performance training; root cause analysis; safety culture assessment and update training; operating experience sharing program; reinforcement of behaviours and attitude; electronic surveillance; simulator training for the licensed operators; dynamic learning activity training on human performance improvement tools for field operators; continuous upgrade of procedures and emergency operating procedures based on field feedback and system upgrades; equipment and system mockups for all critical activities; reward and appreciation programs for high performers; and international and corporate peerreview programs.
References Collins D (2004) Human performance fundamentals: course reference, U.S. Nuclear Regulatory Commission Nuclear Industry Association, NIA Essential Guide: The Essential Guide for the Nuclear New Build Supply Chain,” 2019. Available: https://www.niauk.org/wp-content/uploads/2021/10/NIA_EGu ide-2019_web.pdf. [Accessed 2 February 2023] TeluguPeople, No ‘nuclear’ danger to India: NPCIL chief, 19 March 2011. Available: http://www. telugupeople.com/news/News.asp?newsID=64675&catID=59. [Accessed 2 February 2023] Three Mile Island Accident, McGill University. Available: https://www.cs.mcgill.ca/~rwest/wikisp eedia/wpcd/wp/t/Three_Mile_Island_accident.htm. [Accessed 2 February 2023] U.S. Department of Energy, Human Performance Improvement Handbook, Volume 1: Concepts and Principles, U.S. Department of Energy, Washington, D.C., 2009 World Association of Nuclear Operators, Inside WANO, 14(3), 2006
Chapter 20
HRA Techniques Used in Nuclear Power Plants C. Senthil Kumar
20.1 Introduction Human actions play a vital role in mitigating potential situations that may arise in any industrial application. Human errors contribute to over 90% of failures in the nuclear industry, ~80% of failures in the chemical and petrochemical industries, ~75% of marine casualties, and ~70% of aviation accidents (Kim and Jung 2003; Alvarenga et al. 2014; French et al. 2011). In nuclear power plants during different phases of operation (startup, normal operation, shutdown, etc.), timely human action reduces the plant risk to a large extent. Therefore, the contribution of human errors is very important and needs to be properly included in the risk assessment process. This involves the identification of human failure events as well as the estimation of their probability of occurrence. Probabilistic safety assessment (PSA) is a systematic and comprehensive methodology to evaluate risks associated with complex nuclear systems and addresses all potential human errors by adopting suitable human reliability models for quantification. To arrive at a realistic human error probability, several models are used. This paper discusses some of the human reliability analysis (HRA) models used in the risk assessment process in Nuclear Power Plants (NPPs) with special emphasis on human cognitive reliability models. In NPPs, factors such as skill, rule, knowledgebased behaviors, and training of the operators, as well as the recovery or correction factors depending on the time available for human action, annunciations available, etc., are important and require proper modeling. However, several cognitive, sociotechnical, human factors, decisions, management systems, structure, etc. are difficult to model as they are not easily predictable. To reduce these uncertainties in human reliability estimation, several research and optimization studies were carried out. C. S. Kumar (B) AERB-Southern Regional Regulatory Centre, Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. S. Chirayath and M. Sai Baba (eds.), Human Reliability Programs in Industries of National Importance for Safety and Security, Risk, Reliability and Safety Engineering, https://doi.org/10.1007/978-981-99-5005-8_20
233
234
C. S. Kumar
20.2 Importance of HRA in NPP Risk assessment in NPPs is incomplete without proper integration of human reliability. Important human tasks that could prevent or mitigate undesired consequences during all major hazards, internal to the plant or during external events such as flood, cyclone, seismic, etc., require modeling. HRA is a highly interdisciplinary field of study, and it is challenging to model and quantify human behaviors, such as cognitive thinking, stress, knowledge, training, etc. There are methods reported based on simulator training, operating experience, questionnaires, etc. to quantify these factors. Nevertheless, large uncertainty remains in establishing a universally acceptable model to be adopted for HRA. In NPPs, HRA includes the identification of human actions from the safety point of view, modeling them, and assessing their probabilities. It is an integral activity and part of risk assessment in a nuclear plant and, ideally, a plant-specific HRA is required with an appropriate source of human error data for a realistic risk assessment. Human actions in NPPs are classified as pre and post-initiators. Pre-initiators are those actions during calibration, testing, and maintenance and post-initiators include diagnosis and corrective actions. For risk assessment in NPPs, various techniques are adopted to model human actions for estimating the probabilities. There may be plenty of human factors, viz., attention, perception, memory, etc., and organizational factors, viz., shiftwork, decision-making, communication, etc., that influence the performance of an operator. Some of these factors have a stronger impact and exert more influence than others and are deemed critical (Cepin 2008). These factors, which have a direct influence on human performance, are called Performance Shaping Factors (PSFs). These factors are either external or internal. External factors are equipment, procedures, environment, etc., and internal influences are related to psychological, physiological, and cognitive factors, which include attention, monotonous work, memory, fatigue, perception, etc. (Swain and Guttmann 1983). Other PSFs are training, motivation, personal issues, sleep activity, work schedule, equipment design, etc. (Shappell and Wiegmann 2000; Kim et al. 2014). The role of these factors is a central issue in accidents, and the risk assessment process should include the appropriate factors in HRA modeling.
20.3 Evolution of HRA Methods In NPPs, several typical HRA models are adopted for human error estimation. The quantitative models provide a numerical estimate for error probabilities by analyzing and evaluating various human factors that influence human performance. The first-generation HRA methods (1970–1990) focused on HEPs and operational errors. These methods are: Techniques for Human Error Rate Prediction (THERP); Accident sequence Evaluation Program (ASEP) (Swain 1987); Success Likelihood Index Methodology (SLIM); Human Cognitive Reliability (HCR); Operator Action
20 HRA Techniques Used in Nuclear Power Plants
235
Tree (OAT); Human Error Assessment and Reduction Technique (HEART); and Standardized Plant Analysis Risk—Human Reliability Analysis (SPAR-H) methods. THERP is resource intensive and has a large amount of HEP for huge actions in NPPs. THERP models predict the HEP using a fault-tree approach and also account for performance-shaping factors (PSFs) that may influence these probabilities. Human reliability analysis event tree is another method wherein the probabilities are calculated from the THERP data book. ASEP is an abbreviated and slightly modified version of THERP (Swain and Guttmann 1983). ASEP provides a shorter route to human reliability analysis than THERP by requiring less training to use the tool, less expertise for screening estimates, and less time to complete the analysis. This is a specific tool successfully applied in the nuclear industry. The second-generation HRA methods (1990–2005) focused on human performance factors and cognitive factors such as workload, mental state, stress, etc. Additionally, second-generation models consider the context in which human error occurs and related PSFs. The Cognitive Reliability and Error Assessment Method (CREAM), A Technique for Human Error Analysis (ATHEANA), Information Decision Act in Crew Context (IDAC), and Success Likelihood Index Methodology (SLIM) are some of the second-generation models. The third phase, which started in 2005, is still in progress, and some of the HRA models are Nuclear Accident and Reliability Assessment (NARA) and Bayesian Belief Networks (BBNs). These NARA and BBNs attempt to overcome the limitations of the first- and second-generation methods, and they emphasize human performance factors’ relations and dependencies (Swain and Guttmann 1983). A new computational model was developed for cognitive and factored parameters, such as attention, memory decay, and mental model for operators using Bayesian network. Bayesian Belief Network (BBN) is an emerging model to assess human errors in NPP as well as in many other sectors.
20.4 HEP Algorithms 20.4.1 Accident Sequence Evaluation Program (ASEP) ASEP provides a shorter route to human reliability analysis than THERP by requiring less training to use the tool, less expertise for screening estimates, and less time to complete the analysis (Bell and Holroyd 2009). Pre-accident tasks, Post-accident tasks, Nominal HRAs, and screening HRAs are the main elements in ASEP.
236
20.4.1.1
C. S. Kumar
ASEP Algorithm
Step 1:
Check whether the critical actions required during an abnormal event are in written procedure; if not, then HEP = 1.0, and no further analysis is required; else, proceed with the HRA process. Step 2: Using system analysis/thermal hydraulics analysis, estimate the maximum time (T m ) available for correct diagnosis and completion of the action of an abnormal event. Step 3: Identify further actions required to successfully cope with the abnormal event once the correct diagnosis is made. Step 4: For the post-diagnosis action to be performed in the control room area, calculate the time available (T a ) to complete the action in each room by a walkthrough. Step 5: For actions outside the control room, use a walkthrough to estimate the time required to get to the appropriate location to perform an action. If the estimated time is obtained from operating personnel, double them. Step 6: Calculate the available Time T a = Step 4 + Step 5. Step 7: Estimate the Time for diagnosis, T d = T m − T a. Step 8: Using T d , calculate detection and diagnosis HEP Fig. 8.3 in THERP Handbook (Swain and Guttmann 1983). Step 9: Calculate the action HEP for control room actions using Tables 8.1, 8.2, 8.3, and 8.4 in THERP Book; calculate the HEP for recovery factors based on dependency/task level/stress levels with upper bound or lower bound adjustment for all post actions. Step 10: Compute the final HEP by adding all post-action HEPs; this is the median HEP. Step 11: Estimate the mean HEP by adding diagnosis and action HEP by using the Beta distribution model to arrive at the final HEP for the scenario.
20.4.2 Human Cognitive Reliability (HCR) The HCR methodology provides time-dependent error rate estimates for the nonresponse probability for a given time window (Hollnagel 1998). The model does not strictly produce a HEP but uses the terminology developed by Rasmussen to describe the levels of cognitive processing as well as PSFs that may influence task performance. A normalized time reliability correlation (Fig. 20.1) is used, as the time for action is the most important variable. The model includes performance-shaping factors as well as skill, rule, and knowledge frameworks for categorizing the crew response (Williams 1986). A major assumption of this model is that the normalized response times for all tasks of a given type (skill, rule, or knowledge-based) follow a single distribution (Weibull). It is also reported that beyond a certain critical time, non-response probabilities depend only very weakly on the amount of time available to act. Wherever required,
20 HRA Techniques Used in Nuclear Power Plants
237
Fig. 20.1 Time reliability curves (derived from simulator data and small-scale tests)
a thermal hydraulic analysis is used to determine the available time for each human action in the NPP, and expert opinion is adopted to estimate the median response time for human actions (Felice et al. 2012). PSFs considered in this method are stress level, operator experience, and man–machine interface, and this model requires a thorough assessment of the time window available, cognitive processing type, and PSFs. The median time taken by the crew is one of the major factors in HCR and is determined under normal conditions. The initial screening model of estimated HEPs for diagnosis within time T of one abnormal event by NPP control room personnel is given in Fig. 20.2.
Fig. 20.2 Nominal diagnosis model for HEP of NPP control room personnel
238
20.4.2.1
C. S. Kumar
HCR Algorithm
Step 1: Identify the situation as Skill-based, Rule-based, or Knowledge-based for the human action. Step 2: Estimate the time available (t) to complete the action. Step 3: Determine the co-efficient of the Performance shaping Factors (PSFs) Viz., K 1 , K 2 , and K 3 from IAEA Tec-Doc 592 (IAEA 1991) (table below) for the respective PSF (K 1 -Stress level, K 2 -Operator Experience and K 3 -Quality of Operator) selected for Human action. Performance shaping factors
Co-efficient
K 1 = Op. experience
K 1 = stress level
K 1 = quality of operator/plant interface (HMI)
Expert
−0.22
Average
0
Novice
0.44
Grave
0.44
Potential
0.28
No emergency
0
Low activity
0.28
Excellent
−0.22
Good
0
Fair
0.44
Poor
0.78
Extremely poor
0.92
Step 4: Obtain the correlation coefficient factors considered for human actions A, B, and C from IAEA (1991) for the selected Skill-Based (SB), Rule-Based (RB), and Knowledge-Based (KB) actions. Behavioral level
A
B
C
Skill-based
0.407
0.7
1.2
Rule-based
0.601
0.6
0.9
Knowledge-based
0.791
0.5
0.8
Interim values of the parameters A, B, and C. Step 5: Estimate the median time response T 0.5, = (1 + K 1 ) (1 + K 2 ) (1 + K 3 ) * T 0.5 (Where T 0.5 is the median value and is estimated using two different methods. One is by expert judgment, operator experience, or simulator environment; another one is numerically deriving it. The numerical estimate method is as follows: if the time for crew action is available in terms of the number of data (N), initially the data should be arranged in ascending order and the data at mid position. (N + 1)/2 is the median if the data set is odd.
20 HRA Techniques Used in Nuclear Power Plants
239
If even, the median is the average of data at N/2th position and ((N/2) + 1)th Position). Step 6: Compute Human Error Probability (HEP) using [
(t/T0.5 ) − B P(t) = e − A
]c
where t = time available for the crew to complete a given action T 0.5 = Estimated median time taken by the crew to complete an action A, B, and C = Correlation Coefficients associated with SB, RB, KB P(t) = HEP for a given system time window t.
20.4.3 Standardized Plant Analysis Risk—HRA Algorithm (SPAR-H) SPAR-H method is used to determine human failure probability from the influence of human performance (Forester et al. 2006). It was originally called Accident Sequence Precursor (ASP) and is used for HRA by the US Nuclear Regulatory Commission (NRC). SPAR-H is born out of THERP and ASEP with further simplification and generalization of the two approaches, and it features nominal HEP for processing/ diagnosis and response/actions (Felice et al. 2012). It considers eight PSFs, viz., complexity, stress, workload, available time, ergonomics, experience, procedures available, and fitness for duty. The PSF multiplier data is available for respective PSF for diagnosis and action. HEP estimation is possible for the full power conditions or low power/shut down conditions (LP/SD) of the Nuclear Power Plant. Researchers observed that more failure events are initiated by human activity during LP/SD (Forester et al. 2006). The stepwise procedure in the SPAR-H algorithm is presented below.
20.4.3.1
SPAR-H Algorithm
Step 1:
Identify the initiating event to be analyzed and segregate the Human Factor Events (HFE) into Diagnosis HFE and Action HFE Assign Nominal HEP values 0.01 and 0.001 for Diagnosis HFE and Action HFE, respectively. Obtain the PSF value for each of the eight PSF from SPAR-H framework reported in Gertman et al. (2005). ∑8 Compute PSFcomposite by i=1 P S F. Estimate total human error probability for diagnosis HEP, (HEP)d = NHEP * PSFcomposite.
Step 2: Step 3: Step 4: Step 5:
240
C. S. Kumar
Step 6:
Check whether the number of negative multipliers (the assigned PSF value that is more than one, is referred to as a negative multiplier) is greater than 3. If yes go to step 7 else go to step 8. Step 7: Re-compute the (HEP)d. Step 8: Repeat steps 3–7 and compute HEP for Action HEP, namely (HEP)a. Step 9: Calculate the task failure probability without formal dependence Pw/od, by adding (HEP)d and (HEP)a. Step 10: Use (HEP)d and (HEP)a calculate the task failure probability with formal dependence Pw/od, by adding (HEP)d and (HEP)a based on the five-level dependence models in THERP (Swain and Guttmann 1983). Step 11: If the study did not consider formal dependence, the true HEP viz., Pw/ od. Else Pw/d. A simple comparison of the three models is given below: Method
Characteristics
Merit/Demerit
ASEP
• Quantification of pre/ post initiator Human Failure Events (HFEs) • Expert judgment is used whenever shortage of data • Nominal HEPs calculated and modified by PSF multipliers • Large range of potential PSFs used
• Large number of PSFs covers all contexts • Modified version of THERP; it is simpler and easier for HEP estimation as compared with THERP • Widely applied, large data available • Requires less training than THERP
HCR
• To estimate the nominal likelihood operator error based on HCR curves • Operator response time estimated using simulator data • Best supporting tool with other quantification methods • Uses simulator measurements for estimating the operator’s response time • Uses 3 different PSFs
• It is not a standalone method • Requires several simulations to construct a dataset for analysis • Expert judgment can include analyst-to-analyst variability • Simulator efficacy influences results • Median time taken is based on expert and simulator data
SPAR-H • Eight PSFs are used here • Nominal error rates available for both diagnosis and action • Uses designated worksheets to ensure analyst consistency • Employs a beta distribution for uncertainty analysis
• Less resource intensive • PSF resolution is inadequate • Chronologically new method, low usage until date • Does not account for the dependencies of human performance reliability with time; uses generic TRC
20 HRA Techniques Used in Nuclear Power Plants
241
20.5 Conclusion Even though a number of quantitative and qualitative methods are available to estimate HEPs, only a few methods, such as HEART, THERP, SLIM, have been attempted so far in the NPP domain in various countries. In this paper, the procedure for three HRA models, viz., ASEP, HCR, and SPAR-H, are discussed. The HCR model is expected to produce realistic estimates when the skill levels and knowledge of the operator are well defined and there is no ambiguity in the assumption of coefficients A, B, and C. However, when there is less time available for action, the chance of committing an error may be high, and HCR models are not suitable in such conditions. Further, PSFs of human behavior, such as psychological/mental behavior and dependency, are not accounted for in HCR. Under these conditions, we need to consider depending factors, SPAR-H and ASEP models are recommended. SPAR-H uses only a limited number of PSFs compared to ASEP. Due to data scarcity, both ASEP and SPAR-H follow the THERP data book. A significant advantage of the SPAR-H model compared with the other two methods, viz., ASEP and HCR, is that the SPAR-H model considers work ergonomics, the fitness of the worker, and the work process. These are important factors to be considered in HRA. For example, poor lighting and annunciations may result in error-causing situations. However, the main disadvantage of SPAR-H is in the treatment of uncertainty in PSFs, as there are no explicit guidelines provided for addressing a wide range of PSFs when needed. The prediction of human error through the use of expert judgment for PSFs in SPAR-H is known to provide varied results. In view of the above discussion, it is observed that qualitative methods may be considered for better estimation and a comparison of HEPs helps to identify suitable methods for the event. When plant-specific information on dependency level, task, and stress factor is available, the HCR method may provide realistic estimates, and, in all other cases, SPAR-H may be suitable for HEP estimation in risk-informed decision-making.
References Alvarenga MA, e Melo PF, Fonseca RA (2014) A critical review of methods and models for evaluating organizational factors in human reliability analysis. Prog Nucl Energy 75:25–41 Bell J, Holroyd H (2009) Review of human reliability assessment methods. Health Saf Exec Cepin M (2008) Importance of human contribution within the human reliability analysis (HRA). J Loss Prev Process Ind 21(3):268–276 Felice F, Carlomusto APA, Ramondo A (2012) Human reliability analysis: a review of the state of the art. Int J Res Manag Technol 2(1):35–41 Forester JKA, Loi s, Kelly D (2006) Evaluation of human reliability analysis methods against good practices. NUREG-1842, U.S. Nuclear Regulatory Commission, Washington, DC French S, Bedford T, Pollard S, Sloane E (2011) Human reliability analysis: a critique and review for managers. Saf Sci 46(6):753–763
242
C. S. Kumar
Gertman D, Blackman H, Marble J, Byers J, Smith C (2005) The SPAR-H human reliability analysis method. NUREG/CR-6883, U.S. Regulatory Commission, Washington, DC Hollnagel E (1998) Cognitive reliability and error analysis method (CREAM). Elsevier Science, Amsterdam Kim J, Jung W (2003) Taxonomy of performance influencing factors for human reliability analysis of emergency tasks. J Loss Prev Process Ind 16(6):479–495 Kim S, Lee Y, Jang T, Oh Y, Shin K (2014) An Investigation on unintended reactor trip events in terms of human error hazards of Korean nuclear power plants. Ann Nucl Energy 65:223–231 Shappell S, Wiegmann D (2000) The human factors analysis and classification system—HFACS. U.S. Department of Transportation, Federal Aviation Administration, Springfield, VA Swain A (1987) Accident sequence evaluation program—human reliability analysis procedure (NUREG/CR-4772). U.S. Nuclear Regulatory Commission Swain A, Guttmann H (1983) Handbook of human reliability analysis with emphasis on nuclear power plant applications. NUREG/CR-1278, August 1983. [Online]. https://www.nrc.gov/docs/ ML0712/ML071210299.pdf The International Atomic Energy Agency (IAEA) (1991) Case study on the use of PSA methods: human reliability analysis, IAEA, TECDOC-592. IAEA, Vienna Williams J (1986) HEART—a proposed method for assessing and reducing human error. In: Ninth advances in reliability technology symposium, Bradford
Chapter 21
HRP in the Aviation Sector Kishan K. Nohwar
Aviation is a serious ‘business,’ but it is a ‘business’ nonetheless, as it earns revenue for the owners of the airline. However, it also has the greatest risk of failure, since it relies on the human element in terms of the cockpit crew that is placed ‘in charge’ of the lives of the passengers for the duration of the flight—from take-off to landing. In the early days of aviation—in the middle of the last century—passenger aircraft were flown without very many technical assistants to the pilot; his complete attention was required to ensure the safety of the flight. On trans-continental flights, this led to fatigue, which at times resulted in a lack of attention at crucial stages of flight, e.g., either while negotiating bad weather en-route, or landing in poor visibility or heavy crosswind conditions. To reduce the undue burden on pilots, the autopilot was invented. This relieved the pilot from the often-monotonous task of being ‘hands-on’ on the controls in long-duration flights. With further improvements in avionics, the autopilot could be coupled to the navigation system and the aircraft could ‘fly’ autonomously along the pre-fed route with minimal pilot interference. The natural tendency of a gyro (used for direction indicators, in addition to the magnetic compass) to ‘drift’ complicated the issue and often led to navigation errors. This necessitated the pilots to frequently cross-check their actual ground position, either by cross-referencing with ground features, or by cross-referencing with radio beacons en-route to obtain their exact position in the air at any given time. This was a laborious task that was often entrusted to a navigator, who was always on board such intercontinental flights/domestic flights of long duration. There was also a flight signaler on board (particularly on military transport aircraft) who would carry out en-route communications tasks on long crosscountry or international flights.
K. K. Nohwar (B) Formerly of Indian Airforce, Center for Airpower Studies (CAPS), New Delhi, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. S. Chirayath and M. Sai Baba (eds.), Human Reliability Programs in Industries of National Importance for Safety and Security, Risk, Reliability and Safety Engineering, https://doi.org/10.1007/978-981-99-5005-8_21
243
244
K. K. Nohwar
With further improvement in the accuracy of navigation systems due to the introduction of the satellite-based Global Positioning System (GPS) and the ring-laser gyro, the need for the navigator and flight signaler was obviated, and most airlines removed them as part of the cockpit crew. The onus was squarely on the pilots now to navigate safely and accurately from one place to another under all weather conditions, both by day as well as by night. Did this increase the workload on the pilot? Obviously, it did, but their training ensured that pilots were ready to undertake the rigors of intercontinental flights under all conditions with just the two-man/woman crew. For longer flights, there was a stand-by crew available. Aviation disasters are not a new phenomenon, especially since flying is not a natural activity for humans. Therefore, handling trying situations in the air requires professionally trained pilot(s) who can respond to the situation and recover the aircraft safely. For this reason, the entire Human Reliability Program that is presently adopted independently by individual airlines, in my view, should be universally applicable. This includes ensuring the quality of training and selecting pilots and cabin crew with due emphasis on Emotional Quotient, Spatial Orientation, and Psychometric Tests. Crew resource management and the extent of emphasis on simulator training are examples of other areas that require continuous attention by airlines the world over if ‘avoidable’ accidents—particularly those attributed to human error—are to be prevented. Air India uses Emotional Quotient (EQ) as an essential attribute when selecting its pilots. However, during the pilot selection interviews, the word of the psychologist is final. If the psychologist says that a pilot candidate is not cleared (due to lack of emotional quotient), then Air India will not hire that pilot. EQ is also checked through a questionnaire, while simulator flying is essentially used to check one’s basic aptitude to undertake the given mission. Psychometric tests have nine different modules to evaluate the pilot’s motor skills: • • • • • • • •
Basic Math; Test for short-term memory; Multitasking capability; Visual motor skills coordination; Spatial perception (to check whether you are situationally aware); Technical test; Basic understanding of Physics; Aviation English (this module assesses the pilots’ ability to understand what is happening around them to guard against situations like the Tenerife Accident, where verbal communication was an issue); and • Verbal comprehension, which means the ability to analyze regulatory information. Sometimes, the words that are written are very difficult to comprehend. The language that is used is very official and a pilot might not be able to comprehend what exactly they mean. With this background, I will now cover a few accidents and open a discussion about what happened in each case, why it happened, and what can be done to prevent
21 HRP in the Aviation Sector
245
similar accidents in the future. We will first cover the Tenerife accident. Then, to demonstrate a total failure of crew resource management, I will give you the examples of Air France 447, where the basics of flying were ignored due to an over-reliance on automation, and Air Asia Flight QZ8501, which crashed in December 2014. I will also cover industry shortcomings stemming from not adhering to human reliability programs amongst senior management, e.g., for the recent Lion Air and Ethiopian Airlines crashes of the Boeing 737 Max. Of course, everything is not gloom and doom in aviation history. Captain Sully Sullenberger’s ditching on the Hudson River will be covered in the end. We will examine what this incident tells us. Does it tell us to get back to flying the aircraft or does it tell us to rely more on automation? But first, the Tenerife accident. The Tenerife Collision - March 27, 1977 How did this collision occur? Reports show that the KLM aircraft was allowed to backtrack and, thereafter, await further instructions. And at the same time, the PAN AM aircraft was allowed to enter the runway and clear off on Link 3. All aircraft that had landed at Tenerife had been diverted because the neighboring airfield—where they were supposed to land—had a bomb scare. As these pilots were operating from Tenerife for the first time, the PAN AM crew missed Link 3; it was extremely foggy at the time, and visibility was very poor. They would now have to clear the runway on Link 4. While the PANAM aircraft was still on the runway, the captain and co-pilot suddenly saw the KLM aircraft approaching them. It had not been cleared to take off because the ATC had only read out departure instructions (which does not mean that the aircraft is cleared to take off). It is believed that the KLM pilots only heard the ATC controller say, ‘Okay’; whereas he had very categorically said, ‘Okay’, followed by, ‘Standby for take-off; I will call you’. The PAN AM crew had also mentioned that they were still on the runway. Now, in analyzing this accident further, one realizes that, in this aircraft, in the KLM cockpit, there is a pilot who is supposed to be a hero in the Netherlands. He was a check pilot, a very senior pilot who, supposedly, could do no wrong. His copilot was a person who was his junior. And this check pilot, this senior pilot, had given the co-pilot his clearance and various ratings, instrument ratings, etc. The first time when the captain attempted to take off and began opening throttles, the copilot actually throttled back. He did not allow the captain to open the throttles. When the Captain asked for take-off the second time and the ATC responded by saying, ‘Okay’, he opened the throttles again. The captain had probably ignored the ATC’s caution ‘Standby for take-off; I will call you’ after first saying ‘Okay’. The crew in the cockpit of the KLM aircraft—except possibly the Captain—were aware that the PAN AM aircraft had still not called clear of the runway. But they did not dare tell this to the Captain, their senior—a clear example of adverse power distance on the flight deck. So, once the Captain began opening the throttles, they just let it happen, realizing full well that the PAN AM aircraft was still on the runway. So, obviously, the inevitable happened, and they collided with the PAN AM aircraft that was still on the runway. 583 died; this was the worst disaster in aviation history. Therefore, one might ask, ‘Why is Crew Resource Management so important?’ Crew Resource Management is nothing more than training to help crew
246
K. K. Nohwar
members work together more effectively. While it can never eliminate error and assure safety in a high-risk endeavor such as aviation, it is an array of tools that organizations can use to manage errors. Power distance involves one’s feelings toward the hierarchy. It was a factor in the KLM flight, where there was a large power distance between the Captain and the co-pilot. The higher the power distances between leaders and their subordinates, the greater the responsibility of the Captain, who is expected to be decisive and self-sufficient. In countries like China, India, and several Latin American countries, this cultural power distance is high, and it is at a medium level in a country like the Netherlands. The co-pilot thought he was in absolutely safe hands; he left everything to the experience and better judgment of the Captain, a figure he revered. In cultures where subordinates are expected to know their place and not question their superiors, these types of errors can happen. Furthermore, it is not a healthy situation at all in certain industries—especially in aviation—where subordinates should be permitted, if things are going out of hand, to take over controls from the Captain. To say that they are not permitted to take over controls in critical situations would be undermining the aviation industry, which trains its pilots for exactly such situations. However, there are stray cases when this is not followed, and that is when disasters happen. In May 2010, an Air India Express Boeing 737 overshot the tabletop runway in Mangalore, India, and crashed. The Captain was an expat. He had been asleep during most part of the flight and had just woken up for the landing. The co-pilot had correctly assessed the steep approach angle for landing and had warned the Captain thrice to “go around”; however, he did not take the final step of actually opening throttles or selecting TOGA (“take-off and go around”). Ultimately, that aircraft overshot the runway, went into a ravine, and crashed. I59 passengers and crew died; there were only seven survivors. I am talking about Air India Flight 812. Air France Flight AF447 Now we come to Air France Flight 447, which was traveling from Rio to Paris on 1st June 2009. The details of the events that took place are captured succinctly in the widely available CBS NEWS video, whose transcript is given below. Transcript of video from CBS NEWS: Harry (The News Presenter): Nearly two years ago, Air France Flight 447 mysteriously crashed in the Atlantic killing 228 people. Now with the flight recorders finally recovered from the ocean floor, we know more about what happened in the final, harrowing moments of that Rio to Paris flight. Here is Nancy Cordes (Reagan National Airport, Arlington, Virginia). Nancy Cordes: The plane’s black boxes reveal that the Air France pilots, faced with conflicting speed readings, didn’t know whether to speed up or slow down. At 11:10 p.m., the Airbus 330 was heading into a powerful storm, 700 miles off the coast of Brazil, when the autopilot disengaged, and the plane began to roll to the right. The cockpit display showed inconsistent speed readings, a sign that the external speed sensors, called pitot tubes, were malfunctioning, or perhaps had iced over. “I have the controls,” called one of the plane’s two co-pilots. A stall warning went off. And he climbed briefly before pointing the nose down to pick up some speed. Then, he leveled out and began to climb a bit, perhaps not realizing he was still going too slow. That triggered a second stall warning. But instead of pointing the nose back down, he continued to pull up. Mark Rosenker: The aircraft begins to roll. It begins to go from one side to the other side.
21 HRP in the Aviation Sector
247
Nancy Cordes: Mark Rosenker was Chairman of the NTSB when the plane crashed in June 2009. Mark Rosenker: What we’ve seen in this cockpit voice recorder is, two pilots extremely confused about what’s happening on their flight deck, what’s happening to their instrumentation. Nancy Cordes: It was now a minute and a half into the crisis and the Captain, who had been on a scheduled rest break, made it back into the cockpit. Twenty seconds later, the co-pilot flying the plane said, “I don’t have any more indications,” most likely referring to speed readings. “We have no valid indications,” confirmed the other co-pilot. As it pitched and rolled the plane slowed even more and stalled, falling from 5 miles up for three and half minutes at a rate of 10,000 ft per minute, slamming, belly first into the Atlantic. What would those three and a half minutes have been like for the pilots and for the passengers? Mark Rosenker: Horrific is the only way to describe it. The aircraft was rolling. The aircraft was descending at a high rate of speed, tremendous G-force [sic]. It was horrific. Nancy Cordes: The last recorded speed, right before impact, was just over 120 miles per hour. A jet of this size cruising at altitude should have been flying at more than 500 miles an hour. Harry? Harry: Nancy, are officials in Brazil assigning blame? Nancy Cordes: Well, officials in Brazil and in France where this investigation is taking place, say, all of it is to blame. The faulty pitot tubes which have now been replaced on all Air France jets, and also the actions of the pilot. They’ll also be taking a look at the training of the pilots. Were they trained to deal with a stall in these kinds of situations? This investigation is going to be going on for about another year, Harry.
What do the experts recommend in this case? The last thing this commentator talked about was training. The first step is to identify the stall, a deep stall, based on indications. The indications as far as this aircraft is concerned are that in a deep stall, the nose is above the horizon—giving the impression that the aircraft is climbing— whereas the aircraft is actually descending. The pilot needed to look at the vertical speed indicator, which clearly indicated that the aircraft was in a descent. And if he had looked at the altimeter, he would have found that from 30,000 ft the altimeter had started unwinding since the aircraft was losing altitude. The aircraft was coming down, but the nose was still above the horizon and the wings were rocking. Training to recognize this situation was probably something that was not carried out for any pilot, whether it was from Air France or from other airlines. While we do not know definitively whether this training was taking place with other airlines, we know that Air France was not doing it, because they admitted that this training - for recovery from a deep stall at high altitude - should have been carried out in a simulator. We previously mentioned automation; the autopilot was supposed to take over to a certain extent. But because the aircraft had passed through a thunderstorm, ice crystals had formed on the pitot tubes, and the speed sensors were iced over. Since the indications of speed were not there, the autopilot cut off. This moved the aircraft’s Fly-by-wire flight control system from Normal Law to Alternate Law, which meant that the autopilot had been disconnected and the aircraft was required to be controlled manually by the pilot.
248
K. K. Nohwar
Now, if the pilot flying the aircraft gets into a situation where the aircraft’s nose is high, while the aircraft is actually descending rapidly–and he has never encountered a situation like this in his simulator practices– what does he do? He did not recognize this situation (of the aircraft being ‘stalled’) because he appears to have forgotten the basics of any flying training where the very first exercise is ‘Stall and Recovery’. The aircraft had reached a stalled condition—which was corroborated by the audio warnings for ‘Stall, Stall, Stall’. To recover from that situation requires the pilot to push the nose down and assist the aircraft to build up speed, un-stall the wings (by getting more airflow on the wings), and thereafter start flying the aircraft once again. But when the Captain—who had more than 30,000 hours of total flying experience—came back from resting and tried to assess the situation, he should have realized at one glance that the aircraft was descending, although the nose was above the horizon. These are typical symptoms that the aircraft is in a state of ‘deep stall’. He should have immediately instructed the pilot on controls to push the nose down, to get the bank to one side, and let the aircraft build up speed. He never did that. He only realized that the aircraft was in a stalled condition once the pilot on controls said that he had had the stick back the whole time. By the time the Captain realized this, it was too late to recover the aircraft as the aircraft was only 5000 ft above sea level. The aircraft requires at least 7000–8000 ft to recover from the situation it was in. There was a cumulative failure on the part of all the pilots in the cockpit of Air France 447 to comprehend the actual flying situation the aircraft was in (deep stall), and this led to the accident. The correct sequence of action in the cockpit of Air France 447 should have been: identify the aircraft’s attitude (state of flight); initiate correct recovery actions; fly the aircraft; do not get mesmerized. A natural corollary to this is that the pilots should learn everything about the aircraft and also study about its technical capabilities. In the Indian Air Force, “Recovery from unusual attitudes” is a standard exercise that is performed during instrument flying sorties. In these exercises, the instructor tells the pilot to close his eyes and then takes the aircraft through high G maneuvers before handing over control back to the pilot to ‘recover’ the aircraft from the unusual attitude it is in. At that point, the pilot takes over the controls and recovers the aircraft from the unusual attitude by following certain basic guidelines—a ‘mustknow’ for all pilots. The phenomenon we see in the Airbus, and potentially the Boeing—that of a nose high up attitude, a high rate of descent, and the wing-to-wing rocking setting in—is also a phenomenon in the MiG 21. This is a potentially dangerous situation in combat where the pilot is trying to follow the ‘adversary’, and, in the heat of the moment, ignores the rate of descent having set in. With the nose still above the horizon, the pilot would think everything is okay, without realising that the aircraft is actually descending toward the ground. This is a potential hazard, and it can lead to an accident. Once again, a pilot is encouraged to understand this and ‘off load’ the aircraft (means that the aircraft needs to be put in a gentle dive so that speed can be built up sufficiently so that the aircraft is ‘unstalled’ and is ‘flying’ once again), get the wings level, and, thereafter, once speed is built up, recover to level flight.
21 HRP in the Aviation Sector
249
In the MiG-21, an angle-of-attack indicator is provided, besides an audio warning whenever critical angles of attack are exceeded in flight. Air Asia Flight QZ8501 This brings us to the Air Asia flight QZ8501 flying from Surabaya to Singapore that crashed into the Java Sea on December 28, 2014. All 162 people on board the Airbus A320-200 were killed in the crash. In this accident, exactly the same situation occurred as in Air France 447. The aircraft also encountered a thunderstorm. Since the thunderstorms near the Equator are very severe—because of the ‘inter-tropical convergence zone’—the chances of freezing (of the pitot tubes) are high. The Air Asia pilot also wanted to fly above the weather, and his pitot tubes also froze up. As he climbed, he found he had lost all airspeed indications because of the freezing of the pitot tubes. This pilot responded in exactly the same way as the Air France 447 pilots did. At that time, I was still on the board of Air India as an Independent Director. I wrote a letter to the newly appointed DGCA––who had also been, until then, a member of the Board of Directors––to tell her that there were many similarities between the two events. I pointed out that Air India flew the same aircraft, the Airbus. I asked her if our pilots were being trained in the same manner (i.e. wherein not adequate attention was being given to flying in ’Alternate Law’ at high altitudes). I wanted to bring home the point that we must do something about training. I believe that there was some action taken and training (for simulating a ‘deep stall’ condition and recovery from the same, while flying in ‘Alternate Law’ conditions) was indeed carried out subsequently. Although I had retired by then, training was always a big concern for me. Boeing 737 Max accidents (Lion Air Flight 610 and Ethiopian Airlines Flight 302): And very recently, in view of the Boeing 737 Max Aircraft accidents, the Boeing 737 Max has been grounded. The MCAS (Maneuvering Characteristics Augmentation System), which essentially prevents an aircraft from stalling, was responsible for the accidents. To make the aircraft engines more efficient, the Boeing company increased the engines’ fan size, and they got about 15% improvement in fuel economy. But what happened was that, because of the size of the engines, the aircraft displayed a tendency to pitch up on take-off and also sometimes in flight. So, this MCAS system was used to pitch the nose down. However, if the angle of attack indicator malfunctioned, then this pitch-down could happen without warning while flying. That is exactly what happened in these Boeing 737 Max accidents, and it did not permit the pilot to take over control of the aircraft in time. No matter how much force was applied, the pilot could not take over control of the aircraft. It takes about 45 seconds to be able to select the correct switches off, and then trim the aircraft neutral. Forty-five seconds is a long time, especially if this occurs after take-off; you don’t have the luxury of that time. Unfortunately, the Boeing Company did not mention this in their flight manuals for the pilots; as a result, most of the pilots were not even aware of this phenomenon. They were hearing about it for the first time after the first accident. As per the April 18, 2019 report, the new software will prevent the triggering of the erroneous anti-stall system that resulted in both crashes, and this will be tested over 90 days. In this case, you may well ask if the real culprit was the Human Reliability Program of the senior leadership at Boeing. Did the greater emphasis on aircraft
250
K. K. Nohwar
production—to defeat a rival company for large orders—compromise the safety aspects of flying? The flight crew operational manual did not spell out actions to be taken in case of loss of angle of attack indications. And, finally, there was no simulator training for pilots. ’The miracle on the Hudson’ (US Airways Flight 1549) I have painted a couple of tragic accidents to drive home the point about HRP—or lack of it—in the aviation industry. I will now cover an accident that would bring cheer—and hopefully a smile—to all your faces. On January 15, 2009, US Airways Flight 1549 lost both engines because they were hit by Canadian geese after take-off, and the aircraft was just hurtling toward the ground. But, in this case, the pilot in command, Captain Sully Sullenberger, found the Hudson River on his left. He assessed that he could not turn around and land back at LaGuardia airport. So, he decided to carry out a ditching on the Hudson River. He told the ATC that he may end up in the Hudson. Although Teterboro airfield was on his right, he assessed that he would not be able to make it to Teterboro safely. Captain Sully Sullenberger carried out a copybook ‘ditching’ on the Hudson; no lives were lost. This incident tells us that we must begin to learn to fly the aircraft all over again, and not rely so heavily on automation.
Chapter 22
Myriad Ways in Which Things Could Have Gone Wrong but Did Not Dinesh Srivastava
22.1 Introduction The Variable Energy Cyclotron Center (VECC) at Kolkata has a K = 130 roomtemperature cyclotron that has been operating since 1977. It was indigenously built and has since undergone several changes, e.g., replacing its power supply system and the RF system, having a provision for heavy ion beams, at first using a PIG ion source and then using an ECR ion source, etc. It has played a very important role in training generations of accelerator scientists and engineers and providing a variety of energetic beams for many pioneering experiments of fundamental importance in nuclear physics, nuclear chemistry, isotope production, analytical chemistry, and radiation damage studies of nuclear materials (Fig. 22.1). The Center also houses a K = 500 Superconducting Cyclotron. It is undergoing field corrections for extraction of beam (at the time of this writing) (Figs. 22.2 and 22.3). The VECC is currently developing a Radioactive Ion Beam Facility based on the principle of on-line separation of isotopes and working on “A National Facility for Unstable and Rare Ion Beams.” It is also installing a medical cyclotron for isotope production at an extension campus about 20 km away. In addition, it runs an iodine facility at a cancer hospital for treatment and diagnostics (Fig. 22.4). The VECC has been an important nodal point for the development of detectors for in-house use, as well as for international collaborations at the Relativistic Heavy Ion Collider at Brookhaven National Laboratory, at SPS, and Large Hadron Collider at CERN, and at the Facility for Antiproton and Ion Research (FAIR). It is developing accelerator components, especially superconducting RF cavities in collaboration with Fermi Lab and TRIUMF. D. Srivastava (B) Former Director, Variable Energy Cyclotron Centre, Kolkata, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. S. Chirayath and M. Sai Baba (eds.), Human Reliability Programs in Industries of National Importance for Safety and Security, Risk, Reliability and Safety Engineering, https://doi.org/10.1007/978-981-99-5005-8_22
251
252
D. Srivastava
Fig. 22.1 The variable energy cyclotron at Kolkata
The center also serves as the nodal center for the handling of radioactive emergencies for the entire east of India. These activities require a vast array of power supplies providing DC currents on the order of one thousand amperes, RF, as well as a static voltage of up to 60– 80 kV, very high and ultra-high vacuum, vast quantities of liquid nitrogen and liquid helium to cool superconducting magnets, clean rooms, a very sophisticated workshop, and several High-Performance Computing, including a Tier-II Grid Computing for experiments at CERN.
22.2 Sophisticated Systems Require Extreme Care and Devotion to Duty As mentioned in the Introduction, building such sophisticated facilities requires extreme care and expertise, and maintaining them requires us to be constantly alert. Thus, the extremely high voltages with ppm stability, extremely high currents with ppm stability, extremely high magnetic fields covering large areas with ppm stability, very high vacuums, and liquid helium plants and transfer lines require an unprecedented level of surface finish, cleanliness, and strict maintenance of humidity and temperature. Three liquid helium plants on a small campus, along with
22 Myriad Ways in Which Things Could Have Gone Wrong but Did Not
253
Fig. 22.2 K = 500 superconducting cyclotron at VECC, Kolkata
other expensive equipment and machines require that our technicians be regularly re-trained. The vast computing and computer control of the facility requires us to be alert to viruses and malware attacks and other more dangerous cyber-attacks by terrorists and hackers. The institute has radiochemistry labs and clean labs for making detectors and cavities. The large and ever-increasing demand for electricity has necessitated the setting up of three electrical substations. The unreliable and fluctuating power supply has necessitated the installation of a large Uninterrupted Power Supply and several diesel generators, the latter requiring storage of diesel on site. Further expansion of the lab is stalled as we do not have space on the campus, which is quite overcrowded now. Thus, we are developing a new campus, about 10 km away, especially to house the National Facility for Unstable and Rare Ion Beams. The successful operation of all these facilities for several decades without any mishap so far is not a mean achievement. It has become possible due to an evervigilant, compassionate, and rigorous safety culture. This culture is sensitive to the pressure under which our scientists, engineers, and technicians work around the clock under difficult conditions. For example, in the 1970s, the city of Kolkata had frequent power failures, often lasting for several hours. These outages jeoparded hours of hard work, which had to be started all over again. Realizing that such occurrences were less frequent during
254
D. Srivastava
Fig. 22.3 Radioactive ion beam facility at VECC, Kolkata
the night, it was decided to work during the night, and in fact, the first beam was seen very early in the morning of June 16, 1977. This devotion to duty and the willingness to sacrifice personal comfort to complete a project on time implies the highest degree of commitment to the organization. The management has worked hard to honor this by creating avenues for career advancement for its employees, providing them training, opportunities for higher education, medical facilities for the families, comfortable living accommodations in an expensive metro like Kolkata, and sympathy in listening to their grievances.
22.3 A Culture of Safety The success mentioned above has been possible because of the development of a culture of safety from the formative days of the institute. From the very beginning, strict adherence to radiation safety guidelines, recommendations, advice, and rules are followed at all times. The employees are required to keep all protective gear in perfect condition and no relaxation is ever given towards their usage. These verifications also involve regular checks of grounding for high voltage, earthing for suppression of noise, proper boots and overalls, glasses to cover eyes, masks for low
22 Myriad Ways in Which Things Could Have Gone Wrong but Did Not
255
Fig. 22.4 Medical cyclotron operated by VECC, Kolkata
temperatures, gases, and hazardous chemicals, earmuffs for high noise areas, gloves for hands, non-magnetic tools, and absolute cleanliness at all times. Other precautions that are enforced include strict regulations for dress and ornaments worn by employees deployed in such areas, e.g., no sarees, dupattas, stoles, loose kurta, or dhoti, and no ornaments. Those who work in areas with magnetic fields are not allowed to carry magnetic materials and should not be using pacemakers. Of course, expectant mothers and children under 18 are not allowed in radioactive areas. In order to keep track of the ongoing activities and to be able to analyze any accidents, even if minor or a lapse in safety, the entire area has been divided into zones and covered by closed-circuit television cameras. Access control is strictly enforced. At all times, a minimum of two persons work together, and all areas are provided with emergency phone lines and emergency switches to switch off the power supplies in case of a perceived or actual emergency. A public address system from the control room keeps personnel informed of developments. Warnings are posted clearly and prominently, and faded, over-written, torn, or damaged warning signs are immediately replaced. The safety procedures are not relaxed for anyone, as we have noticed that if we relaxed it even once, others too tend to ignore them, for example, when they are in a hurry or when no one is watching. Any changes in the systems or their layout are clearly recorded.
256
D. Srivastava
22.4 Upgrading the Professional Competence We provide regular training to employees through theoretical classes, practical classes, written tests, and finally grueling interviews. The operators for accelerators and the plants need a license for operations, which must be renewed every 3–5 years after rigorous retraining and re-examination. They are trained to maintain detailed logbooks and be able to reconstruct the events in detail based on entries in the logbook. Strict adherence to checklists is always followed. We have independent safety officers, who maintain and continuously upgrade safety manuals based on recommendations from safety committees and the Atomic Energy Regulatory Board. They perform regular tests of every alarm and gauge, and they are required to examine and record every alarm, even if it is a false alarm.
22.5 Common Human Failings Now let me tell of some precautions which we take to combat some common human behaviors and tendencies. We try to avoid putting people who are under stress or who get nervous easily on sensitive equipment. We have noticed that people tend to avoid extended periods of concentration instinctively. Very often they tend to assume, “All must be ok, what can go wrong by itself?” They do some things unconsciously, out of habit. When there is a problem, people often tend to avoid laborious and final corrections if some make-shift solution works. For example, a leak was allowed to persist for 30 years, as a make-shift solution was available, which often worked, though not quite satisfactorily. Every time we experience a problem we must go through the entire check without skipping any steps. We tend to think in terms of “same symptom same solution.” We tend to try the solutions which readily come to mind and look easy to implement without weighing all the other options. We tend to give more importance to speed than to perfection. We cannot remember all that we need to. Our working memory is limited. We are not able to pay attention to too many tasks at the same time which require our attention. We want to see that “all is well” and we end up trusting that “all is well” and proceed. It is not easy for us to see our own mistakes. We are also scared/reluctant to bring the mistakes committed by seniors and other colleagues to their notice. It is not always possible to see all possible solutions or all possible problems. Generally, it is difficult to stay calm and avoid frustration when things are not working well, or when problems are arising one after the other (a bad hair day!). We may feel embarrassed to tell the mistake that we may have made to avoid feeling or being humiliated. We could be overworked and tired. We want to be seen by the boss!
22 Myriad Ways in Which Things Could Have Gone Wrong but Did Not
257
Complacency is one of the biggest issues, especially in repetitive work, and can be a source of many severe troubles. People tend to argue, “Well, everything was fine so it must be fine.” Rivalry among workers can also complicate the issues, as people tend to think, “I can do it fast and report to earn praise.” There can be deliberate mischief, either by a disgruntled employee or by someone who deliberately loosens a board or a bolt whose location only he or she knows. No amount of care can overcome a lack of discipline. It is also important to be alert to the family tensions of the employees, and it is important that they have access to understanding (but not condescending) counseling, talking, and sharing. It also helps to give them a short leave to go on a vacation to sort things out. One of the most common reasons for frustration is promotion and comparison to others. These frustrated employees tend to argue, “Let those who are being rapidly promoted do the work.” It is important to provide training to these people to improve their chances of advancement. The management should also be alert to the onset of depression in employees. This is most often seen in self-neglect and can be handled in its initial stages by sharing and involving them in social get-togethers. More serious cases may require professional help. The management should also remain alert to employees getting into bad habits, living beyond their means, or taking too much debt, each of which could make them vulnerable to greed.
22.6 Precautions As our institute is run by the Department of Atomic Energy, it faces a variety of security threats. To deal with it, all our employees undergo strict background checks. We do not permit temporary workers in radioactive areas and access control areas, even under supervision. Very strict security regulations are in place to stop a breach of the computing system and the data network. Our employees undergo regular health checks, as is also required by the Atomic Energy Regulatory Board. My one advice to all my colleagues is that they should ensure that no one is humiliated, and definitely never in public, as people hold on to grudges for years. This affects their well-being as well as their creativity and productivity. The management should ensure that people are proud to be working for the organization. They should feel wanted and cared for. They should have faith and trust that they and their family are secure and safe.
258
D. Srivastava
22.7 Conclusions The Variable Energy Cyclotron Center at Kolkata is a national institute, and we have employees from all parts of the country with vastly different cultural backgrounds. A sensitive, understanding and considerate leadership with an eye for detail has over decades created an environment where employees feel safe, cared for, and valued. This encourages them to give their best and achieve their highest intellectual and technical potential. This has been done by following rules without any relaxation and without being complacent. This has ensured that no serious accidents or injuries or incidents in over five decades of very complex constructions, fabrications, and operations.
Chapter 23
Human Reliability in Launch Vehicle Systems Rajaram Nagappa
23.1 Introduction Human reliability analysis (HRA) is a systematic technique to assess human systematic risk and has been widely used in various industries for enhancing the safety and reliability of complex socio-technical systems. This paper provides a perspective on HRA in the launch vehicle industry. Satellites play a ubiquitous role in our everyday life. They keep us connected, enable international communication, tell us where we are, guide us to where we want to go, keep us informed about the weather conditions, enable us to observe and manage our natural resources, provide early warning of impending natural disasters, and a host of societal and security services. For satellites to provide such services, they need to be placed in designated orbits at altitudes and inclinations appropriate for the execution of the intended mission. Satellite Launch Vehicles (SLVs) are employed for launching and placing satellites in the required or intermediate orbits. The satellite needs to be imparted at a certain velocity termed ΔV (Delta V) for it to attain orbit and the value of ΔV is dependent on the orbital altitude. The value of ΔV varies from 7.8 km/s for low earth orbit (LEO) satellites to 10.2 km/s for satellites in geosynchronous transfer orbit (GTO). Launch vehicles comprise a number of stages with each stage providing incremental addition to the ΔV with the final stage providing the final injection velocity. The cost of access to space is expensive with costs ranging upwards of $2500 for placing a kilogramme of payload into orbit. The cost of launch on Indian carriers PSLV and GSLV Mk-III can be surmised from the financial sanctions provided for this purpose (Singh 2018). The PSLV cost works out to |219 crore ($29.2 million) R. Nagappa (B) International Strategic and Security Studies Programme, National Institute of Advanced Studies, Bengaluru, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. S. Chirayath and M. Sai Baba (eds.), Human Reliability Programs in Industries of National Importance for Safety and Security, Risk, Reliability and Safety Engineering, https://doi.org/10.1007/978-981-99-5005-8_23
259
260
R. Nagappa
per launch and the GSLV Mk-III cost works to |433.8 crore ($57.8 million) per launch. Add to this the cost of the communication, or earth observation, satellite the launch vehicle will carry to get an idea of the magnitude of the expense involved. Communication satellite GSAT-7 for example is reported to have cost |185 crore in 2013 (Bagla 2013). Typically, GSAT communication satellites are designed for a service life of 15 years and clients in the government as well as private sector depend upon them to provide continuous service. All these factors reinforce how essential the reliable operation of the launch vehicle is in fulfilling the mission needs—not once but in repeated missions. The quality assurance and safety procedures in this industry are robust and have led to an excellent record of successful launches. Those problems or failures that do occur can be largely attributed to human factors. A general feeling of familiarity can breed complacency on the job and a failure to follow all procedures completely. Training and change management can help to address this for current employees and those just joining the workforce.
23.2 Satellite Launch Vehicle A satellite launch vehicle is a system of systems with each system required to perform to certain requirements and the interfaces integrated and managed to perform as a single entity. While this is easy to state, this requires a great degree of planning, systems engineering, and execution to obtain a reliable outcome. The Polar Satellite Launch Vehicle (PSLV) of ISRO has proven to be a reliable and dependable launch vehicle, placing into orbit not only Indian satellites but also satellites for international customers. The vehicle had, by the end of 2019, chalked up 50 flights with 48 of them proving to be successful. The PSLV shown in Fig. 23.1 is made of 29 subsystems and demands a number of events to be managed in its typical flight time of 12 minutes, the time required for injecting the satellite into orbit. The cutaway view in Fig. 23.2 shows the vehicle’s stage and payload fairing features. During the active flight period, a number of events have to be executed sequentially, and any departure may result in anomalies—some of which may be manageable and some catastrophic and possibly resulting in the failure of the mission. These include initiating the ignition sequence of the appropriate stage, initiating the separation sequence by commanding the ordnance chain, steering the vehicle on the required path, sensing the errors and correcting them, monitoring vehicle performance/health parameters, and, in the case of a major anomaly, terminating the flight. A typical flight sequence of a PSLV flight is shown in Fig. 23.3. In addition to the major subsystems, a launch vehicle is a labyrinth of components, devices, pumps, actuators, electronic modules, sensors, connectors, and wiring. For example, the earlier Augmented Satellite Launch Vehicle (ASLV), which was a 4-stage vehicle employing solid propellant rocket stages, included the following:
23 Human Reliability in Launch Vehicle Systems Fig. 23.1 Polar Satellite Launch Vehicle (PSLV) (Picture courtesy ISRO)
• Sensors of different types • Electronic modules • No. of Onboard measurements • Cells of different ratings • Connectors • Length of wire used
253 Nos 647 Nos 771 parameters 426 Nos 1255 Nos 32 km
261
262
R. Nagappa
Fig. 23.2 Cutaway view of PSLV (Picture courtesy ISRO)
The launch vehicle design and choice of systems are governed by the mission requirements—mainly the payload mass and the orbit placement. To meet these needs, PSLV employs a combination of solid propellant and liquid propellant-fueled stages. In the case of GSLV, an upper stage using more energetic cryogenic propellants is employed. As these and connected vehicle complexities grow, there are more parameters to monitor and more sensors that need to be put in place with enhanced scope for quality management.
23 Human Reliability in Launch Vehicle Systems
Fig. 23.3 PSLV typical flight profile Brochure))
263
(Courtesy PSLV C36 Mission Brochure (PSLV-C36
264
R. Nagappa
There is active human involvement in the total cycle of design, development, facility establishment, material selection, procurement and acceptance, fabrication, processes, assembly, testing/evaluation/acceptance, system realisation, integration, and flight operations. Consequently, ensuring the reliability of the product at every stage depends to a large extent on the skill set of the players, the checks that the skill sets are being applied correctly, documentation, the practise of logging all operations however trivial, and the establishment of quality protocols with an anticipation that something can go wrong.1 In such a complex and interrelated system of systems, the probability of something performing outside the specification is always present, and good practises demand providing design resiliency to account for such departures from the normal. Incorporating redundancy in the design is an evident and accepted practise. However, this can apply to systems that are critical to the success of the mission and can be accommodated within the limits of dimensions, mass, and power (these are critical resources of a launch vehicle with multiple claimants). Reliability is defined as the probability of a product/device performing its defined purpose satisfactorily for a specified period of time under the prescribed operating/ service conditions. It is evident that to establish the reliability of the product, an adequate number of tests have to be carried out under prescribed operating conditions. Electronic and mechanical products on a launch vehicle are subjected to temperature excursions and must withstand vibration, noise, and acceleration loads during the thrusting phase of the stage propulsion system. System characteristics change over time due to storage conditions, humidity, thermal cycles, corrosion, chemical diffusion, chemical reaction, and other reasons. These could degrade or affect the performance of the system. The absence of failure can, therefore, be considered as an index of reliability. In terms of the probability of failure F, reliability R can be expressed as. R =1− F The probability of failure, similar to R, is defined over a specified period of service and a given environment severity. Information related to the failure rates of the constituents of a system is required to determine the probability of failure. The value of F is between 0 and 1 and, therefore, the value of R will lie between 1 and 0. With so many constituent parts, even if the part reliability is of the order of 0.9 or more, the overall launch vehicle reliability could start in the range of 0.6 to 0.65.
1
The author had opportunity to visit the solid rocket propellant plant of Morton Thiokol (now known as Orbital ATK), the manufacturers of the Solid Rocket Booster (SRB) powering the Space Shuttle. In the shop where the nozzle liners were being machined, there were signboards indicating the name of the component being prepared, its function and its cost. I was told that the signboard informed the machinist the importance of the component and the cost emphasised the value and the care that should be taken in turning out the component to avoid any mistake and loss.
23 Human Reliability in Launch Vehicle Systems
265
23.3 Launch Vehicle Reliability—Practises The Vikram Sarabhai Space Centre (VSSC) of the Indian Space Research Organisation (ISRO) is the nodal centre for launch vehicle development. Launch vehicle reliability practises include decentralised design, as well as the development and common oversight review functions. The subsystem design and development agencies are required to incorporate measures which assure the accomplishment of the performance specification and demonstrate the same through analysis and testing within the constraints of interfaces and safety requirements. The impact of off-specification performance on the overall mission and/or another subsystem demands real-time or offline flagging, depending upon the seriousness of the anomaly. Good reliability practise in launch vehicle systems demands trust and transparency among subsystem developers, which must be encouraged and nurtured by Project Management. The following review practises aid the quality and reliability process, especially when project goals and deadlines on the one hand and technology development on the other proceed side-by-side. (1) System Concept Review: This review is carried out when the project/mission is conceptualised.2 The review helps in assessing the preliminary system definition and its adequacy. Examined from the lens of the state of the art, the review helps in gauging the appropriateness and adequacy of the proposed technology approaches to satisfy the mission requirements. The review will also assess the support infrastructure needed and their timely availability. (2) Preliminary Design Review (PDR): This review is scheduled at the end of the initial development and preliminary design phase, which would have enabled the drawing up of specifications of the major elements of the subsystem. The review would include for mechanical systems—material selection and fabrication process, and inspection; for chemical systems—material system, process cycle, and in-process checks; and for electronic systems—evaluation of breadboard and mock-ups. The documentation provided for PDR confirms close out of actions of the System Concept Review; details the functional requirements, interface specifications, technology/process specifications, test specifications,
2
Once the top management takes a call on a new mission requirement (for example, remote sensing satellite of 1000 kg nominal mass needed in 600–800 km polar orbit) a task team is formed to study the feasibility, suggest candidate architectures, identify available technologies/new technologies and infrastructure needed. The Team also provides an estimate of cost and time frame required to achieve the mission. The Task Team report after acceptance forms the basis for detailed design of the mission and generates specifications for key subsystems and other downstream activities.
266
R. Nagappa
quality assurance plan, and the configuration control management plan.3 The system safety plan is an important element for PDR.4 (3) Critical Design Review (CDR): This review establishes the completion of the detailed design, reconciliation of changes, freezing of drawings, and processes. Completion of CDR establishes the baseline for the production of prototypes and flight units. CDR is done at the subsystem level as well as for the integrated flight unit. In addition to the three review stages of a launch vehicle, review of all the flight subsystem readiness and acceptance is done prior to commencement of integration activity. Completion of Flight Readiness Review (FRR) and the Mission Readiness Review (MRR) signify the flightworthiness of the vehicle and its readiness for launch. At the subsystem level starting from specifications to product realisation, an elaborate detailing of materials, material properties, material storage, material testing and acceptance, weighment, process related equipment, equipment calibration, process, in-process checks, coupon level testing, NDT, final product quality and acceptance is carried out and reviewed. Prior to taking up the process, the process document is formally reviewed and accepted, and this forms the basis for preparing process logs to be followed by the processing agency, as well as the inspection and quality people. When all components of a subsystem are processed and integrated, the Test/ Flight Article Review Board (TARB/FARB) is convened to review and give clearance for the test of the article. The Board checks the quality of the product for testability and for meeting the test objectives. Particular attention is given to out-ofspecification waivers, process deviations, and resolution of performance anomalies during environment tests.
23.4 Quality Organisation It is evident that in the realisation of a launch vehicle, there is human involvement at practically every stage of the process. Quality protocol appropriate to the material/ device/component/subassembly/subsystem needs to be documented and followed. To ensure that quality protocol is observed, inspection and quality control measures 3
For aerospace systems, configuration and weight budget management is a critical task. A change in dimension, tolerance, mass, volume and performance can have an impact on an adjacent system and needs to be kept in check. When a change has to be implemented, the change has to be notified to other system developers to enable them to check for any consequences/make appropriate alteration in their system. 4 Launch vehicle constituents include propellants, ordnance systems and electrical systems and pose certain explosive and corrosive hazards. While a safety plan is a must for many operations, for launch vehicle systems it is an essential requirement to lay down the safety procedures to be followed during constituent preparation as well as during integration and assembly process. Safety procedures must cover storage, transportation and handling practises as well as mitigation/remedial procedures in case of a mishap.
23 Human Reliability in Launch Vehicle Systems
267
Fig. 23.4 Quality organisation example
and quality assurance practises must be observed along with periodic quality audits. For carrying out the quality protocol, an appropriate quality organisation similar to the one shown in Fig. 23.4 needs to be in place. With proper documentation of the process, implementation by trained and disciplined crew, and quality control procedures, the realised product should be acceptable and fully usable. In actual practise, however, deviations do creep in. The constituents of a launch vehicle system cannot always be treated under accept/reject criterion. There is a cost and effort that has gone in the realisation of the product, and an outright rejection has an impact not only on the product in question but also on other interfacing elements and consequent project delays. The salvageability of the product is seriously examined. As an example, take the case of a machining tolerance not being met. During inspection, this non-conformance is brought out. The matter is discussed in the local salvage committee (LSC) formed at the workshop floor level. The LSC clears the acceptance if the deviation is of: (a) minor nature; (b) has no impact on the subassembly or any other system; and (c) if there is a precedence for such a deviation. Sometimes, the LSC may clear the element in question to proceed conditionally, and the final clearance may come after a review of the acceptance test of the subassembly. The LSC may decide that the impact of the deviation may impact the performance or the interface and refer to a higher forum for decision. The next forum is the Waiver Board (WB), which examines the deviation in more detail. The WB may refer the issue to the specialist division for expert analysis or subscale experimentation and take a view based on the analysis/experiment outcome. The WB may also suggest a corresponding modification in the interfacing element to accommodate the deviation. The WB may also consider the deviation based on analysis result to be not acceptable and advise rework/rejection. In case the materials and processes are such that there is likelihood of recurrence of the deviation, the WB may refer to the Configuration Management Board to accept the changes and incorporate any corresponding changes in interfacing elements. In spite of all these measures, sometimes a critical inspection is not carried out, as it has been missed in the drawing. The lapse is not caught until a failure occurs due to the lapse.
268
R. Nagappa
23.5 Reliability and Decision Making Design resilience and redundancy add to the reliability of the launch vehicle. The redundancy can be provided to critical avionics and ordnance systems. Many primary systems of the vehicle cannot have the luxury of backups. The methodology adopted is to subject the system to qualification tests and adopt a system of acceptance tests for every component, device, and subsystem selected for ground test or flight test. Obviously, the qualification test levels are kept higher than the acceptance level. For example, pressurised devices like solid rocket motor chambers, liquid propellant tanks, and gas bottles are subjected to burst tests (limited to a few numbers) for design qualification purposes. However, every unit is subjected to acceptance tests to proof levels, which is 10% higher than the design pressure. Similarly, the airframe structure is subjected to a level higher than the flight loads. All such tests are instrumented heavily (100 or more channels of measurement are common), and the recorded data is compared with the pre-test prediction. In rare instances, some strengthening of the structure may be required, in which case the test is repeated after correction. Material selection is an important process. Selection is often governed by the prior knowledge of the material pedigree and application/usage. The selected material should meet the functional requirement, including environmental conditions, and should function with positive safety margins during the service life of the component. Chemical, structural (both metallic and polymeric composites), material characterisation, and acceptance tests at the material level, as well as the derived component level, are important requirements. Electronic parts are selected from a ‘Preferred Parts List (PPL)’ and go through prescribed tests and evaluations (T&E). The T&E levels are generated considering the manufacturer’s information, procurement specification, application guidelines, and the service environment. Any non-listed part used has to meet a separate qualification procedure drawn for acceptance of such non-standard part. The overall process for solid propellant processing including quality control and review steps is shown in Fig. 23.5. The non-destructive testing (NDT) of the finished product is an important marker for the product quality. In a process specific operation like solid propellant processing, involving active human participation, deviations are likely to occur. Some could be due to operator carelessness (example: touching and contaminating a clean surface), some beyond the control of the operator (example: power outage during critical operation) and some due to lack of understanding (example: flow characteristics of thixotropic propellant slurry). NDT is likely to show blowholes, interface debonds, and resin-starved areas. Cracks in the cured propellant are a possibility, but with the presently used composition and achieved mechanical properties this is remote. Flaws falling beyond the acceptance standard are analysed for their impact on the integrity of the system as well as on performance before clearing for use. In the early days of propellant processing for the PSLV first stage motor, large clusters of blow holes were seen in the NDT. When the flaw was repeated in the subsequent casting, two actions were undertaken. One involved analysis and a series of subscale experiments to study the salvageability of
23 Human Reliability in Launch Vehicle Systems
269
Fig. 23.5 Solid propellant processing/quality protocol
the product. The other involved understanding process parameters and equipment that needed modification to realise a flaw-free product. The cartons, control round, and tag ends coupons are indicators of the stage motor performance, and their properties are used to check the system integrity and predict the stage motor performance.
23.6 Launch Vehicle Failures and Reliability Issues Large subsystems of the launch vehicles, like integrated propulsion systems, are qualified through ground tests called static tests. The stage control systems are integrated with the propulsion system during such tests and a duty cycle representative of the flight regime is simulated. A detailed post-test analysis confirms the design’s robustness. In case a potential design weakness becomes apparent, corrective measures are put in place. The question is how many ground tests are needed to qualify the stage propulsion system. The practise in the US is close to a dozen ground tests for large rocket propulsion stages. In India, the criterion adopted is for 3 consecutively successful tests in flight hardware. In the case an anomaly or failure occurs, tests have to be repeated. Doing a large number of ground tests does not necessarily add to the overall reliability. Data on international launch vehicle flight failures also support this. More than 50% of the failures are due to the propulsion system with a smaller proportion of other system failure contributions as summarised in Table 23.1 (Salgado et al. 2018). Over a span of 36 years, 157 launch vehicle failures do not add to a significant number, and the percentage success over the total number of vehicles launched by a
270
R. Nagappa
Table 23.1 Launch vehicle failure causes 1980–2016 (Salgado et al. 2018) Country Propulsion Avionics Separation Electrical Structural Other Unknown Total by country USA
22
4
10
Russia
49
3
2
Europe
8
1
China
6
1
Japan
3
India
3
Failures 91 58%
1
1
3
3
44
8
19
81
1
10
9 2 1
1
5
1
8
2
1
1
11
14
2
3
13
23
157
7%
9%
1%
2%
8%
15%
100%
country would exceed 95%. For instance, India’s time tested and proven PSLV has had a success rating of 48 out of 50 flights. After the failure of the maiden developmental flight on September 09, 1993, PSLV had a string of thirty-nine successful flights, but had a failure in the 41st flight on August 31, 2017. After review and rectification, PSLV has continued to maintain its successful streak. Once the design is proven in the development flights, subsequent failures can mostly be attributed to workmanship problems, problems related to modifications/ improvements, problems related to change of materials or components (due to discontinuation by manufacturers), and lack of oversight. In the operational/production phase of launch vehicles, familiarity breeding a sense of complacency cannot be ruled out. Some individuals feel they know the drill well and do not refer to or follow the process logs. Slip ups could happen unless the supervisor corrects the situation. Also, when old hands are replaced with new member(s), the rigour of training and tutoring them into the working drill is important. This is even more so if the work is contracted out. Proper documentation, handholding, and supervision are important elements during such transfers. As launch vehicles reach the production stage, this element becomes all the more important.
23.7 Conclusion Launch vehicles are complex, multi-disciplinary systems. The subsystems they carry are explosive and corrosive in nature and involve high-pressure systems, calling for adequate safety in operations. The launch vehicle subsystem realisation, integration, and test calls for all tasks to be carried out with due diligence, following the documented procedures and inspection steps, and there is intense human interaction in every step of the subsystem realisation process. Achieving a zero-defect capability is a goal difficult to achieve in launch vehicle systems, especially so where chemical/polymer related processes are involved. While non-destructive test set up can be
23 Human Reliability in Launch Vehicle Systems
271
automated, the interpretation and acceptance/salvageability need human intervention. Artificial Intelligence tools could be used to automate the acceptance process, but this requires some dedicated effort. Shortly, the space programme plans to undertake a human space flight programme. In addition to the launch vehicle, the crew module is being engineered with attention to crew working ease and crew safety during launch and crew recovery. The space programme has worked out elaborate test, evaluation, modelling, and simulation exercises for successfully implementing the mission, including rigorous checks and reviews involving experienced and expert members. In this context, it is relevant to recall the words of the great rocket pioneer Konstantin Tsiolkovosky who said “The earth is the cradle of humanity, but mankind cannot stay in the cradle forever.” It is time to step farther from the cradle and explore new space vistas using new concepts in launch vehicles. It is proven that most industrial accidents are related to some kind of human failure, sometimes with catastrophic consequences. The Space Shuttle Challenger accident of January 1986 (STS 51L), in which all seven astronauts died, is a classic example. The present paper refers to human error probability in the satellite industry, but the same human reliability factors can be applied to the nuclear industry. It is fundamental to identify tasks, actions, or activities that depend on human behaviour or even determine the conditions that influence human error and thus increase risk. With this goal, the most important methods, techniques, and tools to assess human failure (error) are referenced, along with their potential applicability. In the last two decades, there has been a major effort to create methods, techniques, and tools that help analysts to understand and reduce human failures when performing an activity. For obvious reasons, the nuclear industry has been the motor for investigating and developing new models. The satellite industry provides another fertile source of information on how human failures arise in critical industries.
References Bagla P (2013) 655-crore space mission will dedicate satellite to military. https://www.ndtv.com/ india-news/rs-655-crore-space-mission-will-dedicate-satellite-to-military-533051 PSLV-C36 Brochure [Online]. https://www.isro.gov.in/pslv-c36-resourcesat-2a/pslv-c36-brochure Salgado M, Belderrain M, Devezas T (2018) Space propulsion: a survey study about actual and future technologies. J Aerosp Technol Manage 10 Singh S (2018) Cabinet clears 10,900 crores for 30 PSLV, 10 GSLV Mk III launches. https:// timesofindia.indiatimes.com/india/cabinet-approves-rs-10900-crore-for-30-pslv-10-gslv-mkiiilaunches/articleshow/64481976.cms
Chapter 24
Improvement of Human Reliability & Human Organisational Factors at NPPs Natarajan Kanagalakshmi
24.1 Introduction Accidents/Disasters leave footprints that show that human factors play a predominant role in the case of emergency conditions. The researchers analyse the various incidents and accidents and arrive at the contributing factors for the root cause of an accident. Human factors were reasoned to be the root cause of the major nuclear industry disasters, such as Chernobyl and Three Mile Island. It is evident that 50–70% of the accidents at Nuclear Power Plants are due to human error (Trager and Trager 1985). Both human error and human factors are used interchangeably as general terms to refer to the root cause of the accidents wherever the human and machine interface exists. It is important to analyse/investigate all the accidents/incidents and their structure to avoid recurrence.
24.2 Structure of an Accident If we analyse any accident/incident, there will be numerous contributing factors which can cause an accident. The causes of an accident may be immediate causes and root causes. Immediate Causes: These are the unsafe acts and conditions that resulted in or could have resulted in an accident. They explain why an accident happened. Root Causes: Root causes are the causes that result in unsafe acts being committed or unsafe conditions existing. Root causes affect the whole system; thus, their correction N. Kanagalakshmi (B) Bharatiya Nabhikiya Vidyut Nigam Limited (BHAVINI), Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. S. Chirayath and M. Sai Baba (eds.), Human Reliability Programs in Industries of National Importance for Safety and Security, Risk, Reliability and Safety Engineering, https://doi.org/10.1007/978-981-99-5005-8_24
273
274
N. Kanagalakshmi
not only affects the single accident being investigated, but many other future accidents as well. The root cause is the most fundamental cause that can be reasonably corrected to prevent the recurrence of the error.
24.3 Human Error Factor Human error is defined by Gordon as “Human acts which are judged by somebody to deviate from some kind of reference act - they are subjective and they vary with time” (Gordon 1998). These are specific acts which can either directly (Active Errors) or indirectly (Latent errors) cause an accident (Gordon 1998). People will make mistakes due to carelessness, but sometimes they knowingly deviate from the rules for their convenience, such as taking shortcuts. Since the majority of people follow the shortcut and no unusual happenings were experienced, they also tend to follow. These latent behaviours are to be identified as human errors. The behaviour of deviating from rules need not be considered due to human error, there may be willful or intentional acts that can be destructive. In order to reduce human errors, it is important to understand the characteristics of a person. Russell Ferrell, a professor of human factors at the University of Arizona, identified three major factors which can affect human characteristics and cause human errors in the workplace: 1. Overload: • Environmental Factors – Noise, Distractions. • Internal Factors- Personal Problems, Emotional Stress. • Situational Factors- Unclear Instructions, Risk Level 2. Inappropriate response • Detecting a hazard - No correction. • Removing safeguards – Machines/Equipment. • Ignoring safety. 3. Inappropriate activities • Performing the tasks without requisite training. • Misjudging the degree of risk involved in a given task (Reason 1990). Human error taxonomy was studied in detail by various industrial psychologists, viz. Swain and Guttman (1983), Rasmussen (1982), and Reason (1990). The results of analysing various accidents and incidents in the nuclear industry by Rasmussen, who categorised 200 significant events into the human error categories, indicate that the majority of errors made in the nuclear industry are omissions and errors that were made previously but were not detected (Rasmussen 1982). Still, little is known about how individual error tendencies interact within complex organisations of people working in high-risk industries. The research shows the understanding of
24 Improvement of Human Reliability & Human Organisational Factors …
275
the basis of human error and its impact on the causation of accidents. Human error taxonomy models by various industrial psychologists are discussed in Figs. 24.1, 24.2, 24.3. As a whole, it is understood that the unsafe acts of an event are mainly due to human error, which may be intentional or non-intentional. By combining the human error taxonomies of the various experts, a comprehensive human error classification can be categorised, as shown Fig. 24.4. It is evident that the majority of the accidents/incidents are due to unsafe acts that are the result of human factors being affected by various other factors and causing human error. The input/output model is detailed below and analysed regarding the latent and active errors that cause an event to occur (Fig. 24.5). In an input/output model, the following failures may affect the whole process and cause latent errors:
Errors of omission
Errors of commission
Sequence errors
•Fail to do something required
•Do something you shouldn’t do
•Do something in wrong order
Timing errors •Do something too slowly or too quickly
Fig. 24.1 Adapted from Swain and Guttman’s Taxonomy (1983)
Skill Based errors
• Attentional failure to monitor progress. • Forgetfulness. • Misrecognition of events.
Rule Based Errors
• •
Misapplication of good rules. Application of wrong rules.
• •
Lack of knowledge Incorrect/incomplete mental model of the problem.
Knowledge based errors
Fig. 24.2 Adapted from Rasmussen’s Taxonomy (1986)
276
N. Kanagalakshmi
Slips Good intentions, right mental model, but do something wrong An error of commission
Lapses Good intentions, right mental model, but fail to do something
Mistakes Good intentions, wrong mental model
An error of omission
Fig. 24.3 Adapted from Reason’s Error Taxonomy (1990)
Fig. 24.4 Human error classification
Violation Willful circumvention
Not necessarily violation in the sense of malevolent intent; can also be “heroism” or “mentality of there’s a better way to do something”
24 Improvement of Human Reliability & Human Organisational Factors …
277
Fig. 24.5 Elements of output and human contribution (Reason 1990)
1. Failure in decision making by the Management 2. Failures in goal setting by the Management 3. Failure in planning and allocation of resources and reliable equipment by line managers Latent errors result in latent conditions in the system that may become contributing causes for an accident. They are present within the system as unnoticed conditions well before the onset of a recognisable accident sequence. The interaction of humans with a machine or system, given human factors such as overload, inappropriate response and inappropriate activities, and violation of using Personal Protection Equipment (PPE), may be the immediate cause of an event and termed as active errors. Active errors are unsafe acts, failures of technological functions, or human actions that become the local triggering events that afterwards are identified as the immediate causes of an accident.
278
N. Kanagalakshmi
24.4 Human Error—An Optimistic Perspective The old view of human error in event investigations was mainly focussed on finding fault with a certain person or a specific cause of the trouble. The present view of human error is found to be difficult to understand; the behaviour or characteristics of humans, when exposed to the complex and dynamic systems in which people work, is basically not safe at all. There is an urge to reconstruct human contributions for system failure or for an event. As per The Field Guide to Understanding “Human Error” by Sydney Dekkar, the old view and new views of human error can be listed as follows (Dekkar 2014): Sl. no
Old view
New view
1
“Human error” is the cause of the trouble “Human error” is a symptom of trouble deeper inside the system • Systematically connected to features of tools, tasks, and operating environment
2
“Human error” – a conclusion • Get rid of unreliable people • More procedures • Appeals to vigilance • More technology
“Human error”- a starting point • Understand how people create safety through practise
3
Says what people failed to do Says what people should have done • To prevent the mishap
Tries to understand why people did what they did Why did it make sense for them? • If it makes sense to them, others may do it too
4
Somebody didn’t pay enough attention If only somebody had caught the error, then nothing would have happened
Safety is never the only goal of a worker People do their best to reconcile goals and make trade-offs. (efficiency Vs. Safety) Nobody comes to work to do a bad job. Bonafidity to be verified
24.5 How Do We Achieve Sustainable Results to Improve Human Reliability? 24.5.1 Decision Makers Decision-making plays a crucial role in any organisation, helping to build strategies to promote and achieve the organisation’s goal. However, it can deeply affect the continuity of the operations. Many of the research findings arising from recent investigation reports relate to poor managerial decisions triggering operational sequences that result in undesirable outcomes. To eliminate the uncertainty, the decision makers
24 Improvement of Human Reliability & Human Organisational Factors …
279
shall follow the suggested measures to improve human reliability in their organisation. Human factors can cause latent errors that are not apparent to see as the cause of an event, but which can deviate from the intended action in the long run. Some measures that decision makers could initiate to mitigate these latent errors due to human factors include: • Planning strategies to measure where the organisation stands and set concrete objectives for improvement; • Training and Educating staff to provide an understanding of key concepts; • Providing a methodology for human reliability improvement; • Certifying personnel for key activities; • Implementation of Psychological Assessment including cognitive skills to be carried out while recruiting employees; • Team building—integrity; and • Motivation.
24.5.2 Line Managers Line Managers are a bridge between the employees and the management. The strategies and the goals set by the decision makers need to be implemented by the line managers. The line managers must understand the goals, adopt the strategies, plan the implementation measures, train the employees, and arrange for a conducive work environment. The line managers could adopt the following measures: • Identifying suitable human resources with adequate skills and motivating the people; • Allocating suitable resources required for the work; • Capacity building; • Machinery/system updating; • Hazard Identification Risk Assessment/Standard Operating Procedure; • Maintaining databases; • Creating a committee constitution for critical review; • Giving feedback to decision Makers; • Forming an investigation team for Human Reliability Analysis aspects; • Investigating all levels of events; • Striving to understand why people’s assessments and actions made sense at the time- given their context; • Not being over-critical or pointing out what they should have done; • Focussing attention on all the possible contributing factors of a problem, not just the people; • Making every human feel like a stakeholder; • Considering the Circadian system; • Providing simulator training for operators, maintenance team; • Arranging a human-friendly system;
280
N. Kanagalakshmi
• Adapting new technologies; • Considering the human factor in engineering design.
24.5.3 How Do We Achieve Improvement on Human and Organisational Factors? As stated by H. W. Heinrich, and also largely accepted globally, the main vulnerabilities in industrial safety come from Human Organisational Factors (HOF) (Heinrich 1931). Despite this fact, most organisations find it difficult to give priority to human and organisational factors. Emphasis on HOF came after major disasters such as Three Mile Island, Bhopal, Chernobyl, etc. There is a need for improving the HOF at all levels in an organisation. Priority to HOF for safety must be given. The decision makers/leaders of an organisation should be inclined towards the understanding of HOF implementation. The strategy must be framed with a focus on the involvement and participation of all in the daily life of an organisation. Figure 24.6 on Human Organisational Factor Improvement illustrates the requirements and considerations with respect to the human and organisational aspects. Recruitment Process [Individual, Social and Organisational Behaviour]
Human
Periodic Evaluation [Performance Appraisal, Process Intervention & Social Behaviour] Event Analysis [Accident/Incident due to human error] Training/ Skill Development
Organisational
Work place layout/ Process design/ Tools Policy/Decision Making/Procedure
Psychological / Physiological Report 1. Annual Medical Review [Psychological / Physiological aspects] 2. Performance Appraisal 3. Service Record History
Event Investigation /Root Cause analysis Report Assessment/Validation/Retr aining Ergonomics/ Human Factors Engineering Human capabilities Vs Target accomplishment
Fig. 24.6 Human and organisational factor improvement
Review /Update Situational analysis w.r.to Human Factor/ Human Behaviour models-Third Generation HRA Tools
Active Error Latent Error
Root cause analysis w.r.to Human Factor/ Time independent HRA Tools
Review /Update
24 Improvement of Human Reliability & Human Organisational Factors …
281
24.6 Human Factors 24.6.1 Pre-employment/Post-employment Evaluation Human aspects should be evaluated at the time of recruitment itself. The individual, organisational, and social behaviours should be monitored and evaluated prior to employment. The evaluation remarks and pre-employment verification details shall be kept in a record in the form of a psychological and physiological report.
24.6.2 Periodic Evaluation After a defined interval of employment, the performance of the individual will be assessed by their reporting officials in the form of a performance appraisal report. The personnel attributes towards group activity and process intervention will be recorded in this report.
24.6.3 Event Analysis If a person is involved in any incident/accident/event, the contributed human factors will be analysed and recorded in an event investigation report.
24.7 Organisational Factors 24.7.1 Framing Policy/Decision Making/Procedures While defining the policy or during decision making or while preparing procedures, the targets/goals should be set in consideration of human capabilities.
24.7.2 Workplace/Process Layout Workplace/process layout shall be designed ergonomically to avoid human error due to the unsafe conditions of the surrounding environment.
282
N. Kanagalakshmi
24.7.3 Training/Skill Development According to the target, process requirements, the personnel deployed should be adequately trained on the process parameters, operating flow, handling equipment, and managing emergencies.
24.7.4 Event Analysis In the case of any event/mishap, detailed investigations should be carried out and the HOF should be given priority.
24.8 Human Error Based on the detailed event investigation, the human and organisational factors that cause human error need to be further analysed and categorised.
24.8.1 Active Error If it is an active error, then the Human Reliability Analysis Tools should be used to identify and quantify the causes of human error and the situational analysis.
24.8.2 Latent Error If it is a latent error, a detailed root cause analysis should be carried out to quantify the human error.
24.9 Review Based on the above inferences, the required information can be obtained from various records, such as annual performance appraisals, medical reports, service records, etc. A detailed review should be carried out and areas for improvement assessed. Accordingly, the records may also be updated.
24 Improvement of Human Reliability & Human Organisational Factors …
283
24.10 Conclusion Reducing Human Error and Influencing Human Behaviour are the key factors which can eliminate the unsafe acts, thus reducing 88% of accidents. It is high time that we realise that the human error perspective should be focussed on “Why the error is caused by the person?” and “What made him feel the sense to do that error?” In addition to the technical faults, priority on human organisational factors should be given for safety during the event investigation. It is evident that strategies to improve human performance must be implemented at all levels in the organisation. The human reliability assessment must be carried out starting from recruitment to all stages. The need and scope for improvement must be identified and improvement measures should be practised and monitored.
References Dekkar S (2014) The field guide to understanding ‘human error.’ CRC Press, Florida, Boca Raton Gordon PE (1998) The contribution of human factors to accidents in the offshore oil industry. Reliab Eng Syst Saf 61:95–108 Heinrich H (1931) Industrial accident prevention: a scientific approach. McGraw-Hill, New York Rasmussen J (1982) Human errors. a taxonomy for describing human malfunction in industrial installations. J Occup Accid 4(2–4):311–333 Rasmussen J (1986) Information processing and human-machine interaction: an approach to cognitive engineering. North Holland, New York Reason J (1990) Human error. Cambridge University Press, Cambridge, UK Swain A, Guttmann H (1983) Handbook of human reliability analysis with emphasis on nuclear power plant applications. NUREG/CR-1278, August 1983. https://www.nrc.gov/docs/ML0712/ ML071210299.pdf Trager TJ, Trager TA, Jr (1985) Case study report on loss of safety system function events. AEOC/ C504, US NRC