Human Error Reduction in Manufacturing [2 ed.] 1636940897, 9781636940892

For many years, we considered human errors or mistakes as the cause of mishaps or problems. In the manufacturing industr

227 86 8MB

English Pages 272 Year 2023

Recommend Papers

Data reduction and error analysis for physical sciences [3rd ed.] 9780072472271, 0072472278

The purpose of this book is to provide an introduction to the concepts of statistical analysis of data for students at t

349 78 2MB Read more

Human Reliability and Error in Transportation Systems [1 ed.] 9781846288111, 1846288118

Each year billions of dollars are being spent to develop, manufacture, and operate transportation systems such as aircra

296 30 3MB Read more

Ontological Reduction

121 99 7MB Read more

Ten Questions About Human Error: A New View of Human Factors and System Safety (Human Factors in Transportation) 9780805847444, 0805847448, 0805847456

Ten Questions About Human Error asks the type of questions frequently posed in incident and accident investigations, peo

369 68 17KB Read more

Margins of Error in Accounting 0230219918, 9780230219915

Margins of Error in Accounting covers the main reasons why published company accounts cannot be completely 'accurat

350 53 763KB Read more

Ethics and Error in Medicine 9780367217914, 9780429266119

965 71 1MB Read more

Schoenberg's Error 0812230884, 9780812230888

104 75 10MB Read more

Ten Questions About Human Error: A New View of Human Factors and System Safety (Human Factors in Transportation) 0805847448, 9780805847444, 9781410612069

This is a superb book - Im a physician - it should be part of our medical school curriculum.

356 116 700KB Read more

Research on Poverty Reduction in China (International Research on Poverty Reduction) 9811671435, 9789811671432

This book identifies “development-oriented poverty reduction” as a crucial part of what is now often billed as China’s u

98 61 3MB Read more

Advanced Human-Robot Collaboration in Manufacturing [1st ed. 2021] 3030691772, 9783030691776

This book presents state-of-the-art research, challenges and solutions in the area of human–robot collaboration (HRC) in

113 97 18MB Read more

Human Error Reduction in Manufacturing [2 ed.]
1636940897, 9781636940892

Author / Uploaded
Jose (Pepe) Rodriguez-Perez

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Human Error Reduction in Manufacturing Second Edition José Rodríguez-Pérez

Milwaukee, Wisconsin

American Society for Quality, Quality Press, Milwaukee 53203 © 2023 by ASQ Quality Press All rights reserved. Published 2023 Library of Congress Cataloging-in-Publication Data Names: Rodriguez-Perez, Jose, 1961-, author. Title: Human error reduction in manufacturing, second edition / Jose Rodriguez-Perez. Description: Includes bibliographical references and index. | Milwaukee, WI: Quality Press, 2023. Identifiers: LCCN: 2022952229 | ISBN: 978-1-63694-089-2 (paperback) | 978-1-63694-090-8 (pdf) | 978-1-63694-091-5 (epub) Subjects: LCSH Accidents—Prevention. | Industrial safety. | Human engineering. | Quality control. | BISAC BUSINESS & ECONOMICS / Industries / Energy | BUSINESS & ECONOMICS / Industries / Manufacturing | BUSINESS & ECONOMICS / Industries / Pharmaceutical & Biotechnology | BUSINESS & ECONOMICS / Production & Operations Management | BUSINESS & ECONOMICS / Quality Control Classification: LCC HD7262 .R63 2023| DDC 658.4/013—dc23 ASQ advances individual, organizational, and community excellence worldwide through learning, quality improvement, and knowledge exchange. Bookstores, wholesalers, schools, libraries, businesses, and organizations: Quality Press books are available at quantity discounts for bulk purchases for business, trade, or educational uses. For more information, please contact Quality Press at 800-248-1946 or [email protected]. To place orders or browse the selection of all Quality Press titles, visit our website at: http://www.asq.org/quality-press. Printed in the United States of America. 27 26 25 24 23 SWY 7 6 5 4 3 2 1

Table of Contents

List of Figures and Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . List of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

vii ix xiii xv

Chapter 1 About Human Error . . . . . . . . . . . . . . . . . . . . . . . . . 1 Introduction and Some Statistics . . . . . . . . . . . . . . . . . . . . . . . 1 Human Error and Safety: The Importance of Human Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Human Reliability and Error Prediction . . . . . . . . . . . . . . . . . 17 Why Do People Err? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Why Don’t They Follow Procedures? . . . . . . . . . . . . . . . . . . . 21 Human Factors in the FDA-Regulated Industry . . . . . . . . . . . 25 Chapter 2 Psychology and Classification of Human Error . . . 31 Human Errors and Human Factors . . . . . . . . . . . . . . . . . . . . . 31 Classification of Human Failures . . . . . . . . . . . . . . . . . . . . . . 32 Human Factor Categories: Learning from Human Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Skill-based Errors: Recognition, Commission, and Omission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Mistakes: Rule-Based and Knowledge-Based . . . . . . . . . . . . . 46 Normalcy and Impairment . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Errors and Error-Provoking Conditions . . . . . . . . . . . . . . . . . 49 Chapter 3 Intentional Noncompliance . . . . . . . . . . . . . . . . . . . . 51 Why Do People Violate Rules? . . . . . . . . . . . . . . . . . . . . . . . . 51 Sabotage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Actions to Reduce Intentional Noncompliance . . . . . . . . . . . . 54

iii

iv Table of Contents

Chapter 4 Human Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Behavior-Based Compliance and Quality Culture . . . . . . . . . 57 Management of Human Factors . . . . . . . . . . . . . . . . . . . . . . . 66 People Engagement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Workplace Involvement: Motivation and Attention . . . . . . . . 68 Adequate Supervision and Staffing . . . . . . . . . . . . . . . . . . . . . 70 Task Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 Procedures and Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Training, Competence, and Performance . . . . . . . . . . . . . . . . 109 Facilities and Equipment Design and Maintenance . . . . . . . . 123 Examples of Human Factors in Manufacturing Operations . . 129 Chapter 5 How Organizations Deal with Human Errors . . . . 133 Personal Accountability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Chapter 6 Investigating Human Errors . . . . . . . . . . . . . . . . . . . 141 The Investigation Framework . . . . . . . . . . . . . . . . . . . . . . . . . 142 A Diagnostic Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 Chapter 7 Root Causes Related to Human Performance . . . . 163 Root Causes and Examples of How to Fix Them . . . . . . . . . . 166 Chapter 8 Risk Assessment of Human Errors . . . . . . . . . . . . . 183 Is Prevention Possible? Problems with Error Reduction and Prevention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 Risk Management Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 Human Reliability Analysis (HRA) . . . . . . . . . . . . . . . . . . . . 190 Quantitative and Qualitative Analysis . . . . . . . . . . . . . . . . . . . 191 THERP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 Chapter 9 How to Reduce the Probability of Human Error . . 199 Hierarchy of Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 The Checklist Manifesto: Another Checklist? . . . . . . . . . . . . 202 Mistake-Proofing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 Examples of Effective Actions . . . . . . . . . . . . . . . . . . . . . . . . 209 Chapter 10 Selected Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 Human Errors and Retraining . . . . . . . . . . . . . . . . . . . . . . . . . 219 Working from Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 Multitasking and Human Errors . . . . . . . . . . . . . . . . . . . . . . . 222 Good Documentation Practices: Data Integrity and Human Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 Chapter 11 Final Thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229

Table of Contents

v

Endnotes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247

List of Figures and Tables

Figure 1.1

Characteristics associated with a positive quality culture . . . . 9

Table 1.1

Meaning of procedures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Figure 1.2 Reasons for human failures in quality systems. . . . . . . . . . . . . 10

Figure 2.1 Characteristics of the three levels of performance. . . . . . . . . . 33

Figure 2.2 Types of human failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Figure 2.3 Types of human errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Table 2.1

Slips and lapses of memory. . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

Table 4.1

Comparison between typical QMS and behavior-based quality culture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

Figure 4.1 Human factor domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

Figure 4.2 Aspects associated with an effective quality culture . . . . . . . . 65 Table 4.2

Examples of nontechnical supervisory competences. . . . . . . . 79

Table 4.3

The best fonts to use for different types of media. . . . . . . . . . . 93

Figure 4.3 Best practices for electronic documentation. . . . . . . . . . . . . . . 92 Table 4.4

Good and poor procedure titles for facilitating understanding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

Figure 4.4 Table of contents example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Figure 4.5 Example procedure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

Table 4.5 Example of if/and/then instruction. . . . . . . . . . . . . . . . . . . . . . 104 Table 4.6

Example of complex if/and/then instruction. . . . . . . . . . . . . . . 105

Table 4.8

Example of comprehensive instruction. . . . . . . . . . . . . . . . . . . 107

Table 4.7

Table 4.9

Rules for using visual aids in procedures. . . . . . . . . . . . . . . . . 107 UDA question and scale values. . . . . . . . . . . . . . . . . . . . . . . . . 117

Table 4.10 The four levels of the Kirkpatrick Model. . . . . . . . . . . . . . . . . 121 Figure 4.6 Human factors affecting maintenance . . . . . . . . . . . . . . . . . . . 128

vii

viii List of Figures and Tables

Table 4.11

Environmental ergonomic factors and causes. . . . . . . . . . . . . . 129

Table 6.1

Barrier controls. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

Table 6.3

Element of human error investigations. . . . . . . . . . . . . . . . . . . 156

Table 4.12 Comparison between human and machine capabilities. . . . . . 130 Table 6.2

Barrier controls analysis example. . . . . . . . . . . . . . . . . . . . . . . 150

Figure 8.1 Fault tree analysis example. . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 Table 9.1

Table 9.2 Table 9.3

Control strategies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

Comparison of mistake prevention and mistake detection controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 Omission and commission error in laboratory documentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

Table 10.1 ALCOA principles for good documentation practices. . . . . . . 225 Table 11.1

Human error investigation and prevention do’s and don’ts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

Preface

F

or many years, we considered human errors or mistakes to be the cause of mishaps or problems. In the manufacturing industries, human error, under whatever label (procedures not followed, lack of attention, or simply error), was the conclusion of any quality problem investigation. Very often it was coupled with some kind of training activity (most frequently retraining) as corrective action. We even have an old adage—To err is human—to explain it. The way we look at the human side of problems has evolved during the past few decades. Industrial psychologists and human reliability professionals took command during the investigation of catastrophic accidents, such as the Chernobyl, Challenger, and aviation accidents, and our view on human error changed. Now we see human errors as the symptoms of deeper causes. In other words, human errors are consequences, not causes. Humans vary considerably in their capabilities and limitations. It is not easy to predict how they will behave, although an extensive body of knowledge regarding this subject has developed that helps us understand how humans interact with machines and systems. The scope of this book may seem broad, and it does have widespread applications. However, its intent is to apply those known general principles to help prevent worker failures and managerial mistakes within any kind of industry or sector. Managers and organizations must incorporate in their decision-making processes considerations of why errors occur, what errors are likely to occur, and what can be done about them. Managing human errors is one of the main areas on which to focus when trying to improve safety and business performance within any kind of operation. The primary objective of this book is to provide readers with useful information on theories, methods, and specific techniques that can be applied to control human failure. It is a book of ideas, concepts, and examples, many of them from the manufacturing sector. It presents a ix

x Preface

comprehensive overview that focuses on the practical application of the subject, specifically the human side of quality and manufacturing errors. For readers who are interested in how human errors and mistakes create accidents and huge disasters, there are numerous authoritative sources, some of which are included in the Bibliography of this book. In other words, the primary focus of this book is human failure, including its identification, its causes, and how it can be reasonably controlled or prevented in the manufacturing industry setting. In addition to including a detailed discussion of human error (the inadvertent or involuntary component of human failure), we also devote a whole chapter to analyzing voluntary (intentional) noncompliance. The topics throughout this book are interconnected. An effective approach to achieving a significant reduction in human failures requires both a profound knowledge of the subject and a holistic approach. Chapter 1 provides an introduction to the human error topic, covering areas such as the importance of the subject and some astonishing statistics. Chapter 2 focuses on the nature and variety of human error. To limit the occurrence of errors and improve their chances of detection and recovery, we must understand their cognitive origins and the circumstances likely to promote them. Chapter 3 covers the aspect of intentional noncompliance or trying to understand why people violate rules. We also discuss sabotage. Chapter 4 focuses on the main human factors affecting human performance, from organizational culture and adequate supervision and staffing to the design of the workplace and work documentation, including the influence of training, and explains how performance is directly influenced. This updated chapter includes two new sections. One discusses the importance and benefits of a strong and positive quality culture in the reduction of human failures, including errors. Another discusses the importance and benefits of people engagement to enhance the quality culture. Chapter 5 discusses how organizations deal with human errors, and it has been enhanced with new sections about using consequences to modify behaviors and for commitment to resilience. Chapter 6 covers the critical element of the investigation of human errors. A helpful diagnostic tool has been completely revised and updated with new categories. Chapter 7 describes the root causes associated with human performance. Chapter 8 presents a review of human errors and human factors within risk management and the benefits of an integrated approach with other risk management tools and techniques.

Preface xi

Chapter 9 explains how we can reduce the probability of human errors. A hierarchy of actions is presented along with an appeal to the use of checklists within our processes. Elements for mistake-proofing our processes are also discussed. Chapter 10 presents a selection of topics, including the use of retraining as corrective action for human errors, and the effect of working from memory or multitasking on the occurrence of human errors. A section of this chapter is devoted to discussing a very popular topic within regulated industries: data integrity and good documentation practices. Finally, Chapter 11 contains important closing remarks that serve as a summary of the most significant content of this book. I am convinced that human failure reduction is a necessary objective that will improve individual, organizational, and social well-being. This book has been written in a direct style, using simple “industry” language with abundant applied examples and practical references. I had three goals when preparing the manuscript of this book. All of them focus on you, dear reader. I wanted to:

1. Help you understand the human error concept.

2. Allow you to identify human factors affecting the performance of your processes.

3. Enable you to establish effective barriers and corrective and preventive actions related to those human factors.

Your comments are more than welcome at [email protected]. —José (Pepe) Rodríguez-Pérez Puerto Rico

Acknowledgments

T

his book is dedicated to those in various disciplines who established the basics of understanding human failures. Their work has been integrated into this volume.

xiii

List of Acronyms

AAMI

Association for the Advancement of Medical Instrumentation

CAPA

Corrective action and preventive action

ANDA CDC FDA

FMEA

Abbreviated New Drug Application

U.S. Centers for Disease Control and Prevention U.S. Food and Drug Administration Failure mode and effects analysis

FMECA Failure mode, effects, and criticality analysis FTE

Fault tree analysis

HEP

Human error probability

HCI

HRA

HRO IOM ISO

NASA

NTSB PSF

RLD

THERP TNA

UDA

WHO

Human–computer interaction Human reliability analysis

High-reliability organizations Institute of Medicine

International Organization for Standardization

U.S. National Aeronautics and Space Administration U.S. National Transportation Safety Board Performance shape factor Reference listed drug

Technique for human error-rate prediction Training needs analysis

U.S. Army Research Institute’s User’s Decision AID World Health Organization xv

1 About Human Error

INTRODUCTION AND SOME STATISTICS To err is human. Human error is a symptom, not a cause. Good people plus bad systems = A recipe for error. All of us have experienced human errors and mistakes. When we interact with machines or complex systems, we often do things that are different from our intentions. People make errors and mistakes because to err is human, as the adage explains it. Does this mean errors will inevitably happen from time to time and there is nothing we can do about it? Can something be done to better understand, and indeed control, this subject? Prevention of human error is generally seen as a major contributor to the reliability and safety of processes and systems. On the other hand, it is necessary to understand that eliminating human errors is almost impossible without first eliminating human beings. Therefore, the focus must be on their control and reduction, and hopefully the mitigation of their consequences. Errors are symptoms, and they do have causes. Understanding this concept is fundamental to controlling and reducing the frequency of human errors. Sometimes, we can detect consistent relationships between the frequency of some kinds of errors and specific circumstances. For example, at the beginning of each year, most of us find that we use the old year instead of the new year on documents, including personnel documents such as checks and letters, but also in work documents and our verbal commu nication. As time passes, however, this tendency fades, and by February, we are aware of the year we are living in. Problem fixed? Just wait until next January.

1

2 Chapter 1

There are many other interesting and necessary questions, such as how many possible causes are there for human errors? If every error has its own unique cause, each imaginable error would require its own analysis, and the remedy for one error would not apply to others. However, if there are few (general) causes, we can apply general rules repeatedly to them and thus effectively control and reduce the rate of human errors and mistakes. For the sake of clarity, it is worth describing here the meaning of a few concepts that we are going to extensively use across these pages. An in-depth description and evaluation of these terms can be found in Chapter 2: • Human failure refers to any time a human activity deviates from accepted standards, rules, or procedures. • Human error refers to an action or decision that was not intended, that involved an involuntary deviation from an accepted standard, and that led to an undesirable outcome. • Human violation is a deliberate deviation from a rule or procedure. When the objective is to harm, it becomes sabotage. • Human factor is any factor that influences behavior at work in a way that can negatively affect the output of the process the human is involved with. In other words, if human errors and violations are the symptoms, then human factors are the causes. Unfortunately, and on many occasions also tragically, we have plenty of examples to illustrate the significance of human failures in general and human errors specifically. (See the section “Human Error and Safety: The Importance of Human Factors,” which includes descriptions of those cases.)

The High Price of Human Error and Medical Safety In this introduction, we discuss some examples from the healthcare field, where medical errors have received a lot of exposure and attention for the last 15 years. There are many forms of medical error that can put the health and safety of patients at risk. Improper processing of medications, surgical mistakes, misuse of medical equipment, or inaccurate clinical laboratory results are some examples of common medical errors. In the United States, a 1999 report by the Institute of Medicine (IOM)1 indicated that hospital medical errors killed between 44,000 and 98,000 people each year. In this report, the IOM defined medical error as the failure of a planned action to be completed as intended (error of execution) or the use of a wrong plan, including failure to use a plan to achieve an aim (error of planning).

About Human Error 3

Specific types of medical errors highlighted in the IOM report included errors in the administration of treatment, failure to employ indicated tests, and avoidable delays in treatment. The IOM report agreed that many healthcare-acquired infections are preventable. In Minnesota, a 2005 report2 analyzing data from 139 hospitals during 2004 found that there were 13 surgical operations on the wrong body part, 31 cases of foreign objects left inside surgical patients, and 21 preventable deaths. Another study about patient-controlled analgesia pumps concluded that an improvement of the pump interface, which focused on human factors, reduced the frequency of human errors by 55%.3 Analyzing medical death rate data over an eight-year period, Johns Hopkins’s patient safety experts have calculated that more than 250,000 deaths per year in the United States are due to medical errors.4 Their figure surpasses the U.S. Centers for Disease Control and Prevention’s (CDC) third leading cause of death, respiratory disease, which kills nearly 150,000 people per year. In their study published in 2016, the Johns Hopkins researchers examined four separate studies that analyzed medical death rate data from 2000 to 2008, including one by the U.S. Department of Health and Human Services’ Office of the Inspector General and the Agency for Healthcare Research and Quality. Then, using hospital admission rates from 2013, they extrapolated that based on a total of 35,416,020 hospitalizations, 251,454 deaths stemmed from a medical error, which the researchers say translates to almost 10% of all deaths each year in the United States. According to the CDC, in 2013, 611,105 people died of heart disease, 584,881 died of cancer, and 149,205 died of chronic respiratory disease—the top three causes of death in the United States. The newly calculated figure for medical errors puts this cause of death behind cancer but ahead of respiratory disease. In a very interesting experiment performed at a Swiss hospital, 5 30 nurses and 28 anesthesiologists had to prepare medications for 20 patients using 22 syringes of various drugs, respectively. Both groups had to perform 22 calculations relating to the preparation of drugs. The study assessed human error probabilities (HEPs), distribution of HEPs, and dependency of HEPs on individuals and task details. In the preparation tasks, overall HEP was 3% for nurses and 6.5% for anesthesiologists. In the arithmetic tasks, overall HEP was 23.8% for nurses and 8.9% for anesthesiologists. A statistically significant difference was noted between the two groups. In both preparation and arithmetic tasks, HEPs were dependent on individual nurses but not on individual anesthesiologists. In every instance, HEPs were dependent on task details. During a March 2017 summit held in Germany, the World Health Organization (WHO) launched a new campaign titled “Global Patient Safety

4 Chapter 1

Challenge on Medication Safety”6 aimed at reducing severe and avoidable medication-associated damage across the globe by half over the next five years. WHO estimates that every year an estimated one million patients die in hospitals across the world because of avoidable clinical mistakes. The magnitude of this figure places this problem among hypertensive heart disease and road deaths as one of the top causes of death in the world today. The global cost associated with medication errors has been estimated at $42 billion annually, or almost 1% of total global health expenditure. In the United States, medication errors cause at least one death every day and injure approximately 1.3 million people annually.

The High Price of Human Error Across Industries For manufacturing industries, there are several very good, comprehensive reference books regarding this subject, although they are focused on the relationship between human factors and process safety. The reason is that process industries, especially chemical ones, have a large share of disasters provoked by human failures and errors. Major incidents have highlighted the importance of addressing this crucial aspect of performance. For example, safety culture became a key focus within the offshore oil and gas industry after the Piper Alpha disaster in 1989. Human safety analysis has always been a key area for the nuclear industry as well. The rate of development of human factors has accelerated over the recent years due to a mix of elements including major accidents, increased complexity of industrial systems, regulatory efforts, and social expectations. It has also been recognized that “engineering humans out” through full automation does not work. What is increasingly required is a proportionate consideration of human capabilities and limitations within any work system.7 Manufacturing industries, especially those involved in the manufacture of medical products such as medicines and medical devices, are also plagued with human errors with very diverse consequences. A big solid-dosage pharmaceutical manufacturing plant with approximately 1000 employees had more than 1100 documented deviations (nonconformances) in one year (2017). One-third of them (375) were classified as human errors. In other words, one deviation occurred every eight hours (a work shift), and, on average, they documented one human error every single day of the year. Human factors and the FDA-regulated industry are introduced later in this chapter. The cost of human errors in the food industry can also be very high. Most of these situations end with a high-level recall of the products involved because those errors typically jeopardize the safety of the food. Adding sugar to a sugar-free product or adding undeclared tree nuts or undeclared

About Human Error 5

milk can risk the life of sensitive consumers. Risking consumer health with contaminated (or potentially contaminated) food can be disastrous. In the United States alone, dozens of products are recalled each week due to these reasons. Most of the time, human error is the primary causal agent of the situation. When properly investigated, affected companies often discover real root causes associated with a wide range of human factors, such as workload and inadequate supervision, design of the task, inadequate procedures, lack of competence due to ineffective training, and so on. Most medical product manufacturers have tried to address the “plague of human error,” as one vice president of operational excellence once described it to me. My opinion, based on direct knowledge of the corrective and preventive actions (CAPA) system from some of the biggest regulated companies in the United States and Europe, is that no one has been successful in reducing the “plague” to a controlled stage. Human errors continue to be an epidemic for regulated companies. This lack of success comes despite some companies undertaking huge investments in technology (for example, making more graphical procedures with many color images, and so on). Their lack of success comes from a single factor: they did not change the quality culture, starting from the top of the organization. Some of the required changes include: • Promoting the quality of processes over the yield of processes • Promoting and requiring personal accountability at all levels • Using risk management tools to avoid nonconformances, not to justify the acceptability of using nonconforming products I strongly recommend that interested professionals study some of the references included in the Bibliography. Although the oldest ones refer to aviation, nuclear, and industrial accidents, several recent books cover the hospital and healthcare industry, where human errors cost tens of thousands of lives every year in the United States alone.

Errors versus Defects We must differentiate between errors (mistakes) and defects (also known as nonconformances). Regulated companies do not recall products because there were human errors during the manufacturing process. They recall products because their quality systems were unable to detect the human errors, and the nonconforming products were distributed, becoming adul terated and/or misbranded items once they reached patients. As stated by Reason8 when discussing how to eliminate affordances for error, “most human beings enjoy the experience of free will. They form

6 Chapter 1

intentions, make plans, and carry out actions, guided by what appears to them to be internal processes. Yet these apparently volitional activities are subtly constrained by the character of the physical objects with which they interact. The term affordance refers to the basic property of objects that shape the way in which people react to them.” Norman also explored this concept in his book The Design of Everyday Things.9 He explored, for example, how man-made objects and procedures offer affordance for error: I began to realize that human error resulted from bad design. Humans did not always behave so clumsily. But they do so when the things they must do are badly conceived, badly designed. Does a commercial airliner crash? Pilot error, says the report. Does a Soviet nuclear power plant have a serious problem? Human error, says the newspaper. Do two ships at sea collide? Human error is the official cause. But careful analysis of the events that transpired during these kinds of incidents usually gives the lie to such a story. At the famous nuclear power plant disaster, Three Mile Island, the blame was placed on the humans, on the plant operators who misdiagnosed the problems. But was it human error? Consider the phrase ‘operators who misdiagnosed the problems.’ Aha, the phrase reveals that first there was a problem, in fact a series of mechanical failures. Then why wasn’t equipment failure the real cause? What about the misdiagnoses? Why didn’t the operators correctly determine the cause? Well, how about the fact that the proper instruments were not available, that the plant operators did the action that had always been the reasonable and proper ones to do. How about the pressure relief valve that failed to close…. To me it sounds like equipment failure coupled with serious design error. To finish this introduction, it is necessary to remark that human errors cannot be eliminated nor even significantly reduced simply by telling the person who made the error or mistake to be more careful. A general admonition or advisory to stop such behavior is a simplistic approach and it does not work because we are not addressing any root cause. Errors cannot be eliminated by simply disciplining the people who make the mistakes. A factor very often neglected when considering the cause of human failures is the high frequency of conduct disorders such as attention deficit hyperactivity disorder (ADHD). A report by the National Institutes of Health10 concluded that almost 25% of U.S. inhabitants had a psychiatric disorder, and almost 60% of them never sought treatment. This study established that the severity of the disorder was 40% mild, 37% moderate, and 22% severe. It included 18% anxiety disorders, 9% mood disorders, and

About Human Error 7

9% impulse control disorders. This is the picture of the general population from which U.S. companies are recruiting their managers and line workers. More discussion about this topic is included in Chapter 4.

Some Statistics Related to Human Error • Ninety-nine percent of accidental losses (except for natural disasters) begin with a human error. • Root causes of the vast majority of accidents are management system weaknesses. • Eight percent of men are color blind, while only 0.5% of women have the condition. • Eighty percent of medical product recalls due to incorrect expiration date or incorrect lot/batch number are caused by a transposition of digits. • One and a half million Americans are injured every year by drug errors in hospitals, nursing homes, and doctors’ offices (patients’ own medication mix-ups are not included), costing the health system more than $3.5 billion (1999). • The global cost associated with medication errors has been estimated at $42 billion annually. • On average, every hospitalized patient is subject to (at least) one medication error per day. • Seventeen hours of work without a break is operationally the same as being legally drunk. • The worst period for human errors is 2 a.m. to 5 a.m. • About 15% of human errors are due to acquired habits. • Human error accounts for 90% of road accidents. • Ten percent of all U.S. deaths are due to medical errors. • The third highest cause of death in the United States is medical error. • An average of 26% of the babies in a neonatal intensive care unit were found to be at risk of being mistaken for another baby in the same unit on any given day. • The rate of errors and mistakes for most procedure-based tasks is 1/100.

8 Chapter 1

• The average worker is interrupted every 11 minutes and then spends almost a third of his/her day recovering from these distractions. • Twelve percent of the world’s population is left-handed, with twice as many men as women. Thirty percent of us are mixed-handed and switch hands during some tasks. Ambidextrous people can do any task equally well with either hand, but it’s exceptionally rare. However, most of the pieces of manufacturing equipment and utilities are designed for right-handers.

HUMAN ERROR AND SAFETY: THE IMPORTANCE OF HUMAN FACTORS Most of the bibliographic references mentioned in this book cover the safety implications of human errors. Incredible tragedies such as those described below can be traced back to some type of human failure. Numerous statistics link human error to a vast majority of accidents and accidental losses. An investigational work published in 1999 indicated that at-risk behavior is the root cause of 85%-90% of all workplace injuries.11 Between the mid-1970s and the late 1980s there was a series of major accidents (some of them discussed in this section) whose investigations pointed toward organizational and social factors in addition to technical and engineering factors. Behavior was identified as one of these nontechnical factors. People at all levels of organizations were not doing what they were meant to do. Human failures uncovered during these investigations were a mix of both unintentional (errors and mistakes) and intentional (violations). Since the beginning of the 1990s, process safety management became a scientific field, and today process safety is inextricably linked to human factors. Previously, regulators and safety professionals were convinced that putting all their focus on the elimination of hazardous conditions was the best way to prevent workplace injuries. Organizations with safety-critical operations recognize the value of developing a robust safety culture because it will influence the way people (at all levels) behave at work. In other words, human error and process safety incidents are directly linked to the safety culture of the organization. Safety culture is recognized as a significant element of human factors. Investigations into major disasters in nuclear plants (Three Mile Island and Chernobyl), chemical process industries (Piper Alpha), transportation (Space Shuttle and Exxon Valdez), and gas distribution (San Juan Gas) concluded that systems broke down catastrophically despite the provision of complex technical safeguards. The primary cause of those disasters was not

About Human Error 9

an engineering failure, but the action (or inaction) of the designers, mana gers, and maintenance/operating workers. Taking inappropriate risks and not following procedures are indicators of a weak safety culture. Elements associated with a positive quality culture are described in Figure 1.1 (modified from Center for Chemical Process Safety [2007]). Hardware • Good plant design, working conditions, and housekeeping • Perception of low risk due to confidence in engineered systems

Management systems • Quality as a priority over profits and production • Confidence in quality rules • Good on-the-job communication • Good organizational learning • High training satisfaction

People • High level of employee participation in quality systems • High level of management quality concern, involvement, and commitment • Trust in workforce to manage risks to quality

Behavior • Acceptance of personal responsibility for quality • Frequent informal quality communication • Willingness to speak up about quality • A cautionary approach to risk

Organizational climate factors • Low levels of job stress • High levels of job satisfaction

Figure 1.1

Characteristics associated with a positive quality culture.

10 Chapter 1

• Improper design • Improper construction • Improper installation • Improper positioning • Improper maintenance • Improper testing • Improper calibration • Incorrect specification • Improper operation • Failure to restore operation after maintenance • Failure to recognize the need for a quality or process control measure

Figure 1.2

Reasons for human failures in quality systems.

Worker safety behavior can be described as either safe or at-risk. The reason that motivates a well-trained worker to take risks is well documented and explained by behavioral science. There is strong evidence that consequences, meaning what happens after a behavior, are the driving force. Research indicates that the main reason people continue unsafe or nonquality behavior, regardless of knowledge, is because of the positive, immediate, and certain consequences associated with the unsafe behavior.12 Texting while driving can be a perfect example of this. Figure 1.2 depicts a list of human failures, and while none of these events will necessarily cause a quality incident immediately, they represent conditions that may eventually allow one to occur. Each failure may result from a variety of underlying human factors that need to be addressed to minimize the probability of those failures happening. Among them, we can mention inadequate or insufficient training, distraction, fatigue, multitasking, carelessness, inadequate procedures, inadequate supervision and staff, miscommunication, workload, and so on. The application of human factors can have a significant impact on reducing the probability of quality incidents and, sometimes, catastrophic accidents. Human factors can be used to create more efficient and safe work systems. If practically all incidents include human failures, addressing the causes of human failures is fundamental to achieving an improvement in both quality and safety. Following is a brief description of several of the major accidents reported in the last 40 years. Originally, almost all of them were attributed to human error, but at the end of their corresponding

About Human Error 11

investigations, key recurring human factors were identified as the root causes of those disasters. The goal is to learn and understand from past disasters how human factors can contribute to improving the performance of the system, thus creating safer and higher-quality processes and systems.

Flixborough 197413 The Flixborough disaster was an explosion at a chemical plant close to the village of Flixborough, England. It killed 28 people and seriously injured 36 out of a total of only 72 people on-site at the time. Two months prior to the explosion, the number 5 reactor was discovered to be leaking. It was decided to install a temporary pipe to bypass the leaking reactor to allow continued operation of the plant while repairs were being made. In the absence of a “normal” 28-inch nominal bore pipe, a thinner, 20-inch nominal bore pipe was used to fabricate the bypass pipe for linking the reactor 4 outlet to the reactor 6 inlet. The new configuration was tested for leak-tightness at working pressure by pressurization with nitrogen. For two months after fitting, the bypass was operated continuously at temperature and pressure and gave no trouble. At the end of May, the reactors had to be depressurized and allowed to cool in order to deal with leaks elsewhere. The leaks having been dealt with, early on June 1 workers attempted to bring the plant back up to pressure and temperature. On June 1, 1974, there was a massive release of hot cyclohexane in the area of the missing reactor 5, followed shortly by the ignition of the resulting cloud of flammable vapor and a massive explosion in the plant that demolished the site. The conclusion of the investigation was that this disaster was caused by a well-designed and constructed plant undergoing a modification that destroyed its technical integrity. Although the causes of the disaster were complex, the conclusion of the investigation into the accident found that the bypass pipe had failed because of unanticipated stresses in the pipe during a pressure surge. The bypass pipe was inadequately supported, and the modification was made without a full assessment of all the potential factors.

Tenerife Airport 197714 On March 27, 1977, two Boeing 747 passenger jets (one from Dutch’s KLM and the other from the United States’ Pan Am) collided on the runway at Los Rodeos Airport on the Spanish island of Tenerife, Canary Islands. The crash killed 583 people, making it the deadliest accident in aviation history. With a complex interaction of organizational influences, environmental conditions, and unsafe acts leading up to this aircraft mishap, the disaster at Tenerife had a lasting influence on the industry, particularly in the area of

12 Chapter 1

communication. An increased emphasis was placed on using standardized phraseology in air traffic control communication by controllers and pilots alike, thereby reducing the chances for misunderstandings. As part of these changes, the word “takeoff” was removed from general usage and is only spoken by air traffic control when clearing an aircraft to take off or when canceling that same clearance. Less experienced flight crew members were encouraged to challenge their captains when they believed something was not correct, and captains were instructed to listen to their crew and evaluate all decisions in light of crew concerns. This concept was later expanded into what is known today as crew resource management, on which training is now mandatory for all airline pilots. The investigation concluded that the fundamental cause of the accident was that KLM’s captain took off without clearance. The investigators suggested the reason for this was a desire to leave as soon as possible in order to comply with KLM’s duty-time regulations and before the weather deteriorated further. Other major factors contributing to the accident included environmental conditions (a sudden fog greatly limited visibility to the extreme that the control tower and the crews of both planes were unable to see one another) and interference from simultaneous radio transmissions, with the result that it was difficult to hear the message.

Three Mile Island 197915 One of the two nuclear reactors of this complex had a partial meltdown, resulting in the release of radioactive material into the environment. It is considered among the three worst nuclear accidents in history (Fukushima and Chernobyl complete this tragic group). A mechanical failure happened, and plant operators were unable to detect it for several hours due to a poorly designed control room indicator. Lack of adequate training was identified as the main factor in this accident.

Bhopal 198416 A gas release of methyl isocyanate occurred at a pesticide plant in Bhopal, India, killing almost 4000 due to the initial gas release and affecting several hundred thousand inhabitants of surrounding areas. Among the main factors contributing to the disaster: poor maintenance, inadequate storage of hazardous chemicals, some safety systems switched off to save money, understaffing, and a very poor safety culture, among others.

About Human Error 13

Herald of Free Enterprise 198717 MS Herald of Free Enterprise was a roll-on/roll-off (RORO) ferry that capsized moments after leaving the Belgian port of Zeebrugge on the night of March 6, 1987, killing 193 passengers and crew. The modern eightdeck car and passenger ferry was designed for rapid loading and unloading on the competitive cross-channel route, and there were no watertight compartments. When the ship left the harbor with its bow door open, the sea immediately flooded the decks, and within minutes it was lying on its side in shallow water. Although the immediate cause of the sinking was found to be negli gence by the assistant boatswain—asleep in his cabin when he should have been closing the bow door—the official inquiry placed more blame on his supervisors and a general culture of poor communication in the owner company. Since the disaster, improvements have been made to the design of RORO vessels, with watertight ramps, indicators showing the position of the bow doors, and the banning of undivided decks. Passenger details now have to be recorded before a ship sails, so the harbor authorities know who is on board. Also, cameras have been fitted to the front of ships so the crew can see from the bridge whether the doors have been closed before sailing.

Piper Alpha 198818 Piper Alpha was a North Sea oil production platform that began operations in 1976, first as an oil-only platform and then later converted to add gas production. An explosion, and the resulting oil and gas fires, destroyed it on July 6, 1988, killing 167, including the two crewmen of a rescue vessel. The UK government inquiry into the accident concluded that the initial condensate leak was the result of maintenance work being carried out simultaneously on a pump and related safety valve. The inquiry was critical of Piper Alpha’s operator, Occidental Petroleum, which was found guilty of having inadequate maintenance and safety procedures, but no criminal charges were ever brought against the company.

Chernobyl 198619 The Chernobyl disaster was a catastrophic nuclear accident. It occurred on April 26, 1986, at the Chernobyl nuclear power plant in what was then part of the Ukrainian Soviet Socialist Republic of the Soviet Union (USSR). During a late-night safety test that simulated power failure and in which safety systems were deliberately turned off, a combination of inherent reactor design flaws—together with the reactor operators arranging the core

14 Chapter 1

in a manner contrary to the checklist for the test—eventually resulted in uncontrolled reaction conditions that flashed water into steam, generating a destructive steam explosion and a subsequent open-air graphite fire. This fire lofted plumes of fission products into the atmosphere, and practically all of this radioactive material would then go on to precipitate onto much of the surface of western USSR and Europe. The accident caused two deaths within the facility, and later 29 firemen and employees died in the months following from acute radiation syndrome. Estimates of the total number of deaths potentially resulting from the Chernobyl disaster vary enormously (56 direct deaths—47 accident workers and nine children with thyroid cancer), and it is estimated that there may eventually be 4,000 extra cancer deaths among the approximately 600,000 highly exposed people. The Chernobyl accident dominates the energy accidents subcategory of most disastrous nuclear power plant accident in history, both in terms of cost and casualties. It is one of only two nuclear energy accidents classified as a level 7 event (the maximum classification) on the International Nuclear Event Scale, the other being the Fukushima Daiichi nuclear disaster in Japan in 2011. Among the causes of this accident, we can include operator errors due to their lack of knowledge of nuclear reactor physics and engineering, as well as a lack of experience and training. The reactor was being operated with many key safety systems turned off. Also, personnel had an insufficiently detailed understanding of the technical procedures involved with the nuclear reactor. Operating instructions and reactor design deficiencies were also discovered during several investigations.

Exxon Valdez 198920 The Exxon Valdez oil spill occurred in Prince William Sound, Alaska, on March 24, 1989, when the Exxon Valdez, an oil tanker owned by Exxon Shipping Company bound for Long Beach, California, struck Prince William Sound’s Bligh Reef and spilled 10.8 million U.S. gallons of crude oil over the next few days. It is considered to be one of the most devastating human-caused environmental disasters. The oil covered 1300 miles of coastline and more than 11,000 square miles of ocean. Among the factors that contributed to this environmental disaster, the U.S. National Transportation Safety Board (NTSB) found that Exxon Shipping Company failed to supervise the sailing master and provide a rested and sufficient crew for Exxon Valdez. Moreover, the NTSB found this was widespread throughout the industry, prompting a safety recommendation to Exxon and to the industry. The investigation also found that Exxon Shipping Company failed to properly maintain the Raytheon

About Human Error 15

Collision Avoidance System (RAYCAS) radar, which, if functional, would have indicated to the third mate an impending collision with the Bligh Reef by detecting the “radar reflector,” placed on the next rock inland from Bligh Reef for keeping ships on course. Staffing issues, overloaded crew, and inadequate training were also mentioned.

Río Piedras, Puerto Rico (San Juan Gas) 199621 The Río Piedras explosion was a gas explosion that occurred on November 21, 1996, at the Humberto Vidal shoe store located in Río Piedras, within the municipality of San Juan, capital of Puerto Rico, in which 33 people were killed. The investigation by the U.S. NTSB revealed that several people had reported an alleged gas leak in the building in the days leading up to the explosion. The store had no gas supply, so another nearby gas line looked like the culprit. It was discovered that a gas pipe that carried heavier-than-air propane gas was broken. A few years earlier, a water main had been installed below, which bent the pipe in the process. When the pipe was installed, it was already in a tight bend, adding to its stress levels. The addition of the water main caused it to break. The NTSB determined that the probable cause of the propane gas explosion, fueled by an excavation-caused gas leak in the basement of the Humberto Vidal office building, was the failure of San Juan Gas Company: (1) to oversee its employees’ actions to ensure timely identification and correction of unsafe conditions and strict adherence to operating practices and (2) to provide adequate training to employees. Also contributing to the explosion were: (1) the failure of the Research and Special Programs Administration/Office of Pipeline Safety to effectively oversee the pipeline safety program in Puerto Rico; (2) the failure of the Puerto Rico Public Service Commission to require San Juan Gas Company to correctly identify safety deficiencies; and (3) the failure of Enron Corporation to oversee adequately the operation of the San Juan Gas Company. Contributing to the loss of life was the failure of San Juan Gas Company to adequately inform citizens and businesses of the dangers of propane gas and the safety steps to take when a gas leak is suspected or detected. In its investigation of this accident, the NTSB addressed the following safety issues: • Adequacy of employee training • Need for an excavation-damage prevention program • Adequacy of maps and records of buried facilities

16 Chapter 1

• Adequacy of public education on what to do when the odor of gas is detected • Adequacy of the oversight of the San Juan Gas Company, from Enron Corporation, the Puerto Rico Public Service Commission, and the Office of Pipeline Safety

Challenger Space Shuttle 198622 On January 28, 1986, the NASA shuttle Challenger broke apart 73 seconds into its flight, killing all seven crew members. The spacecraft’s disintegration began after an O-ring seal in its right solid rocket booster (SRB) failed at liftoff. The O-ring was not designed to fly under unusually cold conditions as in this launch. Its failure caused a breach in the SRB joint it sealed, allowing pressurized burning gas from within the solid rocket motor to reach the outside and impinge on the adjacent SRB aft field joint attachment hardware and external fuel tank. This led to the separation of the right-hand SRB’s aft field joint attachment and the structural failure of the external tank. Aerodynamic forces broke up the orbiter. The conclusion of the Presidential Commission on the Space Shuttle Challenger Accident, also known as the Rogers Commission after its chairman, was that the disaster was caused by the failure of the O-rings, and this failure was attributed to a faulty design whose performance could be too easily compromised by factors including the low temperature on the day of launch. More broadly, the report also considered the contributing factors to the accident. Among them was the failure of both NASA and a contractor company (Morton Thiokol) to respond adequately to the danger posed by the deficient joint design. Rather than redesigning the joint, they came to define the problem as an acceptable flight risk. The report found that managers had known about the flawed design since 1977 but never discussed the problem outside their reporting channels with Thiokol—a flagrant violation of NASA regulations. Even when it became more apparent how serious the flaw was, no one considered grounding the shuttles until a fix could be implemented. On the contrary, managers went as far as to issue and waive six launch constraints related to the O-rings. The report also strongly criticized the decision-making process that led to the launch of Challenger, saying it was seriously flawed: “Failures in communication…resulted in a decision to launch 51-L based on incomplete and sometimes misleading information, a conflict between engineering data and management judgments, and a NASA management structure that permitted internal flight safety problems to bypass key Shuttle managers.”23

About Human Error 17

Texas City Refinery 200524 Fifteen workers were killed in this accident caused by a hydrocarbon vapor cloud that was released and finally exploded. Equipment issues were iden tified as among the main factors, along with lack of adequate supervision, inadequate procedures, and lack of training. Staffing issues and excessive workload were also identified as primary contributors to this disaster. We can conclude this section by mentioning, as Reason (1990) stated, that the greatest risk of accident in a complex system is “not so much from the breakdown of a major component or from isolated operator error, as from the insidious accumulation of delayed human errors.” Each contributing factor, which alone would not necessarily lead to the unfortunate outcome, aligned like the holes in Reason’s Swiss cheese model, described in Chapter 2, allowing a system failure to pass each potential barrier and occur.

HUMAN RELIABILITY AND ERROR PREDICTION Human failures result from the normal operation of the human informationprocessing system, along with effects from numerous conditions (known as performance shaping factors, or PSFs, and more generally as human factors) that affect human failure rates. Hypothetically, if we knew all these factors, we could predict failures and errors precisely. However, the reality is that since we cannot know all the factors, we will always have to use some kind of statistical prediction. Different humans interacting with the same process could suffer different failures and errors. When someone has an error at the beginning of his/her employment, typically it is assigned to the learning curve period. When the same person fails after months or years doing the same tasks, the error could be assigned to the human being working in autopilot mode. The study performed in a hospital in Switzerland in 2007 and discussed previously is an interesting example of experimental determination of human error probabilities. Many people regard errors as random occurrences, so unpredictable that they are beyond effective control. Humans are usually the weakest links in most of our processes and systems. This lack of reliability makes the possibility of predicting an error a powerful tool in reducing the rate of human failures. Even if an exact prediction of error is not possible, can we predict the probabilities of error? This would still be of great value. If we can determine when and where an error will occur, and who will commit it, then there is at least the probability of preventing it or of responding more effectively to it when it occurs. There are too many factors affecting

18 Chapter 1

people’s behavior to allow for exact predictability. To understand and predict errors, we must first understand all the circumstances surrounding them, especially a detailed task analysis, which is described in Chapter 4. For example, we know that errors are more likely between 2 a.m. and 5 p.m., but it is almost impossible to predict exactly when a particular error, or any error at all, will occur. Human reliability analysis requires data on human error rates. Seminal information was published on this topic by Swain and Guttmann.25 The available estimates of error rates are probably inaccurate because they are based on the estimates of experts, rarely on real data collected from real tasks or even simulations. If we establish an appropriate training program including some kind of training effectiveness verification (see Chapter 4), then we should be able to predict whether a human will know what to do. However, this does not mean we can also predict on each occasion whether he/she will actually do as learned. Unfortunately, there are many variables in the human mental and physical state that make such prediction almost impossible. Additionally, those tasks are typically performed continuously or repeatedly over a long period of time and by teams whose composition varies from shift to shift. For example, if we analyze enough historical data, we might be able to predict the rate of some omission errors while documenting process information. I have personal data that show errors—when the date of an activity is documented—increase more than tenfold during the month of January each year and the errors correlate to the data from the previous year (in January) when documenting activities. It will be possible to predict both the forms and the relative rate of error if we understand the details of the task and the circumstances in which it is to be performed. To do this, a process analysis, including both task analysis and examination of the design of the workplace, is necessary. Process analysis should also address the human factors that influence the likelihood of human failures occurring. Risk analysis can be qualitative or quantitative, and those aspects that deal with human failures are often named human reliability analysis (HRA), which is further discussed in Chapter 8. As described by the Center for Chemical Process Safety (2007),26 HRA involves: 1. Identification of task performed by personnel 2. Task analysis to identify potential human failures 3. Identification of conditions that affect human failure rates (Those conditions are named PSFs and include human factors such as training and its effectiveness, environmental conditions, readability of controls and displays, and so on.)

About Human Error 19

4. Application of data and/or expert opinion on human failure rates and PSFs to determine human failure rates and probabilities A variety of HRA tools exist, often known by acronyms such as THERP (technique for human error rate prediction) or HEART (human error assess ment and reduction techniques), to mention two of the most widely used. Their pros and cons are described in various specialized texts, including guidelines published by the U.S. Nuclear Regulatory Commission.27 THERP is discussed in Chapter 8.

WHY DO PEOPLE ERR? Much of human error appears to result from inadequacies in system design that create favorable conditions for error occurrence. Therefore, to build reliable systems, design factors that induce human errors should be scrutinized and eliminated methodically from our processes. Our emphasis should be to improve processes and systems rather than focus exclusively on “getting rid of bad apples,” or individuals with patterns of poor performance. There are three main elements that induce people to err: 1. Task complexity. Tasks differ in the amount of mental processing required. This causes people to make more errors in more complex tasks. Capacity limitations in short-term memory and recall problems in long-term memory strongly affect human performance reliability. Complex task sequences in a specific order overload human memory. However, adequate written procedures and detailed checklists can be used to unburden the workers of memorizing all the task elements and their correct sequential order. 2. Error-prone situations. These are work situations where inadequate designs increase the probability of errors. For example, any design that violates a strong population tendency could be considered error-likely. These situations overload workers in a manner that is not compatible with their capabilities. Systems should be fitted to the human, not vice versa. Error-proneness applies to the work situation, not people. Some situational task and equipment characteristics that predispose workers to increased errors include the following: • Inadequate work space and layout • Poor environmental conditions

• Inadequate human engineering design

20 Chapter 1

• Inadequate training and job aid procedures • Poor supervision

3. Behavioral characteristics. Age, sex, intelligence, physical conditions, strength/endurance, task knowledge, training, experience, skill level, motivation, attitude, emotional state, stress level, and social factors are some of the behavioral characteristics that can be related to human errors. Two of the most important influential behavioral factors are inexperience and stress. This combination can increase a worker’s error probability by a factor of as much as 10.28 A fundamental question is whether the person who errs is responsible for the error. Errors and mistakes are, by definition, unintentional, and if there is no evident cause (or if one cannot be found), then the error was not something for which the actor was responsible. A single mistake rarely causes an accident or other undesirable outcomes because most systems, especially complex ones, have enough safeguards and controls to make single-error consequences highly unlikely. Safety accidents and quality incidents typically result from a combination of latent failures, active errors, and breaching of barriers.29 The breaching of barriers or defenses occurs when latent failures and active errors line up to permit a trajectory of accident opportunity, as demonstrated by James Reason’s Swiss cheese model (discussed in Chapter 2). Literature evidence30 from major accident investigations, and our own evidence from multiple human error investigations in manufacturing settings, strongly suggests that bad events are more often the result of errorprone situations than they are of error-prone people. Therefore, the desired error management approach must focus on system improvement (including adequate barriers) rather than individual (or collective) discipline because: • Human fallibility is part of the human condition. It can be controlled to a point, but it can never be eliminated. • Management cannot change the human condition of workers. However, they can control the conditions under which people work. • Human beings will always make errors. • Corrective actions involving sanctions, threats, fears, and the like have very limited effectiveness, and, in many cases, these actions can harm morale. • Errors happen at all levels of the system.

About Human Error 21

• Direct causal factors of the error (for example, momentary inattention) are often the last and least manageable links in the causal chain.

WHY DON’T THEY FOLLOW PROCEDURES? People do not always follow procedures. We can easily conclude this when watching people at work in any kind of organization. People typically work from memory rather than from procedures. Analysis and investigations after a mishap keep returning the finding that procedure violations precede accidents and incidents. Operating procedures, with the aim of stan dardization, play a critically important role in shaping better practices, specifically in the safety field. However, when rules are violated (procedure not followed), can we conclude that these bad people are ignoring the rules? Are these bad rules ill-matched to the demands of real work? If people must constantly deviate from procedures and established rules in order to properly (including safely) accomplish a task, then we are witnessing a classic management-controlled human factor as the primary source of such errors. Chapter 4 describes the human factor significance of procedures and forms. As procedure violations are judged to be such a large ingredient of mishaps, one of the typical corrective actions is to introduce even more procedures, or to change existing ones. Introducing more procedures does not necessarily avoid the next incident, nor do exhortations to follow proce dures more carefully necessarily increase compliance. From the point of view of safety, as described in Dekker (2005),31 there are two schools of thought or models that outline what procedures mean, and what they in turn mean for safety. These two opposing models are described in Table 1.1. Model 1 closely resembles the environment of the process industries, specifically manufacturing. Replace safety with quality and all these postu lates equally apply. Procedures and rules (how to perform specific tasks, for example, how to manufacture a product) are key elements of the industrial process—including, for example, billing of materials and manufacturing formulas of products. By following procedures and work instructions, companies consistently provide quality products and services to their customers. In the case of medical products, this is the only way to ensure that our products are safe and effective.

22 Chapter 1

Table 1.1 Meaning of procedures. Model 1

Model 2

Procedures represent the best thought-out, and thus the safest, way to carry out a job.

Procedures are resources for action. Procedures do not specify all circumstances to which they apply. Procedures cannot dictate their own application.

Procedure following is mostly simple if-then rule-based mental activity: If this situation occurs, then this algorithm (for example, a checklist) applies.

Applying procedures successfully across situations can be a substantive and skillful cognitive activity.

Safety results from people following procedures.

Procedures cannot, in themselves, guarantee safety. Safety results from people being skillful at judging when and how (and when not) to adapt procedures to local circumstances.

For progress on safety, organizations must invest in people’s knowledge of procedures and ensure that procedures are followed.

For progress on safety, organizations must monitor and understand the reasons behind the gap between procedures and practice, and additionally, organizations must develop ways that support people’s skills at judging when and how to adapt.

However, real work takes place in a context of limited resources and multiple goals and pressures. Procedures and work instructions assume that there is time to do them and that all necessary resources (humans, machines, instruments, and so on) are readily available. When procedures are not adequate, sometimes unofficial, self-made documentation and infor mal work systems emerge. There is always a distance between a written instruction and an actual task. This gap is not constant and depends on many factors, such as work shift. First, following a procedure requires cognitive tasks that are not specified in the procedure itself. Some kind of human interpretation is needed. As established by Suchman (1987),32 “procedures are inevitably incomplete specifications of action: they contain abstract descriptions of objects and actions that relate only loosely to particular objects and actions

About Human Error 23

that are encountered in the actual situation.” This is especially worrisome in the manufacturing industry when workers are supposed to make products following a detailed and specific set of instructions. During the root cause investigation of many human errors (see Chapter 6), we will discover the existence of simply incorrect or inaccurate instructions or procedures as the source of those human errors. In many other instances, there is incomplete information or the worker needs to make a personal interpretation of the instructions. Some real examples of such instructions from medical product manufacturing are: • Mix well • Stick together the two pieces for a few seconds • Mix for at least one hour • Verify all parameters • Several batches can be used The aforementioned instructions leave a lot to the imagination, or to the operator’s initiative. Here is another example from a real manufacturing instruction: Since the percent actual yield efficiency calculation is an internal manufacturing process guidance, which is monitored in order to have additional technical data for the manufacturing process, it is required to gather more manufacturing data in order to establish this manufacturing guidance in the manufacturing batch record of item code XYZ123 (production version number 1) that uses the ABC manufacturing facility. As a consequence, it is recommended to eliminate this internal manufacturing guidance from the afore mentioned manufacturing batch record for all yield stages until new guidance limits are established based on manufacturing experience and equipment capabilities. Not related to manufacturing is the famous “the unknown known” statement that comes from Donald Rumsfeld in 2002 while serving as George W. Bush’s secretary of defense: As we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns—the ones we don’t know we don’t know.33

24 Chapter 1

Another classic example of lack of clarity came to light in 1951. It is the text from a British Royal Navy instruction manual on the proper storage of torpedo warheads:34 It is necessary for technical reasons that these warheads should be stored with the top at the bottom, and the bottom at the top. In order that there may be no doubt as to which is the bottom and which is the top for storage purposes, it will be seen that the bottom of each warhead has been labeled with the word top. Typically, procedures, checklists, and other similar rule documents are written for near-perfect situations. And a managerial critical assumption regarding procedures is that they must be adequate. Almost every human error investigation discovered that this assumption is far from reality. For this reason, I’ve included the following questions as part of the human error investigation questionnaire described in Chapter 6: • Does the procedure or working instruction have a sufficient level of detail? • Does the procedure or working instruction use specific details rather than a qualitative description (slowly, soon, few, well, and so on)? • Can procedures or working instructions be considered adequate in terms of format, content, level of detail, and so on? • Are procedures or working instructions written clearly, without ambiguity as to what is required? • Does the employee need to perform processing or interpretation of the information to execute this task? • Does the employee clearly understand the applicable procedure or working instruction? • Is there consistency in how employees are performing this task? • Is the error related to numbers or alphanumeric information (for example, specifications, batch numbers, and so on)? • Is there any job aid or checklist to perform this task? Is the format/content appropriate for clear interpretation? Companies, in particular those in the medical product field, are recom mended to use their self-inspection or internal audit program to monitor the gap between procedures and practices across all elements of their quality system. And more importantly, they should try to understand why it exists.

About Human Error 25

In the medical product regulated industry, the safety and efficacy of each batch of product rest on the strict following of manufacturing (including testing) instructions. Thus, a very common finding during regulatory inspections is “your firm failed to follow the procedure.” This discussion cannot be considered complete without including the topic of goal conflicts and procedural deviation. Multiple goals mean conflicts. In the end, some corners are cut and quality suffers. In the case of medical product manufacturers, they exist to achieve economic gain and maximize capacity utilization. In one way or another, all companies and organizations (even government ones) are guided by conflicting goals. NASA’s “Faster, Better, Cheaper” organizational philosophy in the late 1990s exemplified how multiple, contradictory goals are simultaneously present and active in organizations and systems. The problem is that those conflicting goals pass from the boardroom to individuals, for example, to manufacturing sites’ staff, in the form of performance bonuses. Typical goals at this level can be: • Reduce by x% the cycle time of manufacturing products (faster) • Reduce by x% the number of deviations from last year (better) • Reduce by x% the cost of manufacturing goods (cheaper) The result of trying to do all three simultaneously is the creation of constraints such as inadequate time, personnel, parts, tools, and so on. Similarly, the misapplication of lean principles leads to an excessive reduction of resources or, as many companies state, “do more with less.” The problem is that the principle of “do more with less” cannot be applied infinitely. To maintain a working and effective quality system, a minimum quantity of overhead resources is needed to cover all those nonmanufacturing functions associated with quality systems. Just think about internal and external (supplier) audits, management review, validations, nonconformance investigations, complaint investigations, and so on. Multitasking individuals is the typical solution, which increases the opportunity for more human errors, as described in Chapter 10.

HUMAN FACTORS AND THE FDA-REGULATED INDUSTRY In the case of medical devices, the U.S. FDA and foreign regulators require the consideration of human factors during their design and development. In the landmark 1996 guidance Do It By Design—An Introduction to Human

26 Chapter 1

Factors in Medical Devices, the FDA established design requirements to avoid user errors. More recently, the FDA introduced those concepts of human reliability in a guidance published February 3, 2016, titled Applying Human Factors and Usability Engineering to Medical Devices. The FDA recommends that manufacturers follow human factors or usability engineering processes during the development of new medical devices, focusing specifically on the user interface, where the user interface includes all points of interaction between the product and the user(s), including elements such as displays, controls, packaging, product labels, instructions for use, and so on. While following these processes can be beneficial for optimizing user interfaces in other respects (for example, maximizing ease of use, efficiency, and user satisfaction), the FDA is primarily concerned that devices are safe and effective for the intended users, uses, and use environments. The goal is to ensure that the device user interface has been designed such that use errors that occur during use of the device that could cause harm or degrade medical treatment are either eliminated or reduced to the extent possible. Although the FDA’s guidance documents, including this one, do not establish legally enforceable responsibilities, they describe the agency’s current thinking on this topic. In a draft guidance published in January 2017 titled Comparative Analyses and Related Comparative Use Human Factors Studies for a Drug-Device Combination Product Submitted in an ANDA, the FDA provides recommendations to focus on the analysis of the proposed user interface (components that the user interacts with), including the delivery device and any associated controls and displays, as well as product labeling and packaging.35 In terms of minimizing the differences between the generic and reference listed drug (RLD), the FDA further explains general principles, including how to conduct three types of threshold analyses for the identification and assessment of the differences to ensure the same clinical effect and safety profile as the RLD under the conditions specified in the labeling: • Labeling comparison. “FDA recommends a side-by-side, line-byline comparison of the full prescribing information, instructions for use, and descriptions of the delivery device constituent parts of the generic combination product and its RLD.” • Comparative task analysis. “FDA recommends that potential applicants systematically dissect the use process for each product, that is, both the proposed generic product and the RLD, and analyze and compare the sequential and simultaneous manual and intellectual activities for end users interacting with both the

About Human Error 27

products. FDA recommends that sponsors analyze the differences with the goal to characterize the potential for use error.” • Physical comparison of the delivery device constituent part. “FDA recommends that the potential applicant of the proposed generic combination product acquire the RLD to examine (for example, visual and tactile examination) the physical features of the RLD and compare them to those of the delivery device constituent part for the proposed generic combination product.” If the threshold analyses determine that a user interface’s design difference may not be minor, potential applicants should first consider modifying the design, the FDA says, noting that it may also request data “to support that the user interface design difference(s) will not preclude approval of the generic combination product in an ANDA. Such data may be gathered in a comparative use human factors study that evaluates user performance of the critical tasks related to the external critical design attributes that are found to be different.” In terms of labeling issues, the FDA notes: There has been some confusion regarding whether the FDA expects for ANDA approval that a generic combination product be used in accordance with the labeling for the RLD. The FDA does not necessarily expect for approval that a generic combination product can be used according to the RLD labeling per se, but rather it is critical that the generic combination product can be substituted for the RLD without additional physician intervention and/or retraining prior to use. To this end, a comparative use human factors study as described in this guidance could be designed to account for how a particular proposed generic combination product might be used when substituted for the RLD. In the United States, the main organization (alongside the FDA) that deals with medical device standards is the Association for the Advancement of Medical Instrumentation (AAMI). A search of the human factors stan dards currently published by AAMI36 revealed the following human factors standards: • ANSI/AAMI HE75:2009(R)2018 Human factors engineering— Design of medical devices • ANSI/AAMI/IEC 62366-1:2015/A1:2020 Medical devices Part 1—Application of usability engineering to medical devices— Amendment 1

28 Chapter 1

• ANSI/AAMI/IEC 62366-2:2016 Medical devices Part 2— Guidance on the application of usability engineering to medical devices • AAMI TIR49:2013 Design of training and instructional materials for medical devices used in non-clinical environments • AAMI TIR50:2014/R(2017) Post-market surveillance of use error management • AAMI TIR51:2014/R(2017) Human factors engineering— Guidance for contextual inquiry • AAMI TIR55:2014/R(2017) Human factors engineering for processing medical devices • AAMI TIR59:2017 Integrating human factors into design controls • AAMI TIR61:2014 Generating reports for human factors design validation results for external cardiac defibrillators To finish this chapter, here is information about some of the worst nuclear accidents related to medical use of radioactivity. In all cases, human error was considered to be the main cause of these accidents: • Seventeen fatalities—Instituto Oncológico Nacional of Panama, August 2000–March 2001. Patients receiving treatment for prostate cancer and cancer of the cervix received lethal doses of radiation. • Thirteen fatalities—Radiotherapy accident in Costa Rica, 1996. One hundred fourteen patients received an overdose of radiation from a cobalt-60 source that was being used for radiotherapy. • Eleven fatalities—Radiotherapy accident in Zaragoza, Spain, December 1990. Twenty-seven cancer patients receiving radiotherapy were injured. • Ten fatalities—Columbus (Ohio) radiotherapy accident, 1974–1976. Eighty-eight injuries were attributed to a cobalt-60 source. • Eight fatalities—Radiation accident in Morocco, March 1984. • Seven fatalities—Houston (Texas) radiotherapy accident, 1980. • Five fatalities—Lost radiation source, Baku, Azerbaijan, USSR, October 5, 1982. Thirteen injuries.

About Human Error 29

• Four fatalities—Goiânia (Brazil) accident, September 13, 1987. Two hundred forty-nine people received serious radiation contamination from a lost radiotherapy source. • Three fatalities—Samut Prakan, Thailand, radiation accident, February 2000. Three deaths and 10 injuries resulted when a radiation therapy unit was dismantled. • Two fatalities—Meet Halfa, Egypt, May 2000. Two fatalities occurred due to radiography accident. • One fatality—Mayapuri, India, April 2010. Radiological accident.

2 Psychology and Classification of Human Error HUMAN ERRORS AND HUMAN FACTORS Very often we use the term human error to implicate the human as the cause of an undesirable result, whether it is an accident or—in a manufacturing environment, as is the focus of this book—a manufacturing error. Another widely used expression is unintentional human error. The result of a human error is not only undesirable but also unexpected because the person does not set out to do something wrong. However, human error rarely refers to a single incorrect action by an actor. Although individuals make errors and mistakes, it is usually because human factors issues were not properly addressed and controlled. Psychologists have tried to understand human error and to develop methods to prevent or reduce it. Although many models of human error exist, two of the most renowned authors of this discipline are James Reason (1990) and Jens Rasmussen (1983).1 Both references are a must-read for anyone interested in the field of human error. Human error can be defined as a departure from acceptable or desirable practices on the part of an individual resulting in an unacceptable or undesirable result. As defined by Reason and Hobbs (2003),2 an error is “the failure of planned actions to achieve their desired goal, where this occurs without some unforeseeable or chance intervention.” Human factors is defined as the discipline concerned with designing machines, operations, and work environments to match human capabilities, limitations, and needs. Human factors can be further defined as any factor that influences behavior at work in a way that can negatively affect the output of the process the human is involved with. This is pretty much the concept of standard work used for decades at Toyota. 31

32 Chapter 2

When an operator does not properly execute a manufacturing step, we immediately label it as human error. When we investigate the situation, inadequate training and supervision and lack of clarity in the working instruction can be factors behind the operator’s mistake. Human errors and mistakes are the symptoms of causal (human) factors associated with root causes that we must discover prior to solving them. This topic of human factors falls within the field of human reliability engineering. It deals with the person–process interface and how this interaction influences the performance of people. Some authors refer to human factors as performance shaping factors (PSFs). Performance influencing factors (PIFs) and error producing conditions (EPCs) are also used by different authors when referring to human factors.

CLASSIFICATION OF HUMAN FAILURES Human failures can be divided into two broad categories: errors and violations. • A human error is an action or decision that was not intended, that involved a deviation from an accepted standard, and that led to an undesirable outcome.

A laboratory technician performing two tests simultaneously uses the wrong sample for one of the tests.

• A violation is a deliberate deviation from a rule or procedure. A production operator fills out a cleaning record without performing the task.

Human errors are not a uniform collection of unwanted acts. Errors are a predictable consequence of basic and normally useful cognitive mechanisms, not random or arbitrary processes. As error expert James Reason suggests, “Correct performance and systematic errors are two sides of the same coin.” Jens Rasmussen defined three levels of human performance termed skill-based, rule-based, and knowledge-based, and their main properties are summarized in Figure 2.1. At the top, there is the skill-based level of performance in which we deal with familiar and nonproblematic tasks in a largely automatic way. In the middle, there is the rule-based level where we modify our largely automatic behavior because we have become aware of some change or problem. But this is a situation that we have been trained to handle or have experienced

Psychology and Classification of Human Error 33

before, or we have procedures that tell us what to do. We apply stored rules of the kind “if X (some situation), then do Y (some action).” In applying these stored rules, we operate largely by automatic pattern matching where we, unconsciously, match the available signs and indications to some stored solution. Only after this, and not always, do we use conscious thought to verify that we have adopted the right solution. At the bottom, we have the knowledge-based level of performance in which we recognize that we are in front of a novel problem and must think how to solve it because we do not have any ready-made solution. Although humans vary considerably in their capabilities and limitations, there are three ways in which planned actions may fail to achieve their current goals, as depicted in Figure 2.2.

Skill-based • Well-practiced tasks • Little cognitive effort, almost automatic tasks Rule-based • Established rules available • More complex than skill-based Knowledge-based • Novel situation with no learned routine or rules • Using knowledge to find a solution • Totally conscious activity

Figure 2.1

Characteristics of the three levels of performance.

Human

Skill-based errors

Figure 2.2

Mistakes

Types of human failures.

Violations

34 Chapter 2

1. The plan of action may be appropriate, but the actions themselves do not go as planned (execution errors). These error types are denominated as skill-based errors and include attention (slips of actions), memory (lapses), and recognition errors. 2. The actions may go completely as planned, but the plan is inadequate to achieve the desired goal. These error types are called mistakes and can be split into two categories: rule-based mistakes and knowledge-based mistakes. The failure here is at the level of making a plan (plan errors). 3. The actions can deviate intentionally from the established method of working. Such violations involve disobeying formal rules and procedures. Putting it in simple words, we can distinguish between unintentional (slips, lapses, recognition, and rule-based mistakes) and intentional (violations) errors. Focusing on the unintentional arena, we have the structure described in Figure 2.3. Skills are repeatedly practiced behaviors that we perform routinely with little conscious effort. They are literally automatic. Rule- and knowledgebased performance requires more mental involvement and conscious deliberation. Slips and lapses occur in familiar tasks we usually perform without much need for conscious attention. They are errors in the performance of skill-based behaviors, typically when our attention or memory is diverted and we fail to closely monitor the actions we are performing. Recognition errors include the misidentification of objects, messages, and signals, as well as the lack of detection of a problem (inspection of monitoring failures): An inspector fails to detect a defective unit. Slips (commission or action errors) are failures in carrying out the actions of a task. They can be described as “actions not as planned.” Slips are errors in which the intention is correct, but failure occurs when carrying out the activity required: A production operator documents the wrong date in a production document.

Figure 2.3

Types of human errors.

Recognition failures

Slips of actions

Skill-based errors

Lapses of memory

Human errors

Rule-based mistakes

Knowledgebased mistakes

Mistakes

Psychology and Classification of Human Error 35

36 Chapter 2

Lapses (omission or memory errors) cause humans to forget to carry out an action. They are defined as an error in operator recall and can be reduced by minimizing distractions and interruptions and by providing effective reminders, especially for tasks that take some time to complete or involve periods of waiting: A production operator forgets to document the date in a production document. Mistakes are errors in rule- or knowledge-based performance. They are a more complex type of human error where we do the wrong thing believing it to be right. Mistakes include errors in perception, judgment, inference, and interpretation. Two types of mistakes exist: Rule-based mistakes occur when our behavior is based on remembered rules or familiar procedures. We have a strong tendency to use familiar rules or solutions even when these are not the most convenient or efficient. Knowledge-based mistakes result from misdiagnosis and miscalcula tion. Planning or problem solving requires that we reason from previous knowledge or use analogies. Research in the nuclear power industry3 has found that the chance of a skill-based error is less than 1 in 10,000, and only accounts for 25% of all errors. The chance of a rule-based error is about 1 in 1000 and accounts for approximately 60% of all errors. Finally, the study revealed that in the nuclear industry, 15% of all errors are knowledge-based. Attention plays a significant role in all categories of human errors. Slips, lapses, and mistakes are all more common when situational factors such as fatigue, sleep loss, alcohol, drugs, illness, workload, stress, work pressure, multitasking, boredom, frustration, fear, anxiety, and anger play a role. Violations can be defined as deliberate deviations from rules, procedures, instructions, and regulations. Violations sometimes are called intentional errors: A production operator documents cleaning work that was not performed. The breaching or violating of health and safety rules or procedures is a significant cause of many accidents and injuries at work, and in the manufacturing sector there are frequent purposeful violations of the quality system procedures and instructions Violations are divided into three categories: routine, situational, and exceptional.

Psychology and Classification of Human Error 37

We can identify a fourth class of violations: necessary. This type of violation involves situations where noncompliance is necessary to complete the job: A filling operator is forced to transcribe data (a violation of the contemporaneous rule of data integrity) because there is no computer terminal in the weight monitoring station. In addition to this classification, there are many other ways to categorize human error. Almost every author involved in this field has developed his or her own list. Several of them are so complex as to require a graduate degree in human psychophysiology to be understood. To simplify, we will discuss just two of the most frequently used classifications: Swain and Guttman, and Reason. Swain and Guttman (1983) divide human error into four main groups: 1. Errors of omission (forgetting to do something) 2. Errors of commission (doing the task incorrectly) 3. Sequence errors (doing something out of order) 4. Timing errors (doing the task too slow, too fast, or too late) Reason (1990) distinguishes between two types of errors: 1. Active failures have an immediate consequence and are often made by frontline workers 2. Latent failures are made by people whose tasks are removed from operational activities

Active and Latent Failures Active failures have an immediate consequence and are usually made by frontline workers such as drivers, control room staff, or machine operators. In a situation where there is no room for error, active failures have an immediate impact on quality or health and safety. Latent failures are made by people whose tasks are removed in time and space from operational activities (for example, designers, decisionmakers, and managers). Latent failures are typically failures in management systems such as design, implementation, or monitoring. Examples of human factors behind latent failures are: • Poor design of a plant and equipment • Inadequate procedures and work instructions

38 Chapter 2

• Ineffective training • Inadequate supervision • Inadequate staff and resources • Ineffective communications • Uncertainties in roles and responsibilities In the medical products manufacturing environment, most active failures trace back to some precondition (latent failure). We need a good tracking and trending analysis system to be able to discover what in many cases is a true cause-and-effect relationship. One of my favorite examples can clarify this point: A medical device manufacturer has a production room with two dozen identical machines producing the same kind of subassembly product. They work 24/7, and each workstation is attended by only one operator per shift who also verifies the quality of his/her job prior to sending the pieces to the storage room. The next operation suffers from frequent defective subassembly products sent from the production room, and this situation is jeopardizing the productivity of the whole plant. For a couple of years, the level of defective subassembly has been between 2% and 3%, and the impact on scrap, productivity, and rework time is in the millions of dollars. Nonconformance investigations always pointed to inadvertent human errors (they even created a form to document the human error in the area) corrected with retraining, awareness, and occasional termination. Nothing seemed to work, and the rate of error remained steady. A simple analysis (data segmentation) revealed the real situation: more than 80% of the bad units were produced by two workstations, independent of operator, shift, or day of the week. This concentration chart of defects also revealed that these two stations were the ones situated next to the two doors of the manufacturing room. They were all-glass doors opening to the main corridor of the plant. The consequence of this location was that both corner workstations received many social visits from coworkers, and obviously this represented an enormous source of distraction for operators attending to these two machines. As you can see, the latent factor creating those human errors was the layout of the facility. A simple substitution of the all-glass doors with metal doors effectively reduced the rate of defective products by more than twothirds in just the first month. From this moment on, the defect rates for these two stations were not statistically different from the other 22 stations in the same room.

Psychology and Classification of Human Error 39

Human error models describe different types of human error, using different taxonomies. Also, they differ in the explanation of how the error might occur and how to reduce the likelihood of the error occurring. Reason (2004) proposed what is known as the Swiss cheese model of system failure. Every step in a process has the potential for failure. The system is analogous to a stack of Swiss cheese slices. Each hole is an opportunity for a process to fail, and each of the slices is a “defensive layer” in the process against potential error impacting the results. An error may allow a problem to pass through a hole in one layer, but in the next layer the holes are in different places; the problem should be caught. For a catastrophic error to occur (a plane crash or the distribution of a pharmaceutical product with incorrect label information), the holes must align for each step in the process. This allows all defenses to be defeated and results in an error. If the layers are set up with all the holes aligned, it becomes an inherently flawed system that will allow an error to become a final product defect. Each slice of cheese is an opportunity to stop an error. The more defenses we put up, the better. The fewer the holes and the smaller the holes, the more likely you are to notice errors that do occur. It is important to note that the presence of holes in any one slice of our quality system may cause a final defect problem. Usually, this happens only when the holes in many slices momentarily line up to permit the result of the error to escape our controls. Reason establishes that those holes in the defenses arise for two reasons—active failures and latent preexisting conditions—and that nearly all adverse events result from a combination of these two sets of factors.

HUMAN FACTORS CATEGORIES: LEARNING FROM HUMAN FAILURES Much can be learned from human failures in major accidents such as those previously described in Chapter 1. They reflect relevant human factors failing and provide an opportunity to understand how human failures can be avoided or, at least, minimized. Most major accidents have suffered from multiple human factors failures, some more than others. Chapter 4 contains a detailed discussion of the most relevant human factors in the manufacturing environment. From these accidents, we can identify the following recurrent human factors categories.

40 Chapter 2

Human Factors in Design The design of tasks, tools and equipment, workspaces, and environments in which work is performed is a key factor in minimizing the occurrence of human failures.

Procedures and Documentation Deficiencies in document development, organization, management, review, and presentation can have a significant impact on the resulting behavior and performance of the people using them.

Organizational Culture If safety is not prioritized as part of the organizational culture, then unsafe behaviors are more likely to occur. The same reasoning principle applies to quality: if quality is not prioritized over production goals, for example, then corner-cutting behaviors will emerge.

Organizational Change How the organization manages changes is another key human factor. Most lean initiatives are, at a minimum, ill perceived by operators.

Staffing Staffing is about having the optimal number and type of people to consis tently perform at the required standard in all operational scenarios. As frontline leaders, supervisors have a key role in enabling safe and efficient operations. Trying to do more with less inevitably leads to multitasking and other error-prone situations and can place humans at the brink of human failures.

Time Pressure Time is one of the rarest commodities in any industry. In aviation, for example, maintenance personnel have faced pressure to get aircraft back into service since the beginning. Lack of or inadequate maintenance of manufacturing equipment, shortcuts skipping some line clearance or equipment setup steps, or inadequate training are some symptoms of the time pressure situation. A common finding in audits we performed is that scheduled maintenance, including calibration of manufacturing equipment, is not performed as per schedule due to production schedules that barely allow the performance of minimum between-batch cleanings.

Psychology and Classification of Human Error 41

Training and Competence Competence is about the ability to meet role responsibilities and consistently perform to a specified standard. Meeting the competence standard requires training and development of the required knowledge, skills, and attitudes. The standard should align with the hazards under control and enable a sufficient understanding of them and their associated control measures.

Coordination and Communications This relates to the transmission and receipt of critical information throughout the organization but is particularly relevant for shift handover and permitto-work systems.

Fatigue and Shift Work Performance is adversely affected when humans are fatigued. This can be caused by poor sleep quality, an insufficient amount of sleep, or excessive wakefulness. Poor design or management of shift patterns and other factors can lead to fatigue.

SKILL-BASED ERRORS: RECOGNITION, COMMISSION, AND OMISSION As mentioned earlier in this chapter, at the skill-based level of performance we deal with familiar and nonproblematic tasks in a largely automatic way. Using very simple terms, there are two ways people can go wrong: they can do something they should not have done (commission errors) or fail to do something they should have done (omission errors). Using a more common terminology, we can refer to them as slips and lapses, respectively, and the failure involves either attention (slips of actions) or memory (lapses). Reason and Hobbs (2003) introduced the concept of recognition error, applicable, for example, to inspections. Therefore, skill-based errors can be identified with three related aspects of human information processing: recognition, memory, and attention.

Recognition Failures Recognition failures fall into two main groups: 1. The misidentification of objects, messages, signals, and so on 2. The nondetection of problem states (inspection or monitoring failures)

42 Chapter 2

A major factor in misidentification is the similarity (in appearance, location, function, and so on) between the right and wrong objects. This type of error has been the cause of many serious accidents. For example, take train drivers who misread a signal aspect. This is the reason why the important stop traffic signal has a unique, hexagonal shape. Familiarity is another factor contributing to misidentification. In well-practiced and habitual tasks, perceptions become coarser. In many situations, we still rely on the human eyeball for most faultfinding tasks. Nondetection errors typically involve a failure to notice a visible fault during an inspection or monitoring task. Following are some factors contributing to nondetection errors: • Inspector was distracted, tired, or in a hurry • Inspector was multitasking • Inadequate inspection conditions: poor lighting, inadequate rest break, insufficient time to inspect items • Lack of experience and proper training Very important in many situations is that we do not take into account the physiological limitations of the human visual system. Nondetection errors also reflect the vigilance and alertness decrement that humans suffer. During World War II, it was found that after about 20 minutes at their post, radar operators became increasingly more likely to miss obvious targets, even though they were attentively concentrating on the radar screen. This problem, known as the vigilance decrement, applies to monitoring tasks where “hits” are relatively few and far between. Most of the quality control inspection in factories, including all visual inspection tasks, is subject to this decrement, which may explain why the ubiquitous 100% visual inspection is never 100% accurate. Another factor contributing to making human inspection unreliable is the fact that on a long, boring inspection task, the mind will tend to wander to other matters. To obtain maximum efficiency (although never 100%), shorter visual inspection shifts are highly recommended along with breaks and changes. Errors in a product’s labeling (incorrect expiration date or incorrect batch number on the label) or a missing part within a medical device kit are typical examples of recognition failures, where up to six different people (production operators and quality inspectors) failed to identify the error.

Psychology and Classification of Human Error 43

Memory Failures (Lapses, Omission Errors) Memory lapses can occur at one or more of three information-process ing stages: 1. Input failure 2. Storage failure 3. Retrieval failure Typical input failures include failing to remember something we were told. A classic example is not remembering information provided during previous training. Simply hearing a stream (sometimes a flood) of new information while our minds are wandering or we are multitasking (checking messages on a phone or laptop, and so on) can impair recall. This is one of the main reasons to measure the training or learning effectiveness. To be able to remember something later, a fundamental precondition is that we give the right amount of attention to such matters. Although storage failures present in many different forms, the one most likely to have an adverse effect during manufacturing or testing processes concerns forgetting the intention to do something. An intention to perform some tasks is rarely put into action immediately. Typically, it has to be held in memory until the appropriate time and place for execution. Memory for intentions is called prospective memory, and it is particularly prone to forgetting. Another type of storage failure is that you remember the intention and start to carry it through, but something along the way (for example, you are preoccupied or distracted by something else) causes you to forget what it is you intended to do. How many times have you walked into another room intending to do something and then forgotten why you are there? We have a simple remedy or “vaccine” against memory storage failure. Read, execute, and document. Following this simple sequence, you will not depend on your memory for the execution of the task. Checklists are also a very powerful tool for minimizing memory failures. Checklists are discussed in Chapter 9. Retrieval failures are among the most frequent ways our memory can let us down. A special factor in play here is that they become increasingly frequent with age. This type of memory failure can show itself as tip-ofthe-tongue (TOT) situations where you realize that you cannot remember a name or a word that you know you know. Again, the read–execute–document way of working is a simple but powerful solution to this type of memory failure situation.

44 Chapter 2

Omissions Following Interruptions An additional failure to carry out a necessary check on progress can be caused by some local distraction. On many occasions, the interruption causes the person to forget the subsequent actions, or allows him/her to get diverted into something else. The situation is very similar to a multitasking situation; the person cannot handle simultaneously his/her current task and the interruption. A study found that, on average, workers are interrupted every 11 minutes and then spend almost one-third of their day trying to recover from these distractions. The very common situation of a phone ringing in a hospital’s pharmacy is known to have caused many medication errors (some of them lethal) across the United States. A pharmacy technician preparing a medication would stop the process to attend to the phone and then, when he or she returns to the unfinished work, a mistake happens. Now, in many hospital pharmacies, a person is assigned to answer the phone, thus freeing all other technicians from these interruptions. To conclude this section on memory failures, let’s present some statistics. In some well-studied fields, such as maintenance work in nuclear power plants and in the aviation industries, omission errors are the single largest class of human performance problems (Reason and Hobbs 2003). For example, an analysis of U.S. nuclear power plant operations found that 64.5% of the errors associated with maintenance-related activities involved the omission of necessary steps.4 Similar figures are found in aviation maintenance. As mentioned in Reason and Hobbs (2003), the analysis of 122 maintenance errors showed that omissions accounted for 56% of the total. Here is the breakdown of those omission errors: • Fastening left undone or incomplete (22%) • Items left locked or pins not removed (13%) • Caps loose or missing (11%) • Items left loose or disconnected (10%) • Items missing (10%) • Tools and/or spare fastenings not removed (10%) • Lack of lubrication (7%) • Panels left off (3%) • Other (14%)

Psychology and Classification of Human Error 45

Attention Failures (Slips) Much of our behavior is guided by “automatic” routines. The more skilled and experienced a worker is, the more he or she will be able to perform even complex tasks on “mental autopilot.” Many of our daily tasks, in both personal and work environments, can be delegated unconsciously to skillbased action routines. It is very important to understand when investigating those slips that people do not necessarily choose to perform tasks in this way. Action slips happen when our automated routines take control of our actions in ways that we never intended. A typical example of this kind of failure is when some procedure or instruction is changed but operators continue “automatically” to execute the task as they were doing during the last several weeks or months. As a personal example, the month after I quit a job I had for 12 years, I drove to the work site twice when I meant to go to another place. Some authors have labeled this “taking the wrong route,” and in my case, this name cannot be better suited. The wrong route is almost invari ably more familiar and more frequently traveled than the one that is currently intended. Absent-minded slips of action are not random events. They fall into predictable patterns and can be associated with three causal factors: 1. We are performing a routine, habitual task in familiar circumstances. 2. Our attention is “seized” by some unrelated preoccupation or distraction. Therefore, we are not well focused on the task. 3. There is some change, either in our plan of action or in the surroundings. Absent-mindedness is the toll we pay for being skilled, that is, for being able to control our routine actions in a largely automated fashion. In terms of manufacturing industries, experienced workers are more prone to have those slips because they execute well-practiced habitual tasks in familiar surroundings. Attention is a limited commodity. If we devote it to one thing, it is necessarily withdrawn from other things or tasks. Again, this is the problem with multitasking: our attention is divided, and most of the time we do none of the multiple tasks we were engaged in correctly. Attention to some critical step (for example, a decision point) is what goes absent in absent-mindedness. In the process and manufacturing industries, slips and lapses occur in familiar tasks we can perform without much need for conscious attention (see Table 2.1).

46 Chapter 2

Table 2.1 Slips and lapses of memory. Slips (commission or execution errors) • Operating the wrong switch control or valve • Misordering a sequence of steps • Transposing digits when printing a lot number or expiration date • Mixing up products (incorrect label, incorrect product, both incorrect)

Lapses (omission errors) • Not recording equipment identification in the production record • Omitting information that must be recorded • Omitting a step or series of steps from a task

• Not detecting incorrect expiration date, incorrect lot number, incorrect size, or other defects during an inspection

MISTAKES: RULE-BASED AND KNOWLEDGE-BASED There are many definitions of mistake. A mistake is an incorrect decision or choice. A mistake is an error whose result was unintended, as opposed to an action that was intended. Perhaps a most useful definition is that a mistake is an incorrect intention or an incorrect value judgment. Although “mistake” has such a loose meaning in common language, it has come to have a rather precise technical meaning among human error specialists following the works of Norman (1988) and Reason (1990). Following their now standard terminology, we can distinguish between “slips” and “mistakes,” where mistakes are planning failures (errors of judgment), when actions go as planned but the plan is bad. Two types of mistakes exist: rule-based and knowledge-based.

Rule-Based Mistakes Rule-based mistakes occur when our behavior is based on remembered rules or familiar procedures. We have a strong tendency to use familiar rules or solutions even when these are not the most convenient or efficient. In the process and manufacturing industries workers are extensively trained, and their work is almost completely proceduralized. This means that most of their mistakes need to be understood as a deviation from the

Psychology and Classification of Human Error 47

appropriate rule of procedure. There are two primary ways in which rulebased mistakes can arise: 1. Misapplying a good rule (assumptions) 2. Applying a bad rule (habits) A good rule or principle is one that has worked in the past. Misapplying good rules can happen in circumstances that share many common features for which the rule was intended, but where significant differences are overlooked. An old adage can be used to describe a typical rule-based mistake: white liquid in a bottle has to be milk. Another example comes from an investigation I reviewed many years ago, where the root cause for the presence of brown specks in a batch of tablets was assigned to a degrading silicone gasket. The reason the investigator gave me was that a couple of years before, he investigated exactly the same situation, and this was the root cause. Therefore, now it had to be the same. The only problem? The corrective action from the first incident was to modify the machine to eliminate the use of a silicone gasket. Many people pick up undesirable habits when learning a job, and those bad habits become part of the person’s established work routines. They get the job done, and most of the time there are no bad consequences—until the person encounters circumstances that expose the flaws in his or her habit or rule. Perhaps one of the most common examples of those bad habits relates to good documentation practices. Backdating and incorrect transcription of data are two of the most common violations to the data integrity principles discussed in Chapter 10. Although everyone is aware that good manufacturing practices (GMP) information must be documented at the time it is generated or observed, many people (at all levels) working in this regulated environment have the habit of documenting the information later. A very powerful tool that can and must be used to detect and eliminate these bad habits is the internal auditor program (self-auditing).

Knowledge-Based Mistakes Knowledge-based mistakes result from misdiagnosis and miscalculation. Planning or problem-solving requires that we reason from previous knowl edge or use analogies. The errors that occur during knowledge-based problem-solving can arise for two reasons: failed problem-solving and/or a lack of knowledge. Knowledge-based errors are particularly likely if

48 Chapter 2

the person is performing a task for the first time. This happens even in training-intensive industries, such as medical product manufacturing, and is a symptom of ineffective training at the least. In the entire human error universe, this type of mistake is the only one for which training may be the adequate corrective action. A typical example in our industry is to have people performing quality deviation investigations without appropriate problem-solving training.

NORMALCY AND IMPAIRMENT Those involved in the design of things (equally applicable to the design of a kitchen appliance or to the design of a manufacturing procedure or testing instruction) very often forget that not all humans are equal. From a psychometric perspective, one can distinguish between normal and impaired ranges. Of course, there should be extreme care taken in labeling a part of the population as “impaired” to any degree because of the social, personal, and legal considerations and consequences. However, anyone desiring to understand human error should take into consideration that even those judged normal, on a behavioral attribute, could commit human errors, although those who are impaired probably commit a higher frequency of errors. A “normal” person may become confused by even simple instructions, may become inattentive easily, may misinterpret procedures, may ignore warnings, and may forget what she or he has done in the past. For example, in terms of intelligence scores, half of the general population is, by definition, below average in IQ because intelligence scores follow the normal (Gaussian) distribution. The correct execution of any task is a very complex process consisting of many inputs: design of the task, design of the documentation, including instructions, design of the equipment used in the tasks, materials availability, and so on. Processes and their supporting documentation must be designed to allow for adequate execution and performance by all individuals who are supposed to perform the given task. If we design our processes (including their documentation) only with inputs from some top or average performers, we are leaving out, in the best case, 50% of our personnel.

Psychology and Classification of Human Error 49

ERRORS AND ERROR-PROVOKING CONDITIONS Human errors do not occur randomly but are shaped by factors that are part of the environment in which the person is functioning or working. Errorproducing conditions in the workplace are commonly referred to as human factors, meaning that they are present in the immediate surroundings at the time of the error. There are many of these factors that can affect worker performance, for good or ill. However, experience shows that a relatively limited number of factors appear over and over again when investigating specific human error incidents. This means that when considering your own workplace and the factors that increase the probability of error, you can focus on a relatively short, manageable list of issues. Reason and Hobbs (2003) mention an example regarding maintenance errors in the aviation industry. The International Civil Aviation Organization (ICAO) lists more than 300 influencing factors ranging from heat and cold to boredom, nutritional factors, and even dental pain. However, a short list of recurrent factors emerged from analyzing maintenance accident and incident reports from the aviation industry. Among those persistent factors: documentation issues, time pressure, poor housekeeping and tool control, inadequate coordination and communication, inadequate tools and equipment, fatigue, inadequate knowledge and experience, and problems with procedures. Continuing with our topic of maintenance in the aviation industry, an analysis of more than 600 maintenance incidents demonstrated the links between errors and a range of workplace factors. This analysis showed that: • Memory lapses, one of the most common types of maintenance errors, are closely associated with time pressure and fatigue. • Rule-based errors are linked with inadequate procedures and coordination. • Knowledge-based errors are strongly associated with training (as we would expect). • Slips are closely related to equipment deficiencies. • Violations are linked with time pressure.

3 Intentional Noncompliance

I

ntentional noncompliance is defined as a situation where, for whatever reason, an individual or group of individuals consciously decide to perform a task in a way they know is not correct. For example, they may do something that is not written in the procedure, and they know they are not doing the task in the way it is meant to be done. To reduce intentional noncompliance in the workplace, it is necessary to understand what was going on from the point of view of the person(s) who engaged in the intentional noncompliance. In theory, distinguishing noncompliant behavior from human error is relatively easy. It is necessary to determine two aspects to classify a behavior as intentionally noncompliant: 1. The person knew the correct way to behave. 2. There was intentional deviation from the correct way to behave.

WHY DO PEOPLE VIOLATE RULES? Violations are another kind of human failure that significantly differ from errors in that they are deliberate deviations from rules and procedures, while the second are, by definition, unintentional ones. In some cases, the distinction between errors and violations can be unclear, particularly when the violator does not understand the consequences of noncompliance. Only in a sabotage situation do the saboteurs intentionally carry out both of the noncompliant behaviors and produce their bad consequences. Violations arise mainly from motivational factors; beliefs, attitudes, and norms; and organizational culture.

51

52 Chapter 3

The prevalence of violations in the manufacturing industries is largely unknown due to the lack of formal studies or statistics. However, when product and process failures are investigated in highly regulated industries (as the manufacture of medical products), violations are a significant causal factor of many of those failures. Common manufacturing and testing violations include: • Deviating from formal procedures • Deviating from good documentation practices • Signing for or documenting actions (execution of an activity, verification of an activity performed by others, and so on) that were not actually performed Like errors, violations show up in several different forms. There are four main categories: 1. Routine violations 2. Situational violations 3. Exceptional violations 4. Necessary violations

Routine Violations These are committed to avoid unnecessary effort, to get the job done quickly, or to circumvent what seem to be unnecessarily difficult proce dures. Typically, these corner-cutting violators do not understand the nega tive consequences of their noncompliance. Typical examples of this type of violation are most of the data integrity cases described in Chapter 10. Good manufacturing practices require that activities must be documented immediately as they are executed or observed. Performing an activity and documenting it later (known as backdating) is a habitual violation in our industry. It seems that in other industries it is also the norm. For example, routine violations were one of the most common forms of unsafe acts discovered during a survey of aircraft maintenance personnel carried out in Australia (Reason and Hobbs 2003). In fact, it was found that more than 30% of maintenance personnel had signed off on a task before it was completed. Another finding of this survey was that more than 90% of surveyed maintenance personnel confessed to having done a job without the correct tools or equipment. The biggest problem with these corner-cutting routine violations is that they become established at the skill-based level of performance.

Intentional Noncompliance 53

Situational Violations The primary aim when committing a situational violation is simply to get the job done. In some situations, it is very difficult to get the job done if one sticks to the procedure. Situational violations arise from a mismatch between work situations and procedures. For example, procedures require that the process of weighing components for a pharmaceutical product must be witnessed by a second individual to avoid human error in this critical step of the manufacturing process. In situations of a shortage of staff, waiting until a second person becomes available can delay the production schedule. Therefore, sometimes the operator proceeds (in violation of proce dures and regulations) and completes the weighing process without the required witness. Another example of a situational violation is when workers decide to not wear the required gloves, safety glasses, or lab coat (to mention just a few typical examples). They may want to do the job well but believe that they can do the job better without the mandatory protective equipment. In a recent case, a company discovered that the environmental monitor ing technician responsible for the environmental monitoring of one of the production and warehouse buildings did not take all the required samples. In fact, only several days every week did she walk to this isolated building located in a corner of the manufacturing complex and take the required samples. On other days, she simply filled out the required documentation without taking any samples. As she was also the person who tested those samples, she used samples from other buildings to generate environmental monitoring results for the isolated building. From the investigation performed, it was concluded that she only cheated with samples from that specific building, probably due to its location, isolated and relatively far from the rest of the buildings in the manufacturing complex.

Exceptional Violations As the name says, these are not typical, everyday violations. There is a sum of factors that create the “perfect storm.” These are exceptional circum stances, relatively rare events. In another recent example, a manufacturing supervisor falsified the signature of the laundry room group leader. The manufacturing crew needed clean gowns to perform line clearance and begin manufacture during a special shift called in on a Sunday morning. Nobody coordinated to have support from the laundry area. The manufacturing supervisor took the initiative to request that the security guard open the laundry room with the master key. Then he took the gowns and signed off the area’s logbook

54 Chapter 3

with both his signature, as requestor of the clean gowns, and the falsified signature of the laundry room leader. Both signatures are required in the logbook by procedure. Instead of generating a deviation due to the lack of the laundry room signature, his gross signature falsification ended his employment.

Necessary Violations We can identify a fourth class of violations: necessary. This type of violation involves situations where noncompliance is necessary to complete the job. The lack of a computer station where torque testing is performed requires the operator performing the testing to transcribe data to a piece of paper (which is a violation of the original principle of data integrity) and then walk several dozen feet to the nearest computer station where she can input the torque testing results into the software.

SABOTAGE Sabotage comprises rule-breaking actions in which the perpetrators intend for their violations to have damaging consequences. The person breaking the rules does so with full knowledge of the damage he/she will cause. Sabotage ranges from vandalism (uploading a computer virus into the work network because he/she was disciplined) to gross acts of a criminal nature, such as the fire that occurred at the Hotel Dupont Plaza in San Juan, Puerto Rico, on New Year’s Eve, December 31, 1986. The fire was set by three disgruntled employees of the hotel who were involved in a labor dispute with the owners. It claimed 97 lives and caused 140 injuries. It is considered the most catastrophic hotel fire in Puerto Rican history and the second deadliest in the history of the United States.

ACTIONS TO REDUCE INTENTIONAL NONCOMPLIANCE To fight intentional noncompliance, a detailed analysis needs to be performed to understand the true root causes of the behaviors. It is important to under stand what motivates humans to break the rules. There are some basic principles that can be applied to reduce intentional noncompliance. These principles influence attitudes and behavior controls, and they can be used proactively to reduce the likelihood of intentional

Intentional Noncompliance 55

noncompliance or reactively by choosing the appropriate solution based on analysis: • Set a culture of quality and compliance. • Enhance your human failure investigation program to clearly distinguish between violations (intentional) and errors (unintentional). • Make it easy for people to comply. • Be careful with slogans such as “whatever it takes.” Some people will take it literally and cut some quality corners. • Always prioritize quality and compliance over production quotas. • Make everyone aware of the positive behaviors of personnel. • Reward desired behaviors and discourage undesired behaviors. • Establish clear quality and compliance standards, and have top managers “walk the talk.” • Highlight the risk and impact of noncompliance. • Involve staff in improving work conditions and behaviors. • Improve supervision (when supervisors talk more about quality and reinforce supporting behaviors, quality compliance improves).

4 Human Factors

H

uman errors do not appear randomly. They are shaped by situations and factors that are part of the environment in which the person is functioning. People are at the heart of any system, even the most highly automated ones. Despite significant technological development within the manufacturing industries, systems do not always work as effectively as planned. And most of the time, this can be attributed to the failure to consider human interactions and behaviors. Human factors can be further defined as any factor that influences behavior at work in a way that can negatively affect the output of the process the human is involved with. We can distinguish between local error-provoking factors and “upstream” factors such as the company’s culture. There are many error-producing conditions in the workplace, from time pressure to inadequate equipment, illness, or sleepiness. However, the Pareto principle also applies here, and experience shows that a relatively limited number of local factors appear more frequently than others. The primary focus of human factors is to optimize safety and performance; it is undertaken by focusing on the human interactions that occur within a work system. A basic model for human factors relative to the process industries is shown in Figure 4.1. This model identifies three main domains for human factors: facilities and equipment, people, and management systems.

BEHAVIOR-BASED COMPLIANCE AND QUALITY CULTURE Improving the quality of our products and the compliance of our processes requires going beyond traditional training, testing, and inspectional approaches to manage risks. It requires a better understanding of the 57

58 Chapter 4

People

Facilities and equipment

Management systems

Figure 4.1 Human factors domains. Source: Center for Chemical Process Safety 2007.

company’s culture and the human dimension of quality and compliance in our highly regulated industry. If we want to improve the quality of our products and the compliance of our processes, we must change the way people do things. We must change people’s behaviors by implementing a behavior-based quality management system and by extension a behaviorbased quality and compliance culture. If you want to manage the quality culture of your organization or you simply need to improve the quality and compliance level of your group, you must first understand what quality culture is, what content it covers, and how to assess it. Culture and leadership are two sides of the same coin, and one cannot understand one without the other.1 The importance of a compliance and quality culture in the prevention and mitigation of human failures (errors, mistakes, and violations) cannot be ignored. A good organizational culture will serve to reduce human error. The effects of cultural and even demographic factors in system operations are evident. Behavior-based management can be viewed as a system of management based on the sciences of human behavior and organizational culture that is used by an organization to produce results. Principles of managing people that aim to shape and influence behavior and performance are complex. Moreover, they are rarely integrated into approaches for enhancing perfor mance and into pharmaceutical quality system management practices. Quality behaviors can be defined as:

Human Factors 59

Behaviors observed at the site or organization that are associated with a strong quality culture in areas such as clear communication and transparency, commitment and engagement, technical excellence, and standardization of requirements. We develop and manufacture medical products under a quality management system (QMS), for example, ICH Q10 for pharmaceutical and biopharmaceutical or ISO 13485 for medical devices. We do not name them as quality leadership systems. And it is very likely that everyone reading this book appreciates that management and leadership are different. Managers oversee and optimize processes to deliver results. Leaders change (improve) those processes to deliver greater results. Companies need both managers and leaders, as they are both necessary to produce and deliver safe and effective medical products to patients worldwide. In other words, we need better quality system management and more quality and compliance culture leadership. Table 4.1 depicts the differences between a traditional QMS and a behavior-based quality and compliance culture for medical product manufacturers. A QMS is defined as a formalized system that documents processes, procedures, and responsibilities for achieving quality policies and objec tives. A QMS helps coordinate and direct an organization’s activities to meet customer and regulatory requirements and improve its effectiveness and efficiency on a continual basis. Table 4.1 Comparison between typical QMS and behavior-based quality culture. Medical Products Quality Management System

Behavior-based Medical Products Quality and Compliance Culture

Focused on processes

Focused on process and people (human factors)

Primarily based on GMP

Based on CGMP, risk management, behavioral science, and organizational culture

Simplistic view of behavior change

Complex view of behavior change

Linear cause and effect thinking

Systematic thinking

Develop a quality/CGMP program

Develop and sustain a quality and compliance culture

60 Chapter 4

In the case of the pharmaceutical industry, many of the quality system elements correlate closely with the current good manufacturing practices (CGMP) regulations. It is a very process-focused system and yes, welldefined processes and standards are critical, but they are not sufficient to guarantee that medical products are safe and effective. A behaviorbased quality and compliance culture is process focused, but it is also people focused. Traditional medical products’ QMSs are primarily based on meeting CGMP requirements. Under a behavior-based quality and compliance culture, we know that this is not enough. We understand that achieving quality success requires not only an understanding of the CGMP but also the behavioral sciences and organizational culture. This is the main difference between having a quality program (for example, one based on CGMP requirements) and having a culture of quality and compliance. Leaders must figure out a way to get employees at all levels of the organization to do the right things, not because they are being held accountable to them, but because they believe in and are committed to quality and compliance. There was, and still is, an overemphasis on training and quality control activities, including inspections, to modify behaviors and achieve better results. This is largely based on the belief that the desired behavior change can be achieved by simply training employees and inspecting processes. Leaving aside the fact that most of the training programs in our industry are ill-designed and rarely include appropriate measurement of effectiveness, it should be recognized that both activities are important, but they are not enough to effectively change behaviors and achieve quality success. We must understand the complexity of behavior and analyze the cause of the performance problem (lack of skill, lack of motivation, ineffective work system, and so on) before proposing the right solution. A vast majority of medical product manufacturers address specific issues in isolation or as individual components, not as a whole or complete system. This can be observed by looking at their corrective and preventive action (CAPA) programs and noticing the lack of true preventive action. They only have corrections and corrective actions, and they approach quality and quality issues with a sort of linear cause-and-effect thinking. This linear thinking is not adequate to address complex issues. When a company implements a behavior-based quality and culture compliance, it looks into the company’s problems as a whole, and the company understands that there are multiple factors (including the soft ones related to personal and organizational behaviors) that affect performance. A very positive consequence of this systematic thinking is the shift from CAPA programs to mostly corrective ones where the systemic preventive actions are predominant.

Human Factors 61

Organizational culture can influence a wide variety of operator– equipment interactions and can make the difference between effective and erroneous performance. Even among companies seemingly devoted to enhancing operational performance and safety, organizational culture can have a potentially adverse effect on achieving quality and safety. As an example, The New York Times reported on poor organizational practices in the National Aeronautics and Space Administration (NASA) in an article entitled “Poor Management by NASA is Blamed for Mars Failure.” The Mars Polar Lander spacecraft probably failed last year because its descent engine shut down prematurely, but the mission’s loss can ultimately be attributed to inadequate management, testing, and financing, independent experts told NASA today. In candid reports assessing recent problems with NASA’s program to explore Mars, two panels concluded that pressures to conform to the agency’s recent credo of “faster, cheaper, better” ended up compromising ambitious projects. To meet the new constraints, the report said, project managers sacrificed needed testing and realistic assessments of the risk of failure.2 The article also indicates that NASA management had been criticized for many of the same management practices and norms demonstrated after the 1986 Space Shuttle Challenger accident, noting that: “Several recent panels have suggested the new approach may have gone too far by emphasizing cost-cutting and tight schedules at the price of quality.” On the other hand, in an article about the retail giant Costco Wholesale published December 15, 2016, in Fortune,3 the author mentioned that “everyone at Costco will tell you that its culture comes directly from Jim Sinegal,” Costco’s cofounder and CEO from 1983 to 2012. The author also mentioned that for Sinegal, “Culture is not the most important thing. It’s the only thing.” The article credited Costco’s cofounder with creating a management style that was based on an informal, unintimidating environment in which no one was afraid of making mistakes. Sinegal had another inviolate values proposition: “Inexpensive couldn’t mean cheap, because he knew Costco would lose customers that way. Quality, quality, quality.” If we analyze how organizations are dealing with safety culture to prevent accidents, we will have a great benchmark for preventing errors affecting quality as well. Inquiries into major accidents have found faults in the organizational structures and procedures. These were judged to be

62 Chapter 4

as important as the technical and individual human failures. There is now an emphasis on the need for organizations to improve their safety cultures. Safety culture can be divided into two parts. The first comprises the beliefs, attitudes, and values (often unspoken) of an organization. The second is more concrete and embraces the structures, practices, controls, and policies that an organization possesses and employs to achieve a safer environment. We can perfectly translate the following statement regarding safety culture (to prevent accidents) to our quality and compliance environment when trying to prevent errors and mistakes: Safety culture. The effectiveness of self-reporting and behavioral observation programs depends greatly upon the safety culture at a site. For example: If self-reporting of impairment or reporting an impairment concern about another staff member even occasionally results in disciplinary action, then supervisors and workers will naturally be reluctant to report other staff members who appear to be impaired. On the other hand, if individuals who have come to work under some form of stress are treated fairly and with concern, personnel will report more frequently. If the company’s culture emphasizes safety over other goals, personnel may be willing to turn down overtime and monitor their own fatigue levels, even if turning down the opportunity results in a loss of income.4 Earlier in my career, I was involved in an analysis of several thousand batches of medical products produced at one FDA-regulated plant. Not a single event was ever documented related to component(s) spilled prior to charging the mixing tanks. However, more than 40 batches had document deviations (out-of-specification analysis results) due to the concentration of one or more of their components below specifications. Many (if not most) of these failures were a consequence of some material being spilled prior to its addition to the bulk tank. Failures and subsequent investigations and product rejection could be avoided if the operator(s) had notified supervisors about the spill and requested more components. Instead, fear of retaliation and punishment led them to hide these situations. Needless to say, each of those 40 failure investigations was a nightmare; laboratory results showed a lack of adequate quantity of components, while pristine production batch documentation showed that everything was perfect! For many companies, the use of electronic records can have unexpected benefits, exposing cheaters. A few examples can illustrate this. Five components (raw materials) must be added in a preestablished order (as stated in the batch record) to a mixer to produce a batch. The review of the electronic batch data revealed that all components (nearly 759 kg) were added within a mere 26-second period, which is nearly

Human Factors 63

impossible. The fact is that instead of adding each component and recording its addition to the bulk tank, manufacturing operators added all five components and then proceeded to document those additions. This is a violation of the batch record instructions and the regulatory requirements. Another example relates to the documentation of activities. A cleaning equipment documentation form was printed at 16:23:37 (as per its footer information). However, this form was used to document activities performed several hours before, which is a violation of the contemporaneous and direct data recording attributes (see Chapter 10 for more details about data integrity).

Building a Culture of Safety On November 29, 1999, the Institute of Medicine (IOM) released a report called To Err is Human: Building a Safer Health System (Kohn et al. 2000). The report urged healthcare organizations to create an environment in which patient safety becomes a top priority. This report stressed the need for leadership by executives and clinicians and for accountability for patient safety by boards of trustees. In particular, it urged that safety principles known in other industries be adopted, such as designing jobs and working conditions for safety; standardizing and simplifying equipment, supplies, and processes; and avoiding reliance on memory. The report stressed medication safety in part because medication errors are so frequent and in part because a number of evidenced-based practices were already known and needed wider adoption. Though at the time of publication the levels of evidence for each category varied, the members of the committee believed that all were important places to begin to improve safety. The committee recognized that some actions could be taken at the national level, as described in the recommendations contained in Parts 1–3 of the report. Yet if patient safety were really to improve, the committee knew it would take far more than reporting requirements and regulations. Creating and sustaining a culture of safety (Part 4) is needed, which would require continuing local action by thousands of healthcare organizations and individuals working in these settings at all levels of authority. Hospital leadership must provide resources and time to improve safety and foster an organizational culture that encourages recognition of and learning from errors. A culture of safety cannot develop without trust, keen observation, and extensive knowledge of care processes at all levels, from those on the front lines of healthcare to those in leadership and management positions. Every group of people develops a “culture”: shared attitudes, beliefs, and ways of behaving. In an organization with a good compliance and quality culture, everyone puts those elements high on the list. Everyone shares accurate perceptions of the risks to compliance and quality, and adopts the

64 Chapter 4

same positive attitudes to compliance and quality. This influences the ways in which individuals in the group handle new events and decisions. The compliance and quality culture of an organization is an important factor in controlling human failures. Some key aspects of an effective and positive quality and compliance culture are depicted in Figure 4.2. Returning to our manufacturing environment, why would an experi enced manufacturing operator not properly document CGMP data he or she generated? What would motivate a 10-year veteran employee to enter a clearly labeled manufacturing area without wearing the required gowning vest? Following is an excerpt from a recent FDA warning letter to a company missing any sign of effective quality and regulatory compliance culture: The garbing (consisting of face masks, hair nets, gloves, and suits without hoods) used at your facility is not adequate to protect the drug product from microbiological contamination during sterile processing. During your demonstration of cleaning and disinfection practices for your aseptic processing room, our investigators observed an operator who wore eye makeup with no eye protection. The operators wore clothing that allowed for exposed skin on their faces and necks. Furthermore, personnel reused these suits on multiple aseptic processing production days, with no cleaning or sterilization between uses. Rather than instructing operators to dispose of used suits after each use, the procedure your firm used at the time of the inspection instructed cleanroom operators to “store the suit in a clean place for next entry.” Your failure to ensure that personnel wear clothing appropriate to protect the drug product from contamination increases the significant risk to product sterility in your aseptic processing operation. High-reliability organizations (HROs) are organizations that operate for long periods of time under difficult conditions and have few major safety incidents. As defined in Weick et al. (1999), 5 “the processes found in the best HROs provide the cognitive infrastructure that enables simultaneous adaptive learning and reliable performance.” The five key elements dis cussed there are summarized below: 1. Preoccupation with failures rather than success. Managers are wary of long periods of success and encourage identification of early signs of failures. 2. Reluctance to simplify interpretations. Steps are taken to create a more complete and detailed understanding of what is going on.

Human Factors 65

Organizational environment • Positive, blame-free attitude toward errors and mistakes • Commitment by top management to involve the workforce • Low levels of job stress • High levels of job satisfaction

Management systems • Availability of resources • Quality prioritized over production and profits • Open two-way communications • High quality of training • Good ways of informing and consulting the workforce about quality

People • Trust in the workforce to manage quality • Management commitment and leadership toward quality • Cooperation between employees • High level of workforce participation in quality

Behavior • Recognizing that everyone has a role to play • Acceptance of personal responsibility for quality • Frequent formal and informal communication about quality

Hardware • Good plant design and maintenance • Good working conditions • Low risk to quality due to adequate engineering systems

Figure 4.2

Aspects associated with an effective quality culture.

66 Chapter 4

3. Sensitivity to operations. Managers are sensitive to the experiences of their frontline operators and discuss their perceptions of the operations with them. 4. Commitment to resilience. Errors will occur, and the organiza tion should have systems to identify, correct, and learn from errors, and be focused on continually developing people’s skills and knowledge. 5. Deference to expertise. Decisions are made by people with the greatest expertise, even if they are low in the organiza tional hierarchy.

MANAGEMENT OF HUMAN FACTORS Almost every manufacturing organization is already addressing at least some aspects of human factors. For example, there may be processes for considering facilities design, effectiveness of training efforts, or adequacy of working instructions and procedures, citing only a few. While some human factors may already be receiving attention, they may require a deeper focus to attain higher control. Any organization will likely benefit from developing a management framework of human factors that provides a structure to apply in the human factors effort. As described elsewhere,6 there are four main approaches or models that can be used to structure the human factors framework: 1. Topic-based approaches 2. Risk-based approaches 3. Maturity model 4. Human factors in design Although human failures in the medical product manufacturing industry can lead to dramatic consequences, human errors and human factors are only superficially covered during specific quality incidents. From time to time, you can see a pharmaceutical or medical device manufacturing plant become engaged in some type of human error reduction program, typically after a huge mistake (in terms of economic and/or regulatory impact) happened. But most of the time, unfortunately, they continue using the com bination of human error (as root cause) and retraining (as corrective action). Within the process industry, the chemical industry is probably the most advanced in terms of human factor concepts, and the reference previously

Human Factors 67

cited is one of the most comprehensive the interested reader will find. Curiously, even though the manufacture of active pharmaceutical ingredients (API) is largely based on chemical processes, you’ll rarely find any API plant where special attention is given to human factors management. Although very necessary, it is probably not wise to expect a wide deployment of effective human factors management programs across the medical product manufacturing industry anytime soon. By now, a more realistic (and achievable) expectation is to retire the human error/retraining combination and, at least, perform an adequate investigation after each human failure (this topic is covered in Chapter 6). Under the concept of adequate investigation of human failures, we can include the following: • Perform an unbiased investigation. • Try to discover the different situational factors involved in each failure. • Develop and implement realistic, risk-based CAPA plans focused on human factors controls, which typically include controls to increase the detectability of the error’s consequences.

PEOPLE ENGAGEMENT In a more general context, people engagement is the emotional commitment that people have to the organization and its goals. This emotional commitment means engaged people actually care about their work and their organization. They do not work for just a paycheck or the next promotion, but work toward the organization’s goals. When we consider engagement with quality, it is an extension of this emotional commitment. A strong, positive quality culture where people agree upon and care deeply about organizational values can improve organizational performance, motivate people, and coordinate their behaviors toward a vision and specific performance goals. Engagement with the quality of products and services and the QMS has many facets. Without genuine alignment, quality remains a disconnected component of the organization. Alignment transforms this situation and shows the high-level value that can be contributed. Engagement with those at operational levels is also key. The actions at that level should serve to provide far more relevance to the activities of people and the requirements of the QMS. Many challenges with people engagement arise from the lack of relevance. There are many examples of the QMS being “those files in the computer” and people being in charge of audits and correcting “people” when things go wrong.

68 Chapter 4

As established in the international standard ISO 10018:2020 Quality management—Guidance for people engagement, an organization can benefit from contributions to the development of its vision and strategy from a wider range of people, not only top management. Clear benefits of people engagement to the organization’s strategic direction and overall success include: • Greater involvement and contributions of the organization’s staff • Greater clarity to personnel in understanding their individual roles in implementing the strategy • Improved competence of staff • Achievement of the organization’s vision and strategy • Improved performance • Better engagement • Higher levels of customer and employee satisfaction • Improved productivity

WORKPLACE INVOLVEMENT: MOTIVATION AND ATTENTION Attention and motivation are often identified as causes for human error. “Inattention to detail” is frequently cited as a root cause or causal factor in human error investigations. The evidence supporting this conclusion is often weak, and determining the role of lack of attention or motivation in a human error is very difficult. Attention and motivation are internal states that cannot be measured directly. During an investigation, real-time, objective measures of attention or motivation cannot be obtained because the investigation necessarily occurs after the fact. As a result, the investigator must rely on self-reports and inference, which are subject to bias and inaccuracies. Attention, sometimes called conscious workspace, is limited. If attention is strongly drawn to one specific thing, it is necessary to withdraw it from other competing concerns. We can only attend to a very small proportion of available sensory data, and unrelated matters can capture our attention. Also, it is important to notice that the attentional focus of the average human being is hard to sustain for more than a few seconds.

Human Factors 69

Regarding attention and the prevalence of the well-known attention deficit hyperactivity disorder (ADHD), this is one of the most common childhood disorders and can continue through adolescence and into adult hood. Symptoms include difficulty staying focused and paying attention, difficulty controlling behavior, and hyperactivity (overactivity). The percentage of children estimated to have ADHD in the United States has changed over time and can vary by how it is measured. The American Psychiatric Association (APA) states in the DSM-5 Diagnostic and Statistical Manual of Mental Disorders (APA 2013) that 5% of U.S. children have ADHD.7 The Anxiety and Depression Association of America (ADAA) estimates that “about 60 percent of children with ADHD in the United States become adults with ADHD; that’s about 4 percent of the adult population, or 8 million adults.”8 Based on those figures, and statistically speaking, every manufacturing site should have its share of employees with ADHD. A concept that must be considered when planning quality inspections during the manufacturing process is the problem known as vigilance decrement. It was first observed during World War II when it was noticed that after a mere 20 minutes at their posts, radar operators became increasingly more likely to miss obvious targets. And this happened even though the radar operators were attentively concentrated on the screen. This problem affects many monitoring tasks where “hits” are relatively few and far between. Quality control inspections in factories are a typical scenario where vigilance decrement occurs. Attributing accidents and quality incidents to workers’ lack of attention, attitudes, or motivations is a common practice. In the absence of compelling evidence that some characteristic of the work environment affected the workers’ actions, investigators may resort to the “default” explanation and conclude that the workers were not paying attention or lacked the motivation to perform their work correctly. We never ask the next question: “Why did the operator lack attention?” Many company programs, policies, and practices are intended to reduce errors associated with attention and motivation. Some programs, such as human factors engineering programs, directly focus on these potential causes and contributors to error. Others may indirectly affect attention and motivation during task performance. Company elements or programs that may be implicated in errors caused by attention or motivation include: • Poor or inadequate human factors design • Lack of accurate and easily accessible procedures • Inadequate performance evaluation process/human resources

70 Chapter 4

• Weakness in supervision • Weakness in problem identification and resolution programs • Inattention to employee concerns In this section, we must also discuss the concept of motivational human error, which is a form of organizational error. We will discuss it using two examples. 1. To motivate employees, some organizations embrace incentive programs where excessively large monetary rewards are given with the intention to promote some desired performance results. When the incentive is high, it may serve to motivate unethical and even unlawful behavior. The high reward fosters self-interest over company interest and can develop corner-cutting behaviors that will affect quality and compliance and even the safety of the processes. On the other hand, reasonable incentives motivate reasonable behavior that is competitive and acceptable, while unreasonably low incentives typically fall short of getting any improvement because workers are not motivated. 2. In the second example, errors arise when enthusiastic managers promote specific goals (most of the time, production quotes). These managers try to encourage or influence the audience using motivational pep talks, such as the “whatever it takes” speech, and think primarily about end-of-year production goals, which are probably tied to management bonuses. The message can be transmitted so effectively that it is taken literally by some operators and supervisors. They make the product goals, but in January there is a huge spike in manufacturing deviations and failure investigations tied to “procedure not followed” and/or “human error.” My only suggestion to those whatever-it-takes managers is to be equally emphatic, intense, and emotional talking about compliance and quality of work, which includes always following procedures and rules.

ADEQUATE SUPERVISION AND STAFFING Nowadays, many organizations are structured so as to have insufficient supervision of jobs. Supervision can and normally does play a key role in the selection of the right workers for the job, scheduling of workers to

Human Factors 71

match the required tasks for the day/week, and generally overseeing task execution to ensure policies and procedures are followed. Supervisors are not always trained in all of their key roles in support of control of human factors, such as detecting issues in workers related to fitness for duty or fatigue, to mention just a few. An adequately staffed organization ensures that personnel are available with the proper qualifications for both planned and foreseeable unplanned activities. Staffing is a dynamic process in which plant management monitors personnel performance to ensure overall organizational performance goals are met or exceeded. The result of an effective staffing process is a balance between personnel costs and the achievement of organizational goals. Issues with staffing may include: • Selecting the right staff for a job • Avoiding staff overload • Rotating staff on tasks that require high concentration, such as quality inspections Each organization requires the proper amount and type of expertise to operate competently under a variety of conditions. The term expertise includes the attributes of talent, effectiveness, knowledge, skills, abilities, and experience necessary to operate and maintain plant systems, structures, and components.

Fatigue and Shift Work Many individuals work shift systems, work at night, or work extended hours. Such working patterns can lead to adverse effects on health, particularly for night shift workers. Reduced levels of performance have been associated with nighttime work, which can also increase the likelihood of human errors. Some people experience severe fatigue at work. This can lead to poorer performance on tasks that require attention, decision-making, or high levels of skill. Too often, fatigue (as happens with multitasking) is seen as a familiar and acceptable part of everyday life. Working long hours may even be accepted in the culture of a workplace as “the thing to do.” Shift scheduling may also affect the likelihood that personnel will show performance decrements due to fatigue. Job performance may be poorer on shift work, especially when working night shifts. Tasks tend to be completed more slowly at night, although this can be balanced by altering the workload. In general, the early hours of the morning (between 2 a.m. and 5 a.m.) present the highest risk for fatigue-related incidents. Sleep loss can lead to lowered levels of alertness. Cumulative sleep loss over

72 Chapter 4

a number of days can result in a “sleep debt,” with greatly reduced levels of productivity and attention. Such sleep loss results not only from working night shifts but also on morning shifts with very early start times and in on-call situations where it may be difficult to plan when to sleep. A change in the assigned shift or a rotating shift schedule will disrupt circadian rhythms and may increase the likelihood of errors. Although companies establish limits for work hours to reduce on-the-job fatigue, there is a lot of room for improvement. It has been shown that 17 hours of work without a break is the same as being legally drunk.

Workload and Staffing Levels Staffing is concerned with having the optimal number and type of personnel to consistently perform at the required standard, and obviously, it is intrinsically tied to workload. Workload relates to the total demand placed on a person as he/she performs a task. It refers to both the quantity of work and the quality or complexity of the work. There is a clear understanding that both excessive and very low workloads lead to human errors. In the first case, the worker is overwhelmed by the activity. In the second case, workers will mentally disengage from the process and be less effective. Workload is one of the most common and controversial issues in human factors because of its relationship to staffing levels. When asked, workers will claim that they are overworked, while management will claim that there is too much unproductive idle time. The right staffing level provides adequate resources to do the necessary tasks, and those tasks should be distributed in a way that keeps each worker near an optimum stress level. The ideal workload should be challenging enough to maintain the worker’s attention and interest without overloading them. Adequate staffing and workload are critical factors in achieving safe and effective performance. These factors can be easily overlooked during times of staff reduction, resulting in situations where the remaining staff try to work harder, quicker, or longer (and probably take shortcuts) to compensate. In addition to delaying some tasks (or simply not doing them), error rates will increase. Sometimes, regulated companies see a spike in human error after the implementation of lean programs because a typical outcome of these programs is some kind of staff reduction. Regulators consider staffing a critical issue in producing safe and effective medical products. For example, the FDA finished pharmaceutical CGMP requires under §211.25 Personnel qualifications9 that “(c) There shall be an adequate number of qualified personnel to perform and supervise the manufacture, processing, packing, or holding of each drug product.”

Human Factors 73

The European pharmaceutical CGMP regulations require under the Principle of Chapter 2: Personnel10 that “The correct manufacture of medicinal products relies upon people. For this reason there must be suffi cient qualified personnel to carry out all the tasks which are the respon sibility of the manufacturer.” It is important to recognize that workload does not maintain a simple linear relationship with performance. Both sustained low workload and high workload may adversely affect performance, as explained by the Yerkes-Dodson law, which establishes that there is an empirical relationship between arousal and performance.11 Psychologists Robert M. Yerkes and John Dillingham Dodson developed this law in 1908, which determines that performance increases with physiological or mental arousal (pressure) but only up to a point. When levels of arousal become too high, performance decreases. The process is often illustrated graphically as a bell-shaped curve that increases and then decreases with higher levels of arousal. Without enough pressure, a worker can become bored, unmotivated, frustrated, and lose situational awareness and alertness, among other unde sirable effects. On the other hand, under too much pressure, the worker will become anxious, fatigued, or emotionally drained. Both situations are considered to be stress states, which make errors more likely. Workload modeling is a complex activity typically undertaken by human factors experts in fields such as air traffic control. Many of the human errors and other frequent problems within the regulated industries can be, in some way or another, related to workload and staffing issues. Possible solutions to these problems include improvements to the following areas: • Work area • Allocation of functions • Work environment • Work equipment • Task design • Resource allocation • Recruitment and selection • Skill, knowledge, and experience One of the most important aspects influencing the physical and mental condition of a person is the degree to which employees are able to recover from the fatigue and stress of work. Work breaks can potentially be disruptive

74 Chapter 4

to the flow of work and impact the completion of a task. However, breaks can serve multiple positive functions for the person being interrupted,12 such as: • Stimulation for the individual performing a job that is routine or boring • Opportunities to engage in activities that are essential to emotional well-being • Sustained job satisfaction • Productivity • Time for the subconscious to process complex problems that require creativity In addition, regular breaks seem to be an effective way to control the accumulation of risk during the industrial shift. A 2006 study by Folkard and Lombardi13 showed the impact of frequent pauses in different shift systems. The results of these studies confirm that breaks, even for a short period of time, are positively reflected from physical and psychic viewpoints on the operator’s work. Proper design of a work/rest schedule that involves frequency, duration, and timing of rest breaks may be effective in improving workers’ comfort, health, and productivity. But today, work breaks are often not taken into proper consideration.

Competence Management The competence of personnel is defined as the ability to perform tasks according to expectations. In other words, competence is the ability of an individual to do a job properly, or what people need to be successful in their jobs. Job competencies are not the same as job tasks. Competencies include all the related knowledge, skills, abilities, and attributes that form a person’s job. This set of context-specific qualities is correlated with superior job performance and can be used as a standard against which to measure job performance, as well as to develop, recruit, and hire employees. ISO 9000:2015, clause 3.10.4 defines competence as “the ability to apply knowledge and skills to achieve intended results.” The benefit of training and development is the increase in competence, which leads to an increase in a person’s ability to create value for the organization and its customers.

Human Factors 75

Training and development are essential factors in people engagement, including the management of industrial/labor relations and formal griev ances. Successful organizations apply the knowledge and skills of their people in a way that creates value for the organization and its customers. Learning is the process of acquiring knowledge or skills through experience, from study, or from instruction. Formal learning will often result in a person receiving qualifications. Learning processes may apply to a person or collectively to an organization. An organization should recog nize that people learn in different ways. Some people are more suited to a classroom environment, while others are inclined to a mentoring environ ment, while still others learn better in a web-based environment. A learning organization focuses on increasing and retaining its knowledge to enhance the organization’s capacity for performance. The organization needs to have competent staff in order to be competitive. To achieve the necessary flow of information and knowledge and become a learning organization, the organization’s processes need to be combined into a management system. An organization’s ability to learn enables it to be more competitive. The benefits of an effective learning process are increased achievement, job satisfaction, and job security. These lead to an improvement in attitude and motivation. Improvements in competencies such as communication lead to improvements in product quality and better customer service. For the organization, this leads to increased competitiveness and profitability. Training is the process by which people learn skills and competencies. Development is the process by which people change and become more competent. The intent is to engage people with the journey toward a personal connection with strategic direction and outcomes. Developing competence is part of organizational design and is critical to performance. Providing the right training and developing the required competencies (technical and nontechnical) have a direct influence on the reliability of human performance. Workers are expected to perform a wider range of tasks with less supervision, thus increasing the need to manage competence effectively. Competence is much more than training. It implies appropriate education, qualifications, training, skills, technical knowledge, experience, physical and mental capabilities, understanding, behavior, and attitudes. Competence management must be an integral part of an organization’s overall management system, and it should apply to all personnel (regular employees, contractors, and so on) from the top to the bottom.

76 Chapter 4

Competence management involves: • Identifying competence requirements • Selecting and recruiting personnel • Assessing competence • Certifying competence • Maintaining, reassessing, and monitoring competence When considering competence needs, organizations should determine the competence required to achieve intended results, at the organizational, team, group, and individual levels, taking into account: • The context of the organization: changes to external/internal issues and the needs and expectations of relevant interested parties significantly affecting competence needs • The potential impact of lack of competence on the processes and the effectiveness of the management system • Recognition of individual levels of competence in relation to the ability to perform specific roles • Opportunities to utilize specific available competence in the design of work-related functions, processes, and systems Competence management should consider all processes, functions, and levels of the organization. The determination of what is needed should begin by evaluating the current levels of competence, including any limitations, and maintaining documented information on specified competence needs as appropriate. The organization should determine its competence needs at planned intervals and in response to changes in its context. Organizations may choose to use external providers to carry out any activities, including an analysis to determine competence needs and assess current competence levels, as covered by this document. If an organization uses an external provider, it should ensure appropriate monitoring and evaluation of the activities. Determining competence needs and organizational competence Competence is directly affected by the context of the organization. When determining the types and levels of competence needed, the organization should consider, for example: • External issues (for example, statutory and regulatory requirements, technological advances)

Human Factors 77

• Internal factors (for example, mission, vision, strategic objectives, values and culture of the organization, range of activities or services, resource availability, organizational knowledge) • Needs and expectations of relevant interested parties (for example, regulators, customers, society) Documented information should be maintained and/or retained as appropriate to support and demonstrate: • Competence needs – Organizational related to the organization – Team (established team or more informal group training achievements) – Individual (qualifications, performance/appraisal outcomes) • Development programs and other initiatives • Evaluation of the impact of competence development and associated actions Team or group competence Within the organization, different teams or groups will need different competencies according to the activities they perform and the intended results. When determining differing team or group needs, the organization should consider: • Leadership • Team or group objectives and intended results • Activities, processes, and systems • Structure of the team or group: hierarchy, number of people, and roles and responsibilities • Team or group culture and the ability to cooperate, collaborate, and cultivate respect Individual competence Individual competence requirements should be determined at all levels of the organization to ensure each different role or function is effective. To determine individual competence, the organization should consider: • External competence requirements • Roles and responsibilities

78 Chapter 4

• Activities related to roles or function • Behaviors (for example, emotional intelligence, ability to remain calm in a crisis, ability to maintain concentration during monotonous work, ability to work cooperatively within a direct team and across the organization or with customers) The concept of competence in relation to training and performance is discussed later in this chapter.

Effective Supervision Providing an effective level of supervision is critical for reliable operations in the manufacturing industries. Supervision can be considered a management function, a control in the organization to manage performance and quality of work. People in supervisory functions need appropriate technical and nontechnical competencies in order to perform effectively. However, they are often insufficiently prepared for this critical role, especially for the nontechnical features. Supervision involves controlling, influencing, and leading a group of people to ensure activities are performed correctly. In addition to the lack of nontechnical competencies (which influence how a supervisor performs the role), one of the biggest problems is where supervisors perform their function. In today’s industrial environment, supervisors spend most of their time far from production lines or work sites. A day full of meetings, from planning production to discussing human resources issues, is the norm for most supervisors. Supervisors need to be on the floor, directly controlling, influencing, and leading the team. The English verb supervise originated from the medieval Latin supervisus, past participle of supervidēre, from Latin super- + vidēre (to see). Therefore, the original definition of the function includes the need for physical contact between the supervisor and supervisees. When supervisors are far from work sites, they cannot supervise effectively. Earlier, we mentioned that technical and nontechnical competencies are necessary to become an effective supervisor or team leader. Typically, in the regulated industry, supervisors are developed and promoted from within, and they have the required technical knowledge and experience. However, very often they lack many of the nontechnical competencies (behaviors) necessary to become a successful supervisor. Some examples of these nontechnical competencies required by today’s supervisors are depicted in Table 4.2.

Human Factors 79

Table 4.2 Examples of nontechnical supervisory competences. Supervisory competency Ensure compliance

Example behaviors • Positive: Monitor performance and check compliance. Emphasize quality and safety over production quotas and schedules. Set an example, and explain to the team that compliance is expected and required at all times. • Negative: Set a poor example by breaking rules or procedures, cutting corners, and so on

Encourage the team

• Positive: Seek the team’s ideas for quality improvements. Act on quality concerns or ideas for improvement. Manage and develop the team. • Negative: Ignore the team’s ideas for improvement or concerns about quality or safety.

Involve the team

• Positive: Support and encourage quality activities. Help the team learn from incidents. Initiate discussions about performance improvements. • Negative: Focus on punitive actions in response to human error. Impose production quotas.

TASK DESIGN A task that is designed with human limits in mind is much more likely to work effectively than one that assumes humans can and will “always” do what is written. The task designer must consider that humans think and remember and factor in prior data and experiences. Take into consideration the complexity of each task. If the task is too complex, then humans can forget their place in the task, fail to understand the goal of each step or substep, or fail to notice when something isn’t going right. Task complexity is a function of many factors, among them: • Number of choices available for making a wrong selection of similar items (such as number of similar switches, number of similar valves, number of similar size and shaped containers) • Number of parallel tasks that may distract the worker from the task at hand (leading to either an initiating event or failure of a protection layer) • Number of staff involved (more staff = higher complexity)

80 Chapter 4

• Number of adjustments or steps necessary to achieve the goal • Amount of mental math required (as a rule, no math in anyone’s head should be required when accomplishing a standardized task) • The amount of judgment required to know when the goal has been accomplished within the task • The amount of feedback in the process required to allow the operator or lab technician to realize (in time) that they made a mistake • Where the task is to be performed—the location may be too noisy, too cold, too hot, too distracting, or too dark

Task Analysis In simple terms, a task analysis is a systematic process used to establish the ordered list of activities that people perform in a task and identify the human factors issues associated with each activity. It is often helpful to look at a particular job, task, or activity in a given work setting. One approach is to understand what the job actually consists of and what risks are involved. This is known as task analysis or activity analysis. Task analysis serves to identify and then break down a job or task into its smaller component elements. It is a valuable asset whether it is formal or informal, systematic or fragmented, or even performed only when problems arise. It also has considerable value in helping to formulate detailed procedures, checklists, warnings, and cautions. To try it, select a particular activity or task and find the answers to the following questions: • Who does this activity? • Exactly what tasks/actions do they do? • What tools or equipment are needed? • What decisions are made? • What information is needed to do the task? • Where does this information come from (people/paper/ computers/displays)? • How is the task learned? • How is competence assessed?

Human Factors 81

• How often is the activity carried out? • Where is the task carried out? • What is the working environment like (temperature, noise, lighting, and so on)? • Are there time constraints on the task? • What can go wrong? • Where is there potential to make errors? • How can failures be detected and corrected? It is important to use multiple sources, including documented procedures and a walk-through or talk-through by an experienced operator, to determine what they actually do when completing the task. You will find it easier if you ask someone who does the activity to walk through it with you. The aim is to find out what really happens, not just what should happen. Working through the questions, you will identify problems that need attention, and you will be able to feed the results of your analysis into a risk assessment. Although human performance reliability is important for total system reliability, not all errors have the same effect on system performance. Thus, another task parameter to be considered might be criticality. The systematic identification and elimination of sources of critical human-induced failures in complex systems is important in the early developmental stages of large systems. Error analysis is another important activity to be carried out. Once it is established that there is enough time for a given task to be accomplished, that task can be evaluated in terms of the probability of errors occurring and the potential severity of their effects. For this analysis, human error can be defined as the failure to perform a task within a designated time and under specified conditions. Each task is analyzed to identify probable operator or lab analyst errors. The probable effects of those human errors are determined and classified, based on available information, into categories such as catastrophic, critical, marginal, or negligible. The error analysis uses probability of errors and error effects information to estimate the extent of quality impact as a consequence of human errors. An important element of a task is the physical area where a person performs the task. Following are some general principles in workplace and workstation design:14 • Design workplaces to accommodate the extremes of the user population.

82 Chapter 4

• Design workplaces to adjust to the characteristics of the user populations (for example, adjustable-height workstations). • Design equipment to be physically accessible. • Avoid holding tensed muscles in fixed positions for long periods of time. • Locate frequently accessed items within easy reach from the working position. • Locate hand work at approximately elbow height, depending on the task. • Minimize highly repetitive tasks. • Consider ergonomics and human factors standards in designing workstations and seating. • Ensure that the working environment (light, temperature, humidity, and so on) is properly designed. Improving Job Design When thinking about improving job satisfaction and reducing stress levels, organizations often focus on the individual worker through the provision of stress management courses and employee assistance programs. Throughout the years, jobs have tended to become increasingly monotonous and controlled. Many jobs are designed to minimize skill requirements, maxi mize management control, and minimize the time required to perform a task. Jobs designed like this have a human cost in terms of negative attitudes toward the job and poor mental and physical health. Many attempts have been made to redesign such work to improve the quality of working life. Such redesign is based on increasing one or more of the following job characteristics: • Variety of tasks or skills (increased use of capabilities) • Autonomy (higher control over when and how tasks are done) • Completeness (whether a job produces an identifiable end result, which makes the task more significant and meaningful for the worker) • Feedback from the job (improved knowledge of the results of the work activities)

Human Factors 83

Other characteristics of work that are also thought to be important for job satisfaction are the amount and quality of social interaction with coworkers, responsibility for technology and output, and the mental demands of a job, including the need to pay close, constant attention to a task, and the need to diagnose and solve problems. Work Redesign Typical ways to redesign jobs include job rotation and horizontal or vertical job enlargement. Employee involvement and participation of staff in job, task, and equipment design and redesign is an important tool in the reduction of both stress levels and safety risks. Individuals are often able to identify and propose solutions to some of the ergonomic problems in their workplace. However, such initiatives need to have the support of management to make them work. Extensive use of participation can create raised expectations for employees that may be difficult to meet, and employee involvement can appear threatening to managers who are used to making their own decisions. Job redesign usually has a positive impact on job satisfaction, motiva tion, employee mental health, and performance, as long as it is not restricted to just increasing job variety. Such redesign usually occurs in combination with other changes, such as to staffing levels, pay rates, or management style, which are likely to also affect these outcomes. Error-Proof Operation Design Considerations When incorporated into the design, error-proofing mechanisms are very powerful in improving system reliability. These mechanisms, by design, will not allow a user to execute an incorrect operation. For example, if a user enters a value that is outside the accepted range of operation, the control logic will not accept the value. Chapter 9 covers this topic.

Human–Computer Interaction The human-computer interaction (commonly referred to as HCI) deals with how people interact with computer systems. It covers the design and use of computer technology, focusing on the interfaces between people (users) and computers. Humans interact with computers in many ways; the interface between humans and computers is crucial to facilitating this interaction. Poorly designed human-machine interfaces can lead to many unexpected problems. A classic example of this is the Three Mile Island nuclear meltdown accident (see Chapter 1), where investigations concluded that the design of the human–machine interface was at least partly responsible for the disaster. Similarly, accidents in aviation have resulted

84 Chapter 4

from manufacturers’ decisions to use nonstandard flight instruments; even though the new designs were proposed to be superior in basic human– machine interaction, pilots had already ingrained the “standard” layout, and thus the conceptually good idea actually had undesirable results. The key aspect of HCI for manufacturing and processing plants is the human interaction with the control systems that control plant processes. Human failures in interacting with those control systems may result in loss of control and serious quality and/or safety incidents. Human interactions with control systems involve several kinds of tasks, such as monitoring, detection, data entry, diagnosis, and control. All these tasks are subject to different types of human errors, including: • Incorrectly reading information • Incorrectly diagnosing situations • Not responding to system prompts or alarms • Incorrectly responding to system prompts or alarms • Incorrectly entering data or not entering data when required • Incorrectly issuing commands or instructions, or not using them when required HCI differs from human factors and ergonomics in that HCI focuses more on users working specifically with computers rather than other kinds of machines or designed artifacts. HCI also focuses on how to implement the computer software and hardware mechanisms to support human–computer interaction. Thus, human factors is a broader term; HCI could be described as the “human factors” of computers. Display Design Displays are human-made artifacts designed to support the perception of relevant system variables and to facilitate further processing of that information. Before a display is designed, the task that the display is intended to support must be defined (for example, navigating, controlling, decisionmaking, learning, entertaining, and so on). A user or operator must be able to process whatever information a system generates and displays; therefore, the information must be displayed according to principles in a manner that will support perception, situational awareness, and understanding. Thirteen Principles of Display Design Christopher Wickens et al. defined 13 principles of display design in their book An Introduction to Human Factors Engineering.15

Human Factors 85

These principles of human perception and information processing can be utilized to create an effective display design. A reduction in errors, a reduction in required training time, an increase in efficiency, and an increase in user satisfaction are a few of the many potential benefits that can be achieved through the utilization of these principles. Certain principles may not be applicable to different displays or situations. Some principles may seem to be conflicting, and there is no simple solution to say that one principle is more important than another. The principles may be tailored to a specific design or situation. Striking a functional balance between the principles is critical for an effective design. Perceptual Principles 1. Make displays legible (or audible). A display’s legibility is critical and necessary for designing a usable display. If the characters or objects being displayed cannot be discerned, then the operator cannot effectively make use of them. 2. Avoid absolute judgment limits. Do not ask the user to determine the level of a variable on the basis of a single sensory variable (for example, color, size, or loudness). These sensory variables can contain many possible levels. 3. Top-down processing. Signals are likely perceived and interpreted in accordance with what is expected based on a user’s experience. If a signal is presented contrary to the user’s expectation, more physical evidence of that signal may need to be presented to assure that it is understood correctly. 4. Redundancy gain. If a signal is presented more than once, it is more likely that it will be understood correctly. This can be done by presenting the signal in alternative physical forms (for example, color and shape, voice and print, and so on), as redundancy does not imply repetition. A traffic light is a good example of redundancy, as color and position are redundant. 5. Similarity causes confusion: use distinguishable elements. Signals that appear to be similar will likely be confused. The ratio of similar features to different features causes signals to be similar. For example, A423B9 is more similar to A423B8 than 92 is to 93. Unnecessarily similar features should be removed, and dissimilar features should be highlighted.

86 Chapter 4

Mental Model Principles 6. Principle of pictorial realism. A display should look like the variable it represents (for example, higher temperature on a thermometer shown as a higher vertical level). If there are multiple elements, they can be configured in a manner that looks like it would in the represented environment. 7. Principle of the moving part. Moving elements should move in a pattern and direction compatible with the user’s mental model of how they actually move in the system. For example, the moving element on an altimeter should move upward with increasing altitude. Principles Based on Attention 8. Minimizing information access cost or interaction cost. When the user’s attention is diverted from one location to another to access necessary information, there is an associated cost in time or effort. A display design should minimize this cost by allowing for frequently accessed sources to be located at the nearest possible position. However, adequate legibility should not be sacrificed to reduce this cost. 9. Proximity compatibility principle. Divided attention between two information sources may be necessary for the completion of one task. These sources must be mentally integrated and are defined to have close mental proximity. Information access costs should be low, which can be achieved in many ways (for example, proximity, linkage by common colors, patterns, shapes, and so on). However, close display proximity can be harmful by causing too much clutter. 10. Principle of multiple resources. A user can more easily process information across different resources. For example, visual and auditory information can be presented simultaneously rather than presenting all visual or all auditory information. Memory Principles 11. Replace memory with visual information: knowledge in the world. Users should not need to retain important information solely in working memory or retrieve it from long-term memory. A menu, checklist, or another display can aid users by easing the use of his/ her memory. However, the use of memory may sometimes benefit users by eliminating the need to reference some type of knowledge in the world (for example, an expert computer operator would

Human Factors 87

rather use direct commands from memory than refer to a manual). The use of knowledge in users’s head and knowledge in the world must be balanced for an effective design. 12. Principle of predictive aiding. Proactive actions are usually more effective than reactive actions. A display should attempt to eliminate resource-demanding cognitive tasks and replace them with simpler perceptual tasks to reduce the use of the user’s mental resources. This will allow the user to focus on current conditions and to consider possible future conditions. An example of a predictive aid is a road sign displaying the distance to a certain destination. 13. Principle of consistency. Old habits from other displays will easily transfer to support processing of new displays if they are designed consistently. A user’s long-term memory will trigger actions that are expected to be appropriate. A designer must accept this fact and utilize consistency between different displays.

PROCEDURES AND FORMS The interaction of people with documents involves human factors issues that can have a major impact on the quality of the work. Procedures not followed, diagrams that are misleading, or records that are not completed properly—all of these can increase the likelihood of product failures and process deviations in the manufacturing industry. Procedures are a core part of every manufacturing operation. They provide rules to follow and approved operational practices. In the field of medical products manufacturing, there are regulations requiring written procedures for production and process control, quality, and so on. Following is an example taken from U.S. FDA CGMP for finished pharmaceuticals (22 CFR 211).16 Specifically, subpart F, Production and Process Controls, establishes that: §211.100 Written procedures; deviations. (a) There shall be written procedures for production and process control designed to assure that the drug products have the identity, strength, quality, and purity they purport or are represented to possess. Such procedures shall include all requirements in this subpart. These written procedures, including any changes, shall be drafted, reviewed, and approved by the appropriate organizational units and reviewed and approved by the quality control unit.

88 Chapter 4

(b) Written production and process control procedures shall be followed in the execution of the various production and process control functions and shall be documented at the time of performance. Any deviation from the written procedures shall be recorded and justified. Some studies of accidents in a major petrochemical company show that 60% of incidents related to human performance were due to ineffective, incorrect, or missing procedures.17 When developing documentation, it is important to consider how humans sense and perceive information, and how they access information. Among the key issues for document design are: • Medium used in the document. How users access the information. • Navigation of the document. How users move around the documentation: paper, computer screen, and so on. • Content of information. What information is available to users. • Presentation of information. How information looks on a page or screen. Documents need to be designed from the perspective of users. Documentation developers must understand document users, their needs, and their expectations, and work with them during the process of developing docu mentation. Documents should be tailored to their users wherever possible. If only one version of a document is generated, it should be tailored to the lowest common denominator users. The following is a breakdown of issues that should be noted during document design. Medium • Consider user needs. • Match the medium to the information. Navigation • Provide navigation clues that are clear, recognizable, and consistent, such as page numbers, running headers/footers, tables of content, references, and so on. • Use no more than three layers of information (no, it is not OK to have section number 6.2.1.2.1.4.5.6). • Avoid circular references.

Human Factors 89

Content • Provide sufficient and accurate information. • Identify clearly what the user must do. • Use imperative tone when providing instructions. • Provide only the information that is really needed by the user. • Use appendices for supporting information and details. Presentation • Use the primary language of the users. • Use appropriate terminology, based on the user’s level. The Pitfall of Bilingual Documents In many low-wage countries, manufacturing plants from big companies maintain certain documentation in English for regulatory or certification (for example, ISO) purposes, among others. Some documents are also maintained in bilingual format, and in this situation, there is a good way and a bad way to accomplish this. Having successive paragraphs in each language is a real nightmare. I strongly recommend having the complete document in one language and after that including the translation of the complete document in the second language, if needed.

How to Design and Write Effective Procedures and Forms to Minimize Human Errors Many procedures, working instructions, and reference documents do not follow best practices for controlling human error, and so the written process actually contributes to increasing error rates. Many organizations have lengthy procedures that are poorly written and disorganized. Having deficient procedures is one of the most prevalent problems in manufacturing industries since procedures have not traditionally been developed from the perspective of optimizing human factors. Instead, procedures have traditionally been developed to meet a compliance requirement to have written procedures. For procedures to be effective, they must be used. Organizations must also address the reasons that cause workers not to use the written procedures. Examples of problems with procedures that prevent their use include: • Procedures are difficult to use in the work environment. • Procedures are difficult to understand.

90 Chapter 4

• Procedures are incorrect or incomplete (users need more information than the procedures contain). • Procedures are formatted poorly. Writing Better Procedures Written procedures play a critical role in maintaining consistency and in ensuring that everyone has the same basic level of information and instruc tions. They are a key element of the QMS and an important training tool. However, poor procedures can be a reason for people to not complete required actions. In addition to being technically accurate, procedures need to be well written, usable, and up-to-date. Ask yourself: • Are procedures accessible? • Do staff members actually follow them? • Are procedures written in a way that makes them easy to understand and follow? • Do they reflect the tasks as they are actually carried out? • Do they include all required information and/or instructions? • Are they current and reviewed periodically? Procedures must: • Be accurate and complete • Be clear and concise, with an appropriate level of detail • Be accessible, current, and up to date • Be supported by training • Use adequate and comprehensible language • Use consistent terminology • Reflect how tasks are actually carried out • Promote ownership by users Start by collecting information about the task and the users. To do this, you could carry out an activity analysis. Here are some issues to think about: • Consider both the difficulty and importance of the task to be documented. • Find out how often the task is carried out.

Human Factors 91

• Think about who will use the procedure and the level of information they will need. • Establish the skills, experience level, past training, and needs of the users of the procedure. • Look at the level of training needed to support the understanding and effective use of the procedure. • Try to involve users in the preparation and maintenance of the procedure. Procedures can appear in different forms, for example, as printed text documents, electronically, or as quick job aids. It is important that users know where the procedures can be found and that this location is convenient for them. If it takes too long to find a procedure, users will be more reluctant to use it. Writing style is a very important factor. As a general guideline, keep sentences short and avoid complex sentence structure. This will make the procedure easier to read and understand. Try to write the required actions that users need to do in positive, active sentences. For example, “Add component A and then mix for 10 minutes.” This is easier to follow than the more complicated “After adding component A, start to mix” or “Do not mix until component A has been added.” Write actions in the order in which they need to be carried out. Divide longer procedures into shorter pieces. This helps users to go back to a particular step if they are interrupted or if the task takes some time to carry out. AVOID USING ALL CAPITAL LETTERS FOR THE TEXT. Research shows that this is slower and more difficult to read than lowercase text. Decide how features such as capitals, bold, italics, and underlining will be used. Overuse of these features is very distracting for users. Avoid using very small fonts (for example, eight points or smaller), as they are very difficult for users to read. Make good use of open space in the printed text. Cluttered pages are more difficult to read. Although the procedure may have more pages, providing spaces between steps on the page will make it more usable. Try to use the same format and structure for all procedures. An inconsistent format could confuse the user. Why Procedures Fail Procedures fail when they are not used. They go unused because they are: • Missing or not accessible • Inaccurate or incomplete

92 Chapter 4

• Poorly written • Poorly presented Hard Copies or Electronic Documentation Companies are more frequently using electronic media for documentation. Hard (paper) copies have distinct advantages, including navigation schemes and presentation formats that are universally understood. Also, they are portable and can be easily annotated. On the other hand, electronic documentation is cheaper to produce, distribute, and maintain, is very easy to update, and offers very efficient navigation techniques. In addition, manufacturing companies typically store a large quantity of documents, including manufacturing records, for very long periods of time. Therefore, the cost of storage for electronic media is very low when compared to hard copies. In order to convert hard-copy information into electronic media, it needs to be restructured and probably rewritten. Some recommendations for electronic documentation in manufacturing and process industries are included in Figure 4.3. • Use more white space than for hard-copy information. • Provide users with the most direct access to information that is possible. • Use lists, tables, and graphs to gain more white space and group the information. • Use text and graphic hyperlinks to improve access to information. • Avoid overuse of hyperlinks. • Provide for keyword searches. • Provide interactive tutorials. • Use color but not more than six colors on one panel, including colors of background, text, and hyperlinks. • Be consistent in the use of color coding. • Design to the smallest screen size employed by users. • Ensure users do not have to scroll to see all of a table or graphic. • Use fonts designed for electronic media.

Figure 4.3

Best practices for electronic documentation.

Human Factors 93

The Best Fonts to Use in Print, Online, and in Email In addition to the considerations listed in Figure 4.3, transforming a paper document into a computer-based one also requires the selection of the appropriate font styles and sizes to ensure readability and comprehension of the electronic information.18 There are two types of fonts: serif and sansserif. Serif fonts have little “feet” and embellishments at the ends of the strokes in each letter (serifs), making them more distinct and recognizable. Popular serif fonts are Times New Roman, Palatino, Georgia, Courier, Bookman, and Garamond. Nearly all books, newspapers, and magazines use a serif font. It’s popularly accepted that, in print, serif fonts are easier to read. The idea is that the serifs actually make the letters flow together, and subsequently, easier on the eyes. As the name states, sans-serif fonts are fonts without serifs. It’s been said that serif fonts are for “readability,” while sans-serif fonts are for “legibility.” Which is why, in print, sans-serif fonts are often used as the headline font and serif fonts are used for the body text. Some popular sansserif fonts are Helvetica, Arial, Calibri, Century Gothic, and Verdana. Readability is more than simply legibility, which is a measure of how easily a reader can distinguish individual letters or characters from each other. Readability refers to how you lay out your selected font on printed or electronic media. You can make your words easy to read by choosing a legible font and then arranging that font in a readable way. For example, white space in between letters and lines, as well as paragraph breaks and images or offset quotes, are all techniques used to make fonts readable. There are a variety of methods available to assess the readability of written materials. The Flesch reading ease (FRE) score, the earliest of the commonly used tools to assess readability, gives a score on a scale ranging from 0–100, with 0 being unreadable and 100 being the most readable.19 Table 4.3 depicts the best selection of fonts for different media. Table 4.3 The best fonts to use for different types of media. Best fonts for print

Best fonts for online

Best fonts for email

Serif

Sans serif

Sans serif

Garamond 12 points

Arial 12 points and larger

Arial 12 points for body text

Times New Roman 11 or 12 points

Verdana 10 points

Verdana 9 or 10 points for body text

Use dark color against a white background

Georgia 10 or 12 points

Verdana bold 12 or 14 points for headlines

94 Chapter 4

Pictures on the Left, Text on the Right This is consistent with our brain perception. For visual stimuli, the left sight field is analyzed in the right hemisphere of our brain, and the right sight field in the left hemisphere. Also, it is known that the right side of our brain is responsible for the reception of visual information, while the left side is responsible for analysis. Therefore, diagrams, pictures, drawings, and so on should be placed on the left side of process instructions to transmit information to the right hemisphere. Likewise, the text of the process’s instructions should be placed on the right side of the document. Implement Visual Aids and Controls A picture is worth a thousand words says the proverb. Eighty-three percent of the information from our surroundings is acquired by sight; therefore, it makes a lot of sense to implement visual rather than numeric controls. For example, a pressure check control on a manometer could have the specification as a numeric range (such as 0–2.5 bar), or the proper range could be marked in green. The operator does not have to memorize the correct range, just verify that the gauge is in the green area. Moreover, you could implement the traffic light principle and the manometer can be color-coded with green, amber, and red zones representing respectively the safe, caution, and dangerous areas. Visual aids can be helpful to convey large amounts of information more clearly than text. Consider the following: • Use diagrams or flowcharts to describe the process. • Be careful when using photographs with regard to picture quality and image detail. • Use color sparingly and consistently. Never rely solely on color due to the use of black and white printing and because a significant part of the population (especially males) is colorblind.

Who Should Write Procedures and Forms? In most organizations, almost any technical person, administrator, or person with experience in writing or revising procedures is given the task. Although not always perhaps the most suitable person for the job, even someone with some experience is helpful in documenting procedures. Procedures are written for those who have to execute the tasks described in the procedure. They normally can detect how good, and especially how

Human Factors 95

bad, the procedure is. So, the question to answer is: Who is the best person to write this procedure? Is it the expert of the process, the person who writes well, a team of people, the engineer, the technician, the supervisor? Who should write it? It may depend on the situation. Whoever writes a procedure must be a person who asks questions, is logical, likes to write, and writes well. The preparation, background, and so on does not define this. It is a combination of factors. Once the writers have been selected, they should be provided with training on the human factors principles.

Before You Start Writing Here are five basic steps to consider before writing a procedure or work instruction. 1. The audience. This procedure you are about to write is intended for some specific audience. Who will use the procedure? How much do they know about this subject? Will they understand the technical issues? Will you expect them to execute this procedure correctly from the first time? How much training will they need to execute consistently? The questions will guide you on what to do next. Remember that the procedure tells you how to do a task. Each step in a procedure can be supplemented by training that helps each audience understand why they are doing each step the way it is defined. 2. Process map. Once you know your audience, you need to think about the process you are going to write. If it is something new, you have the option of defining it from zero. If you are rewriting or modifying a procedure, don’t even look at the existing procedure before thinking about and writing down the major steps of the process you are about to write. A good recommendation is to think about what needs to be done before, during, and after the process. This segregation helps to make sure no loose ends are left in the procedure. In the pharmaceutical industry, “before” implies what documents, preparation, materials, verifications, and personnel are needed. When you consider the actions required for the “during” phase of the process, just think about what instructions you will provide for executing the task in the procedure. This is the how. The “after” on a process map considers all actions to be done after you complete the process. This covers documentation, cleaning, and any activities required to leave the area ready for the next activity.

96 Chapter 4

3. Your documentation system. Almost all procedures require some sort of documentation, particularly in the regulated industry. What needs to be documented before, during, and after the process? Before starting to write, think about what documentation will be completed at each step of the process. Clarify what information is to be collected and in what media. Define it on your map. If you are in a regulated environment, you may need to document equipment use, environmental conditions verification, materials used, names of people involved in the process, timing of actions, second verifications, and so on. 4. Why is this procedure needed? Do not write until you are sure the procedure is needed. Is there a similar procedure? Can we make use of another procedure with minimum modifications? 5. The outline. Once the process map is completed, you are ready to prepare your procedure outline or table of contents. This is where you use the suggested format defined by your organization and set up the headings of the sections on your procedure. This involves indicating who will be involved (audience). Normally, several positions (job functions) are involved in the execution of a procedure.

Sections in Your Procedures Almost every company has a different procedure format. The sections mentioned below are the most common sections in the procedures for the regulated industries. We will discuss what is recommended under each section. Title This includes the heading and brief indication of what is included in the procedure. This title will be descriptive, precise, and concise, indicating exactly what is included. Since the procedure always indicates how to do a task, the language indicates what will be done and to what. A poor example would be the “Batch Record.” This title does not indicate what will be done to the batch record. However, a title indicating “Preparation, Revision, and Approval of a Master Batch Record” indicates what will be done (prepara tion, revision, and approval) and to what (the master batch record). Table 4.4 shows some good and poor titles for facilitating understanding.

Human Factors 97

Table 4.4 Good and poor procedure titles for facilitating understanding. Poor titles

Better titles

Standard Operating Procedures

Writing Standard Operating Procedures

Nonconforming Materials

Handling Nonconforming Materials

XXX Equipment

Operation of XXX Equipment

Table of Contents This section is recommended for procedures that have more than four or five pages. If used, it will help the user to find what he/she is looking for quickly. For this purpose, if a table of contents is used, keep section numbers, section names, and page numbers correct. A table of contents will be brief, included on one page if possible, and hyperlinked to page numbers in the electronic version. An example of a table of contents is shown in Figure 4.4. Section number

Section heading . . . . . . . . . . . . . . . . . . Page number

1.0

Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2.0

Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

3.0

Process flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

4.0

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

5.0

Key terms/definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

6.0

Safety precautions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

7.0

Responsibilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

8.0

Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

9.0

Attachments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

10.0

Revision history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Figure 4.4

Table of contents example.

98 Chapter 4

Purpose A well-written purpose reiterates and expands a well-written title. It will answer two questions: “Why was this procedure written?” and “What information does it contain?” There is normally only one purpose; you could probably include two purposes, but more than two implies that you need to separate your content into more than one procedure. An example of a good purpose might be “To describe the specific steps to follow for the selection and qualification of a supplier.” The Scope This section is intended to help the reader focus on where specifically the procedure is applicable. The section starts with the words “This procedure is applicable….” It answers the question “What and who is controlled by this procedure?” It also answers the question “Where is this procedure effective?” An example of a scope might be “This procedure applies to the line clearance of packaging activities at ABC Company.” References This section considers the listing of other documents that are relevant for the execution of the procedure. You need to be very careful with what you include here. Too many references are an indication of a complicated procedure, one in which the user must go back and forth looking for information elsewhere to be able to execute the action steps. You could list here forms and other procedures needed to execute. You could list forms and other procedures needed to execute the procedure here; you do not need to list all policies and parent documents used for reference when generating the procedure. Some companies, however, like to maintain a list of all policies that require the existence of any given procedure. This could help to avoid the elimination of a procedure or section of the procedure without prior verification with the policy. Key Terms/Definitions This section describes the most significant words and terms associated with its content. It is used to facilitate the understanding of terms that may be of common use in a company but need definition for a new hire or clarification for a user. If used, this section needs to be brief and should really support and facilitate understanding. The following rules apply to this section: 1. Never use the term being defined in the definition. 2. Never refer to the company glossary.

Human Factors 99

3. Never define what is obvious. 4. Only define the terms used in the procedure. Following is a real example of a poor definition (this one violates several recommendations shown above): Master Batch Record: Master Batch Record (MBR) represents the documents included in the process of production, starting with the manufacturing process Master Manufacturing Records (MMRs) and finishing with the Master Packaging Records (MPRs) or Generic Master Production Records (GMPRs). The Master Manufacturing Records (MMRs) can be divided into seg ments or parts (for example, Weight/Formula, Granulation, Mixing, Encapsulation, Inspection, Packaging). Each segment can be developed, revised, approved, or used independently. Safety Precautions Procedures need to: • Include a separate section at the start of the document to highlight hazards and precautions when applicable • Use consistent colors and/or recognizable symbols to highlight hazards • Embed the hazard or precaution information at the relevant process step • Include a brief description of what can happen and the consequences of ignoring the precaution, thus encouraging compliance Responsibilities In this section, you must indicate very briefly who is involved and what responsibilities each person has. No specific task will be defined here. This is not the procedure. It is a section to quickly identify who has responsibilities and a brief description of the tasks that the person executes. The Procedure Finally, this section describes the action steps needed to complete the tasks for which the standard operating procedure (SOP) is being written. Indicate the materials and special equipment at their point of usage. Also highlight precautions and warnings before the point they are needed. This section follows a logical sequence and can be done with tables, diagrams, and

100 Chapter 4

pictures. It could be written in different formats. For example, this section may include a few steps that could be grouped together, actions to be taken, what happens when you execute each step, a photo to aid understanding, and a reference document that is needed to complete execution. To minimize human errors, it is recommended not to have more than three hierarchical steps for each section, such as: Section 1 Subsection 1.1 Step 1.1.1 Use bullets if the elements on the list can be performed in any order. Figure 4.5 shows an example of this procedure. Section

Action

Effect of action

Picture or diagram

Start equipment

Press the green button on the left of the equipment.

The equipment will start to operate.

Picture of location of green button.

Press the handle on the right of the equipment.

The equipment bowl will lower.

Picture of handle on right.

Load materials on the bowl as indicated on product batch record.

Materials are loaded to start operation.

Picture of bowl.

Document the lot number of each material loaded.

Generate evidence on product batch record.

Form: product batch record.

Load / document materials

Figure 4.5 Example procedure.

Attachments These are added at the end of the procedure and may include tables, forms, formulas, flowcharts, or examples to help sustain the procedure. To minimize human errors, when using an attachment, be careful that the user is not forced to look back and forth on the procedure to be able to execute. Whenever possible, include the information (table, formula, or example) at the point of use. Too many attachments cause confusion and may be detrimental for the execution of the procedure.

Human Factors 101

Revision History This section should be clear and brief and scrupulously maintained. It is used to help track the changes performed and the main reason for each change. The best practice is to include in each version only the history of the most recent (latest) modification. Once, I reviewed a document which the information occupied five pages, and the history of the 36 previous editions took 14 additional pages.

Procedure Writing Principles to Minimize Human Error There is a lot to consider when writing a good procedure. The five principles mentioned below represent my experience in writing and correcting procedures, especially when focusing on human error minimization. If you are writing from scratch with a blank document, first prepare your format as defined in the process map section. If you are correcting an existing procedure, do the same, but remember that you have something written and your task in this case is to improve what is already written. These principles can be applied in any sequence you wish; however, they are presented here in the sequence I find easiest to complete. The five principles that will be discussed are: 1. Clarity 2. Readability 3. Coherence 4. Economy 5. Correctness Clarity Clarity on a procedure means freedom from ambiguity, a style of writing that is clean and easy to understand. When writing a procedure, nothing is more important than giving clear and precise instruction, which can be accomplished by the “rule of the king.” Kings give commands and are specific in what they want their subjects to do. Instructions start with a verb, specifically an active verb. To understand the concept of active verbs, let’s discuss and refresh the concept of sentence types: • Declarative sentences make a statement: “The label is green.” • Imperative statements make a command: “Place a label.” • Interrogative sentences ask a question: “Who will place the label?”

102 Chapter 4

For procedures, we recommend using imperative statements. These statements start with a verb, communicate clearly, and are written in active voice. Passive verbs and statements are a weaker method of expressing action and can introduce a great deal of ambiguity into a procedure. Passive verbs are formed with the verb “to be.” If you see them in your procedures, consider rewriting them into active, imperative voice. By starting the action step with a verb, the user immediately recognizes the action to be done, which results in better attention and comprehension of what is expected. For example, a poor action step for a procedure may read like this: “This action step requires pieces X and Y and screw Z; immediately after locating piece X and B, insert Y into X while pressing, and turning the tall handle to the left, and tightening the screw.” This run-on sentence has way too many verbs and introduces erroneous information. To provide clarity, each action must have only one verb. Notice the difference in impact from the following easy-to-read checklist format: 1. Locate pieces X and Y 2. Locate screw Z 3. Insert Y into X 4. Press Y into X 5. Turn the tall handle to the left 6. Tighten the screw Clarity also implies freedom from ambiguity. Sometimes, we write a procedure having in mind what we know about the process. However, we need to assume nothing. A statement such as the one following starts with an active verb; it can be segregated in three statements, but the portion related to “make sure…” has a high level of ambiguity. “Inspect the area and equipment and make sure it is in adequate condition for production.” Any “make sure” needs to have a criterion defined. Do we use a checklist? What is the meaning of “adequate conditions for production”? To minimize confusion and human error, we need to be clearer. The instruction above will be converted into three specific commands: 1. Inspect the area. 2. Inspect the equipment. 3. Evaluate whether both are visibly clean.

Human Factors 103

Readability This refers to using standard terms in the native language, being positive, emphasizing important information, and properly using quantitative information and conditional statements. Complex words may look nice for high-level people, but people who normally execute need common words, something easy to understand. Avoid fancy words and you will minimize errors. Negative sentences are harder to understand, and they are typically wordier. When writing procedures, you must focus on what the reader needs to do instead of what you want them not to do. Readers find it easier to understand positive statements. As an example, note that this negative statement is easier to read when written positively: • “Do not press the start button until the door is not open” or “Press the start button when the door is closed.” • “Do not overfill the tank” or “Fill the tank to the mark labeled as maximum.” • “Do not operate the equipment without safety guards” or “Place safety guards before operating the equipment.” There are situations when negative statements can be emphatic. A concise, nonambiguous negative statement can communicate effectively. This is particularly true for safety issues, especially in “caution” and “warning” statements. Use “not” instead of negating prefixes such as “in-” or “un-.” Emphasizing Important Information When writing procedures, there are words or phrases we like to emphasize. For this you can use capitals, bold letters, italics, colors, highlighting, or any other way you are allowed based on the software you have. The important thing to consider is that we cannot abuse the use of any emphasis, because eventually everything may look the same, and the highlight loses its purpose. Also, it is necessary to be consistent throughout the document. A phrase like “NO smoking allowed” presents several interesting construction clues. It is a negative statement, but on occasion negative statements have a place—particularly when dealing with safety or health issues. The word “NO” is written both in capital letters and bold type. This is a great use of a NO. You want to highlight that it is not allowed, so you use double highlighting.

104 Chapter 4

Quantitative Information

I have seen many deviations associated with the following kind of instructions: “Keep the room temperature at 72 ºF ±5 °F” Consider a change to the following format:

“Keep the room temperature between 67 ºF and 77 ºF.”

Conditional Statements

Conditions are one of the highest contributors to human errors when procedures are written. These are statements that use the terms “if…then,” “when…then,” or “if…and…or…then.” Conditions could get complicated, and the way you write them could guide the user to make mistakes. For this reason, the better you write the conditions, the less interpretation is left to the imagination. First, let’s emphasize that conditions are the only statements in your procedure where you are free to start with something different than an active verb. These commands guide you to detect a condition, and if the subject condition applies, then you execute a command. This means you must write the word then because what follows is the command for action. Let’s see some good and poorly written examples: “The supervisor must be notified whenever the pH is less than 6 and greater than 10.” Is this a conditional statement? Yes! Is this statement written as a command with an active verb? No! Can a number be less than 6 and greater than 10? Never! Let’s see this statement written as a conditional statement: “If the pH is less than 6 or greater than 10, then notify a supervisor.” It is recommended to highlight the words if, when, and then to make it very evident that this is a conditional statement. Table 4.5 shows the suggested format for writing conditional statements in procedures. Table 4.5 Example of if/and/then instruction. If . . .

Then . . .

If during environmental monitoring in the filling process, there is an excursion . . .

Stop the filling process and segregate the filled vials without stoppers.

and the HVAC system is operational and stable.

Human Factors 105

The more complex the conditional statements are, the easier it is to follow them in a table format, as shown in Table 4.6. Table 4.6 Example of complex if/and/then instruction. If

And

Then

pH higher than 7.5

Conductivity lower than 1850

Start mixing

pH lower than 7.5

Conductivity lower than 1850

Add XYZ

pH higher than 7.5

Conductivity higher than 1850

Add 2 gallons of water

pH lower than 7.5

Conductivity higher than 1850

Notify supervisor

Note: Be careful with confusion derived from the use of < or > symbols. It’s a lot easier to interpret the use of words such as higher than/lower than.

In summary, this principle assists you in making your procedure easy to read and understand. You need to look for the careful use of negative and conditional statements. Say what you want the user to do, not what you want them not to do. Also, be very specific on segregating the condition from the action. Coherence This is a property of well-written texts, and in a procedure, coherence means a way by which all steps take the user into the execution of something that not only makes sense but also is consistent, logical, and integrated. The coherence principle also focuses on anything we incorporate in procedures that may divert our attention from what we are doing; this includes “notes,” “warnings,” “references,” and so on. If notes must be used in procedures, they need to be written immediately where needed. Notes can be used to expand an explanation, present a definition, or present an example. They are an excellent way to clarify safety precautions. But do not abuse the use of notes. My recommendation is to use no more than one note per action step. Referencing and branching are the principal reasons behind human errors in procedures. Let’s analyze these two factors that make procedures lose their coherence. These two concepts guide us to move into another place out of the current step in the procedure. The “reference” guides the reader inside the same procedure or into other procedures or documents but always brings us back to the departure

106 Chapter 4

point. One example of a reference might be “Execute the cleaning operation following Procedure 123.” Here you are expected to pull out Procedure 123 and execute the operation, then continue with the next step. There are specific words that indicate that you have a reference. These are “following,” “in accordance with,” and “refer to.” Be very careful with references and the way you write them. The users must not feel they are constantly going back and forth from the procedure to the reference to get the information they need to execute a task. You write to make the job easier for the user, never to make it easier for you. Thus, a procedure might seem short and easy to write, but the user may find that to execute he/she needs 10 or 20 additional SOPs or documents outside the procedure. Avoid this! Branching is an extremely dangerous term when it refers to procedures. This is when you ask the user to jump steps—to go outside to another procedure to execute some steps only—usually when you want to avoid duplicating something from another procedure in your procedure. Once again, consider your SOP users, and make it easy for them, not for you. The branch asks the user to go to another step or document, but the user is not expected to continue the same step. Example: “If major cleaning applies, go to section 3, step 13.” What you need to be very careful about is that if you add another step, then this branch may become incorrect. Section 3 may later be section 4, or step 13 may become step 15. Imagine your user executing the wrong steps. For this reason, we do not recommend referencing any specific step number. There are very specific words that indicate a branch: “go to…” “proceed to…” “return to…” “repeat…” Referencing and branching create errors and require that the procedure writer pay attention in future revisions to guarantee that: • References are still valid. • Procedures are still logical. • The go tos are still valid. • Users are clear about the next action they need to do. Be careful that the next SOP in which you forward the user does not forward him/her back (this is called a circular reference or branch). A good way to enhance coherence in a procedure is to use visual aids such as pictures, flowcharts, maps, diagrams, tables, and graphics. You may follow the simple rules shown in Table 4.7.

Human Factors 107

Table 4.7 Rules for using visual aids in procedures. If you wish to . . .

Then the proper visual aid is a . . .

Illustrate how something looks

Photo or diagram

Illustrate location

Map or photo

Illustrate assembly process

Diagram or process flowchart

Illustrate relationship among data points

Table or graph

There is a current tendency in industry to write procedures in a way that supports training and facilitates coherence. In this methodology, you write procedures in a table format, you define the action following the clarity principle, you define the effect of the action and place a photo next to the action step to make the action clearer, and finally you define why this step is necessary. Normally, a procedure only answers the “how,” and the “why” is left out for training. Under this new methodology, the use of the “why” within a procedure helps to make it more coherent. You can also define what could go wrong and what can be done if something goes wrong. See the example in Table 4.8. Whatever process you use to enhance the coherence of your procedure, be careful that you do not create circular flowcharts.

Table 4.8 Example of comprehensive instruction. What is the Why is consequence this action of the action? needed?

Responsibility Action step

Photo

Operator

Picture of When the green button equipment is on the tumbler, it will start to move.

Turn the equipment on by pressing the green button on the front of the equipment.

The operation cannot start until the equipment is on—tumbler movement implies the mixing process has started.

108 Chapter 4

Economy This principle relates to eliminating redundancy, unnecessary prepositions, jargon, and dead construction. Procedures need to be lean; we want them lean, specific, short, and precise. A procedure is not a book. You do not need to write with fancy words or an excess of words. A procedure does not need paragraphs. Sentences should be brief and easy to follow. Long sentences tend to have more than one verb, and they tend to confuse. Here is an example from a procedure: “Toward the end of the process, add about 10 kg of material.” Imagine that you are an operator and you see this instruction. Think of a process that takes three hours. What is meant by “toward the end of the process”? Is it 15 minutes before, five minutes, one minute? And then what exactly does “add about 10 kg of material” mean? Isn’t this complex? Is 8 good? Is 12 good? What material are we referring to? When we use prepositions this way, leaving the options too open, the instruction becomes ambiguous and confusing, and the final result may be disastrous. For this reason, I recommend that a procedure writer say exactly what he/she wants as an action—very specific and without any opportunity to assume. In summary, when we talk about economy, my recommendation is to examine your procedure for any paragraph that can be cut. Rephrase sentences by using only one active verb per command. Separate the sentences to make them easier to read, and say what you want people to do, not what you want them not to do. Correctness Now that you have applied all the other principles, it is time to evaluate whether what you have left of your procedure is written properly, with adequate grammar, and so on. As I say in all my seminars, there are many rules for proper grammar, as well as many books on how to use and select the proper words. The difficulty with encountering a grammar error in a procedure is that it stops the brain process. We immediately think about the person who wrote the procedure and those who approved it; we cannot believe they missed the error. The reality is that there are grammatical errors that could create a human error at the moment of execution, and there are others that have no consequence on the procedure. In the discussion in the following section, we will emphasize those grammar errors that can affect the actions of the person following the procedure.

Human Factors 109

In summary, this principle refers to using correct grammar, using simple words and terms, writing in the proper sequence, and supporting the users in their execution. In many situations, the SOP writer makes it easy for himself/herself, but forgets that every time the user uses the procedure, it may be a nightmare trying to understand and follow what is written. When the brain detects an error, it stops—first to try to understand and decode the error, second to assume what is expected, and third to execute what the person believes must be done.

TRAINING, COMPETENCE, AND PERFORMANCE The term training has evolved to mean establishing competence through education, training, skills, and experience as defined, for example, in the international standard ISO 13485:2016. The FDA requirements are not explicit, but the agency’s expectation is that a manufacturer must ensure the people assigned to particular functions possess the necessary education, background, training, and experience to perform their functions correctly. Based on this, it is clear that the FDA and QMS expect competency as a requirement. Effective training and competence development are essential to achieving effective performance. Training provides skills and/or knowledge to adequately perform a job. Personnel should be qualified to do the operations that are assigned to them in accordance with the nature of, and potential risk of, their operational activities. On the other hand, managers should define appropriate qualifications for each position to help ensure individuals are assigned the appropriate responsibilities. Personnel should also understand the effect of their activities on the product and the customer. Job descriptions should include requirements such as scientific and technical knowledge, process and product knowledge, and/or risk assessment abilities to appropriately execute certain functions. Training plays a key role in the reduction of errors. Well-trained operators whose skills and knowledge are appropriate to the task will make fewer errors than unskilled operators. There is a difference between skill acquisition and skill maintenance. Operators will initially be trained to the required skill level, after which it is assumed they will retain their skills. That is probably true for skills that are exercised all the time, but not for those rarely used. If operators are required to exercise important or critical skills at longer intervals, a major reduction in human error may be expected from the implementation of a skill maintenance program. An example would be

110 Chapter 4

the compulsory periodic retraining program of airline pilots in simulators to ensure they maintain their ability to deal with emergencies. The competence of personnel is defined as the ability to perform tasks according to expectations. In other words, competence is the ability of an individual to do a job properly, or what people need to be successful in their jobs. The concept of competencies includes all the related knowledge, skills, abilities, and attributes that form a person’s job. This set of contextspecific qualities is correlated with superior job performance and can be used as a standard against which to measure job performance, as well as to develop, recruit, and hire employees. Developing competence is part of the organizational design and is critical to performance. Providing the right training and developing the required competencies (technical and nontechnical) has a direct influence on the reliability of human performance. Competence is much more than training. It implies appropriate education, qualifications, training, skills, technical knowledge, experience, physical and mental capabilities, understanding, behavior, and attitudes. Continued training is critical to ensure that employees remain proficient in their operational functions. Typical training should cover the policies, processes, procedures, and written instructions related to operational activities, the product or service, the quality system, and the desired work culture. Training should focus on both the employees’ specific job functions and the related regulatory requirements. Managers are expected to establish training programs that include the following: • Evaluation of training needs • Provision of training to satisfy these needs • Evaluation of the effectiveness of training • Documentation of training and/or retraining When operating in a robust quality system environment, it is important that managers verify that skills gained from training are implemented in dayto-day performance. As an example of the importance of the concept of competence, following are some of the requirements that the new international standard ISO/IEC 17025:2017 General requirements for the competence of testing and calibration laboratories contains under clause 6.2, “Personnel”:20 • 6.2.1 All personnel of the laboratory, either internal or external, that could influence the laboratory activities shall act impartially, be competent and work in accordance with the laboratory’s management system.

Human Factors 111

• 6.2.2 The laboratory shall document the competence requirements for each function influencing the results of laboratory activities, including requirements for education, qualification, training, technical knowledge, skills and experience. • 6.2.3 The laboratory shall ensure that the personnel have the competence to perform laboratory activities for which they are responsible and to evaluate the significance of deviations. • 6.2.4 The management of the laboratory shall communicate to personnel their duties, responsibilities and authorities. • 6.2.5 The laboratory shall have procedure(s) and retain records for: (a) determining the competence requirements; (b) selection of personnel; (c) training of personnel; (d) supervision of personnel; (e) authorization of personnel; (f) monitoring competence of personnel. The current version of the international standard ISO 13485:2018 Medical devices — Quality management systems —Requirements for regulatory purposes also includes under clause 6.2 Human resources:21 Personnel performing work affecting product quality shall be competent on the basis of appropriate education, training, skills and experience. The organization shall document the process(es) for establish ing competence, providing needed training, and ensuring awareness of personnel. The organization shall: (a) determine the necessary competence for personnel per forming work affecting product quality; (b) provide training or take other actions to achieve or maintain the necessary competence;

112 Chapter 4

(c) evaluate the effectiveness of the actions taken; (d) ensure that its personnel are aware of the relevance and importance of their activities and how they contribute to the achievement of the quality objectives; (e) maintain appropriate records of education, training, skills and experience In the case of the international standard ISO 9001:2015, Quality management systems—Requirements, there is a clause devoted to this topic.22 Specifically, clause 7.2, Competence, establishes that: The organization shall: (a) determine the necessary competence of person(s) doing work under its control that affects the performance and effectiveness of the quality management system; (b) ensure that these persons are competent on the basis of appropriate education, training, or experience; (c) where applicable, take actions to acquire the necessary competence, and evaluate the effectiveness of the actions taken; (d) retain appropriate documented information as evidence of competence. NOTE: Applicable actions can include, for example, the provision of training to, the mentoring of, or the reassignment of currently employed persons; or the hiring or contracting of competent persons. Most of the cases mentioned in Chapter 1 concluded that inadequate training and lack of competence were main contributing factors.

Typical Errors in Training Programs Human errors are typically associated with a lack of training, or more often, poor-quality training. Without a doubt, training can be considered as one of the main root cause categories for human errors and, in a wider sense, for inadequate human performance in the manufacturing industries. The requirements of an effective training program are: • Develop the program based on the needs of the participants. • Set clear learning objectives.

Human Factors 113

• Schedule the program at the right time and place. • Select the right people to attend. • Select effective instructors. • Use effective techniques and learning aids. • Evaluate the program, and measure training effectiveness. Causes of training deficiencies are multiple, including: • Training not required • Missing training • Content not adequate • Training method not adequate • Language barriers • Training environment not adequate • Instructor not adequate • Insufficient practice or hands-on experience • Frequency not adequate (insufficient refresher training) • Effectiveness of the training not measured Chapter 7 provides a detailed discussion of the root causes related to training. Training problems are related to two main areas: (1) the content and delivery process of the training; and (2) the trainer/instructor’s capabilities. Most companies lack a formal and robust process for instructional design, or even an adequate train-the-trainer program. On the other hand, on-the-job training programs in the manufacturing industries are typically run by supervisors, subject matter experts (sometimes), or even experienced operators, normally called instructors or trainers. Those instructors often have extensive expertise in a specific job, but their teaching/training skills are not necessarily developed at the required level. The lack of measurement of the effectiveness of training is another major deficiency in the training program, which is discussed in later in this chapter.

Training Needs Analysis Training needs analysis (TNA) is the process of identifying the gap between employee training level and the company’s training needs. TNA is the first stage in the training process and involves a procedure to determine whether

114 Chapter 4

training will indeed address the problem that has been identified. A TNA looks at each aspect of an operational domain so the initial skills, knowledge, and attitudes of the human elements of a system can be effectively identified and appropriate training can be determined and established. TNA is a structured approach, initially developed in the military setting to assess training requirements and appropriate training methods to meet them. Typically, it is used to identify and support training needs created by the introduction of new or modified systems and equipment. TNA is an iterative process and provides an audit trail for training-related decisions. It consists of five stages: 1. Scoping document 2. Operational task analysis 3. Training gap analysis 4. Training option analysis 5. Training plan Based on the scope of the TNA (for example, a new role or a new equipment item), there may be a need to prepare a scoping document to identify what the TNA aims to achieve. An operational task analysis evaluates the tasks undertaken within a specific role to identify the tasks associated with the new system or equip ment being introduced. The training gap analysis is used as a measure of the gap between existing skills, knowledge, and attitudes and those required. Skills, knowl edge, and attitudes required may vary between different roles performing the task. The training option analysis reviews available training methods and media for each task and evaluates the advantages and disadvantages of each delivery method. The training plan presents the detail of the analysis and should include the following: • Implementation plan • Delivery schedule • Explanation of how the training will be evaluated

Skill Fading There is also a need to consider that skills fade (the degree to which the learning decays over time). Complex cognitive skills, such as performing

Human Factors 115

a calculation, tend to be more prone to skill fade than psychomotor skills, such as learning to ride a bicycle. Tasks performed infrequently are more prone to skill fade, especially if they are difficult or complex. High skillfade activities should be selected for more intensive training, practice, and refresher training activities. Looking for the optimal balance, it is important to conduct not more (and not less!) refresher training than is necessary to keep performance at the desired skill level. The military23 is one of the fields where many studies have been conducted regarding potential models for predicting skill retention and trying to determine when refresher training should be provided. Skill retention can be defined as the maintenance or sustainment of skills as learned behaviors and procedures over long periods of time without practice. Degradation in performance can be observed because the perceptual, motor, and cognitive processes that underlie skilled performance decay or break down, or because the individual loses the ability to access or perform those processes. Eight main factors seem to be related to skill retention: 1. Retention interval 2. Opportunity to practice 3. Degree of learning 4. Method of training 5. Similarity of training and performance environments 6. Type of task 7. Method of testing 8. Individual differences Several classes of models for predicting skill retention have been published. Some of them are subjective and qualitative approaches that involve some kind of self-assessment regarding retention and/or the need for refresher training performed by the trained individual. Other qualitative models help to understand how skills fade over time in relation to some key factors. However, they do not allow the prediction of when competence will decline below an established threshold or criterion or when refresher training will be needed. One of the most widely recognized quantitative models of skill reten tion is the U.S. Army Research Institute’s Users’ Decision AID (UDA) model. This model was developed to provide quantitative predictions

116 Chapter 4

of skill retention for military tasks, and it is based on empirical studies documenting factors that affect skill retention. The model is based on the following specific factors: • The number of steps • Whether the steps must be performed in a set sequence • Whether the task contains feedback that indicates the correct performance steps • The number of facts or information chunks that must be recalled • Execution demands • Whether the skill is cognitive or perceptual/motor • Whether there are job and/or memory aids for the task • The time limit for the task (if any) The UDA was developed through an iterative process of determining the empirical relationship between the set of factors and observed retention of certain military skills, and determining the best-fitting function describing that relationship. The UDA contains 10 questions that raters answer based on the task summary and their knowledge of the task. Raters select the appropriate answer and note the scale (points) value associated with the selected answer. When all 10 questions have been answered, the raters compute the total of the scale values, which constitutes the task’s retention value. Scores are interpreted with performance prediction tables, which are used to convert the total retention score into a prediction of performance for the rated task. The UDA model is used to determine: • How quickly task skills will be forgotten • Which tasks among several will be forgotten or remembered after a specified interval • What percentage of soldiers will be able to perform a task after up to one year without practice • When to conduct refresher training to keep a group at a criterion level Table 4.9 depicts UDA question and scale values.

Human Factors 117

Table 4.9 UDA question and scale values. Question

Scale values

Factor

1. Are jobs or memory aids used by the soldier in performing this task?

1 (Yes)

Task characteristics; presence of job aid

2. How would you rate the quality of the job or memory aid?

56 (Excellent)

0 (No)

25 (Very good) 2 (Marginally good)

Task characteristics; presence of job aid

1 (Poor) 3. Into how many steps has the task been divided?

25 (One) 14 (Two to five) 12 (Six to ten)

Task characteristics; number of steps

0 (More than ten) 4. Are the steps in the task required to be performed in a definite sequence?

10 (None)

5. Does the task provide built-in feedback so you can tell if you are doing each step correctly?

22 (For all steps)

5 (All) 0 (Some are and some not)

19 (For greater than 50% of steps) 11 (For up to 50% of steps)

Task characteristics; organization Task characteristics; availability of feedback

0 (For none) 6. Does the task or part of the task have a time limit for its completion?

40 (No time limit)

7. How difficult are the mental processing requirements of this task?

37 (Almost no requirements)

8. How many facts, terms, names, rules, or ideas must a soldier memorize in order to do the task?

20 (None)

35 (Easy time limit) 0 (Difficult time limit) 28 (Simple requirements) 3 (Complex requirements)

Task characteristics; stress Task characteristics; difficulty

0 (Very complex requirements) 18 (One to three) 13 (Four to eight)

Task characteristics; difficulty

0 (More than eight) Continued

118 Chapter 4

Table 4.9 Continued. Question

Scale values

9. How hard are the 34 (Not applicable, none facts and terms that to remember) must be remembered? 31 (Not hard at all)

Factor Task characteristics; difficulty

12 (Somewhat hard) 0 (Very hard) 10. What are 2 (None) the motor control 0 (Small) demands of the task? 16 (Considerable)

Task characteristics; difficulty

3 (Very large)

Training Effectiveness Training is a critical component in any organization’s strategy, but regulated companies rarely evaluate the impact of their training programs. The management of effective training provides the overall structure needed to ensure that training programs have processes in place to support regulated operations. Organizations that monitor training effectiveness and strive to improve weaknesses are consistently the best performers. It is important to develop methodologies to measure, evaluate, and continuously improve training. Very often, the training function is seen as an expenditure center rather than as one of the most critical activities in any organization, especially in highly regulated environments such as nuclear, aerospace, medical, and pharmaceutical. In these industries, training results must be measured. Incorporating selected training metrics into a reporting strategy can help demonstrate the real value of training. Measurements that consider performance improvements can provide a benchmark for training effectiveness. An important consideration is that most corrective or preventive actions include some training efforts, and therefore, the effectiveness of these training actions must be evaluated. However, for most companies, the only record generated from training activities is the attendance sheet itself. When evaluating the possible impact of training during nonconformance investigations, these sheets merely determine whether the personnel involved in the failure signed the corresponding training roster. If so, they conclude that training can be discarded as a root cause of the situation.

Human Factors 119

Training effectiveness is not an explicit requirement of FDA regulations, but the FDA has expectations regarding these topics that are included in several guidance documents. The FDA’s expectation is that firms must evaluate the effectiveness of their personnel training because it is a direct indicator of the robustness of the firm’s quality system. Quality data (complaints, failure investigations, audits, record reviews, and so on) must be used to assess both training needs and effectiveness. Human errors must be detected, trended, investigated, and corrected. Do not use retraining as a corrective action. The 2006 FDA Guidance for Industry: Quality Systems Approach to Pharmaceutical CGMP Regulations24 states that “under a quality system, managers are expected to establish training programs that include the following: • Evaluation of training needs • Provision of training to satisfy these needs • Evaluation of effectiveness of training • Documentation of training and/or re-training” And the standard ISO 9001:2015 Quality management systems—Require ments includes, under clause 7.2, Competence, that the organization shall:25 (a) determine the necessary competence of person(s) doing work under its control that affect the performance and effectiveness of the quality management system; (b) ensure that these persons are competent on the basis of appropriate education, training, or experience; (c) where applicable, take actions to acquire the necessary competence, and evaluate the effectiveness of the actions taken; (d) retain appropriate documented information as evidence of competence. Evaluation of effectiveness of training is also a requirement of ISO 13485: 2018 Medical devices—Quality management systems—Requirements for regulatory purposes,26 which also includes under clause 6.2, Human resources: Personnel performing work affecting product quality shall be competent on the basis of appropriate education, training, skills and experience.

120 Chapter 4

The organization shall document the process(es) for estab lishing competence, providing needed training, and ensuring awareness of personnel. The organization shall: (a) determine the necessary competence for personnel performing work affecting product quality; (b) provide training or take other actions to achieve or maintain the necessary competence; (c) evaluate the effectiveness of the actions taken; (d) ensure that its personnel are aware of the relevance and importance of their activities and how they contribute to the achievement of the quality objectives; (e) maintain appropriate records of education, training, skills, and experience The requirement to measure the effectiveness of training is also part of most of the foreign regulations pertaining to this type of industry. As if we need more reasons for the evaluation of training, here are a few others: • To justify the existence and budget of the training department by showing how it contributes to the organization’s objectives and goals • To decide whether to continue or discontinue specific training programs • To gain information on how to improve future training programs: physical facilities, schedule, materials, food, material contents, instructors, and so on The Kirkpatrick Model for Training Effectiveness Evaluation More than half a century ago, Donald L. Kirkpatrick introduced a four-step approach to training evaluation.27 His four steps have become commonly known in the training field as level one, level two, level three, and level four evaluation. Table 4.10 reflects these four levels of evaluation.

Human Factors 121

Table 4.10 The four levels of the Kirkpatrick model. Level

What

When

Reaction

Did they like it?

Upon completion of the training

Learning

Did they learn it?

Before and after training

Behavior

Did they use it?

Before and after training

Results

Did they produce measurable positive business results?

Before and after training

Level One: Reaction Kirkpatrick defines this first level of evaluation as determining “how well trainees liked a particular training program,” “measuring the feelings of trainees,” or “measuring the customer satisfaction.” He outlines the following guidelines for evaluating reaction: 1. Determine what you want to learn. 2. Use a written comment sheet covering those items determined in step 1. 3. Design the form so reactions can be tabulated and quantified. 4. Obtain honest reactions by making the forms anonymous. 5. Encourage the trainees to write additional comments not covered by the questions that were designed to be tabulated and quantified. Kirkpatrick also suggests measuring the reaction of the training managers and other qualified observers. An analysis of these two groups would give the best indication of the effectiveness of the program at this first level of training evaluation. Level Two: Learning Kirkpatrick defines learning as “attitudes that were changed, and knowledge and skills that were learned.” He outlines the following guidelines to evaluate learning: 1. The learning of each trainee should be measured so quantitative results can be determined. 2. A before-and-after approach should be used so any learning can be related to the program.

122 Chapter 4

3. Where practical, a control group not receiving the training should be compared with the group that received the training. 4. Where practical, the evaluation results should be analyzed statistically so learning can be proved in terms of correlation or level of confidence. In addition to using written and oral examinations and performance tests, Kirkpatrick suggests that if a program is carefully designed, learning can be fairly and objectively evaluated while the training session is being conducted. For example, individual performance of a skill being taught and discussions following a role-playing situation can be used as evaluation techniques. Level Three: Behavior (the Transfer of Training) Realizing that “there may be a big difference between knowing principles and techniques and using them on the job,” Kirkpatrick suggests that the following five requirements must be met for change in behavior to occur: 1. Desire to change 2. Knowledge of what to do and how to do it 3. The right job climate 4. Help in applying what was learned during training 5. Rewards for changing behavior Kirkpatrick outlines the following guidelines for evaluating training programs in terms of behavioral changes on the job: • A systematic appraisal should be made of on-the-job performance on a before-and-after basis. • The appraisal of performance should be made by one or more of the following groups (the more the better): – The person receiving the training – The person’s supervisor – The person’s subordinates (if any) – The person’s peers or other people thoroughly familiar with his or her performance • A statistical analysis should be made to compare performance before and after and to relate changes to the training program.

Human Factors 123

• The post-training appraisal should be made three months or more after the training so the trainees have an opportunity to put into practice what they have learned. Subsequent appraisals may add to the validity of the study. • A control group (not receiving the training) should be used. Kirkpatrick establishes that “measuring changes in behavior resulting from training programs involves a very complicated procedure.” Nevertheless, it is worthwhile if training programs are to increase in effectiveness and their benefits are to be made clear to top management. He also recognizes that few training managers have the background, skill, and time to engage in extensive evaluations, and he suggests they call on specialists, researchers, and consultants for advice and help. Level Four: Results (the Impact of Training on the Business) Based on the premise that “the objectives of most training programs can be stated in terms of results such as reduced turnover, reduced costs, improved efficiency, reduction in grievances, increase in quality and quantity of production, or improved morale,” Kirkpatrick concludes, “it would be best to evaluate training programs directly in terms of results desired.” He recognizes that there are so many complicating factors that it is extremely difficult, if not impossible, to evaluate certain kinds of programs in terms of results. He recommends that training managers evaluate in terms of reaction, learning, and behavior first, and then consider tangible business results. He also cautions that due to the difficulty in the separation of variables (that is, how much of the improvement is due to training as compared to other factors), it is very difficult to measure results that can be attributed directly to a specific training program. From Kirkpatrick’s experience with level four evaluations, he concludes that it is probably better to use the personal interview rather than a questionnaire to measure results. Also, measures taken on a beforeand-after basis can provide evidence (but not necessarily proof) that the business results are directly attributable to the training even though other factors might have been influential.

FACILITIES AND EQUIPMENT DESIGN AND MAINTENANCE The design of the physical facilities, from their layout to the construction materials used, is a significant element in controlling opportunities for

124 Chapter 4

human failures. The primary users of manufacturing plant interfaces are the plant operators and maintenance personnel. These users have distinc tly different tasks from each other, but all of them interact, in one way or another, with different aspects of the facilities and equipment. For example, a manufacturing operator may be responsible for taking an instrument reading (for example, the temperature of a production room), while an instrumentation technician is responsible for calibrating the same equip ment. They both interact with the same equipment but in a different way. In the fabrication of medicinal products, those aspects are even more relevant, as they can be directly related to the contamination of products, including the critical cross-contamination of one product with another that might result even in the death of some patients. For example, the U.S. FDA GMP regulation for the manufacture of human finished drugs established under 21 CFR 21128 states that: §211.42 Design and construction features. (a) Any building or buildings used in the manufacture, processing, packing, or holding of a drug product shall be of suitable size, construction and location to facilitate cleaning, maintenance, and proper operations. (b) Any such building shall have adequate space for the orderly placement of equipment and materials to prevent mixups between different components, drug product containers, closures, labeling, in-process materials, or drug products, and to prevent contamination. The flow of components, drug product containers, closures, labeling, in-process materials, and drug products through the building or buildings shall be designed to prevent contamination. This regulation also states regarding maintenance that: §211.58 Maintenance. Any building used in the manufacture, processing, packing, or holding of a drug product shall be maintained in a good state of repair. Regarding the equipment used in a manufacturing plant, the same FDA regulation established the following: §211.63 Equipment design, size, and location. Equipment used in the manufacture, processing, packing, or holding of a drug product shall be of appropriate design, adequate size, and suitably located to facilitate operations for its intended use and for its cleaning and maintenance.

Human Factors 125

§211.65 Equipment construction. (a) Equipment shall be constructed so that surfaces that contact components, in-process materials, or drug products shall not be reactive, additive, or absorptive so as to alter the safety, identity, strength, quality, or purity of the drug product beyond the official or other established requirements. (b) Any substances required for operation, such as lubricants or coolants, shall not come into contact with components, drug product containers, closures, in-process materials, or drug products so as to alter the safety, identity, strength, quality, or purity of the drug product beyond the official or other estab lished requirements. §211.67 Equipment cleaning and maintenance. (a) Equipment and utensils shall be cleaned, maintained, and, as appropriate for the nature of the drug, sanitized and/or sterilized at appropriate intervals to prevent malfunctions or contamination that would alter the safety, identity, strength, quality, or purity of the drug product beyond the official or other established requirements. Maintenance is a key activity in a manufacturing plant, and many tasks undertaken are critical to safety and/or quality and prone to error. Some times, maintenance is performed under a rush order or even an emergency state, which may involve even more pressure on the maintenance staff. Therefore, equipment should be designed to make the task easier and reduce the potential for human error. The proportions of human performance problems associated with maintenance-related activities far exceed those relating to other types of human performance. Maintenance errors have been among the principal causes of several major accidents in a wide range of fields. For example: • The Apollo XIII oxygen tank blowout (1970) • The loss of coolant near-disaster at the Three Mile Island nuclear power plant in Pennsylvania (1970) • The Bophal disaster in India (1984) • The crash of a Japan Airlines B747 into the side of Mount Osutaka (1985) • The explosion on the Piper Alpha oil and gas platform in the North Sea (1988)

126 Chapter 4

• A blocked pitot tube contributing to the total loss of a B757 at Puerto Plata, in the Dominican Republic (1996) Human factors issues for manufacturing equipment relate to how people interact with and use equipment. Process equipment includes displays, alarms, controls, and so on. The diversity of equipment used in production, their characteristics, and those of people using the equipment, result in a wide array of human factors design issues. Among them are: • Suitability • Simplicity • Accessibility • Identifiability • Detectability • Availability • Logic and consistency • Flexibility • Conformity with user expectations ISO published a standard (ISO 9355-3:2006) on ergonomic requirements for the design of displays and control actuators. ISO 9355 consists of the following parts, under the general title Ergonomic requirements for the design of displays and control actuators: • Part 1: Human interactions with displays and control actuators • Part 2: Displays • Part 3: Control actuators • Part 4: Location and arrangement of displays and controls Note: Part 4 is at the draft international standard (DIS) status at the time of this publishing. ISO 9355-1:1999 Ergonomic requirements for the design of displays and control actuators—Part 1: Human interactions with displays and control actuators applies to the design of displays and control actuators on machinery. It specifies general principles for human interaction with displays and control actuators, to minimize operator errors and to ensure an efficient interaction between the operator and the equipment. It is

Human Factors 127

particularly important to observe these principles when an operator error may lead to injury or damage to health. This standard was last reviewed and confirmed in 2021. ISO 9355-2:1999 Ergonomic requirements for the design of displays and control actuators—Part 2: Displays gives guidance on the selection, design, and location of displays to avoid potential ergonomic hazards associated with their use. It specifies ergonomics requirements and covers visual, audible, and tactile displays. ISO 9355 also applies to displays used in machinery (for example, devices and installations, control panels, operating and monitoring consoles) for occupational and private use. Specific ergonomics requirements for visual display terminals (VDTs) used for office tasks are given in the standard ISO 9241-11:2018. This standard was last reviewed and confirmed in 2021. ISO 9355-3:2006 Ergonomic requirements for the design of displays and control actuators—Part 3: Control actuators gives ergonomic requirements for, and guidance on, the selection, design, and location of control actuators adapted to the needs of the operator, suitable for the control task in question, and taking into account the circumstances of their use. It is applicable to manual control actuators used in equipment for both occupational and private use. This standard was last reviewed and confirmed in 2021. The process control system links the human operator to the process equipment. The designer must decide which functions are allocated to the process control system and which are allocated to the operator. In normal operation, the dominant issue for process control is designing the system to minimize the potential for human error. For example, if 400 kg have already been added to a 500 kg blender, the control system can reject an operator input calling for the addition of 200 kg of the component. The process control system must be designated to provide enough information for the operator to quickly diagnose the cause of the problem and respond to it. If operators use the overrides repeatedly because equip ment failure remains uncorrected, an error will occur. Maintenance errors fall into highly predictable and repetitive clusters. Therefore, these errors are relatively easy to identify and predict. In the maintenance environment, there are many situational and environmental factors that can affect human performance and increase the likelihood of error. Some of them are depicted in Figure 4.6.

128 Chapter 4

• Task difficulty • Poor environmental conditions • Poorly designed documentation/procedures • Lack of correct tools and equipment • Lack of knowledge and experience • Time pressure • Fatigue

Figure 4.6

Human factors affecting maintenance.

Environmental Ergonomics Environmental ergonomics focuses on the interaction between people and their physical environment, with specific emphasis on: • Thermal comfort • Lighting • Noise • Vibration The key opportunity to provide effective solutions to these factors is during the design of the facilities and/or equipment. Poorly designed working environments may have a significant impact on the quality and safety of the work, including significant adverse effects on health and well-being. Poor design can have both psychological and physical effects and can lead to lower productivity, increased potential for human failures, and physi cal discomfort. Table 4.11 depicts typical environmental ergonomic factors (and possible causes) to be considered in a manufacturing plant setting. The following example demonstrates the impact of adverse environ mental conditions on the quality of processes. A quality inspector started to miss defective units when he was moved to control a new production line. It was located in a new building where everyone was complaining that the temperature was very cold. The reason for the inspector’s failures was that the cold environment forced him to visit the bathroom more frequently to urinate. This is a physiologic mechanism of defense known as cold diuresis, diuresis being the production of more dilute urine. The idea is that when it’s cold, your body tries to conserve heat by constricting the blood vessels in the skin. When you constrict the blood vessels in the skin, more blood

Human Factors 129

Table 4.11 Environmental ergonomic factors and causes. Factors

Possible causes

Thermal comfort

• Extreme temperatures • Poor air quality • Humidity • Dryness • Heavy manual work in hot areas • Sedentary work in cold areas • Exposure to rain, ice, high winds

Lighting

Lighting not compatible with the task Glare and reflective sources

Noise

Noisy equipment Use of noisy portable tools Impact noise Surrounding external noises (road, train, and so on) without adequate soundproofing

Vibration

Use of handheld equipment that creates vibration Rough driving due to surface or vehicle condition

accumulates in the interior of the body, and that tends to raise blood pressure. In response to the raised pressure, the levels of the antidiuretic hormone will fall, provoking the kidney to produce more dilute urine, which then translates to an increase in urine production.

EXAMPLES OF HUMAN FACTORS IN MANUFACTURING OPERATIONS Human error is much more likely than equipment failure. Instead of trying to adapt the human to the facility, it is essential to design plants that meet the capabilities of the human. The design of control systems in process and manufacturing plants directly impacts the likelihood of human errors by operators. Consequently, the development of an optimized interface between the processes and equipment being controlled and the operators is vital for ensuring safety and operability to avoid adverse impacts on people, the process, or the company. There are various important aspects of the control system, including equipment design, process control design, the

130 Chapter 4

human/computer interface, operating procedures and the documentation used, and operator competence. People control processes by interacting with equipment. Process equip ment includes displays, alarms, controls, computers, manual equipment, and personal protective equipment. Human factors issues for process equipment relate to how people interact with and use the equipment, and the charac teristics of the equipment that may increase the likelihood of human failures when people use it. This entails studying the match between the attributes of the people and those of the equipment involved in the interactions. The human-computer interface deals with how people interact with computer systems with the objective of ensuring that computer system designs are functional, easily operable, efficient, and safe. Many plants today utilize computer systems to control processes. Human failures in interacting with control systems can result in loss of control and serious accidents and/or quality incidents. A variety of documentation is used, including manuals, guidelines, checklists, data sheets, logs, records, work orders, and so on to develop procedures that account for human factors issues. Documentation design can have a major impact on process safety and operability. Procedures that are not followed, guidelines that are not used, diagrams that are misleading, and records that are not completed properly can all increase the likelihood of errors. The ability of personnel to perform tasks according to expectations, or the competence of personnel, is fundamental to every organization because of the role it plays in ensuring that tasks are carried out satisfactorily. Table 4.12 (modified from the Center for Chemical Process Safety, 2007) depicts a comparison between human and machine capabilities. Table 4.12 Comparison between human and machine capabilities. Humans excel in

Machines excel in

• Detection of certain forms of very • Monitoring (both people low energy and machines) • Sensitivity to an extremely wide variety of stimuli

• Performing routine, repetitive, or very precise operations

• Perceiving patterns and making generalizations about them

• Responding very quickly to control signals

• Ability to store large amounts of information for long periods— and recalling relevant facts at appropriate moments

• Storing and recalling large amounts of information in short time periods Continued

Human Factors 131

Table 4.12 Continued. Humans excel in

Machines excel in

• Ability to exercise judgment where events cannot be completely defined

• Performing complex and rapid computations with high accuracy

• Improvising and adopting flexible procedures • Ability to react to unexpected low-probability events • Applying originality in solving problems (alternative solutions) to profit from experience • Ability and alter the course of action • Ability to perform fine manipulations, especially where misalignment appears unexpected • Ability to continue to perform when overloaded • Ability to reason inductively

• Sensitivity to stimuli beyond the range of human sensitivity (infrared, radio waves) • Doing many things at one time • Exerting large amounts of force smoothly and precisely to extraneous factors • Insensitivity • Ability to repeat operations very rapidly, continuously, and precisely the same way over a long period in environments • Operating that are hostile and beyond human tolerance • Deductive process

In the United States, the FDA includes human factors as a critical element to consider during the medical device design process. The FDA’s human factors pre-market evaluation team ensures that new medical devices have been designed to be reasonably safe and effective when used by the intended user populations. The effort primarily involves reviewing new device submissions, and promoting effective and focused human factors evaluation and good design practices for medical devices. Human factors/usability engineering focuses on the interactions between people and devices. The critical element in these interactions is the device user interface. To understand the human–machine system, it’s important to understand the ways that people: • Perceive information from the device • Interpret the information and make decisions about what to do • Manipulate the device, its components, and/or its controls It’s also important to understand the ways that devices react to input from the user, and then provide feedback to the user about the effects of their actions.

132 Chapter 4

Human factors/usability engineering is used to design the machine– human (device–user) interface. The user interface includes all components with which users interact while preparing the device for use (for example, unpacking, setup, calibration), using the device, or performing maintenance (for example, cleaning, replacing a battery, making repairs). For medical devices, the most important goal of the human factors/usability engineering process is to minimize use-related hazards and risks, and then confirm that these efforts were successful and that users can use the device safely and effectively.

5 How Organizations Deal with Human Errors

PERSONAL ACCOUNTABILITY An increasing number of manufacturing companies, especially in highly regulated sectors, are establishing specific procedures to deal with the plague of human errors. Some divert the human error investigation to the human resources department. Others develop a checklist to search for specific human factors that could be considered precursors to the incident under investigation. The main point here is not who is in charge of the investigation, but rather what tools and knowledge they have to perform a good and effective investigation. Most regulated companies still do not “get” the human factors message and focus their investigations solely on the carelessness of their associates. Manufacturing errors are usually costly in the medical product industry, especially when the products involved reach the customers and must be recalled. A recent case of a pill coated with the incorrect color resulted in the dismissal of the entire crew involved with the manufacture of the batch. An FDA inspector discovered some backdated information. In response to the finding, the company stated that “the employees involved will be retrained and warned that a future recurrence will have zero tolerance resulting in severe action, including possible immediate termination.” Several questions arise: Why did the associates backdate the information? Did management control exist to prevent or even detect this behavior? The warning letter indicates that this was a repeat observation following two previous inspections. We may conclude that many regulated companies are not dealing appropriately with human error, and some of them are not dealing with it at all.

133

134 Chapter 5

Advancements in human failures management are based on two main elements: 1. Taking a systems perspective. Errors are not caused by the failure of individuals; they emerge from the alignment of multiple contributory system factors, each necessary and only jointly sufficient. The source of the error is the system, not its component parts. 2. Moving beyond blame. Blame focuses on the supposed defects of individual workers and denies the impact of systemic contributions. Moreover, blame has all kinds of negative side effects. It automatically leads to defensive posturing, obfuscation of information, protectionism, polarization, and mute reporting systems. Learning from failures is a key element of improvement. This makes pun ishment and learning almost mutually exclusive activities. Organizations can either learn from errors or punish the individuals involved in them, but hardly do both at the same time. Punishment, rather than helping people avoid or better manage conditions that are conducive to error, actually conditions people not to get caught when an error does occur. Also, punishment is about moving beyond and away from the adverse situation. Learning, on the other hand, is about continuous improvement, about closely integrating the event into what the system knows about itself. The concept of accountability or responsibility in the sense of “make everyone accountable (responsible) for their own actions” has profound philosophical roots whose discussion is beyond the scope of this book. Data integrity issues deserve special mention and attention. At the time of this writing, good documentation practices and data integrity are among the top priorities of regulatory agencies all over the world. To defend themselves after the inspector found some lack of good documentation practices/data integrity occurrences, companies continue to justify those situations as mere human errors and emphasize the lack of intentionality. Here is the opinion of the FDA, represented in a 2015 warning letter: In correspondence with the Agency, you indicate that no malicious data integrity patterns and practices were found. Also, you state that no intentional activity to disguise, misrepresent, or replace failing data with passing data was identified and no evidence of file deletion or manipulation was found. Your response and comments focus primarily on the issue of intent, and do not adequately address the seriousness of the CGMP violations found during the inspection.

How Organizations Deal with Human Errors 135

I’d like to end this short chapter by reinforcing the expectations that the FDA has established for regulated companies when they need to deal with data integrity situations. Following is an example taken from a warning letter given to a drug manufacturer: We highly recommend that you hire a third-party auditor, with experience in detecting data integrity problems, to assist you with this evaluation and to assist with your overall compliance with CGMP. It is your responsibility to ensure that data generated during operations is accurate and that the results reported are a true repre sentation of the quality of your drug products. In response to this letter, provide a list of all the batches of drug products shipped to the U.S. market that relied upon missing, inaccurate, or unreliable test data. Your data integrity consultant should: 1. Identify any historical period(s) during which inaccurate data reporting occurred at your facilities. 2. Identify and interview your current employees who were employed prior to, during, or immediately after the relevant period(s) to identify activities, systems, procedures, and management behaviors that may have resulted in or contributed to inaccurate data reporting. 3. Identify former employees who departed prior to, during, or after the relevant period(s) and make diligent efforts to interview them to determine whether they possess any relevant information regarding any inaccurate data reporting. 4. Determine whether other evidence supports the infor ma tion gathered during the interviews, and determine whether additional facilities were involved in or affected by inaccurate data reporting. 5. Use organizational charts and SOPs to identify the specific managers in place when the inaccurate data reporting was occurring and determine the extent of top and middle management involvement in or awareness of data manipulation. 6. Determine whether any individual managers identified in item 5 above are still in a position to influence data integrity with respect to CGMP requirements or the submission

136 Chapter 5

of applications; and establish procedures to expand your internal review to any other facilities determined to be involved in or affected by the inaccurate data reporting.

Using Consequences to Increase or Decrease Behaviors Consequences are important because they increase of decrease the likeli hood of behavior occurring again. Behavioral consequences are those things and events that follow a behavior and change the probability that the behavior will be repeated in the future. Consequences must be used to shape and reinforce adequate positive behaviors toward quality and compliance. The word consequence is often interpreted to have a negative conno tation, but consequences can be negative, and they can be also positive. Remember that quality performance is the result of people’s behavior. If a company is not seeing improvements in quality, then one contributing factor may be that it is not effectively using consequences to manage performance. From the point of view of behavioral sciences, positive reinforcement and negative reinforcement are two behavioral consequences that increase the probability of a behavior occurring again. On the other hand, punishment and penalty are two behavioral consequences that decrease behavior. We can simplify the discussion by calling them either positive or negative consequences. Positive consequences are consequences that increase the likelihood of the behavior occurring again. Negative consequences generally decrease the likelihood of the behavior or, alternatively, are viewed as useful in trying to sustain certain desired behaviors out of fear of receiving a negative consequence. For example, a person driving might choose to follow the speed limit (desired behavior) and not speed (undesired behavior) out of concern about receiving a speeding ticket (negative consequence). Management-created consequences do not occur naturally. They only happen when a manager causes them to. They require consistent management observation, commitment, and follow-through. For example, if when employees demonstrate a certain behavior, the manager gives them a small reward and thanks them, this is an intentionally created positive consequence. Our industry has been focused historically on creating negative consequences for substandard quality performance. From internal audits to management reviews, violations on CGMP are escalated to senior management. Regulatory authorities may threaten establishments with warning letters and other regulatory enforcement actions. Fears of punishment and penalties have been some of the primary tools used by regulatory agencies to enhance compliance. However, if we place an

How Organizations Deal with Human Errors 137

overreliance on negative consequences, it demonstrates that we really do not understand how to utilize consequences to drive enhanced performance. A work environment driven by the fear of negative consequences is not a very nice work environment. Although negative consequences certainly have their place in managing quality and compliance performance, they are not the only consequences that should be used. An adequate balance between positive and negative consequences, sometimes referred to as positive and negative reinforcement, with emphasis on positive ones, generally leads to enhanced performance and results. Companies should consider two basic questions: 1. What should the company positively reinforce? 2. What are the types of positive consequences or reinforcements the company should consider? Specific desired behaviors should be the target of frequent recognition in order to reinforce them. This type of recognition is viewed as in-the-spot, individualized, and informal reinforcement. It includes a simple verbal “thank you.” Most employees sincerely appreciate verbal recognition by their managers or leaders. Other types of positive reinforcement can range from more formal tokens of appreciation, such as quality cards, to small gift certificates worth monetary value. You want to reach the right balance between creating positive consequences and recognition for a job well done or going the extra mile versus simply recognizing employees for what they are expected to do. Sites and departments that are showing significant improvements, meeting established targets or goals, or whose performance is considered best in class for the company should be recognized. As summarized by Michael LeBoeuf in his book The Greatest Management Principle in the World (1985), managers do not get what they hope for, train for, beg for, or even demand. Managers get what they recognize and reward through positive consequences. We have reviewed how management-created positive consequences can be used to increase the likelihood of desired behaviors reoccurring. However, sometimes the behavior that is occurring is an undesired one, and we want it to stop. In these situations, negative consequences can be used to decrease the likelihood of an undesired behavior from occurring again. Although negative consequences may be effective in influencing shortterm behavioral change, there are doubts about their ability to produce real, long-term behavioral change. When management has a pep talk with employees about recent mishaps and warns them that the next employee who has such a mishap will be terminated, certainly it will be an immediate,

138 Chapter 5

but temporal, change. But if the real root causes of those mishaps are not corrected, they will occur again. On the other hand, the intentional violation of procedures (think about data integrity problems) must be integrated into the disciplinary management process already established at the company. In summary, remember that behavioral change is complex and that consequences are only one small component of a comprehensive behaviorbased quality and compliance culture.

Commitment to Resilience: Learn from Errors Resilience is a combination of keeping errors small, improvising work arounds that keep the system functioning, and absorbing change while persisting. It is concerned with the ability of organizations to not only effec tively anticipate errors but also to cope with and bounce back from errors and unexpected events. One of the most common descriptions of the high-reliability organi zation (HRO) is that it is “resilient.” Here is how the Oxford English Dictionary defines resilient:

1. The capacity to recover quickly from difficulties, toughness

2. The ability of a substance to spring back into shape; elasticity

The definition points directly to two important characteristics of organi zational resilience. First, organizations show resilience in response to a difficulty or deformity. Resilience is reactive, not predictive. Thus, it is not the kind of capacity that is based on a careful analysis of potential faults, with mitigating solutions prepositioned to cope. In fact, the resilient organization will generate solutions to unexpected problems on the fly. The second feature is that when an unexpected problem occurs, the elastic (resilient) organization will continue to function normally. It continues to produce desired outcomes despite the problem (and internalizes the solution so a future response to the problem is even faster). Weick and Sutcliffe (2007) summarize the resilient organization very clearly: In moments of resilience, conditions vary yet the effect remains the same. That difference lies at the heart of a commitment to resilience. The “commitment to resilience” implies that the company’s leadership and culture have the proper attitude toward unexpected conditions or failures. It emphasizes the central point that high-reliability organizations are not organizations that do not experience failure. Rather, they continue to generate the main outcomes of their mission despite failures. To adapt to something unexpected, the people in the company are ready to recognize the

How Organizations Deal with Human Errors 139

event for what it is, avoid complacent assumptions, and refuse to oversimplify or routinize the problem before an effective solution is identified. This is a capacity that organizations with a commitment to resilience will develop over time. Employees in resilient organizations will create innovative responses to failures as needed, almost improvising in real time. However, they are not working in an unstructured system when they do this. They need to have both expertise regarding the portion of the organization affected by an event and the confidence to act as developed by prior empowering support from leaders and managers. The best area to demonstrate resilience is the handling of human errors and mistakes in our operations, with the ultimate objective of reducing them as low as humanly possible. To be able to do this, first we need to: 1. Clearly understand the human error concept 2. Identify human factors affecting the performance of processes 3. Establish effective barriers and corrective and preventive actions related to those human factors When an operator does not properly execute a manufacturing step, we immediately label it as human error. When we investigate the situation, inadequate training and supervision, lack of clarity in the work instruction, and multitasking can be factors behind the operator’s mistake. Human errors and mistakes are the symptoms of causal (human) factors associated with root causes that we must discover prior to solving them. To conclude this chapter, my personal reflection is that most manu facturing companies are not adequately dealing with human failures. Here are some areas in urgent need of improvement for the vast majority of regulated companies: • Distinguishing between human errors (unintentional) and violations (intentional) • Enhancing the process used during human failure investigations • Making people (at all levels) accountable

6 Investigating Human Errors

A

fter any incident involving human failure, there may be an investigation into the causes and contributing factors. Very often, little attempt is made to understand why the human failures occurred, and the tendency is to blame someone for the error. The obvious but unfortunate solution, most of the time, is to retrain, counsel, discipline, or dismiss that employee. However, finding the human factors that cause these incidents is the key to preventing similar reoccurrences through the design of effective control measures. We must realize that practically all incidents attributed to human error are symptoms of a breakdown in management systems. We must investigate “human error” as we do any other nonconformance or deviation within our quality system. We have a symptom, and we must discover its causal factors (human factors in this case) in order to identify the root cause. The main point to consider is that we are finger-pointing at some of our associates as being responsible for the situation, and we must allow them to express their opinions. Interviewing the workers involved is the most important method used to investigate human errors. When interviewing the personnel involved in the event, we are not just asking questions; we are trying to discover why this person did not follow the procedure or why this person made a decision that later created the problem. The main objective of the investigation is to gather all relevant facts. Later in this chapter there is a diagnostic tool composed of 50 questions that can be used as guidance during the investigation of human errors. The purpose of these questions is to obtain a better understanding of the human factors surrounding the issue under investigation. Answers to the questions

141

142 Chapter 6

will provide valuable information regarding the following elements of the investigation: • Identifying contributing root causes • Identifying situational factors • Identifying latent factors • Identifying absent or insufficient control barriers

THE INVESTIGATION FRAMEWORK The aim of any human error investigation is to understand what caused the operator to make an error. To understand what made the operator perform erroneously is critical to understanding the context in which the mistake occurred. Why did control barriers not prevent the error? Which latent failures contributed to the error? The steps of an adequate human error investigation are shown in the following list and described in subsequent sections: 1. Prepare the investigation plan. 2. Establish the chronology. 3. Conduct on-site investigation/perform interviews. 4. Identify contributing root causes. 5. Identify situational factors. 6. Identify latent factors. 7. Identify absent or insufficient control barriers. 8. Prepare a CAPA plan covering the findings of the investigation.

Prepare the Investigation Plan The investigation plan is probably the most important element of your investigation. An investigation is: • A systematic process of collecting relevant evidence, followed by • An assessment of the evidence gathered, followed by • A logical and reasonable determination or conclusion

Investigating Human Errors 143

To be meaningful, the investigation should be thorough, timely, unbiased, well-documented, and scientifically sound. The investigator (alone or as a member of an investigating team) is responsible for gathering all the relevant evidence or information and then using this to find the facts. But an investigation is not a trial. You are not a prosecutor or plaintiff but an impartial fact gatherer. You have a duty both to collect the information and to assess it. At the end of the process, you must report your findings in an independent and objective way. Planning is essential to ensure that: • The investigation is carried out methodically and professionally • Resources are used efficiently • The focus is maintained; you are not going back and forth or employing trial and error as you look for causes • Additional resources (for example, subject matter experts) can be made available if required • Potential factors and root causes are not overlooked The main (and best) planning tool available to an investigator is an inves tigation plan. There are a number of ways in which you may develop an investigation plan. While it is important that you start with a plan, inves tigations rarely proceed as originally predicted. You should therefore be ready to revise your plan, perhaps drastically, as new information emerges during the course of an investigation. Always follow the facts, rather than trying to make the facts fit into your plan. An investigation plan will define what you do, why you do it, and when you do it. It should include at a minimum the following elements: 1. A clear statement of the reason for the investigation 2. A summary of the factors (potential causes) of the process that may have caused/contributed to the problem 3. The type of documentation review that will be performed, including the time frame for such historical data evaluation 4. The interviews that will be performed

Establish the Chronology We recommend the use of a chronology, or timeline. It can be defined as an arrangement of events in their order of occurrence. It must include a detailed description of events leading to the problem as well as those actions taken in reaction to the problem. Time is perhaps the most important element of any

144 Chapter 6

investigation because causal factors and root causes act on a specific and determined moment. Even if they seem to appear randomly, it is important to consider all pieces of information. The objective of this analysis is to determine when the problem began. Ordering the facts of an investigation by time has two main purposes. It helps you: 1. Understand the problem 2. Write the investigation report During our workshops, participants work with case studies containing dozens of facts and dates. When completing the chronology analysis, most of them ignore at least half of those dates. Ordering the facts by time is often the only tool you need to discover the key path to the root cause.

Conduct On-Site Investigation/Perform Interviews As previously mentioned, a crucial element of a human error investigation is the gathering of information through interviews of those directly and indirectly involved in the situation. Interviews are necessary to under standing the context in which the error took place. When interviewing the personnel involved in the event, we are not just asking questions, we are trying to: • Identify contributing root causes • Identify situational factors • Identify latent factors • Identify absent or insufficient control barriers You Are Interviewing, Not Interrogating The conversation formats usually identified with a quality audit or noncon formance investigation range from an informal conversation with witnesses to a tense confrontation with the person believed responsible for the questionable action. The division between the two extremes is not a clearcut line of separation. The more distant people are from an incident, the less concerned they are about repercussions; the closer they are to the incident, the more stressed or concerned they become in talking about the situation. The concern can be due to knowledge of the action, actual involvement in the incident, or fear of being held responsible for the actions of others.

Investigating Human Errors 145

The objective of an effective interview is to gain knowledge and information that are pertinent to the investigation. The characteristics of an effective interview include the following: • It is in a nonthreatening format and the tone is nonaccusatory. • It takes a relatively short time to complete (15 minutes to one hour). • It asks open-ended questions. • It is followed by a formal written report of the conversation. Prepare Yourself before Initiating the Interview An experienced interviewer understands the correct questions to ask and grasps the answers that flow from the conversation. This particular skill is enhanced by advanced study and knowledge of the topic or process being discussed. If the investigator is ignorant of the process or the specific problem, the employee can take advantage of this lack of knowledge. The assimilation of sufficient background information will allow the interview team to quickly recognize inaccurate or inconsistent answers to pertinent questions. Knowledge and competence allow the investigator to enter the interview process with more confidence and self-assurance. Many times, the person being interviewed will phrase answers based on a perception of the knowledge possessed by the investigator. Opening the Interview The meeting is expected to begin on a friendly, yet professional, tone with no defensiveness or hostility anticipated from either party. This sense of cooperation begins with the interviewer. Arrogance, aggressiveness, and an air of superiority interfere with an investigator’s ability to solicit answers and assistance from the other party of the conversation. The interviewer should expect that some nervousness, even resentment, will be evident during an interview concerning an incident that may have cost a lot of money and even resulted in a regulatory action such as a product recall. This attitude generally subsides within a short time, and both parties can then get on with the task of resolving the problem. An initial exchange with nonspecific, generic conversation allows time for both parties to adjust to the dynamics of the beginning of the interview. The investigator should inform the interviewee of the identities of the people conducting the interview and the general nature of the investigation. Inflammatory words such as “fraud” or “violation” should be replaced by “review,” “analyze,” “examine,” “unusual things,” and other phrases that generate less tension.

146 Chapter 6

Control the Interview Process While it is important to begin the interview on a friendly, nonthreatening basis, it is just as important to maintain firm control over the interview process. The investigator is in charge and has a mission to accomplish: to resolve the topic of the investigation. The interview should be conducted in a business place with a minimum of distractions or potential interruptions. The conversation should be limited to the subject matter at hand. The investigator can come prepared with a list of questions that will serve as a reminder of the topics to be discussed and specific points to be covered. The list should not become a barrier that inhibits a smooth flow of dialogue. The next section includes a diagnostic tool that can be used during the interview to help in the process of gathering facts and data. Allow Sufficient Time to Answer the Question Many times, the inexperienced investigator will ask a question and imme diately begin to provide the answer before the other person has an oppor tunity to speak. The goal is to listen to the response, not to articulate a response. Information is gained by asking a question and listening carefully to the response. Silence can be a strong motivator for the flow of information. Once the question is asked, wait for the answer. Silence is an obvious cue that you are waiting for a response. Allow the person being interviewed to fill the silence with information. The more the person talks, the more information you will gain from the process. Additional conversation allows you to evaluate the completeness and accuracy of the responses and detect indicators of inconsistency. Be Alert to Nonverbal Communication Most people have a tendency to subconsciously react when giving inten tionally fabricated answers to questions. Reactions may be subtle but easily recognized by an experienced investigator. Investigators should be alert to the following nonverbal communication clues that may be an indication of deception: • Excessive grooming or fidgeting during the interview, especially during key questions • Avoidance of eye contact during pertinent questions • Preoccupation with other items on the desk or in the room • Excessive nervousness, heavy breathing, or fast heartbeat

Investigating Human Errors 147

The effective investigator should not place too much reliance on any one factor but should view the overall communication patterns displayed by the individual. A single gesture or motion should not be taken out of context but rather viewed as a part of an individual’s total reaction pattern. The skilled investigator must observe and analyze such points as eye movement, body gestures, and even posture in an effort to capture the full meaning of the verbal responses given to the questions. Ending the Interview Investigators should paraphrase or repeat key points that came from the dialogue. This ensures that the facts are understood and that nothing has been misinterpreted. Hostility should be avoided, and an effort should be made to ensure that the interview ends on a positive note. In closing, the investigator should mention the possibility of a follow-up contact. This might prompt an individual to “recall an important fact.” An effective technique is to reinterview a critical witness in an effort to clarify a point; this gives the person a second chance to disclose significant information that is unknown to the investigator. Many witnesses are reluctant to discuss what they really know or strongly suspect. Asking “What do you think?” allows the person being interviewed to respond. Finally, it’s essential to document the result of the interview as soon as possible and give a copy to the person interviewed for review.

Identify Contributing Root Causes A contributing cause represents the most obvious reason for the active failure. Often, more than one contributing cause may be present. Many typical root causes behind human failures are described in Chapter 7. Many of the chronic and persistent problems we face within the regulated industry are not the direct result of a single root cause. Many times, they result from a combination of causes or, even worse, from the interaction of causes. This is one of the basic reasons why the classic trialand-error methodology of problem-solving does not work most of the time. It could be acceptable just to fix the problem if you are working in other less regulated environments. For the regulated field, we must effectively fix root causes, verify that the actions already taken have worked, and generate a documentation trail covering all the phases of the problem-solving exercise.

Identify Situational Factors Situational factors can be defined as unforeseen, unexpected circumstances that play a significant role in the human error. Often, these unanticipated

148 Chapter 6

circumstances can explain why the process went wrong this one time even though the process was carried out in the usual way. Another way of explaining it is that situational factors release the risk represented by latent failures. Examples of situational factors might be an unexpected telephone call, a car accident on your way to work, a financial problem, and so on. Under any of these circumstances, the person can be more prone to have an error.

Identify Latent Factors Reason (1990) distinguishes between active and latent failures. Active failures have an immediate consequence and are usually made by frontline workers such as drivers, control room staff, or machine operators. In a situa tion where there is no room for error, active failures have an immediate impact on quality or health and safety. Latent failures, on the other hand, are made by people whose tasks are removed in time and space from operational activities (designers, decisionmakers, and managers). Latent failures are typically failures in management systems (design, implementation, or monitoring). Examples of human factors behind latent failures are: • Poor design of plant and equipment • Inadequate procedures and work instructions • Ineffective training • Inadequate supervision • Inadequate staff and resources • Ineffective communications • Uncertainties in roles and responsibilities In our regulated environment, most active failures trace back to some precondition (latent failure). We need a good tracking and trending analysis system to be able to discover what in many cases is a true cause-and-effect relationship. Chapter 2 included several examples of latent failures. Returning to Reason’s Swiss cheese model, every step in a process has the potential for failure. The system is analogous to a stack of Swiss cheese slices. Each hole is an opportunity for a process to fail, and each slice is a “defensive layer” in the process against a potential error impacting the results. An error may allow a problem to pass through a hole in one layer, but in the next layer the holes are in different places; the problem should be caught.

Investigating Human Errors 149

For a catastrophic error to occur (a plane crash or the distribution of a pharmaceutical product with incorrect label information), the holes must align for each step in the process. This allows all defenses to be defeated and results in an error. If the layers are set up with all the holes aligned, it becomes an inherently flawed system that will allow an error to become a final product defect. Each slice of cheese is an opportunity to stop an error. The more defenses we put up, the better. The fewer the holes and the smaller the holes, the more likely you are to notice errors that do occur. It is important to note that the presence of holes in any one slice of our quality system may cause a final defect problem. Usually, this happens only when the holes in many slices momentarily line up to permit the result of the error to escape our controls. Reason establishes that those holes in the defenses arise for two reasons: active failures and latent preexisting conditions, and that nearly all adverse events result from a combination of these two sets of factors. If latent failures are not identified and eliminated, reoccurrence is likely. Sometimes, fixing latent failures is not realistic or affordable. In those situations, an effective alternative can be to circumvent the latent failure by designing a strong barrier control.

Identify Absent or Insufficient Control Barriers Control barrier analysis is the evaluation of current process controls to determine whether all the current barriers pertaining to the problem you are investigating were present and effective (whether they worked or not). The origin of this concept relates to the safety field where the term “barrier” is used to mean any barrier, defense, or control that is in place to increase the safety of a system. Barrier analysis can be used both proactively, for example, performing a risk assessment using failure mode and effects analysis (FMEA), or retrospectively, for example, to perform a post-mortem incident analysis using fault tree analysis (FTA). There are two main types of barriers: physical and administrative. Physical barriers are the most reliable in terms of providing fail-safe solutions to problems. Administrative barriers are considered to be the least reliable barriers in terms of fail-safe because they rely on human action and behavior. Examples of each type of control barrier are included in Table 6.1. It is important to understand that for most “typical” human errors, barriers put in place to minimize the impact of the potential errors are the best practical option we have to make our process more robust. As important as answering why that happened is to respond why did we not detect it earlier.

150 Chapter 6

Table 6.1 Barrier controls. Physical and natural barriers • Separation between manufacturing or packaging lines • Emergency power supply • Dedicated equipment • Bar coding • Keypad controlled doors • Separate storage for components • Software that prevents going further if a field is not completed

Administrative barriers • Training and certifications • Clear procedures and policies • Adequate supervision • Adequate workload • Use of a checklist • Verification of critical task by a second person • Periodic process audits

• Redundant designs

Table 6.2 depicts an example of a typical situation where a medical product reached the market with incorrect information on its label although as many as four barrier controls were in place. Why did they fail? An egregious example of this situation was a product recalled because its label stated an expiration date of 07/2109 instead of 07/2019.

Table 6.2 Barrier controls analysis example. Current controls evaluation—why they failed

Problem

Current controls

Incorrect expiration date on label

Line clearance

Item not specifically included in the line clearance

In-process manufacturing inspection

No formal checklist for inspection

Final manufacturing inspection

No formal checklist for final manufacturing inspection

Final quality product release

No formal checklist for final quality inspection

Investigating Human Errors 151

Prepare a CAPA Plan Covering the Findings of the Investigation Identification of root causes, situational and latent factors, and inadequate or insufficient control barriers is worthless if no adequate actions are taken to fix them. Once we arrive at the probable causes behind our problems, it is time to develop an effective plan to avoid the recurrence of those causes. The best root cause investigation is worthless if the identified causes are not addressed. These plans must cover the following four sequential elements: 1. Identify corrective and preventive actions

2. Verify and/or validate corrective and preventive actions prior to implementation

3. Implement corrective and preventive actions

4. Evaluate effectiveness The elaboration of an adequate CAPA plan requires time. Most companies do not recognize this, which is one of the main reasons why most CAPA systems are ineffective. One or two weeks seems to be an adequate period in which to decide on the most effective way to address the identified root causes. During this time the investigator can evaluate where else the actions can be applied. If the corrective action is to clarify document X, the common inadequate preventive action would be to evaluate whether other documents must be clarified. The correct approach must be to perform such an evaluation during the one- or two-week period described above and then write the preventive action as “clarify documents Y and Z,” which were found to have the same kind of problem. Analyze, evaluate, assess, and so on are not adequate corrective or preventive actions.1 Establish Effective Corrective and Preventive Actions For each root cause identified, we must generate an adequate corrective and/or preventive action. The key point is that the investigator must be sure that every identified root cause is covered in the CAPA plan. Many times, there are several root causes, but corrective actions address only one of

152 Chapter 6

them. Another gray area of responsibility has to do with who should prepare the CAPA plan. The CAPA plan encompasses the identification of corrective and/or preventive actions, their verification and/or validation (prior to implemen tation), their implementation, and finally the evaluation of the plan’s effectiveness. Usually, the person best positioned to prepare a good CAPA plan is the owner of the process or system that needs to be fixed. The CAPA plan must be a team effort including subject matter experts as well as the investigator or someone from the investigation team. Each corrective or preventive action included in the CAPA plan must include a detailed description of every single action to be taken and an explanation of how the action will help avoid the reoccurrence (or occurrence, if working with a preventive action) of the identified root cause. The plan must also include a description of how the action will be validated or verified, as well as details about its implementation (when and by whom). If the implementation is not immediate (something common in our industry), some interim actions must be included to minimize the risk of reoccurrence while the corrective action is implemented. Finally, for each corrective action, we must always consider whether that action can be extended to other products, processes, and/or systems not yet affected by the identified root cause. If the answer is affirmative, a preventive action must be created to prevent the same cause from acting elsewhere. If you have root causes that already occurred, you must have corrective actions. If you can extend the corrective actions to other places, then you will also have some preventive actions. On the other hand, if you only have potential root causes, you cannot have corrective actions; therefore, only preventive actions can be implemented. Validation and Verification Prior to Implementation Once the team decides how to fix the identified root causes, it must make sure the proposed corrective and/or preventive actions will work and achieve the desired results from implementation. The CFR medical device regulation includes this requirement under §820.100(a)(4):2 “Verifying or validating the corrective and preventive action to ensure that such action is effective and does not adversely affect the finished device.” Simply stated, you do not want the cure to be worse than the disease. In simple terms, we can say that there is a lot of confusion about the meaning of this section of the regulation. The medical device regulation is considered the gold standard for CAPA. A number of companies interpret this section to require evaluation of the effectiveness of all corrective actions prior to implementation, which is impossible.

Investigating Human Errors 153

If the corrective or preventive action does affect any validated item (for example, a validated test method, a validated piece of equipment, or a validated process), then we must perform some validation work in order to secure permission to implement the action. Do not perform a validation simply because it is a corrective or preventive action; validate because your procedures and your quality system require that such action must be performed prior to implementation. If the preventive action is to change the current visual inspection to a sophisticated electronic inspection, we must validate the inspection device to ensure that the new inspection process will consistently produce a result that meets its predetermined specifications. In this specific case, it means the electronic eye will detect nonconformances with a predetermined confidence level. On the other hand, if the corrective or preventive action consists merely of a clarification of some written instruction, without a major change to a process, then a validation is not necessary. The document will be changed through a formal change control process that establishes who can change the document, who must approve it, and what training requirements are associated with it. This rigorous control of the proposed change can be considered the verification of the adequacy of the change. As part of the justification for the implementation, you must also discuss why the proposed change will not produce adverse effects on the product. If you decide to change a component, several kinds of studies will be needed prior to implementation depending on the product (stability, biocompatibility, and so on). When a proposed action affects the design of medical devices, some design verification and/or validation work may be required. FDA regulation also requires that all software changes shall be validated. The FDA’s analysis of 3140 medical device recalls conducted between 1992 and 1998 reveals that 7.7% were attributable to software failures. Of those software-related recalls, 79% were caused by software defects introduced when changes were made to the software after its initial production and distribution. Software validation and other related good software engineering practices discussed in this guidance are a principal means of avoiding such defects and resultant recalls. Therefore, the CAPA plan document must include: • A description of the actions to be taken • When it will be implemented and who is responsible for the implementation • An effectiveness evaluation: how, when, and by whom

154 Chapter 6

Implementation of Corrective and Preventive Actions A frequent observation issued by FDA inspectors is that corrective and preventive actions were not implemented. To avoid this embarrassing problem, every regulated company needs a clear accountability of respon sibilities as well as an adequate tracking system to verify the implementation of each corrective or preventive action. Effectiveness Evaluation: Verifying That Solutions Worked Finally, it is time to determine the effectiveness of corrective or preventive actions. Talking in terms of problems and solutions, we must verify that the solutions worked. Two main elements here are how and when the verification is accomplished. One of my favorite things to do at the beginning of a CAPA training session is to ask participants how the effectiveness of implemented actions can be evaluated. Most participants answer that an action is effective if the problem does not reoccur. Rarely, someone defines it correctly as the lack of reoccurrence of the root causes. Once we define what a corrective or preventive action is (the action that addresses the root cause), everyone understands that effectiveness relates to causes, not to symptoms or problems. If similar symptoms are observed, do not jump to the conclusion that the action was not effective. To be able to conclude this, you must first identify the root causes of this repeated symptom. If you reach the same cause, then you can conclude that the previous action was ineffective. If you discover that this time the problem was the result of a different root cause (a common situation), then the effectiveness of your previous action is not in question. By the same line of reasoning, sometimes you investigate a new problem and discover that the situation was created by a root cause you already fixed. In this case, you have evidence that the previous corrective or preventive action was not effective. There are also some misunderstandings related to effectiveness verification. Some companies document that the action was implemented but not whether the action worked. If the action is not implemented, it does not have a chance to be effective; the implementation verification (discussed in the previous section) is a different concept. At this point of the CAPA cycle, the quality system asks for evidence that the implemented corrective or preventive action was effective and that the intended objective was accomplished. Root causes are detected through the symptoms they produce. Therefore, the way to determine whether a corrective action was effective is to analyze the process that the root cause acted on. A typical question

Investigating Human Errors 155

here is how long it takes to verify the effectiveness of the actions. Some companies have a fixed period of time (three months, six months, or one year); others take a more correct approach by linking that period of time to the frequency of the process being fixed. A rule of thumb we recommend using is the “double-digit” rule. It requires having at least 10 repetitions of the process where the corrective or preventive action was applied prior to establishing whether the action was effective. If we use a fixed period (for example, three months) and the process is performed monthly, we will have only three results (in the best case) to determine such effectiveness. Statistically, there is a large probability that those first three repetitions are OK simply by chance even though the action did not work. By extending the evaluation to at least 10 repetitions, we increase our confidence level. With 10 good results, we can be confident that the action worked. The documentation of the effectiveness evaluation should be generated along with the rest of the CAPA plan. Once we document the implementation of the action, the only remaining (open) task from the plan will be the effectiveness evaluation. The vast majority of CAPA effectiveness plans are totally reactive. We always recommend establishing a verification method that proactively looks for measures of effectiveness. A typical situation is the implementation of a corrective action after we receive several complaints about a product. In this scenario, the action will be considered effective if no complaints are received during the next three months. A lack of complaints does not necessarily mean that we fixed the issue. A more adequate way to determine the effectiveness of the corrective action would be to monitor the next five or 10 batches produced using an appropriate sampling plan. This statistically sound sampling plan can provide an adequate confidence level about the effectiveness of the action. And yes, it is fine if you also include as a second element of this effectiveness verification some criterion regarding a reduction in customer complaints associated with the identified root cause. The specific topic of evaluation of training effectiveness is discussed in Chapter 4. Table 6.3 illustrates a simplified summary of the different elements of a human error investigation related to a medical device product having an incorrect expiration date on both the label and the packaging box.

21:50–23:45

Printing of product label and packaging box with the correct lot number (12345) but incorrect expiration date of July 2109 instead of July 2019.

Inexperienced operators using a totally manual process for calculation of expiration date. A second operator did not verify the information as required by procedure.

The last printing job of the day was performed as a rush order on overtime. Two out of the three printing area operators working this shift were from other production areas due to the absence of the two regular employees.

Time

Event

Causes

Situational factors

The area’s supervisor was taking off-site training.

Feb 4, 2018

Elements of human error investigations.

Date

Table 6.3

Labeling and packaging of the lot 12345 (850 units)

06:45–13:49

Feb 5, 2018 DHR and associated documentation audited by quality assurance personnel

16:40–17:55

Feb 6, 2018

Audit and verification processes of the DHR and associated documents for lot 12345 were performed in a “rush” mode because a special pickup service was arranged with an overnight carrier to deliver this product to the distribution center. The latest available pickup time was 18:00.

DHR and associated documentation audited by production personnel

15:30–16:35

Feb 6, 2018

Continued

First complaint received due to an incorrect expiration date

10:00

Feb 22, 2018

156 Chapter 6

Feb 4, 2018

21:50–23:45

The expiration date calculation of this family of legacy products is performed manually. The newest products use a validated spreadsheet. The procedure does not require that a second operator verify that the correct expiration date has been calculated and entered into the device history record (DHR).

The procedure requires that a second person verify that the correct information is printed. The barrier does not have enough specific details. The procedure does not require that a second operator verify the expiration date calculation.

Date

Time

Latent factors

Control barriers

Table 6.3 Continued. 15:30–16:35

Feb 6, 2018 16:40–17:55

Feb 6, 2018

Production and quality assurance procedures require the verification of the label information (lot number and expiration date) against the information included in the DHR. However, procedures do not require the recalculation of the expiration date included in the DHR by the printing operator.

06:45–13:49

Feb 5, 2018

10:00

Continued

Feb 22, 2018

Investigating Human Errors 157

The CAPA plan for this event has the following elements:

CAPA plan

Feb 5, 2018

Feb 6, 2018 15:30–16:35

Feb 6, 2018 16:40–17:55

Feb 22, 2018 10:00

• Encourage supervisors to give priority to their supervisory role over other tasks.

• Encourage supervisors to provide more supervision to less experienced workers.

• Provide supervisors enough time and resources to directly supervise their employees.

Additional actions should be taken to strengthen the supervisory function, such as:

– Change the quality assurance inspection procedures to require the recalculation of the expiration date information as part of the final DHR evaluation prior to product release to the market.

– Change the printing procedure to use the validated spreadsheet to calculate the expiration date for all products.

– Change the printing procedure to require that a second person verify the expiration date calculation.

• Corrective action:

• Correction: Recall lot 12345 due to an incorrect expiration date on its labeling.

06:45–13:49

Feb 4, 2018

21:50–23:45

Date

Time

Table 6.3 Continued.

158 Chapter 6

Investigating Human Errors 159

A DIAGNOSTIC TOOL I’ve listed 50 questions that can be used as guidance during any investi gation of human errors. The purpose of these questions is to obtain a better understanding of the human factors surrounding the issue under investigation. This is not a mere checklist, and it should be used along regular root cause analysis tools such as cause-and-effect diagrams and fault tree analysis. Answers to the questions will provide valuable information regarding the following elements of the investigation: • Identifying contributing root causes • Identifying situational factors • Identifying latent factors • Identifying absent or insufficient control barriers

Classification of Human Errors/Violations 1. Did the person unconsciously perform the task erroneously (commission error)? 2. Did the person unconsciously miss the task or any step (omission error)? 3. If this was a documentation omission, is the record designed to identify omissions easily? 4. Did the person fail to recognize a defect, message, signal, and so on (recognition error)? 5. Did the person intentionally perform the task differently than specified in the procedures or work instructions?

Procedure Quality and Content 6. Are there formal (written) instructions to perform this task? 7. Were procedures or work instructions available in the immediate area where the task was performed? 8. Did the procedures or work instructions change recently? 9. Did the person read the procedures or work instructions while executing the task?

160 Chapter 6

10. Did the procedures or work instructions use specific details rather than a qualitative description (slowly, soon, few, well, and so on)? 11. Does the person need to process or interpret information to execute this task? 12. Is there consistency in how other people are performing this task? 13. Is the error related to numbers or alphanumeric information (specifications, batch numbers)?

Training 14. Was the person formally trained on the task or procedure? 15. Is the training method appropriate for this task? Describe the method used. 16. Does the training cover this specific task? 17. How long was the training for this specific task? 18. How recently did the person receive the training? 19. Who trained this person? 20. Who trained the other people who did the task correctly? 21. How was the competence of the person assessed after completion of the training? 22. Was the training effectiveness evaluated? Describe the method used.

Situational Factors 23. Are other tasks interrupting the performance of this task? 24. Have there been any recent interventions in the area affected by the nonconformity? 25. Were all the area personnel present when the error occurred? 26. Was the supervisor or group leader present when the error occurred? 27. Did the error occur during overtime? 28. Did the error occur prior to or after a break or shift change?

Investigating Human Errors 161

29. Did the error occur prior to or after a shutdown or vacation period? 30. Was it a situation with a very tight deadline? 31. Was it a situation of competing priorities?

Latent Factors 32. Did the person work from memory while executing this task? 33. Was the person working in autopilot mode because he/she was very familiar with the task? 34. Was the person performing any other task concurrently (multitasking)? 35. Does the task require processing too much information or focusing on too many things at the same time? 36. Does the task require the person to follow a specific sequence of steps? 37. Was the area’s layout or workspace overcrowded? 38. Was the area disorganized, requiring people to have to move around to perform tasks? 39. Were the working conditions comfortable (noise, temperature, humidity, illumination [well-lit with bright light], etc.)? 40. Is the system or process intuitive enough to be understood easily?

Control Related 41. Is there any control (barrier) in place to specifically avoid this failure? 42. Has the control been challenged to determine if it is working? 43. Are there any job aids or checklists to perform this task? 44. Are their format/contents appropriate for clear interpretation? 45. Did the employee use the job aid(s)? 46. Does this task include a second person verification or checking?

162 Chapter 6

Skill Fading 47. Is this the first time the person performed this task? a. If it’s not the first time, when was the last time he/she performed this task? b. If it’s not the first time, how often does the person perform this task? 48. How many other people do the same task correctly under the same conditions? 49. Has the person performed the task correctly earlier? 50. How often did this (or similar) error occur during the last year?

7 Root Causes Related to Human Performance

T

he previous chapter noted that errors are not caused by the failure of individuals, but rather emerge from the alignment of multiple contributory system factors, each necessary and only jointly suffi cient. Human errors are the result of contributing root causes, situational factors, latent failure factors, and inadequate control barriers. All those factors should be considered as part of the true, or real, root causes directly related to the human side of manufacturing and process industry problems. Three categories encompass many of the human factors related to human errors: personal performance, human reliability factors, and management and supervision. Other categories, such as training and procedures, are also directly related to employee performance. The purpose of this chapter (and of this entire book) is to help reduce human errors in the manufacturing sector and, specifically, in FDA-regulated industries. Speaking in a wider sense, consider human errors as symptoms, whereas the human factors would be causes (root causes). However, some books on the subject of human error make a clear distinction between performance shape factors (PSFs) and root causes (Edmonds 2016). For them, PSFs such as high workload, fatigue, or noisy working environment are conditions that affect human performance, not the root causes of the error. For those authors, what PSFs do is interfere with the reliability of performing the task, making errors more likely. In other words, they made the likelihood of degraded performance greater. In the first edition (2010) and second edition (2016),1 I clearly established that one of the main pitfalls of investigations and CAPA systems is the lack of true root causes. A common problem observed in many companies is that most nonconformance investigations point to human error or procedures not followed as the root cause of the nonconformity, which are merely symptoms of deeper causes. To establish and maintain an effective investigation and

163

164 Chapter 7

CAPA system, companies must move beyond symptoms and reach the root cause level of the problem. What I have learned from investigating human errors is that, in addition to the true root cause(s) of each human failure incident, the following elements must be considered when preparing any effective CAPA plan, as discussed in the previous chapter: • Identified situational factors • Identified latent failure factors • Identified absent or insufficient control barriers Often, it’s very difficult to reach the true root cause level when dealing with human failures, and the best chance you have to avoid reoccurrence is by fixing situational or latent factors and/or implementing some kind of barrier. For this reason, some of the elements included in the following pages as root causes could also be classified as performance shape factors. Here is a personal account based on direct observation and analysis of dozens of regulated manufacturing plants. The first interesting finding, related to working instructions and records, is that medical device manufacturers have a lot of benchmarking to do with drug manufacturer peers. When you walk the floor of a classic drug or biotech manufacturing facility, you see that operators have in front of them the working instructions (batch record) corresponding with their given job. This interesting approach combines working instructions and manufacturing records into a single quality system document. As an example, if the working instruction requires a mix step between 30 and 60 minutes, the document will allow space to record when the mix started and when it stopped (time on/time off). This format has several advantages: • The operator need not memorize how many minutes the mixer must be running. • Writing down those times allows the worker and auditors to double-check that the requirement was met. • If this step is judged to be critical, a second operator can verify its correctness. • Further audit of the document (by manufacturing and finally by quality assurance) will provide additional opportunities to detect any errors before the product is released to the market. By contrast, when you walk the floor of most medical device manufacturers, no matter how high-tech their devices are (from gloves or dental floss to

Root Causes Related to Human Performance 165

highly sophisticated life-sustaining machines), you rarely see an employee reading and following such written working instructions. Most of the time, the instructions are not concurrently used during manufacturing steps. An operator deprived of these critical instruments must rely on memory to perform the task. When errors occur, employees typically receive retraining along with a warning for not following procedures. Nobody asks what happened at the exact moment when the employee had the memory lapse that created the defect. Improving working instructions and records is the first crucial step to take to reduce human errors and mistakes. Very few companies have formal training for document writers. The result is the perpetuation of ill-written procedures and working instructions filled with incorrect or incomplete information. We must recover the essence of a GMP-regulated environment. The starting point must be better manufacturing instructions. Following are just a few examples of poorly written instructions: • Mix well. • Stick together for a few seconds. • Verify all parameters. The second key observation relates to the training system. It must be the second observation, because how do you effectively train someone on a procedure or instruction that is not clear? Here are some important points: • Today, most training provided to operators and technicians is merely the reading of less-than-perfect material (SOPs, working instructions, and so on). • Almost never is there real training material (such as PowerPoint presentations, flowcharts, simulations, and so on), discussion of the material with trainees, or a simulation exercise. • Training conditions are far from ideal, taking place at the end of a work shift or conducted by less-than-adequate instructors without any pedagogical background. • Few companies have a formal process for measuring the effectiveness of training efforts, and the only factor considered during human error investigations is the existence of a training sheet sign-off. • If the employee has signed this piece of paper, training is immediately discounted as a causal factor for this event.

166 Chapter 7

The third critical item is the lack of adequate supervision. In today’s manufacturing environment within the regulated industries, it is difficult to find a supervisor who meets that definition. Supervising dozens of operators, spending hours every day in unproductive meetings, and dealing with bureaucracy (time card revision and adjustment, payroll, and so on) are just some of the reasons that explain the lack of adequate and effective supervision. We can add to this the fact that the supervisor’s office is often far from his or her workers, which makes the supervisory function even more difficult. Regarding these last two items, there is no difference between drug and device manufacturers. Both have the same urgent opportunities related to training and effective supervision. Trending of root cause categories is one of the most critical metrics management must periodically evaluate. To help with this process, here is a list I have been developing for years that includes more than 40 categories associated to human error and failures. It can help to reinforce your investigation and CAPA system in several ways: • Increase consistency across all investigations. • Facilitate consistency across the organization. • Allow trending of categories and root causes.

ROOT CAUSES AND EXAMPLES OF HOW TO FIX THEM Over the years, I have developed this list of factors and their root causes, along with examples of how to fix each one. There are five main factors: 1. Personal performance 2. Training 3. Human reliability factors 4. Procedures and instructions/task design 5. Supervision and management factors Each factor is followed by examples of how to fix the root causes. While this list is not static and may be modified to include new category, it is not meant to be used for picking a cause or causes without performing the proper root cause analysis process. However, resist the temptation to create an “other” category. You are likely to finish with most of the root

Root Causes Related to Human Performance 167

causes classified as “other,” and this defeats the purpose of the list. Once you arrive at the root cause, try to confirm it. If possible and practical, conduct a controlled experiment to verify that the root cause effectively creates the symptoms you detected.

1. Personal Performance • Lack of attention (inattention to detail, working from memory) • Continuous attitude problems • Fatigue • Lack of capability (sensory, physical, intellectual) • Personal problems • Medication problems Root cause

Inattention to detail

Example

A person incorrectly enters a value in the wrong field.

How to fix root cause

• Make certain the person reads the name of each field before entering the value. • Remove any distractions that could cause the inattention to detail, especially multitasking. • Include poka-yoke features, if feasible, such as a preprinted unit of measure for the value to be entered, the quantity of significant figures to be recorded (including decimal place if applicable), and so on.

Root cause

Working from memory

Example

A person forgets to perform a specific task because of familiarity with the task.

How to fix root cause

• Do not work from memory. • Use a formal document (procedure, work instruction, or checklist) to make certain all the required tasks are completed and document. • Follow the read–execute–document principle. • Use effective supervision to ensure people do not work from memory.

168 Chapter 7

Root cause

Continuous attitude problems

Example

A person intentionally fails to follow a procedure.

How to fix root cause

• Provide adequate direct, on-site supervision. • Use progressive discipline methods and document these actions. • Develop mechanisms to identify this kind of behavior during the hiring process.

Root cause

Fatigue

Example

A person fails to detect a specific defect because of visual fatigue due to the repetitiveness of the task.

How to fix root cause

• Rotate the person from time to time (for example, every hour) to minimize the visual fatigue. • If fatigue is caused by excessive overtime, have supervisors control it. • If fatigue is caused by personal problems, see below.

Root cause

Lack of capability (sensory, physical, intellectual)

Example

A person mixes two pieces of different colors because of the inability to distinguish between the two colors (for example, he or she cannot distinguish between black and navy blue).

How to fix root cause

• Make certain people whose tasks require visual capability are subject to visual tests at least every year. • Ensure the responsibilities placed on any one individual are not so extensive as to present any risk to quality.

Root cause

Personal problems

Example

A person falls asleep during his/her shift due to a family emergency and does not detect a defect.

How to fix root cause

• Provide adequate supervision and strict rules about notifying supervisors if the person is not in optimal condition to perform the task. • Implement the rule “If you see something, say something.”

Root Causes Related to Human Performance 169

Root cause

Medication problems

Example

A person is subject to minor surgery during the morning and goes to work during the afternoon or night shift. Then the person fails to detect a defect or performs the task incorrectly.

How to fix root cause

• Provide adequate supervision and strict rules about notifying supervisors if the person is not in optimal condition to perform the task. • Implement the rule “If you see something, say something.” • Be aware of inspectors and operators working under antihistamine and anti-allergy medication, especially during allergy season.

2. Training • Training not required • Missing training • Inadequate content (task analysis, qualification/certification) • Inadequate training method • Language barriers • Inadequate environment • Inadequate instructor • Insufficient practice or hands-on experience • Inadequate frequency (insufficient refresher training) • Training effectiveness not measured Root cause

Training not required

Example

A change to the procedure is implemented, but no training is required.

How to fix root cause

• Establish a formal (documented) process to determine whether changes to quality system documents require formal training. • Minor changes (for example, to correct a typographical error or to change the format of a document) could be implemented without training, while other changes (for example, change of an instruction) will require formal training prior to implementation.

170 Chapter 7

Root cause

Missing training

Example

A person is found without the required training. The person was relocated from a department or functional area in which that specific training was not required.

How to fix root cause

• Make certain all personnel have the required training before performing a task; this is an essential supervisory function. • Establish and maintain a system (preferably electronic) to monitor the training requirements of each employee working under a quality system. Untrained employees should not be allowed to perform GMP tasks. • Consider tying training requirements to an employee’s access card. Past-due training inactivates the access card and requires a supervisory escort of the employee until he/she completes the missing training.

Root cause

Content not adequate

Example

The content of the training does not match the requirements for the task and/or the ability of the learner. For example, the content is too theoretical, providing too much information at once, and does not include hands-on practice. The learner is unable to remember such a large amount of information.

How to fix root cause

• Perform a training needs assessment. • Then develop the content of the training based on the assessment. • Match the training content to learner capacity. • Develop training certification, on-the-job training programs, and so on. • Establish a formal program to measure training effectiveness using the Kirkpatrick model: reaction, learning, behavior, and results. • Act based on measurement results.

Root Causes Related to Human Performance 171

Root cause

Training method not adequate

Example

A person required to perform very detailed steps is trained using the “read and understand” method. The learner does not receive information regarding why the task should be done in a particular way.

How to fix root cause

• Perform a training needs assessment. • Then select the training method based on the assessment. • Match the training method to learner capacity. • Develop training certification, on-the-job training programs, and so on. • Establish a formal program to measure training effectiveness using the Kirkpatrick model: reaction, learning, behavior, and results. • Act based on measurement results.

Root cause

Language barriers

Example

A person fails to perform correctly because the terminology used during the training is not the common terminology used in the operations.

How to fix root cause

• During training, use the jargon commonly used by the people performing the task. • Involve operators in the training development. • Write instructions and procedures and provide training on them using the primary language of the population using them.

Root cause

Environment not adequate

Example

The training room is in an area with excessive personnel flow, causing too many distractions to the training attendees.

How to fix root cause

• Consider using glazed glass for the training room, or move the training room to another area. • Avoid providing training to two shifts simultaneously. People from the exiting shift will be tired and their attention span will be very limited. • Provide training to each shift/group using the best possible conditions in terms of place, time, duration, and so on.

172 Chapter 7

Root cause

Instructor not adequate

Example

People trained by the same instructor fail to perform a task correctly.

How to fix root cause

• Develop a process to qualify/certify instructors. • As part of the training evaluation, include questions about the knowledge of the instructor, the involvement of the instructor with the class, the delivery of training, and so on. • Act based on measurement results.

Root cause

Insufficient practice or hands-on experience

Example

Personnel fail to perform a task correctly because the training was too theoretical or does not include enough hands-on practice.

How to fix root cause

• Perform a training needs assessment. • Select the training method based on the assessment. • Establish a formal program to measure training effectiveness using the Kirkpatrick model: reaction, learning, behavior, and results. • Act based on measurement results.

Root cause

Training frequency not adequate

Example

Personnel forget how to perform a task because it depends on information learned a long time ago.

How to fix root cause

• Consider skill fading. • Perform a training risk assessment, and establish a frequency for periodic refreshing of significant training. • Verify training effectiveness using the Kirkpatrick model: reaction, learning, behavior, and results.

Root cause

Training effectiveness not measured

Example

There is no mechanism in place to measure the effectiveness of the training program.

How to fix root cause

• Establish a formal program to measure training effectiveness using the Kirkpatrick model: reaction, learning, behavior, and results. • Act based on measurement results.

Root Causes Related to Human Performance 173

3. Human Reliability Factors • Inadequate location of equipment • Inadequate identification of equipment, materials • Cluttered or inadequate layout • Inadequate environmental conditions (cold, hot, poor illumination, and so on) • Inadequate housekeeping • Stress conditions (rush) • Excessive workload • Excessive calculation or data manipulation • Multitasking Root cause

Inadequate location of equipment

Example

A person must go to a mezzanine level in order to read a display, memorize the value, and return to the original floor level to write the information in the record using a desktop computer.

How to fix root cause

• Relocate the display so the person does not have to access the mezzanine to read the display, or provide a portable device to record the value directly at the mezzanine.

Root cause

Inadequate identification of equipment or materials

Example

A person registers the wrong equipment used on the record because the equipment identification label has been practically erased during the cleaning process.

How to fix root cause

• Provide a permanent way to identify the equipment, such as engraving the equipment identification information. • If documentation is performed electronically, provide a drop-down menu to select the piece of equipment used instead of having to enter the entire identification information of the equipment. • Even better, identify the equipment with a barcode, and perform the documentation of the equipment used with a barcode reader connected to the electronic documentation system.

174 Chapter 7

Root cause

Cluttered or inadequate layout

Example

A person fails to package all required components in a device kit because the layout of the packaging area is too cluttered and there is constant interference from other personnel in the assembly line.

How to fix root cause

• Redesign the layout to provide clear spatial separation between personnel and products. • Establish an appropriate line clearance process to avoid product mix-ups.

Root cause

Uncomfortable environmental conditions (cold, hot, poor illumination)

Example

A person is not able to detect a specific defect because of inadequate room illumination.

How to fix root cause

• Provide adequate room illumination, plus magnifier lamps at the workstation. • Provide adequate environmental conditions, or allow for frequent breaks if conditions are too extreme (very cold or very hot).

Root cause

Inadequate housekeeping

Example

GMP documents are frequently misplaced or lost in a specific area of the manufacturing department because all documents are stored in the same place without any kind of identification or indexing.

How to fix root cause

• Perform a 5S (sort, straighten, shine, standardize, sustain) kaizen event, and implement the required improvements for proper document identification and indexing.

Root cause

Stress conditions (rush)

Example

A product was shipped using an incorrect and nonvalidated configuration because a rush shipment was performed over a weekend by personnel who normally do not prepare the shipment of product.

How to fix root cause

• Provide a checklist to make certain all required components and activities are performed. • Effective supervision should minimize these situations and take control of them when they occur. • Extra control should be exercised by supervisors/ managers when this kind of exceptional situation emerges.

Root Causes Related to Human Performance 175

Root cause

Excessive workload

Example

An error is produced because of absent personnel on the assembly line, and the task must be completed with fewer people than required for the task.

How to fix root cause

• Provide cross-training to other personnel in order to balance the workload in the event that people are absent at a specific moment. • When there is a requirement of verification by a second person for a specific step, instruct people to not execute unless the second person is present. • Effective supervision should minimize these situations and take control of them when they occur.

Root cause

Excessive calculation or data manipulation

Example

A person makes a wrong calculation because too many mathematical operations have to be completed in order to obtain the final value.

How to fix root cause

• Develop a template in which only the required values are entered, and the spreadsheet calculates the final value. • Validate the spreadsheet prior to implementing it.

Root cause

Multitasking

Example

A person forgets to register the value obtained from a specific task because that person was performing multiple tasks simultaneously.

How to fix root cause

• Avoid performing more than one task at the same time. • Use a checklist to document when a task is performed. • Have a second person verify that all tasks have been completed as per requirements.

4. Procedures and Instructions/Task Design • Lack of procedure or instruction • Procedure not required • Unavailable or difficult to obtain procedure • Difficult-to-use procedure • Ambiguous or confusing procedure

176 Chapter 7

• Lack of sufficient detail • Inadequate document format • Incomplete instructions • Wrong instruction • Typographical error(s) • Obsolete document • Document not approved

Root cause

Lack of procedure or instruction

Example

A person is performing a task, and the results have high variability. After interviewing the person, it was found that no formal procedure was developed to perform the task.

How to fix root cause

• Write a procedure.

Root cause

Procedure not required

Example

A person claims that a procedure is not mandatory because it is written using the word “should,” which means it is a recommendation.

How to fix root cause

• Avoid the use of “should” in a procedure. Instead, use the word “must” to provide an instruction.

• Evaluate other tasks lacking procedures, and implement procedures for those tasks.

• Use an imperative tone when writing procedures and instructions. Root cause

Procedure not available or difficult to obtain

Example

Procedures and instructions are in electronic format only, but the electronic documentation system is frequently down for maintenance and repair.

How to fix root cause

• Have backups of the procedure in paper format, especially during the time the computer system is down.

Root Causes Related to Human Performance 177

Root cause

Procedure difficult to use

Example

A person finds a procedure difficult to follow because the wording used is not the common language (jargon) used by the people performing the task.

How to fix root cause

• Involve the people who are going to execute the task in the development of the procedure. • Write the procedure using the primary language and jargon of the users.

Root cause

Decision not to use or follow document

Example

A person decides not to use a procedure because it is considered to be inadequate by some of the people performing the task.

How to fix root cause

• This is not an error but a violation (intentional). • Provide adequate supervision to make certain people are following the procedures. • Instruct the workers that if they feel the procedure is inadequate, the situation must be channeled to the personnel responsible to review and make changes to the procedures.

Root cause

Ambiguous or confusing instructions

Example

A person fails to follow an instruction because the sentence is too long and includes too many action verbs.

How to fix root cause

• Use adequate grammar, including the possibility of using numeric lists whenever an instruction with many actions is given. • When preparing working instructions or procedures, follow the five principles described in Chapter 4 of this book: – Clarity – Readability – Coherence – Economy – Correctness

178 Chapter 7

Root cause

Lack of sufficient details 

Example

Two glued pieces fall apart because of lack of adhesiveness. The procedure says, “Stick together for a few seconds.”

How to fix root cause

• Be more precise when writing procedures.

.

• Provide measurable instructions. • Avoid using words such as “few,” “slowly,” “well,” and so on when providing instructions.

Root cause

Document format not adequate

Example

A person fails to correctly perform the task because the instruction uses too many levels to provide the instructions. For example, one of the steps is: 4.15.3.8.16.7.10.1.22(b): Clean the workstation after….

How to fix root cause

• Try to avoid adding too many levels when providing instructions. • Keep a maximum of four or five levels. • Be sure to provide spaces to document all required information.

Root cause

Incomplete instructions

Example

The procedure provides an instruction to do a task under a certain condition but does not provide instructions under other conditions. For example, the procedure says, “If the temperature reaches 40 °C, stop the heater.” But it does not mention what to do while the temperature is below 40 °C.

How to fix root cause

• The word “if” suggests various paths of action. The right approach is to use the “if, then, else” format. • For example, the above example could have been written in this way: “If the temperature reaches 40 °C, stop the heater and the mixer. Otherwise, keep monitoring the temperature and agitating the vessel at 25 ± 2 rpm until 40 °C is reached.”

Root Causes Related to Human Performance 179

Root cause

Wrong instructions

Example

Procedure calls to perform an activity that is not actually required.

How to fix root cause

• Verify the correct content of any procedure or work instruction prior to approval of the document. • Use a checklist to verify that appropriate instructions are provided. Make a double verification before releasing the instructions

Root cause

Typographical error(s)

Example

A person incorrectly manufactured a product due to an error in the instruction: the requirement was to add 10 L of solution A instead of 10 mL. During a recent change to the manufacturing instructions, the letter “m” was inadvertently deleted, resulting in the addition of 1000 times the required amount of solution A.

How to fix root cause

• Verify all units of measures, quantities, components, and so on during the approval process of any document. • Use the spell checker before approving any document.

Root cause

Obsolete document

Example

A person uses an old revision of a document kept in hard-copy format near the equipment

How to fix root cause

• Do not allow printed copies of documents at the workstation. • If a document needs to be printed, a footer or watermark with the printed document usable date must be printed on the document. • Supervisors must enforce and verify that their areas are free of old copies of any kind of GMP documentation.

Root cause

Document not approved

Example

A document without all the approval signatures is used.

How to fix root cause

• Do not allow the use of GMP documents not fully approved. • Supervisors must enforce and verify that their areas are free of unapproved GMP documentation.

180 Chapter 7

5. Supervision and Management Factors • Verbal instructions/communication problem • Inadequate communication between shifts • Inadequate supervision • Improper resource allocation (lack of personnel)

Root cause

Verbal instructions/communication problem

Example

A person fails to follow a verbal instruction provided because that person is working from memory.

How to fix root cause

• Provide important information formally, using written instructions rather than verbal instructions. • Do not work from memory. • Use a checklist to make certain all the tasks are completed.

Root cause

Inadequate communication between shifts

Example

An emergency intervention on a failing piece of equipment was performed during a shift, but it was not communicated to the next shift. It was written in a logbook, but the logbook was not read by the next shift’s personnel. The equipment failed again because the new shift was not aware of the changes.

How to fix root cause

• Make certain that major issues during a shift are not only documented in a logbook but also formally communicated to the next shift.

Root Causes Related to Human Performance 181

Root cause

Inadequate supervision

Example

A manufacturing supervisor was in her office answering an audit report. She told operators to contact her if a problem arose. The operators did not want the supervisor to think they did not know how to proceed; therefore, when an unexpected situation happened, they made a decision that damaged the batch in process.

How to fix root cause

• Provide supervisors with enough time and resources to directly supervise their employees. • Encourage supervisors to discuss nonroutine job tasks such as product rework. • Encourage supervisors to provide more supervision to less-experienced workers. • Encourage supervisors to give priority to their supervisory role over other tasks.

Root cause

Lack of supervision

Example

A third-shift production crew was without a supervisor for almost two months. Several batches were damaged during this period, caused by exceeding the time limits of specific manufacturing steps.

How to fix root cause

• The supervisory function is one of the most important in the manufacturing industry. • See Chapter 4,“Adequate Supervision and Staffing”

Root cause

Improper resource allocation (lack of personnel)

Example

A person made an error due to work overload because some personnel were absent during the shift.

How to fix root cause

• Cross-train personnel in different tasks so the production lines can be balanced with other personnel in situations when there are absences. • When there is a requirement of verification by a second person for a specific step, instruct people to not execute unless the second person is present. • Provide supervisors with an adequate number of employees to effectively complete the tasks assigned for the shift. • Effective supervision should minimize these situations and take control of them when they occur.

8 Risk Assessment of Human Errors

M

anaging risk is a basic requirement in every industrial sector, including manufacturing. Operational risks must be defined, evaluated, and controlled under such general principles as ALARP (as low as reasonably practicable) or for some specific critical sector (nuclear, medical products, and so on), AFAP (as far as possible). Inadequate or incorrect human performance is one of the most common hazards to safety and quality risk. There are many examples of how human performance at various levels in an organization has introduced risk. Most of the incidents discussed in Chapter 1 represent well-known examples of this. On the other hand, we also have many examples of human performance successfully overcoming unexpected events, such as the problem-solving activities carried out to successfully return the Apollo 13 spacecraft to earth. In a perfect world, the goal would be zero human error, but our world in general, and the manufacturing sector in particular, is far from perfect. Human error does exist, and we must recognize that it represents a risk to the quality and safety of our processes. Traditional approaches to reducing production errors relied on personnel selection and training without any remarkable emphasis on real prevention. All too often, the operators are blamed for making errors and producing defects, when, in fact, the poorly designed work system itself is error-inducing. The first step toward reducing (and preventing) human error is to identify its causes correctly. Human error should be evaluated by assessing the frequency of each type of human error, the severity of its consequences, and the degree of risk that results. In the manufacturing environment, a risk management approach should be used to deal with human errors, just as we should do with any other source of product nonconformities or process noncompliance. Although risk management tools have been intensively used in assessing and miti gating the impact of human failures in safety, very little attention has been dedicated to their impact on quality. 183

184 Chapter 8

An important point to consider is that operators with poor motivation or emotional problems can commit numerous unintentional errors and com promise the quality and/or safety of their jobs, hence the importance of effective supervision to allow early detection of those error-inducing factors. We can use human factors knowledge to manage risk based on two main principles: 1. How potential human failures could contribute to risk 2. How to mitigate such risks, including the best utilization of people to reduce risk

IS PREVENTION POSSIBLE? PROBLEMS WITH ERROR REDUCTION AND PREVENTION If the occurrence of error is in some sense random, it will be practically impossible to prevent a particular error on a particular occasion. However, the majority of human error situations are not random occurrences, and, even in those cases, many factors affect their frequency. By controlling those factors, we can certainly reduce the number of such errors within a given time period. Very often the reliability of systems can be improved by redundancy and parallelism. An operator inspecting a process may fail to detect a defect once in 100 times. If we have two independent operators doing the same inspection, the chance of both failing is, theoretically, only 1 in 10,000. High system reliability can be achieved if the user is prepared to pay for redundancy. A very interesting point is brought up by Senders and Moray.1 They established that “redundancy does not always work to reduce human error. A problem that has been largely ignored in the study of human error is that behavior is seldom solitary. Most discussion of error concentrates on the individuals, but social interactions change the equation considerably, in ways that are difficult to predict.” An example of this situation is when medical products with incorrect or erroneous labels reached the market despite a very redundant inspection process. How can five or six different people miss an obvious error? Some explanation can be found in Chapter 2 under recognition errors. Some companies initiate campaigns to “eliminate human error.” We must recall that eliminating human error is impossible unless we eliminate humans first.

Risk Assessment of Human Errors 185

If error-proneness exists, it may not be a serious problem in complex industrial environments because chronically error-prone people would not stay a long time there because they would get fired or moved. We can reduce both the frequency and the severity of human errors, although eliminating them is probably impossible. Typically, in the manu facturing industry, there is an overemphasis on documentation reevaluation and redesign, but new documentation is no cure-all. Many companies feel that a significant economic investment in new documentation leads to an errorless system, but errors do not disappear. When we redesign a process to eliminate error-inducing features, then the number and frequency of errors will decrease. That should be the case, but only with respect to the old errors. A redesigned process or system is not the old process minus the defective features. It is a new system, with new opportunities for new and different errors to occur.

RISK MANAGEMENT TOOLS Risk management may be undertaken in varying degrees of depth and detail and using one or several methods ranging from simple to complex. The form of assessment and its output should be consistent with the risk criteria developed as part of establishing the context. In general terms, a suitable technique should have the following characteristics: • It should be justifiable and commensurate to the situation under consideration. • It should provide results in a form that enhances understanding of the nature of the risk and how it can be treated. • It should be capable of being used in a manner that is traceable, repeatable, and verifiable.

Failure Mode and Effects Analysis (FMEA) Within the medical product manufacturing setting, few tools are more appropriate than failure mode and effects analysis (FMEA). In one of its guidances for industry (Q9 Quality Risk Management), the FDA summarizes some of the most common risk management tools. Among those tools, they include the FMEA. Specifically, in section I.2, they establish that: FMEA provides for an evaluation of potential failure modes for processes and their likely effect on outcomes and/or product performance. Once failure modes are established, risk reduction

186 Chapter 8

can be used to eliminate, contain, reduce, or control the potential failures. FMEA relies on product and process understanding. FMEA methodically breaks down the analysis of complex processes into manageable steps. It is a powerful tool for summarizing the important modes of failure, factors causing these failures, and the likely effects of these failures.2 Furthermore, the guide mentions that FMEA can be used to prioritize risks and monitor the effectiveness of risk control activities. Besides its definition and areas of application, the guide does not provide a methodology to carry out an effective FMEA. For that reason, a systematic approach is needed. That approach will consist of three interrelated tools: the process map, the cause and effect matrix (prioritization matrix), and the FMEA. FMEA is an analysis technique that facilitates the identification of potential problems in the design or process by examining the effects of lower-level failures. Recommended actions or mitigation provisions are made to reduce the likelihood of the problem occurring and mitigate the risk if, in fact, it does occur. The FMEA determines the effect of each failure mode and identifies single failure points that are critical. It may also rank each failure according to the criticality of a failure effect and its probability of occurring. Therefore, the failure mode, effects, and criticality analysis (FMECA) is the result of two steps: 1. Failure mode and effects analysis (FMEA) 2. Criticality analysis (CA) FMECA is just FMEA with criticality analysis, and they are often used interchangeably. From now on, when I use the acronym FMEA I am referring to both FMEA and FMECA. There are many different “flavors” of FMEA. They are inductive techniques using the question “What happens if …?” Elements of the process (for example, components) are analyzed one at a time, generally looking at a single-fault condition. This is done in a “bottom-up” mode, following the procedure to the next higher functional system level. FMEA provides for an evaluation of potential failure modes of processes and their likely effect on outcomes and/or product performance. Once failure modes are established, risk reduction can be used to eliminate, contain, reduce, or control the potential failures, thus creating preventive actions to minimize risks. FMEA relies on product and process understanding. FMEA method ically breaks down the analysis of complex processes into manageable steps. It is a powerful tool for summarizing the important modes of failure,

Risk Assessment of Human Errors 187

the factors causing these failures, and the likely effects of these failures. FMEA can be used to prioritize risks and monitor the effectiveness of risk control activities. FMEA can be applied to equipment and facilities, and it might be used to analyze a manufacturing operation and its effect on a product or process. It identifies elements/operations within the system that render it vulnerable. The output/results of FMEA can be used as a basis for design or further analysis, or to guide resource deployment. FMEA application in the medical product industry can be used for failures and risks associated with manufacturing processes; however, it is not limited to this application. The output of an FMEA is a relative risk “score” for each failure mode, which is used to rank the modes on a relative risk basis. FMEA can provide input into other analysis techniques such as fault tree analysis at either a qualitative or quantitative level.

Risk Management is More Than FMEA Since the first notions and requirements for risk analysis of medical products appeared, many regulated companies (if not most of them) started to use the FMEA methodology to accomplish those new tasks. While this tool can have many good uses, there are also several typical misuses. The most notorious (and widely practiced) is to believe that using only FMEA for the entire risk management process is appropriate. FMEA is a good risk analysis tool, but alone it cannot sustain the risk management program. During audits and inspections, many regulated companies often present their FMEAs to inspectors as their whole risk management system. A sound and effective risk management program definitely needs more than good FMEA. Comment 83 in the FDA’s Preamble to QSR 21 CFR 820 states that: When conducting a risk analysis, manufacturers are expected to identify possible hazards associated with the design in both normal and fault conditions. The risks associated with the hazards, including those resulting from user error, should then be calcu lated in both normal and fault conditions. If any risk is judged unacceptable, it should be reduced to acceptable levels by the appropriate means, for example, by redesign or warnings. An important part of risk analysis is ensuring that changes made to eliminate or minimize hazards do not introduce new hazards. Tools for conducting such analyses include failure mode effect analysis and fault tree analysis, among others. The standard FMEA process addresses only fault condition hazards and not normal condition hazards. Another shortcoming of this tool is that it

188 Chapter 8

addresses only single-fault condition hazards. In addition, the standard FMEA is not the best tool for documenting risks that are not failure modes. Here are two examples related to the incorrect application of the FMEA tool in the medical product manufacturing industry: 1. Lack of integration of the FMEA process with a performance test. An organization received customer complaints about its tubing, which was assembled incorrectly. As part of the investigation, the organization conducted a performance test, the result of which showed that the incorrect assembly did not affect the product. But the organization did not update its FMEA, which stated that the incorrect assembly compromised the functionality of the device. Thus, it left itself vulnerable to future occurrences of the same issue. 2. Lack of definition of the FMEA indexes for severity, occurrence, and detectability. An organization received customer complaints about device failures, so it rushed to develop a reactive FMEA based solely on the major cause of the complaints: contamination. When it developed the scales to rank severity, occurrence, and detectability, it did not provide a rationale for each scale. The organization only provided the indexes and the calculated risk priority number (RPN). In summary, FMEA alone is not enough to guarantee either product safety or quality. While FMEA is a proactive tool within an organization’s risk management system (mainly used during the risk analysis stage), it is a dynamic component that must be handled correctly using the following steps: • Perform an FMEA using a teamwork approach. • Use a systematic method to complete the FMEA with help from a process map and cause and effect matrix. • Prioritize the order in which the process inputs will be evaluated using the FMEA tool. • Sort RPNs in descending order. • Identify all potential root causes. • Identify and implement CAPAs for each root cause. • Recalculate RPNs after CAPAs are implemented, not before. Doing so will ensure that your FMEA is as effective as possible and your products are safe in the hands of the people they are meant to help.

Risk Assessment of Human Errors 189

Fault Tree Analysis (FTA) Fault tree analysis (FTA) is a type of analysis in which a failure is analyzed using Boolean logic (and/or) to combine a series of lower-level events (causal factors) until we reach their root causes.3 This analysis method was originally developed to quantitatively determine the probability of a safety hazard in the field of safety engineering. FTA provides a method of breaking down chains of failures. A key addition permits the identification of combinations or interactions of events that cause other failure events (see the example in Figure 8.1). There are two types of interaction: 1. Several items must fail together to cause another item to fail (“AND” combination) 2. Only one of a number of possible events must happen to cause another item to fail (“OR” combination) Mix-up between product X and product Y

AND

Filling machine not cleaned after filling product X

Line clearance prior to filling product Y did not detect the lack of cleaning

OR

Cleaning procedure inadequate

Cleaning procedure not performed

OR

Wrong cleaning method used

Figure 8.1

Cleaning procedure is incorrect

Fault tree analysis example.

Cleaning process performed incorrectly

190 Chapter 8

The AND/OR symbols are called gates. They prevent the failure event above them from occurring unless their specific conditions are met. When several factors must happen simultaneously (AND relationship), we can avoid the failure simply by controlling one of them (the easiest or the cheapest). When any of several causal factors can create the failure, then we must fix all of them. The tree is constructed working backward from a known event or failure and asking why it happened. The answer will represent the factor that directly caused the failure. Continuing with the why questioning will allow us to reach fundamental events or root causes. In other words, FTA is a very good tool to help us understand how an event occurred. It is best used when working with complex issues with several interrelated causes of failure. FTA is a deductive, top-down approach to failure mode analysis aimed at analyzing the possible causes of an undesired event or failure. This contrasts with FMEA, an inductive, bottom-up analysis method aimed at analyzing the effects of single-component or single-function failures in equipment or systems. In terms of the CAPA system, we can define FTA as a reactive investigation tool (the failure already happened). FMEA should, ideally, be used proactively (during the design phase of a process) to antici pate failure modes and generate preventive actions. FTA is a technique for identifying and analyzing factors that can contribute to a specified undesired event (called the “top event”). In a deductive manner, starting with the top event, the possible causes or fault modes of the next lower functional system level causing the undesired consequence are identified. The factors identified in the tree can be events that are associated with component hardware failures, human errors, or any other pertinent events that lead to the undesired event. FTA can be used to establish the pathway to the root cause of the failure. It can be used to investigate complaints or deviations in order to fully understand their root cause and ensure that intended improvements will fully resolve the issue and not lead to other issues (solve one problem yet cause a different problem). FTA is an effective tool for evaluating how multiple factors affect a given issue. The output of an FTA includes a visual representation of failure modes. It is useful both for risk assessment and in developing monitoring programs.

HUMAN RELIABILITY ANALYSIS (HRA) Human reliability analysis (HRA), sometimes referred to as human reliability assessment, is a general term applied to any systematic method of

Risk Assessment of Human Errors 191

quantifying the impact of human factors on risk. It is not possible to address system reliability without considering the failure rate of all its components, especially the most error-prone of all: the people. Human error probabilities or human reliabilities are useful not only in estimating man–machine system reliability, but also in other human error activities, such as allocating functions between man and machine, quantifying the error likelihood and consequences of human-engineered equipment, and estimating the success of personnel training programs. HRA is used to: • Predict the errors people could make while completing a task • Quantify the impact of these errors • Determine ways to reduce the likelihood or consequences of error A broad methodology for HRA that defines best practices when using any HRA tool was described by Kirwan.4 It includes: 1. Problem definition 2. Task analysis 3. Error identification 4. Representation 5. Screening 6. Quantification 7. Impact assessment 8. Error reduction 9. Quality assurance 10. Documentation

QUANTITATIVE AND QUALITATIVE ANALYSIS Specific tools and techniques for performing HRA fall into two categories: 1. Quantitative approaches attempting to estimate the probability of each error, before identifying potential error reduction measures 2. Qualitative approaches describing the potential error, the consequences, and potential error prevention measures

192 Chapter 8

Both approaches can be used independently or in combination. Both have benefits and drawbacks, and the selection of one specific approach over the other will depend on the project. The availability of subject matter experts, the quantity and complexity of the tasks to be analyzed, or any particular time constraints are typical factors to consider when selecting the specific approach to be used.

Quantitative Approaches In safety engineering, the quantification of failures is a common requirement. For hardware, one can predict the mean time between failures (MTBF) and can provide probabilities to describe the reliability of the system. In addition to hardware and software, a crucial part of the system is the people who operate and maintain it. Therefore, to fully assess the reliability of the system, it is necessary to calculate the probability of human failure as well. There are several quantitative (also known as first-generation methods) HRA tools, including: • Technique for human error-rate prediction (THERP) • Cognitive reliability and error analysis method (CREAM) • Human error assessment and reduction technique (HEART) • Nuclear action reliability assessment (NARA)

Qualitative Approaches The aim of these methods is to describe what error might happen and how the potential error could be reduced. Reduction techniques include either enhancing detection and recovery of the situation or preventing the error by design or other actions. Some of the most-used qualitative tools include: • Generic error modeling system (GEMS) • Systematic human error reduction and prediction approach (SHERPA) • Technique for retrospective analysis of cognitive errors (TRACEr) Qualitative HRA methods are also referred to as second-generation methods. HRA methods originated a half-century ago, but most techniques have been developed since the mid-1980s. The first-generation methods are mostly behavioral approaches, while the second-generation methods (developed during the 1990s) are mostly conceptual. Over time, scientists gradually abandoned the quantitative approach in favor of greater attention

Risk Assessment of Human Errors 193

to the qualitative assessment of human error. The focus shifted to the cognitive aspect of humans, the causes of error rather than their frequency. Qualitative approaches include the relationship of human error with human factors, trying to explain how these human factors exert their effect on performance. Among the most popular and effectively used methods is THERP, which is described later in this chapter. NASA Quantification of Human Error Rates for Mission Control Flight Controller A 2010 study performed by NASA concluded that with well-trained flight controllers at mission control, the likelihood of errors in sending commands to the International Space Station (ISS) ranged from approximately 0.1 (10 –1) to 0.0001 (10 –4). These errors included selecting the wrong procedure to use, or sending the wrong command to the ISS, and were affected by working conditions such as fatigue, time pressure, and cognitive overload (Edmonds 2016). Measuring Human Error Probabilities in Drug Preparation: A Pilot Simulation Study As mentioned in an earlier chapter, in a study by Garnerin et al. (2007), volunteer nurses and anesthetists from a Swiss hospital were used to analyze human error probabilities (HEP). It focused on two core activities, namely, the manual preparation of medications and the arithmetic necessary to prepare drugs. Its specific objectives were to evaluate whether HEPs could be high enough to be measurable and to determine whether these HEPs could be sensitive to individuals and task details. One experiment involved 30 nurses and 28 anesthetists who had to prepare medications for 20 patients and 22 syringes of various drugs, respectively. Both groups had to perform 22 calculations relating to the preparation of drugs. HEPs, distribution of HEPs, and dependency of HEPs on individuals and task details were assessed. The result of this experiment showed that in the preparation tasks, overall HEP was 3% for nurses and 6.5% for anesthetists. In the arithmetic tasks, overall HEP was 23.8% for nurses and 8.9% for anesthetists. A statistically significant difference was noted between the two groups. In both preparation and arithmetic tasks, HEPs were dependent on individual nurses but not on individual anesthetists. In every instance, HEPs were dependent on task details. The conclusion of this study illustrates that small-scale simulations represent an interesting way of generating HEPs. HEPs are, indeed, in the

194 Chapter 8

range of 0.1 (10 −1) and (0.01) 10 −2. But in most cases, HEPs depend heavily on operators and task details.

THERP The technique for human error-rate prediction (THERP) is a technique used in the field of HRA for the purpose of evaluating the probability of a human error occurring throughout the completion of a specific task. From such analyses, measures can then be taken to reduce the likelihood of errors occurring within the system and therefore lead to an improvement in the overall levels of safety. There are three primary reasons for conducting an HRA: 1. Error identification 2. Error quantification 3. Error reduction THERP models HEPs using a fault tree approach, in a similar way to an engineering risk assessment, but also accounts for performance shaping factors (PSFs) that may influence these probabilities. The probabilities for the human reliability analysis event tree (HRAET), which is the primary tool for assessment, are nominally calculated from the database developed by the authors Swain and Guttmann (1983), or local data may be used instead. The resultant tree portrays a step-by-step account of the stages involved in a task, in a logical order, and it simultaneously manages a number of different activities, including task analysis, error identification, and representation in the form of HRAET and HEP quantification. The technique was developed in the Sandia Laboratories for the U.S. Nuclear Regulatory Commission. Its primary author is Allan D. Swain, who developed the THERP methodology. THERP relies on a large human reliability database that contains HEPs and is based on both plant data and expert judgments. The technique was the first approach in HRA to come into broad use and is still widely used in a range of applications even beyond its original nuclear setting. The methodology for the THERP technique is broken down into five main stages: 1. Define the system failures of interest. These failures include functions of the system where human error has a greater likelihood of influencing the probability of a fault, and those of interest to the risk assessor; operations in which there may be no interest include

Risk Assessment of Human Errors 195

those not operationally critical or those for which there are already safety countermeasures. 2. List and analyze the related human operations, and identify human errors that can occur and relevant human error recovery modes. This stage of the process necessitates a comprehensive task and human error analysis. The task analysis lists and sequences the discrete elements and information required by task operators. For each step of the task, possible errors are considered by the analyst and precisely defined. The possible errors are then considered by the analyst, for each task step. Such errors can be broken down into the following categories: • Errors of omission. This involves leaving out a step of the task or the whole task itself. • Errors of commission. This involves several different types of errors: • Errors of selection. This is an error in the use of controls or in issuing of commands. • Errors of sequence. The required action is carried out in the wrong order. • Errors of timing. The task is executed before or after when required. • Errors of quantity. There is an inadequate amount or an excess. The opportunity for error recovery must also be considered, as this, if achieved, has the potential to drastically reduce the error probability for a task. 3. Estimate the relevant error probabilities. HEPs for each subtask are entered into the tree. It is necessary for all failure branches to have a probability; otherwise, the system will fail to provide a final answer. HRAETs provide the function of breaking down the primary operator tasks into finer steps, which are represented in the form of successes and failures. This tree indicates the order in which the events occur and also considers likely failures that may occur at each of the represented branches. The degree to which each high-level task is broken down into lower-level tasks is dependent on the availability of HEPs for the successive individual branches. The HEPs may be derived from a range of sources, such as the THERP database, simulation data, historical accident

196 Chapter 8

data, and expert judgment. PSFs should be incorporated into these HEP calculations. The primary source of guidance for this is the THERP handbook. However, the analyst must use his/her own discretion when deciding the extent to which each of the factors applies to the task. 4. Estimate the effects of human error on the system failure events. With the completion of the HRA, the human contribution to failure can then be assessed compared to the results of the overall reliability analysis. This can be completed by inserting the HEPs into the full system’s fault event tree, which allows human factors to be considered within the context of the full system. 5. Recommend changes to the system and recalculate the system failure probabilities. Once the human factors contribution is known, sensitivity analysis can be used to identify how certain risks may be improved in the reduction of HEPs. Error recovery paths may be incorporated into the event tree, as this will aid the assessor when considering the possible approaches by which the identified errors can be reduced.

Advantages • It is possible to use THERP at all stages of design. Furthermore, THERP is not restricted to the assessment of designs already in place, and due to the level of detail in the analysis, it can be specifically tailored to the requirements of a particular assessment. • THERP is compatible with probabilistic risk assessments (PRAs); the methodology of the technique means that it can be readily integrated with fault tree reliability methodologies. The THERP process is structured and provides a logical review of the human factors considered in a risk assessment; this allows the results to be examined in a straightforward manner and assumptions to be challenged. • The technique can be utilized within a wide range of different human reliability domains and has a high degree of validity. • It is a unique methodology in the way that it highlights error recovery, and it also quantitatively models a dependency relationship between the various actions or errors.

Risk Assessment of Human Errors 197

Disadvantages • THERP analysis is very resource intensive and may require a large amount of effort. • Compared to some other human reliability assessment tools such as HEART, THERP is a relatively unsophisticated tool as the range of PSFs considered is generally low and the underlying psychological causes of errors are not identified. • Large discrepancies have been found in practice regarding different analysts’ assessments of the risk associated with the same tasks. Such discrepancies may have arisen from either the process mapping of the tasks in question or in the estimation of the HEPs associated with each of the tasks using THERP tables compared to, for example, expert judgment or the application of PSFs. • The methodology fails to provide guidance to the assessor on how to model the impact of PSFs and the influence of the situation on the errors being assessed. • The THERP HRAETs implicitly assume that each subtask’s HEP is independent of all others. • THERP is a “first generation” HRA tool, and in common with other such tools, has been criticized for not taking adequate account of the cognitive aspect of humans.

9 How to Reduce the Probability of Human Error

H

uman errors appear in many forms and have a wide variety of causes. It is therefore not surprising that no single, universally applicable error-reducing technique is available. Human reliability specialists must rely on a wide range of remedial tools to find the best, most appropriate method(s) for their immediate needs. The study of error is basically an applied study. We would like to develop ways of reducing or eliminating errors, and of reducing or eliminating the impact of errors, on our manufacturing processes. We need to understand the nature of cognitive tasks and construct models of human–machine interfaces based on our knowledge of cognitive processes. One of the main areas for improvement is the design of processes and systems. A properly designed system must take into account the properties of the people who use it. There should be no mismatches between the cognitive, psychological, social, and physiological characteristics of the human operators and the design and response characteristics of the system. In his 2013 revised and expanded edition of the book The Psychology of Everyday Things, Donald Norman explored how manmade objects and procedures offer affordance for error. The term affordance refers to the basic property of objects that shape the way in which people react to them. He says: Given the mismatch between human competencies and techno logical requirements, errors are inevitable. Therefore, the best designs take that fact as given and seek to minimize the oppor tunities for errors while also mitigating the consequences. Assume that every possible mishap will happen, so protect against them. Make actions reversible; make errors less costly.

199

200 Chapter 9

Design principles for minimizing error affordances include: • Put the knowledge required to operate the technology in the physical world. Don’t require that all the knowledge must be in one’s head. Knowledge in the head is information contained in human memory. Knowledge in the world is information provided as part of the environment in which a task is performed. Mistakeproofing involves changing the physical attributes of a process, and mistake-proofing devices can usually be photographed. Mistakeproofing is one way of putting knowledge into the world. • Use the power of natural and artificial constraints: physical, logical, semantic, and cultural; exploit the power of forcing functions and natural mapping. • Bridge the two gulfs—the gulf of execution and the gulf of evaluation. Make things visible, both for execution and evaluation. On the execution side, provide feedback; make the results of the action apparent. Make it possible to determine the system’s status readily, easily, accurately, and in a form consistent with the person’s goals, plans, and expectations. • Humans are generally proficient in detecting their own errors of action. Good design can help catch those errors (see the example of manufacturing documentation included in this chapter). • The complete elimination of human error is as impossible as the complete elimination of machine failure; there will always be an irreducible residual. However, performance that is nearly error-free is certainly possible.

HIERARCHY OF ACTIONS Organizations now employ a wide range of error-reducing and error-containing techniques, which fall into one of five main error-management strategies: 1. Error prevention aims at avoiding the occurrence of errors. This is possible only in some specific cases and, almost without exception, requires design-based solutions. 2. Error reduction aims at minimizing both the likelihood and magnitude of the error.

How to Reduce the Probability of Human Error 201

3. Error detection aims at making errors apparent as quickly and as clearly as possible, therefore enabling recovery. An error can be: a. Detected by the person who committed the error (self-monitoring) b. Cued by the environment c. Detected by another person 4. Error recovery aims at making it easy to rapidly recover the system to its safe state after an error has been committed. 5. Error tolerance aims at making the system as tolerant as possible toward error, for example, minimizing the consequences of errors. Preventing errors is usually hard to achieve. Therefore, research efforts put more emphasis on error management. The first step in successful error management is to understand the nature of the errors experienced and the mechanisms behind them. Real solutions to human error require systemic improvements in the operation. One way consists of improving working conditions, procedures, and knowledge in order to reduce the likelihood of error and improve error detection. Another way is to build more error tolerance into the system; that is, limit the consequences of errors. Achieving such systemic solutions requires adopting an organizational focus on error management instead of focusing on the individuals committing the errors. The following basic principles are arranged in descending order, from a high degree of effectiveness to virtually no effect in the reduction and control of human errors: 1. Eliminate the source of error. a. Design for error. b. Mistake-proof processes. c. Simplify the process, reduce handoffs. d. Standardize tasks. 2. Control the opportunity for error with physical barriers. a. Use constraints and forcing functions. b. Reduce reliance on memory. 3. Mitigate the consequences of an error. 4. Ensure detectability of errors before damage occurs.

202 Chapter 9

5. Establish procedural guidance. 6. Provide instructions (written, brief, specific, clear). Remember that unsupervised employees do not always follow instructions. 7. Maintain supervisory control and monitor for errors. 8. Provide adequate training to provide general and specific information. However, training may be remote in time. 9. Have technical manuals and documentation available for reference. 10. Include warnings to alert for residual risks after the use of other remedies. This will be effective if well designed, read or heard, understood, and not disregarded.

THE CHECKLIST MANIFESTO: ANOTHER CHECKLIST? A checklist is essentially a list of action items arranged in sequential order that affords the user the ability to monitor task progress while the steps are being completed. Checklists can thus be considered a memory retrieval aid, providing guidance as to which steps must be completed to accomplish the task while simultaneously serving as a verification tool allowing the user to identify which steps remain outstanding in the task sequence.1 Existing literature is laden with research findings that suggest the importance of checklist usage in the workplace. Given that pilots and astronauts are exposed to inherently high-risk work environments, much of the research on the use of checklists has focused on the fields of aviation and aeronautics. Aviators consider the checklist as a form of flight protocol, which must be strictly adhered to. Failure to do so constitutes a protocol violation, or pilot error.2 Indeed, earlier studies have shown that 33% of jetliner accidents are caused by pilots deviating from established opera tional procedures.3 Checklists exist for various phases of a typical flight, including preflight checks, engine start checks, takeoff checks, as well as landing and shutdown checks. These ensure consistency and coherence in normal flight operations. Checklists have also been employed in emergency situations, allowing the flight crew to systematically address the problem without compromising safety.4 Given the importance of checklists in ensuring flight safety, it is no wonder why the aviation industry adopts a serious view on the use of checklists. Despite the success of checklist usage in error reduction, aircraft manufacturers have strived to constantly seek improvements to existing

How to Reduce the Probability of Human Error 203

checklist systems. For example, the implementation of the Boeing 777 Electronic Checklist further decreased errors by up to 46% when compared to previous paper-based versions of the checklist (Boorman 2001). Hence, it can be seen that the use of checklists in the aviation industry has led to improved work processes and best practice adherence, which have largely improved flight safety standards (Hales and Pronovost 2006). Similar findings have also surfaced in the medical and healthcare industry. One study5 found that using checklists in clinical care resulted in heightened compliance in various practices in response to patients admitted for acute myocardial infarction, leading to an improvement of key outcomes by as much as 55%. The use of checklists has also been found to reduce infection rates through the administration of proper antibiotics, as well as increased operation success rates through improved standardization of practices and better communication between staff (Gawande 2009). As with the aviation industry, checklists are poised to exert greater influence on work processes, amounting to more efficient practices in healthcare. It is important to consider exactly how checklists serve as cognitive aids in procedural tasks. It may explain the usefulness of checklists. Essentially, the grouping of information in an organized manner has been found to improve recall, a phenomenon commonly referred to as chunking.6 In the case of aviation, cockpit checks are broken down according to the different phases of flight, making it more manageable for the flight crew to handle all the necessary procedures without compromising safety. Additionally, instructions presented as a list are often better understood and remembered when compared to continuous prose. Thus, a checklist serves as a learning aid and a memory retrieval cue, allowing the user to execute the procedure in a more optimal and efficient manner. One of my favorite books in the field of human error is The Checklist Manifesto: How to Get Things Right by Atul Gawande, published in 2009.7 In this book, Gawande explains how checklists help you focus on your work instead of remembering each upcoming task. This way, you save time and make sure you’re doing things right, with less risk of skipping any crucial steps. Gawande offers great insight, using real-world experiences, from the hospital’s operating rooms to environments with zero margins for error, such as flying a plane or constructing buildings. We humans are highly skilled, expertly trained, and hardworking. Yet our failures remain frequent. Simple mistakes (either the result of unforeseen complications or the failure to follow through on a task) cost companies millions of dollars a year. Sometimes, they even cause planes to fall out of the sky. The truth is, our overloaded brains can only hold on to so much information. The volume

204 Chapter 9

and complexity of what we know have exceeded our ability to consistently and correctly put it to use. Gawande believes that in an age of relentless technological complexity, where the most basic steps can easily be overlooked, a simple technique— making a process list to follow when completing a given task—can serve as a revolutionary tool to help us avoid error and get things right. In 2008, the World Health Organization agreed to work with Gawande’s thesis, bringing the checklist concept to eight hospitals around the world. During the next six months, deaths fell by 47% and post-surgical complications dropped by 36%. “In medicine,” Gawande writes, the issue is “making sure we apply the knowledge we have consistently and correctly.” Failure isn’t from ignorance, it’s from not properly applying what we know. Maybe you’re not sure that a checklist would be useful in your work environment. However, one thing Gawande’s research makes abundantly clear: from medicine to homeland security to investment banking to building construction, checklists can be a game changer. In addition to the book mentioned in this chapter, Atul Gawande is a very prolific author of very interesting books. Some of them are included in the bibliography section for the curious reader.

MISTAKE-PROOFING Mistake-proofing is the use of process or design features to prevent errors or the negative impact of errors. Mistake-proofing is also known as pokayoke, Japanese slang for “avoiding inadvertent errors.” Shigeo Shingo8 formalized mistake-proofing as part of his contribution to the production system for Toyota automobiles. Today, many manufacturing organizations have embraced the usefulness of mistake-proofing concepts to reduce human errors. Although it was formalized by Japanese manufacturers in the 1960s (and published in English in the 1980s), mistake-proofing did not start in Japan, and its utility was not limited to factories. Inventors, designers, and problem-solvers led by common-sense implemented mistake-proofing devices long before the 1960s. There are four basic approaches to mistake-proofing: 1. Mistake prevention in the work environment 2. Mistake detection (Shingo’s informative inspection) 3. Mistake prevention (Shingo’s source inspection) 4. Preventing the influence of mistakes

How to Reduce the Probability of Human Error 205

Let’s explore each of these four approaches in greater detail.

Mistake Prevention in the Work Environment This approach involves reducing complexity, ambiguity, vagueness, and uncertainty in the workplace. An example is to have only the set of instructions for the task to be performed rather than having many different instructions applying to different stages or phases of the manufacturing process. When only the applicable set of instructions is provided, workers are unable to accidentally read inappropriate or incorrect instructions. In another example, similar items with right-hand and left-hand orientations can sometimes lead to wrong-side errors. If the design can be altered and made symmetrical, no wrong-side errors can occur; whether the part is mounted on the left or right side, it is always correct. The orientation of the part becomes inconsequential. Likewise, any simplification of the process that leads to the elimination of process steps ensures that none of the errors associated with that step can ever occur again. Norman (1998; 2013) suggests several process design principles that make errors less likely. He recommends avoiding wide and deep task structures. The term “wide structures” means that there are lots of alter natives for a given choice, while “deep structures” means that the process requires a long series of choices. Humans can perform either moderately broad or moderately deep task structures relatively well. Humans have more difficulty if tasks are both moderately broad and moderately deep, meaning there are lots of alternatives for each choice, and many choices to be made. Task structures that are very broad or very deep can also cause difficulties. More of Norman’s recommendations are summarized in Table 9.1. Table 9.1 Control strategies. Strategies

Action

Natural mappings

Design one-to-one physical correspondence between the arrangement of controls and the objects being controlled.

Affordances

Provide guidance about the operation of an object by providing features that allow or afford certain actions.

Visibility

Make observation of the relevant parts of the system possible.

Feedback

Give each action an immediate and obvious effect.

Constraints

Provide design features that either compel or exclude certain actions. Constraints may be physical, semantic, cultural, or logical in nature.

206 Chapter 9

Another method of mistake prevention in the work environment is the implementation of “visual systems,” also known as 5S. The term comes from Japanese manufacturing, in which the 5Ss are seiri (organization), seiton (orderliness), seiso (cleanliness), seiketsu (standardization), and shitsuke (discipline). Visual systems involve sharing information in the work environment visually. Individuals in the work environment should be able to “know by looking.” A visual workplace is a work environment that is self-ordering, self-regulating, and self-improving because of visual devices. 1. Seiri (organization) focuses on removing unneeded items from the workplace. Items that are actually used all the time are segregated from those that are superfluous. Unneeded items are tagged and removed to a holding area to await alternate allocation or disposal. 2. Seiton (orderliness) involves arranging needed items so they are easy to find, use, and put away. Often, the focus of these efforts is to minimize motion. 3. Seiso (cleanliness) involves making sure the workplace is clean and stays clean on a daily basis. This step reduces the visual “noise” that would impede communication. 4. Seiketsu (standardization) focuses on maintaining and institutionalizing organization, orderliness, and cleanliness. It includes preventive steps that reduce the effort required to maintain the improvements already made. 5. Shitsuke (discipline) involves avoiding a return to the comfortable behavior of the past. It focuses on aligning the culture and habits of the organization with its new approach to organizing work.

Mistake Detection Mistake detection identifies process errors found by inspecting the process after actions have been taken. Often, immediate notification that a mistake has occurred is sufficient to allow remedial actions to be taken to avoid harm. Shingo called this type of inspection informative inspection. The outcome or effect of the problem is inspected after an incorrect action or omission has occurred. Informative inspection can also be used to reduce the occurrence of incorrect actions. This can be accomplished by using data acquired from the inspection to control the process and inform mistake-

How to Reduce the Probability of Human Error 207

prevention efforts. Another informative inspection technique is statistical process control (SPC). SPC is a set of methods that uses statistical tools to detect whether the observed process is being adequately controlled. To learn more, I recommend the book Mistake-Proofing for Operators: The ZQC System, published by Productivity Press and based on the work of Dr. Shingo.9 Shingo identifies two other informative inspection techniques: successive checks and self-checks. Successive checks consist of inspections of previous steps as part of the process. Self-checks employ mistakeproofing devices to allow workers to assess the quality of their own work. Self-checks and successive checks differ only in who performs the inspection. Self-checks are preferred to successive checks because feedback is more rapid. Not all mistake-proofing is equally useful. Mistake prevention is usually preferred to mistake detection. Similarly, forced control, shutdown, warning, and sensory alert are preferred, in that order. The preferred devices tend to be those that are the strongest and require the least attention and the least discretionary behavior by users. Table 9.2 depicts a comparison between mistake prevention and mistake detection controls.

Mistake Prevention Mistake prevention identifies process errors found by inspecting the process before taking actions that would result in quality defects. The word inspection as it is used here is broadly defined. The inspection could be accomplished by physical or electronic means with or without human involvement. Shingo called this type of inspection “source inspection.” The source or cause of the problem is inspected before the effect, an incorrect action or omission, can actually occur. Donald Norman’s concept of forcing functions is also included in mistake prevention. He calls them forcing functions because they are designed to force, or ensure, that correct actions occur.

Preventing the Influence of Mistakes Preventing the influence of mistakes means designing processes so the impact of errors is reduced or eliminated. This can be accomplished by facilitating correction or by decoupling processes.

208 Chapter 9

Table 9.2 C omparison of mistake prevention and mistake detection controls. Regulator function

Mistake prevention

Mistake detection

Forced control

Physical shape and size of object or electronic controls detect mistakes being made and stop them from resulting in incorrect actions or omissions.

Physical shape and size of object or electronic controls detect incorrect actions or omissions before they can cause harm.

Shutdown

The process is stopped before mistakes can result in incorrect actions or omissions.

The process is stopped immediately after an incorrect action or omission is detected.

Warning

A visual or audible warning signal is given that a mistake or omission is about to occur. Although the error is signaled, the process is allowed to continue.

A visual or audible warning signal is given that a mistaken action or omission has just occurred.

Sensory alert

A sensory cue signals that a mistake is about to be acted on or an omission made. The cue may be audible, visible, or tactile. Taste and smell have not proved to be as useful. Sensory alerts signal mistakes but allow the process to continue.

A sensory cue signals that a mistake has just been acted on or an omission has just occurred.

Facilitating correction could include finding easy and immediate ways of allowing workers to reverse the errors they commit. While doing things right the first time is still the goal, effortless error corrections can often be nearly as good as not committing errors at all. This can be accomplished through planned responses to errors or the immediate reworking of processes. Typewriters became obsolete technology because typing errors are so much more easily corrected on a computer. Errors that once required retyping an entire page can now be corrected with a couple of keystrokes. Software that offers “undo” and “redo” capabilities also facilitates the correction of errors. Word processors also include an auto-correct function. These features significantly increase the effectiveness of users. They did not come into being accidentally but are the result of intentional, purposeful design efforts based on an understanding of the errors that users are likely to make.

How to Reduce the Probability of Human Error 209

Automotive safety has been enhanced by preventing the influence of mistakes. Air bags do not stop accidents. Rather, they are designed to minimize injuries experienced in an accident. Antilock brakes also prevent the influence of mistakes by turning a common driving error into the correct action. Prior to the invention of antilock brakes, drivers were instructed not to follow their instincts and slam on the brakes in emergencies. To do so would increase the stopping distance and cause accidents due to driver error. Pumping the brakes was the recommended procedure. With antilock brakes, drivers who follow their instincts and slam on the brakes are following the recommended emergency braking procedure. What once was an error has become the correct action. Decoupling means separating an error-prone activity from the point at which the error becomes irreversible. Software developers try to help users avoid deleting files they may want later by decoupling. Pressing the delete button on an unwanted email or computer file does not actually delete it. The software merely moves it to another folder named “deleted items,” “trash can,” or “recycling bin.” If you have ever retrieved an item that was previously “deleted,” you are the beneficiary of decoupling. Regrettably, this type of protection is not yet available when saving work. Files can be overwritten, and the only warning may be a dialogue box asking, “Are you sure?”

EXAMPLES OF EFFECTIVE ACTIONS Following are some real examples of very effective (and simple) actions taken to address very frequent errors. Information contained in these examples was collected from some of my customers in the medical product manufacturing industry, both within and outside the United States.

Redesign of Documentation Forms A company had frequent deviations during the documentation of manufac turing activities. The required information was either missed or incorrectly documented, affecting practically all batches. Actually, there were 3.7 documentation errors per batch, on average. Those documents contained a mix of instructions and spaces to document activities. Those pages were very cluttered with text, making it difficult to decide where to record the required information. In some cases, there was a requirement to document an activity, but no physical space was provided in the batch record to do it, resulting in missing necessary information. My work was the redesign of these batch records, starting with a basic, common template where all spaces for required

210 Chapter 9

information were placed on the right side of the paper, each one within a light-gray-colored box. The location (at the right side) and the form (a colored box) helped to easily identify where to document the required information. It also helped reviewers and process auditors to easily identify any missing information. If documents could be printed in color, it was recommended to fill these boxes with a yellow or light blue color to make them even more poka-yoke.

Improving Laboratory Documentation A quality control laboratory within a small manufacturing company had frequent deviations in its documentation of a test performed on each batch of product. Omission and commission errors (see Table 9.3) plagued the process. Fixing those errors had a significant impact on the timely approval and release of each batch. Correction of these errors not only significantly delayed the release of product, but it also represented an important compliance risk due to the intense oversight the FDA places on pharmaceutical laboratory activities. Our solution was to design a form in Excel that would be used to document the QC activities for each batch. The spreadsheet format allows for simple but powerful controls to avoid omissions, and practically eliminates the commission errors. Once the quality control activities were completed, the Excel form was printed, verified, and manually signed. Omission errors were eliminated because if the form had any required field without information, it could not be printed (poka-yoke feature). The potential for incorrect information was also minimized by using drop-down menus to select information (for example, the pieces of laboratory equipment, such as scales, pH meters, and other equipment).

Rework Information In some cases, manufacturing process instructions include instructions and space to document the potential rework of the product. If the specific batch is not reworked, then the space is crossed out as nonapplicable. I was impacted several years ago by the case of a company in which the batch record for its blockbuster drug included 45 pages to document potential rework. It brought the total page count to almost 300. The rework process was rarely used (less than 0.5% of the batches were reworked), but those pages were a constant source of deviations due to pages being missing, inappropriately crossed out, or mistakenly used to document the regular manufacturing process of the batch. The solution to the problem was easy: rework pages were removed from the batch record and included as part of the rework procedure that is used only when the batch needs to be reworked.

How to Reduce the Probability of Human Error 211

Table 9.3 Omission and commission errors in laboratory documentation. Omission errors

Commission errors

• Lack of identification of pieces of equipment (pipettes, balances, pH meters, and so on) used to perform tests

• Incorrect identification of equipment

• Lack of identification of the QC technician performing the task

• Incorrect identification of materials used to perform tests • Use of expired solutions and materials

• Lack of identification of materials • Used of noncalibrated pieces used to perform tests (standards, of equipment raw chemicals, and so on)

To finish this chapter, we will discuss three documents related to human error reduction in the healthcare field. The first one is the report from the U.S. Institute of Medicine (IOM) “To Err is Human,” and the recommendations it presented to reduce error10. The other two studies cover the critical field of preventing drug administration errors during anesthesia. The message in “To Err is Human” is that preventing death and injury from medical errors requires dramatic, systemwide changes. Among three important strategies (preventing, recognizing, and mitigating harm from error), the first one (recognizing and implementing actions to prevent error) has the greatest potential effect, just as in preventive public health efforts. Opportunities to improve safety have been drawn from numerous disciplines such as engineering, psychology, and occupational health. The IOM report brought together what had been learned in these fields and then applied the opportunities to healthcare, as described in the nine categories that follow: 1. User-centered design 2. Avoid reliance on memory 3. Attend to work safety 4. Avoid reliance on vigilance 5. Train concepts for teams 6. Involve patients in their care 7. Anticipate the unexpected 8. Design for recovery 9. Improve access to accurate, timely information

212 Chapter 9

1. User-Centered Design Understanding how to reduce errors depends on framing likely sources of error and pairing them with effective ways to reduce them. The term “usercentered design” builds on human strengths and avoids human weaknesses in processes and technologies. The first strategy of user-centered design is to make things visible (including the conceptual model of the process) so the user can determine what actions are possible at any given moment, for example, how to return to an earlier step, how to change settings, and what is likely to happen if a step in a process is skipped. Another principle is to incorporate affordances, natural mappings, and constraints into healthcare. Although the terms are strange, their meaning can be surprisingly easily applied to common, everyday tasks, both in and out of the workplace. An affordance is a characteristic of equipment or work space that communicates how it is to be used, such as a push bar on an outward opening door that shows where to push, or a telephone handset that is uncomfortable to hold in any but the correct position. Marking the correct limb before surgery is an affordance that has been widely adopted. Natural mapping refers to the relationship between a control and its movement, for example, in steering a car to the right, one turns the wheel right. Other examples include using louder sound or a brighter light to indicate a greater amount. Constraints and forcing functions guide the user to the next appropriate action or decision. A constraint makes it hard to do the wrong thing. A forcing function makes it impossible to do the wrong thing. For example, one cannot start a car that is in gear. Forcing functions include the use of special luer locks for syringes and indwelling lines that must be matched before fluid can be infused, and different connections for oxygen and other gas lines to prevent their being inadvertently switched. Removing concentrated potassium chloride from patient units is a (negative) forcing function because it should never be administered undiluted, and preparation should be done in the pharmacy.

2. Avoid Reliance on Memory The next strategy is to standardize and simplify the structure of tasks to minimize the demand on working memory, planning, or problem-solving, including the following two elements: Standardize processes and equipment. Standardization reduces reli ance on memory and allows newcomers who are unfamiliar with a given process or device to perform the process or use a device safely. For example, standardizing device displays (for example, readout units), oper a tions, and doses is important to reduce the likelihood of error. Other examples

How to Reduce the Probability of Human Error 213

of standardizing include standard order forms, administration times, prescribing protocols, and types of equipment. When devices or medications cannot be standardized, they should be clearly distinguishable. For example, one can identify look-alike, but different, strengths of a narcotic by labeling the higher concentration in consistent ways, such as by shape and promi nent labeling. When developed, updated, and used wisely, protocols and checklists can enhance safety. Protocols for the use of anticoagulants and perioperative antibiotics have gained widespread acceptance. Laminated dosing cards that include standard order times, doses of antibiotics, formulas for calculating pediatric doses, and common chemotherapy protocols can reduce reliance on memory. Simplify key processes. Simplifying key processes can minimize problem-solving and greatly reduce the likelihood of error. Simplifying includes reducing the number of steps or handoffs that are needed. Examples of processes that can usually be simplified are writing an order, then transcribing and entering it in a computer, or having several people record and enter the same data in different databases. Other examples of simplification include limiting the choice of drugs and dose strengths available in the pharmacy, maintaining an inventory of frequently prepared drugs, reducing the number of times a day a drug is administered, keeping a single medication administration record, automating dispensing, and purchasing equipment that is easy to use and maintain.

3. Attend to Work Safety Conditions of work are likely to affect patient safety. Factors that contribute to worker safety in all industries studied include work hours, workload, staffing ratios, sources of distraction, and shift changes (which affect one’s circadian rhythm). Systematic evidence about the relative importance of various factors is growing, with particular emphasis on nurse staffing.

4. Avoid Reliance on Vigilance Individuals cannot remain vigilant for long periods of time. Approaches for reducing the need for vigilance include providing checklists and requiring their use at regular intervals, limiting long shifts, rotating staff, and employing equipment that automates some functions. The need for vigilance can be reduced by using signals such as visual and auditory alarms. Also, well-designed equipment provides information about the reason for an alarm. There are pitfalls in relying on automation, such as if a user learns to ignore alarms that are often wrong, becomes inattentive or

214 Chapter 9

inexpert in a given process, or if the effects of errors remain invisible until it is too late to correct them.

5. Train Concepts for Teams People work together throughout healthcare systems in multidisciplinary teams, whether in a practice, for a clinical condition, or in operating rooms, emergency departments, or ICUs. In an effective interdisciplinary team, members come to trust one another’s judgments and expertise and attend to one another’s safety concerns. Team training in labor and delivery and hospital rapid response teams are examples. The IOM committee believed that whenever it is possible, training programs and hospitals should establish interdisciplinary team training.

6. Involve Patients in Their Care Whenever possible, patients and their family members or other caregivers should be invited to become part of the care process. Clinicians must obtain accurate information about each patient’s medications and allergies and make certain this information is readily available at the patient’s bedside. In addition, safety improves when patients and their families know their condition, treatments (including medications), and technologies that are used in their care. At the time of discharge, patients should receive a list of their medications, dosages, dosing schedule, precautions about interactions, possible side effects, and any activities that should be avoided, such as driving. Patients also need clear written information about the next steps after discharge, such as follow-up visits to monitor their progress and whom to contact if problems or questions arise. Family caregivers deserve special attention in terms of their ability to provide safe care, manage devices and medication, and safely respond to patient needs. Yet they may themselves be affected by physical, health, and emotional challenges, lack of rest or respite, and other responsibilities (including work, finances, and other family members). Attention is now being given to problems resulting from a lack of patient and family health literacy. For example, information may be too complex to absorb or in a language unfamiliar (even to educated and English-speaking patients) and frightening. A simple example is rapidly giving instructions on home care of a Foley catheter when, as often occurs, the patient is being discharged shortly after surgery and knows nothing about sterile technique or the design of the device. Another ubiquitous example is the warnings

How to Reduce the Probability of Human Error 215

and dosage information on medication bottles, which many patients cannot understand how to apply.

7. Anticipate the Unexpected The likelihood of error increases with reorganizations, mergers, and other organizationwide changes that result in new patterns and processes of care. Some technologies, such as computerized physician order entry systems, are engineered specifically to prevent error. Despite the best intentions of designers, however, all technology introduces new errors, even when its sole purpose is to prevent errors. Indeed, future failures cannot be forestalled by simply adding another layer of defense against failure. Safe equipment design and use depend on a chain of involvement and commitment that begins with the manufacturer and continues with careful attention to the vulnerabilities of a new device or system. Healthcare professionals should expect any new technology to introduce new sources of error and should adopt the custom of automating cautiously, always alerting to the possibility of unintended harm, and should test these technologies with users and modify them as needed before widespread implementation.

8. Design for Recovery The next strategy is to assume that errors will occur and to design and plan for recovery by duplicating critical functions and making it easy to reverse operations and hard to carry out nonreversible ones. If an error occurs, examples of strategies to mitigate injury are keeping antidotes for highrisk drugs up to date and easily accessible and having standardized, wellrehearsed procedures in place for responding quickly to adverse events. Another strategy is to use simulation training, where learners practice tasks, processes, and rescues in lifelike circumstances using models or virtual reality.

9. Improve Access to Accurate, Timely Information The final strategy for user-centered design is to improve access to information. Information for decision-making (for example, patient history, medications, and current therapeutic strategies) should be available at the point of patient care. Examples include putting lab reports and medication administration records at the patient’s bedside and putting protocols in the patient’s chart. In a broader context, information is coordinated over time and across settings.

216 Chapter 9

Another example is the study published in 2004 by Jenson and colleagues11 that recommended a 12-point strategy to prevent medication errors during anesthesia and critical care: 1. The label on any drug ampoule or syringe should be read carefully before the drug is drawn up or injected. 2. Legibility and contents of labels on ampoules and syringes should be optimized according to agreed-on standards with respect to font, size, color, and information. 3. Syringes should always be labeled. 4. Formal organization of drug drawers and work space should be used, with attention to tidiness, position of ampoules and syringes, separation of look-alike drugs, and removal of dangerous drugs from the operation room. 5. Labels should be checked specifically with the help of a second person or a device like a barcode reader before administration. 6. Error during administration should be reported and reviewed. 7. Management of inventory should focus on minimizing the risk of drug error. 8. Look-alike packaging and presentation of the drug should be avoided where possible. 9. Drugs should be presented in prefilled syringes rather than ampoules. 10. Drugs should be drawn up and labeled by the anesthesia provider himself or herself. 11. Color coding by class of drugs should be according to an agreed-on national or international standard. 12. Coding of syringes according to position or size should be done. In another study published in 2010, Kothari et al.12 presented a dozen meas ures to promote safe drug administration during anesthesia and critical care: 1. The provision of all labels in a standardized format to emphasize the class and generic name of each drug incorporating the barcode and class-specific color code as per international standard

How to Reduce the Probability of Human Error 217

2. The use of a barcode reader to scan the drug at the point of administration immediately before it is given, linked to an auditory prompt to facilitate checking of the drug identity 3. Integration of scanned information into an automated anesthesia record and reducing the cognitive load on the anesthetist 4. The use of devices at the point of care to automatically measure the dose of the administered drug 5. A dosing nomograph on the infusion syringe label to avoid the need for lookup tables or dose calculations 6. An automated medication dispensing system with features such as single-issue drawers and barcode scanners to facilitate safer dispensing of drugs 7. Elimination of extended physician work schedules 8. Computerized physician order entry 9. Implementation of a support system for clinical decisions 10. Computerized intravenous devices 11. Active participation of pharmacists in the ICU 12. Medication reconciliation All these examples share the same basic components, from mistakeproofing to avoiding multitasking and reliance on memory or vigilance.

10 Selected Topics

T

o conclude, I’d like to discuss various topics that address the use of retraining in the CAPA context. The definition of retrain is to train again. Every time I see retraining under the corrective or preventive action sections of CAPA documentation, I ask myself the same questions: “What is the difference between this (re)training and the original training? If a person did not follow a procedure, why is retraining the solution?” Perhaps it would be better to determine why the procedure was not followed.

HUMAN ERRORS AND RETRAINING If retraining is the corrective action, the original training must be the root cause of the problem we are trying to fix (remember the definition of root cause). In other words, our original training was not effective. If we retrain with the same content, the same instructor, and the same conditions, why would it be effective this time? The root cause of lapses and slips (see Chapter 7) is rarely associated with training. A hypothetical example can illustrate this: One operator forgot to perform a step during the manufacture of a batch. The same employee prepared dozens of batches of the same product during the previous months, the last one only three days ago. Due to the “error” he made today, he receives a retraining. Does he really need a retraining? I think not. Training or retraining is the appropriate corrective action when the human error is a knowledge-based mistake (see Chapter 2). Sometimes, staff retraining is misused as a preventive action for such incidents. It cannot be a preventive action because, in any case, it will avoid reoccurrence, which is the definition of corrective action. Discussing the situation with other employees to make them aware of the situation can be considered a preventive action because it tries to prevent the (first-time) occurrence of the situation (for those other employees). 219

220 Chapter 10

During our training sessions, we use the analogy that the overuse of the human error and retraining combination is killing our training system because we are assigning blame to our training system for all those human failures without objective evidence. The word retraining is often substituted by refresher, awareness, counseling, orientation, and so on, but all of them point to the same inefficient and inadequate corrective or preventive actions. This reliance on the human error and retraining combination is not adequate. Human errors are sharp indicators of the presence of underlying problems in the quality system that cannot (and will not) be properly solved by retraining. Therefore, my recommendation is to think twice the next time you are concluding that a person made an error when he/she did not follow a procedure and retraining him/her will avoid the reoccurrence of the same situation.

WORKING FROM MEMORY The ability to remember information is an essential part of human infor mation over a long period of time. The memory necessary for operating procedures differs from memory for recent events, such as remembering a grocery shopping list. Three distinct memory storage subsystems exist: 1. Sensory memory holds information from the eyes (iconic memory) and ears (echoic memory) for a very short period of time, less than a second, after which it must enter short-term memory or be lost.1 2. Short-term memory deals with events that have just recently occurred (within seconds or minutes). It stores transitory information temporarily and almost immediately recalls it to make decisions. For example, after reading a phone number, the digits are dialed immediately. 3. Long-term memory stores information for a very long period of time without the need for practice. It involves the integration and recall of information acquired over longer periods of experience, practice, and training. The total amount of information that can be retained in the long-term memory is very large, although there are marked individual differences. The ability to recall also deteriorates with age. When you walk through manufacturing or quality control (QC) areas, you rarely see operators or analysts reading procedures or working from instructions. Frequently, they work from memory, which is problematic because our memories fail very often. When working from memory,

Selected Topics 221

operators must remember sequences of operations, the meaning of different stimuli, and the responses to make. Memory gaps and errors are frequent and can occur due to several different reasons. As the retention interval between memorizing and retrieval of the memory lengthens, there is an increase in both the amount of information that is forgotten and the likelihood of a memory error occurring. Lack of attention and memory play a significant role in all categories of human errors. Slips, lapses, and mistakes are all more common when situational factors divert our attention. However, in the regulated industries, these factors should be negligible because we are not supposed to rely on our memory to remember how to do things. Batch records, device master records, and device history files exist for one purpose. It has been found that increased working memory load leads to higher error rates.2 Moreover, specific error types have also been identified. In procedures that require subtasks to be completed in a specified order, inter ruptions increase the likelihood that these subtasks are executed in the wrong order, and such errors have been named sequence errors. Various types of sequence errors exist, depending on the nature of the actions executed following an interruption.3 Perseveration errors occur when a previous step is repeated, whereas anticipation errors occur when the next correct step to be executed is skipped altogether. Some procedures also require a post-completion step to be executed at the end of the task sequence, and it has been found that people are more likely to omit this step if they are interrupted just prior to its execution.4 As discussed in Chapter 2, retrieval failures are one of the three cate gories of memory lapses that can occur. They are among the most frequent ways that our memory can let us down. And it only becomes worse with age. This type of memory failure can show itself as tip-of-the-tongue situations when you realize that you cannot remember a name or a word that you know you know. The read–execute–document way of work is a simple but powerful solution for this type of memory failure situation.

Errors Following Interruptions An additional failure to carry out a necessary check on progress can be caused by some local distraction. On many occasions, the interruption causes the person to forget the subsequent actions or allows him/her to get diverted into something else. The situation is very similar to a multitasking situation; the person cannot handle simultaneously the current task and the interruption. A study found that, on average, workers are interrupted every

222 Chapter 10

11 minutes and then spend almost one-third of their day trying to recover from these distractions. External aids such as checklists are very useful during task execu tion to mitigate the effect of interruptions when workers are working from memory.

MULTITASKING AND HUMAN ERRORS As someone once said, “Multitasking is merely the opportunity to screw up more than one thing at a time.” Multitasking is a contradictory concept. Almost everyone accepts it as an effective way of working, and many job descriptions include it as one of the most desirable behaviors. But the reality is that multitasking is neither effective nor efficient. There are multiple scientific papers that prove that multitasking is actually a very inefficient way to do things, and it is at the core of many errors and mistakes. Multitasking means combining two or more activities, potentially causing at least one to receive inadequate attention. When you try to do two things at the same time, you will not be able to do either one well. The term multitasking is actually a misnomer. People can’t actually do more than one task at a time. Instead, we switch between tasks. Therefore, the term that should be used is “task switching.” The multitasking concept was developed to describe computers, not people. But not even a CPU multitasks; it merely switches back and forth between tasks several thousand times per second, thus giving the illusion that everything is happening simultaneously. Today it’s interpreted to mean multiple tasks being done simultaneously by a person. Let’s be clear: a majority of individuals can actually do two or more things at once, such as walk and chew gum, as the saying goes. Yes, we can walk and chew gum, but not much else. The military even has an expression for it: they called it “task satura tion,” which is trying to do too many things at one time. Humans cannot focus on two or more things at once. Unlike computers, we cannot make partitions in our brains to work on different tasks at the same time. We simply are not wired for that. Repeatedly dropping and picking up a task results in greater mental fatigue and more error than deep immersion in a single task. When we are distracted, our brain processes and sorts information ineffectively. Multitasking negatively affects concentration. Since workers are interrupted every 11 minutes on average and then spend almost a third of their work shift trying to recover from these

Selected Topics 223

distractions, we lose almost 30% of an average workday due to multitasking ineffectiveness.5 When we switch between tasks, our brain jumps back and forth. It always takes some time to start a new task and restart the one you quit, and there’s no guarantee you will be able to pick up exactly where you left off. Task switching takes a tremendous toll in terms of slips and lapse errors. A better term to define what we normally do is divide our attention. We can drive and maintain a phone conversation, but our attention gets divided. In many cases, if the conversation is really important, we will try to find a safe place, stop the car, and focus solely on the conversation. There is no such thing as dividing attention between two conscious activities. We cannot make two conscious decisions at the same time, no matter how simple they are. Do you want your surgeon multitasking (for example, texting) while operating on you? Simply put, multitaskers make more errors than nonmultitaskers. In a 2009 series of articles, The New York Times6 reported on the dangers of driving while using cell phones to talk or text. It reported that 16% of all traffic fatalities in the United States and nearly half a million injuries are caused every year by distracted driving. Several studies suggest that the most innocent and casual phone conversation while driving takes nearly 40% of your attention, practically having the same effect as being drunk. Doing more than one task at a time, especially more than one complex task, also takes a toll on productivity. Psychologists who study what happens to cognition when people try to perform more than one task at a time have found that the mind and brain were not designed for heavyduty multitasking.7 • It takes more time to get tasks completed if you switch between them than if you do them one at a time. • You make more errors when you switch between tasks than if you do one task at a time. • If the tasks are complex, these time and error penalties increase. • Each task switch might waste only 1/10th of a second, but if you do a lot of switching in a day, it can add up to a loss of 40% of your productivity. Because of the lack of focused attention caused by multitasking, I included a specific question (#34 of 50) among those recommended for investigating human errors in Chapter 6.

224 Chapter 10

GOOD DOCUMENTATION PRACTICES: DATA INTEGRITY AND HUMAN ERROR In August 2015, the European Union (EU) banned the marketing of about 700 Indian-made generic drugs for alleged manipulation of clinical trial data. This largest-ever EU-wide suspension of sales and distribution of generic drugs ordered by the European Commission was applicable to all 28 member-nations.8 Recently, there has been a dramatic escalation in the number of FDA warning letters, World Health Organization (WHO) notices of concern, and EU statements of noncompliance in which false or misleading infor mation has been identified during inspections. Failure to properly manage data integrity applies equally to paper and electronic data. It can arise either from poor systematic control of data management systems due to a lack of knowledge, human error, or intentionally hidden, falsified, or misleading data.

What is Data Integrity? Data integrity is a global mandatory requirement for the regulated healthcare industry.9 It is also a basic element of good documentation practices, one of the most fundamental pillars of any quality management system, including CGMP. Developing and bringing a medical product to market involves different players and activities; therefore, robustness and accuracy of the data submitted by manufacturers to regulatory authorities are crucial. The data must be comprehensive, complete, accurate, and true to ensure the quality of studies supporting applications for medical products to be placed on the market. Complete, consistent, and accurate data must be attributable, legible, contemporaneously recorded, original or a true copy, and accurate (ALCOA).10 Table 10.1 depicts these basic characteristics. Data integrity also must comply with good manufacturing practices (GMPs), good clinical practices (GCPs), and good laboratory practices (GLPs). In recent years, however, data integrity issues are jeopardizing the regulatory compliance status of organizations. In many instances, data integrity problems are created by sloppy documentation practices or inci dents that cause the loss of data, but regulators tend to label those situations as fraud. Moreover, it demonstrates a lack of commitment.

Selected Topics 225

Table 10.1

ALCOA principles for good documentation practices.

Characteristics

Meaning

Attributable

Establishes who performed an action and when. Traceable to an individual.

Legible

Data must be recorded permanently in a durable medium and be readable by human beings. Traceable changes.

Contemporaneous

Activities must be recorded at the time they occur (when the activity is performed or the information is obtained).

Original

The information must be the original record (first capture of the data) or a certified true copy, not transcribed data.

Accurate

The data reflect true information.

Data Integrity and Human Error Data integrity is a critical element of an organization’s quality program. Reducing the risk of human error in our manufacturing and laboratory processes will ensure that we comply with data integrity laws and regulations while building quality into our everyday practices and securing the quality and safety of our products. Regulators and quality auditors do not distinguish between human error or sloppiness and data falsification and fraud when assessing the impact of data integrity failures, as demonstrated in the following excerpt from a 2015 FDA warning letter: In correspondence with the agency, you indicate that no malicious data integrity patterns and practices were found. Also, you state that no intentional activity to disguise, misrepresent, or replace failing data with passing data was identified and no evidence of file deletion or manipulation was found. Your response and comments focus primarily on the issue of intent, and do not adequately address the seriousness of the CGMP violations found during the inspection.

Regulatory Impact During the last years, a string of FDA-issued warning letters for data integrity violations have been published on the agency’s website. For

226 Chapter 10

example, from January 2015 to May 15, 2016, 21 out of 28 warning letters given to drug manufacturers involved data integrity issues. The intense scrutiny and harsh enforcement actions taken by the FDA and other regulatory bodies during the past few years apparently is not stopping other companies from continuing to engage in unacceptable behaviors. As recently as September 16, 2021, the United States was notifying sponsors of new drug applications (NDAs) and abbreviated new drug applications (ANDAs) that clinical and bioanalytical studies conducted by two Indian clinical research organizations (CROs) sites) were not acceptable because of data integrity concerns, and the studies must be repeated. The FDA stated that “this action is part of our continued vigilance and commitment to data integrity and protecting consumers from products that may put them at risk. In this case, while approved drugs were impacted, affected applications were also successfully identified while still under review by the agency.”11 Some data integrity breaches during FDA inspections are shocking. They range from backdating records in the presence of two FDA inspectors12 to documenting microbial results on a certificate of analysis when the testing was never performed.13 Between 2015 and 2016, major regulatory bodies, such as the European Medicines Agency (EMA), FDA, WHO, and the Pharmaceutical Inspection Co-operation Scheme (PIC/S), published guidance documents on the topic of data integrity and data management. In August 2016, the EMA and the PIC/S announced the publication of a new GMP data integrity draft guidance document, which was finally made effective July 1, 2021, under the title of “Good Practices for Data Management and Integrity in Regulated GMP/GDP Environments.”14 Data from the testing, manufacturing, packaging, distribution, and monitoring of drugs are used by regulators to review the quality, safety, and efficacy of drugs, so ensuring the integrity and completeness of such data is important. This document addresses the assessment of risk to data integrity, risk management strategies, design and control of electronic and paper-based documentation systems, and ensuring data integrity of outside contractors. It appears that regulators are taking a closer look at data integrity industry-wide. The FDA released its own data integrity draft guidance document in April 2016 (updated December 2018), which relies on numerous prior guidances. It reaffirms the critical role of quality functions and quality professionals in ensuring the integrity of data: • For recording data, manufacturing, or testing steps, numbered and controlled forms must be issued and reconciled by quality assurance (QA).

Selected Topics 227

• Any findings of data integrity violations and “removing at all levels individuals responsible for [data integrity] problems from current GMP (CGMP) positions” must be disclosed to the FDA. • Before batch release, QA must review the audit trail and electronic testing. • Control strategies must be in place to ensure that all original lab records (paper and electronic) are reviewed (by a person), and all test results are appropriately reported. • The immediate and irreversible recording of electronic testing data is imperative (including after completing each high-performance liquid chromatography [HPLC] testing sequence versus recording only at the end of the day).

Commitment From All Data integrity enables good decision-making by manufacturers and regu latory authorities. It is a fundamental mandatory requirement of the medical products quality system, applying equally to manual (paper) and electronic systems. To ensure data integrity, senior management must engage in the promotion of a quality culture along with the implementation of appropriate organizational and technical controls. It requires participation and commitment by staff at all levels within the organization, by the organization’s suppliers, and by its distributors. Data integrity is a basic element of good documentation practices, one of the most fundamental pillars of any quality management system, including CGMP. Upper management, and especially quality leaders, at every regulated organization must ensure that everyone is accountable for their actions, including having proper documentation of activities performed. Unfortunately, most regulated organizations only react to data integrity issues after regulators discover them. An outrageous example of this can be found in a warning letter15 issued in July 2014 in which the FDA required an organization to “identify the specific managers in place who participated in, facilitated, encouraged, or failed to stop subordinates from falsifying data in CGMP records, and determine the extent of top and middle management’s involvement in or awareness of data manipulation.” In the same inspection, the FDA also discovered that “your firm falsified documents designed to demonstrate the effectiveness of cGMP training…. That a senior manager was engaged in the falsification of documents is troubling and raises questions about validity of documents generated by your firm.”

228 Chapter 10

Senior management, especially those with quality management respon sibilities, should ensure that data integrity risk is assessed, mitigated, and communicated in accordance with the principles of quality risk manage ment. The effort and resources assigned to data integrity measures should be commensurate with the risk to product quality; it should also be balanced with other QA resource demands. Where long-term measures are identified to achieve the desired state of control, interim measures should be implemented to mitigate risk and monitored for effectiveness.

11 Final Thoughts

O

nce we all agree that eliminating all human error is impossible (to do that, we would first have to eliminate all humans), our efforts should address two areas:

1. Reducing the probability of human error from the onset

2. When the unavoidable error occurs, implementing barriers to detect those human errors and/or minimize their impact on the quality of our processes Using CAPA concepts, these mitigation efforts consist of two parts: 1. The preventive part must encompass important human factors such as better supervision, better procedures and working instructions, and more-effective training efforts. Make your processes and documents as error-proof as you can. Do not hesitate to overuse mistake-proofing features, also known by the Japanese term poka-yoke. 2. For the reactive part, you must improve your investigations. Don’t accept human error as the root cause. Think twice before using retraining as a corrective action. Identify: a. Contributing root causes b. Situational factors c. Latent factors d. Absent or insufficient control barriers Companies often act as if workers make mistakes simply because they forget the instructions. They believe that retraining will help workers to not forget in the future. This lack of understanding of human error is one of the 229

230 Chapter 11

root causes of our lack of effectiveness when trying to fix human-caused defects. To succeed at error control and reduction, we must consider the influence the following factors have on behavior and performance: • Design of facilities and equipment • Information content and format (procedures, work instructions, and job aids • Training • Method of work—supervision and management controls (including adequate resources and clear roles and responsibilities) • Process of communications Do not operate from memory. Read, execute, and document is the best recipe to minimize most of the human errors created by lapses of memory. Finally, but not less important, we must monitor the performance of the human involved in our process. Simple statistical tools such as the analysis of proportions, or chi-squared, can help. The methodical identification of the best performer (who can be used for benchmarking purposes) and the not-so-best can help the organization to improve whole processes, from job description adequacies to the best way to deliver effective training. There are three areas in which we must concentrate our effort to reduce the impact of so-called human error effectively and dramatically in the bottom line of the regulated industries. For doing the right thing the first time, we need: 1. Better user-centered documents (working instructions, specifications, and procedures) with clear, complete, and comprehensive instructions 2. Better training to ensure that workers understand why they are doing what they are doing, why they must always follow instructions, and what happens when instructions are not followed 3. Better supervision to ensure that workers always follow procedures and working instructions while performing any function in a GMP-regulated environment Steps in reducing human errors include: • Addressing the conditions and reducing the stressors that increase the frequency of errors

Final Thoughts 231

• Designing facilities and equipment to prevent slips and lapses from occurring or to increase the chances of detecting and correcting them • Driving out complexity and designing jobs to avoid the need for tasks that involve very complex decisions, diagnoses, or calculations, for example, by writing procedures for rare events requiring decisions and actions • Ensuring proper supervision, particularly for inexperienced staff or tasks in which there is a need for independent checking • Checking that job aids such as procedures and instructions are clear, concise, available, up to date, and accepted by users • Considering the possibility of human error when undertaking risk assessments • Thinking about the different causes of human errors during incident investigations in order to introduce measures to reduce the risk of a repeat incident • Monitoring to ensure that measures taken to reduce error are effective • Enhancing process (barrier) controls and poka-yokes • Removing latent failures • Making people accountable within a positive and blame-free environment toward errors and mistakes • Understanding the hows and whys of human errors To reduce violations, managers could: • Take steps to increase the chances of violations being detected using routine monitoring, internal audits, and so on • Make rules and procedures relevant and practical, and eliminate unnecessary rules or instruction • Train by explaining the reasons behind certain rules or procedures and their relevance • Provide more training and better control (for example supervisory presence) for abnormal and emergency situations to minimize exceptional violations

232 Chapter 11

We learned that lack of attention and lapse of memory play a significant role in all categories of human error. In regulated industries, these factors should be negligible because workers are not supposed to rely on memory for correct performance. Batch records, device master records, and history files exist for a purpose. If you want to improve processes performed by humans at all levels, you must remember what Reason (1990) wrote: • Fallibility is part of the human condition. • We can’t change the human condition. • We can change the conditions under which people work. • Human beings will always make errors. • Naming, blaming, and shaming have no remedial value. Everyone can make errors no matter how well trained and motivated they are. Sometimes, we are “set up” by the system to fail. The challenge is to develop error-tolerant systems and prevent errors from occurring. Reducing human error involves far more than taking disciplinary action against an individual. There are a range of measures that provide more-effective controls, including the design of the job and equipment, procedures, and training. Paying attention to individual attitudes and motivations, design features of the job, and the organization will help to reduce violations. Chapter 6 contains a diagnostic tool for investigating human errors. And finally, we must establish an order for error control from the best, ideal options to the least effective barriers: • Error- or mistake-proof: make the error impossible (the computer form cannot be saved if some information is missed, the microwave cannot operate if the door is open, the production tank will unlock and open only if scanned containers of ingredients match the system’s bill of material for the lot) • Error prevention: signals and alarms (red underlining of incorrect spelling in Microsoft Word) • Minimize the impact of error (inspection and test, or double verification of the addition of components to a mixing tank) • Reduce reliance on memory and vigilance One final recommendation—if you are going to make only one change to reduce human error prevalence, do this: improve supervision. Table 11.1 contains some key recommendations to follow when investigating and fixing human errors.

Final Thoughts 233

Table 11.1 Human error investigation and prevention do’s and don’ts. Do

Don’t

• Investigate every human error up to its root cause(s)

• Use human error as root cause

• Search for precursors of the human error (working from memory?) • Improve your work instructions and records by enhancing document format (imperative tone, graphic elements, clear and comprehensive content) • Improve your training system • Measure the effectiveness of your training efforts

• Use retraining as the default corrective action for human failures • Assume that your employees are lazy and careless about their jobs

Endnotes

Chapter 1

1. Linda T. Kohn, Janet M. Corrigan, Molla S. Donaldson, eds. To Err Is Human: Building a Safer Health System. (Washington, DC: Institute of Medicine Committee on Quality of Health Care in America, 2000).

2. P. Davies. “First State Hospital Report Card Issued.” The Wall Street Journal. January 20, 2005, D-5. 3. George A. Peters, and Barbara J. Peters. Human Error: Causes and Control. (Boca Raton, FL: CRC Press, 2006).

4. Michael Daniel. “Study Suggests Medical Errors Now Third Leading Cause of Death in the U.S.: Physicians advocate for changes in how deaths are reported to better reflect reality.” Johns Hopkins Medicine, May 3, 2016, http://www.hopkinsmedicine.org/news/media/releases/study_ suggests_medical_errors_now_third_leading_cause_of_death_in_the_us. 5. P. Garnerin, B. Pellet-Meier, P. Chopard, T. Perneger, and P. Bonnabry. “Measuring Human-Error Probabilities in Drug Preparation: A Pilot Simulation Study. European Journal of Clinical Pharmacology 63, no. 8 (2007): 769–76.

6. World Health Organization. “WHO Launches Global Effort to Halve Medication-Related Errors in 5 Years.” March 29, 2017, http://www.who.int/ mediacentre/news/releases/2017/medication-related-errors/en/.

7. Center for Chemical Process Safety. Human Factors Methods for Improving Performance in the Process Industries. (Hoboken, NJ: Wiley-Interscience, 2007).

8. James Reason. Human Error. (New York: Cambridge University Press, 1990).

9. Don Norman. The Design of Everyday Things. (New York: Basic Books, 2013).

10. Leila Abboud. “Mental Illness Said to Affect One-Quarter of Americans.” The Wall Street Journal, June 7, 2005, D-1 and D-7. 235

236 Endnotes

11. Thomas R. Krause, ed. Current Issues in Behavior-Based Safety: How to Make Continuous Improvement a Reality. (Ojai, CA: Behavioral Science Technology, Inc., 1999).

12. Andrew D. ShamRao. “Shaping a Safety Culture.” Safety Online, accessed Sept. 13, 2021, https://www.safetyonline.com/doc/ shaping-a-safety-culture-0001.

13. Health and Safety Executive. “Flixborough (Nypro UK) Explosion 1st June 1974,” accessed Sept. 14, 2021, http://www.hse.gov.uk/comah/ sragtech/caseflixboroug74.htm.

14. “Tenerife: Remembering the World’s Deadliest Aviation Disaster.” CBS News.com, March 17, 2017, https://www.cbsnews.com/news/ tenerife-remembering-the-worlds-deadliest-aviation-disaster/.

15. Julia Jacobo and Bianca Seidman. “A Brief History of the Three Mile Island Reactor Plant Known for 1979 Reactor Accident.” ABC News.com, May 31, 2017, http://abcnews.go.com/US/ history-mile-island-nuclear-plant-1979-reactoraccident/story?id=47731028.

16. Encyclopaedia Britannica Online, s.v. “Bhopal Disaster,” accessed Sept. 14, 2021, https://www.britannica.com/event/Bhopal-disaster.

17. Bethan Bell. “Zeebrugge Herald of Free Enterprise Disaster Remembered.” BBC News, March 6, 2017, http://www.bbc.com/news/ uk-england-39116394.

18. National Aeronautics and Space Administration, “The Case for Safety: The North Sea Piper Alpha Disaster,” accessed Sept. 14, 2021, https://sma. nasa.gov/docs/default-source/safety-messages/safetymessage-2013-05-06piperalpha.pdf?sfvrsn=3daf1ef8_6.

19. Encyclopaedia Britannica Online, s.v. “Chernobyl Disaster,” accessed Sept. 14, 2021, https://www.britannica.com/event/Chernobyl-disaster.

20. Environmental Protection Agency website. “Exxon Valdez Spill Profile,” last modified Nov. 29, 2022, https://www.epa.gov/emergency-response/ exxon-valdez-spill-profile.

21. National Transportation Safety Board, “Pipeline Accident Report: San Juan Gas Company, Inc./Enron Corp Propane Gas Explosion in San Juan, Puerto Rico, on November 21, 1996,” accessed Feb. 6, 2023, https://www. ntsb.gov/investigations/AccidentReports/Reports/PAR9701.pdf.

22. History.com. “Challenger Explosion,” last modified September 11, 2020, https://www.history.com/topics/challenger-disaster.

23. “Report of the Presidential Commission on the Space Shuttle Challenger Accident,” accessed Sept. 14, 2021, https://history.nasa.gov/rogersrep/ v1ch5.htm.

24. “BP American Refinery Explosion.” Chemical Safety and Hazardous Investigations Board website, accessed Sept. 14, 2021, http://www.csb.gov/ bp-america-refinery-explosion/.

Endnotes 237

25. Alan D. Swain, and H. E. Guttmann. Handbook of Human Reliability Analysis with Emphasis on Nuclear Power Plant Applications. (Washington, DC: NUREG/CR-1278, 1983).

26. Center for Chemical Process Safety. Human Factors Methods for Improving Performance in the Process Industries. (Hoboken, NJ: Wiley-Interscience, 2007).

27. A. Kolaczkowski, J. Forester, E. Lois, and S. Cooper. Good Practices for Implementing Human Reliability Analysis. NUREG-1792 Final Report. (Washington, DC: U.S. Nuclear Regulatory Commission, 2005).

28. Kyung S. Park. Human Reliability: Analysis, Prediction, and Prevention of Human Errors. Vol. 7 of Advances in Human Factors/Ergonomics (Amsterdam: Elsevier Science Publishers, 1987).

29. Lucian L. Leape. “Error in Medicine.” Journal of the American Medical Association 272, no. 23 (1994): 1851–57.

30. James Reason. “Beyond the Organizational Accident: The Need for ‘Error Wisdom’ on the Frontline.” Quality and Safety in Healthcare 13, Supplement II (2004): ii28–ii33.

31. Sidney W. A. Dekker. Ten Questions About Human Error: A New View of Human Factors and System Safety. (Mahwah, NJ: Lawrence Erlbaum Associates, 2005).

32. Lucy A. Suchman. Plans and Situated Actions: The Problem of HumanMachine Communication. (New York: Cambridge University Press, 1987).

33. David A. Graham. “Rumsfeld’s Knowns and Unknowns: The Intellectual History of a Quip.” The Atlantic, March 14, 2014, https://www.theatlantic. com/politics/archive/2014/03/rumsfelds-knownsand-unknowns-theintellectual-history-of-a-quip/359719/.

34. “Proper Storage of Warheads.” Weird Universe, accessed Sept. 14, 2021, http://www.weirduniverse.net/blog/comments/ proper_storage_of_warheads.

35. Zachary Brennan. “Human Factors Studies for Generic Combo Products: FDA Offers Draft Guidance.” Regulatory Focus, Jan. 13, 2017, http:// www.raps.org/Regulatory-Focus/News/2017/01/13/26591/Human-FactorsStudies-for-Generic-Combo-Products-FDA-Offers-Draft-Guidance/.

36. AAMI website, accessed Sept. 14, 2021, http://www.aami.org.

Chapter 2

1. Jens Rasmussen. “Skills, Rules, and Knowledge: Signals, Signs, and Symbols and Other Distinctions in Human Performance Models.” IEEE Transactions on Systems, Man, and Cybernetics SMC-13, no. 3 (1983): 257–67.

2. James Reason, and Alan Hobbs. Managing Maintenance Error: A Practical Guide. (Burlington, VT: CRC Press, 2003).

238 Endnotes

3. U.S. Department of Energy. Human Performance Improvement Handbook: Volume 1: Concepts and Principles. (Washington, DC: Technical Standards Program, 2009).

4. Institute of Nuclear Power Operations (INPO). An Analysis of Root Causes in 1983 Significant Event Reports. INPO 84-027. (Atlanta, GA: INPO, 1984).

Chapter 4

1. Edgar H. Schein. The Corporate Culture Survival Guide. (San Francisco: Jossey-Bass, 2009).

2. Warren E. Leary. “Poor Management by NASA is Blamed for Mars Failure.” New York Times, March 29, 2000, http://www.nytimes.com/2000/03/29/us/ poor-management-by-nasa-is-blamed-for-mars-failure.html.

3. Neal Gabler. “Inside Costco: The Magic in the Warehouse.” Fortune.com, Dec. 3, 2016, https://fortune.com/longform/costco-wholesale-shopping/.

4. William Bridges, and Revonda Tew. Human Factors Missing from Process Safety Management. 6th Global Congress on Process Safety, San Antonio, Texas, 2010.

5. Karl E. Weick, Kathleen M. Sutcliffe, and David Obstfeld. “Organizing for High Reliability: Processes of Collective Mindfulness.” In: Research in Organizational Behavior, vol. 21., edited by Robert I. Sutton and Barry M. Staw, 81–124. (Greenwich, CT: JAI, 1999).

6. Janette Edmonds ed. Human Factors in the Chemical and Process industries: Making It Work in Practice. (Cambridge, MA: Elsevier, 2016).

7. American Psychiatric Association. DSM-5: Diagnostic and Statistical Manual of Mental Disorders, 5th ed. (Washington, DC: APA, 2013).

8. “Adult ADHD (Attention Deficit Hyperactivity Disorder).” Anxiety and Depression Association of America website, accessed Sept. 15, 2021, https://adaa.org/understanding-anxiety/related-illnesses/ other-related-conditions/adult-adhd.

9. U.S. Food and Drug Administration. “CFR Code of Federal Regulations Title 21,” last modified Nov. 29, 2022, https://www.accessdata.fda.gov/ scripts/cdrh/cfdocs/cfcfr/CFRSearch.cfm?fr=211.25.

10. Eudralex. “The Rules Governing Medicinal Products in the European Union,” accessed Sept. 15, 2021, https://ec.europa.eu/health/sites/health/ files/files/eudralex/vol-4/2014-03_chapter_2.pdf.

11. Francesca Gino. “Are You Too Stressed to Be Productive? Or Not Stressed Enough?” Harvard Business Review, accessed Sept. 15, 2021, https://hbr. org/2016/04/are-you-too-stressed-to-be-productive-or-notstressed-enough.

12. Quintus R. Jett, and Jennifer M. George. “Work Interrupted: A Closer Look at the Role of Interruptions in Organizational Life. Academy of Management Review 28, no. 3 (2003): 494–507.

Endnotes 239

13. Simon Folkard, and David A. Lombardi. “Modelling the Impact of the Components of Long Work Hours on Injuries and ‘Accidents’.” American Journal of Industrial Medicine 49, no. 11 (2006): 953–63.

14. Center for Chemical Process Safety. Human Factors Methods for Improving Performance in the Process Industries. (Hoboken, NJ: Wiley-Interscience, 2007).

15. Christopher D. Wickens, John D. Lee, Yili Liu, and Sallie E. Gordon Becker. An Introduction to Human Factors Engineering. 2nd ed. (Upper Saddle River, NJ: Pearson Prentice Hall, 2004).

16. Food and Drug Administration (FDA). CGMP for Finished Pharmaceuticals (21 CFR 211). (Washington, DC: FDA).

17. Center for Chemical Process Safety. Human Factors Methods for Improving Performance in the Process Industries. (Hoboken, NJ: Wiley-Interscience, 2007).

18. John Wood. “The Best Fonts to Use in Print, Online, and Email.” American Writers and Artists Institute website, Oct. 6, 2011, https://www. awai.com/2011/10/the-best-fonts-to-use-in-print-online-and-email/.

19. D.B. Friedman, and L. Hoffman-Goetz. “A Systematic Review of Readability and Comprehension Instruments Used for Print and Web-Based Cancer Information.” Health Education & Behavior 33 (2006): 352–73.

20. International Organization for Standardization. (2017). General requirements for the competence of testing and calibration laboratories (ISO Standard No. 17025:2017). (Geneva, Switzerland: ISO, 2017).

21. International Organization for Standardization. (2018). Quality management systems–Requirements for regulatory purposes (ISO Standard No. 13485:2018). (Geneva, Switzerland: ISO, 2018).

22. International Organization for Standardization. (2015). Quality management systems–requirements (ISO Standard No. 9001:2015). (Geneva, Switzerland: ISO, 2015).

23. D.J. Bryan, and H. Angel. Retention and Fading of Military Skills: Literature Review. Canada Department of National Defense (2000).

24. Food and Drug Administration (FDA). Quality Systems Approach to Pharmaceutical CGMP Regulations. Guidance for Industry (Washington, DC: FDA, 2006).

25 International Organization for Standardization. (2015). Quality management systems–requirements (ISO Standard No. 9001:2015). (Geneva, Switzerland: ISO, 2015).

26. International Organization for Standardization. (2018). Medical devices– quality management systems–requirements for regulatory purposes (ISO Standard No. 13485:2018). (Geneva, Switzerland: ISO, 2018).

240 Endnotes

27. Donald L. Kirkpatrick, and James D. Kirkpatrick. Evaluating Training Programs, 3rd ed. (San Francisco: Berrett-Koehler Publishers, 2006).

28. Food and Drug Administration (FDA). GMP regulation for the manufac ture of human finished drugs (21 CFR 211) (Washington, DC: FDA).

Chapter 6

1. An in-depth discussion of this topic can be found in the author’s Handbook of Investigation and Effective CAPA Systems, Third ed., published by ASQ Quality Press, 2021. See Rodríguez-Pérez, J. (2021).

2. Food and Drug Administration (FDA). Code of Federal Regulations (21 CFR 820.100) (Washington, DC: FDA).

Chapter 7

1. Jose Rodriguez-Perez. Handbook of Investigation and Effective CAPA Systems (Milwaukee: ASQ Quality Press, 2010).

Chapter 8

1. John W. Senders, and Neville P. Moray. Human Error: Cause, Prediction and Reduction. (Hillsdale, NJ: CRC Press, 1991).

2. Food and Drug Administration (FDA). Q9 Quality Risk Management. Guidance for Industry. (Washington, DC: FDA, 2006).

3. Nancy R. Tague. The Quality Toolbox, Second ed. (Milwaukee: ASQ Quality Press, 2005).

4. Barry Kirwan. “Human Reliability Assessment.” In Evaluation of Human Works, Third ed. Edited by John R. Wilson and Nigel Corlett. (Boca Raton, FL: CRC Press, 2005).

Chapter 9

1. Brigette M. Hales, and Peter J. Pronovost. “The Checklist—A Tool for Error Management and Performance Improvement.” Journal of Critical Care 21, no. 3 (2006): 231–35.

2. Robert L. Helmreich. “On Error Management: Lessons from Aviation.” BMJ 320, no. 7237 (2000): 781–85.

3. Richard L. Sears. “A New Look at Accident Contributors and the Implications of Operational and Training Procedures: Influence of Training, Operational and Maintenance Practices on Flight Safety.” In Proceedings of the Flight Safety Foundation’s 38th Annual International Air Safety Seminar, 29–51. (Arlington, VA: Flight Safety Institute, 1985).

4. Daniel Boorman. “Today’s Electronic Checklists Reduce Likelihood of Crew Errors and Help Prevent Mishaps.” ICAO Journal 56 (2001): 17–36.

Endnotes 241

5. Alan M. Wolff, Sally A. Taylor, and Janette F. McCabe. “Using Checklists and Reminders in Clinical Pathways to Improve Hospital Inpatient Care.” The Medical Journal of Australia 181, no. 8 (2004): 428–31.

6. M.J. Sharps, C.A. Wilson-Leff, and J.L. Price. “Relational and ItemSpecific Information as Determinants of Category Superiority Effects.” The Journal of General Psychology 122, no. 3 (1995): 271–85.

7. Atul Gawande. 2009. The Checklist Manifesto: How to Get Things Right. (New York: Metropolitan Book, 2009). The book was preceded in December 2007 by an article titled “The Checklist” published in The New Yorker. https://www.newyorker.com/magazine/2007/12/10/the-checklist, accessed Sept. 15, 2021.

8. Shigeo Shingo. Zero Quality Control: Source Inspection and the Poka-Yoke System. (New York: Productivity Press, 1986).

9. Productivity Press Development Team. Mistake-Proofing for Operators: The ZQC System. Shopfloor Series. (New York: Productivity Press, 1997).

10. Linda T. Kohn, Janet M. Corrigan, and Molla S. Donaldson, eds. To Err Is Human: Building a Safer Health System. (Washington, DC: Institute of Medicine Committee on Quality of Health Care in America, 2000).

11. L.S. Jenson, A.F. Merry, C.S. Webster, J. Weller, and L. Larson. “Evidence-Based Strategies for Preventing Drug Administration Errors During Anesthesia.” Anesthesia 59 (2004): 493–504.

12. Dilip Kothari, Suman Gupta, Sharma Chetan, and Saroj Kothari. “Medication Error in Anaesthesia and Critical Care: A Cause for Concern.” Indian Journal of Anaesthesia 54, no. 3 (2010): 187–92. at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2933474/, accessed Sept. 16, 2021.

Chapter 10

1. B. Kantowiz, and R. Sorkin. Human Factors. (New York: John Wiley & Sons, 1983).

2. M. Ament, A. Cox, A. Blandford, and D. Brumby. “Working Memory Load Affects Device-Specific but Not Task-Specific Error Rates.” In Proceedings of the 32nd Annual Conference of the Cognitive Science Society, 91–96. (Austin, TX: Cognitive Science Society, 2010).

3. J. Gregory Trafton, Erik M. Altmann, and Raj M. Ratwani. “A Memory for Goals Model of Sequence Errors.” In Proceedings of the 9th International Conference of Cognitive Modeling (paper 225). (Manchester, England, 2009).

4. Simon Y. Li, Ann Blandford, Paul Cairns, and Richard M. Young. “The Effect of Interruptions on Post-completion and Other Procedural Errors: An Account Based on the Activation-Based Goal Memory Model.” Journal of Experimental Psychology: Applied 14, no. 4 (2008): 314–28.

242 Endnotes

5. Gary Keller, and Jay Papasan. The One Thing: The Surprisingly Simple Truth Behind Extraordinary Results. (Austin, TX: Bard Press, 2013).

6. New York Times online. “Should Cellphone Use by Drivers Be Illegal?” accessed Sept. 16, 2021, https://roomfordebate.blogs.nytimes. com/2009/07/18/should-cellphone-use-by-drivers-be-illegal/.

7. American Psychological Association website. “Multitasking: Switching Costs,” last modified March 20, 2006, http://www.apa.org/research/action/ multitask.aspx.

8. PTI. “European Union Bans 700 Generic Drugs for Manipulation of Trials by GVK Biosciences.” July 25, 2015, http://www.financialexpress.com/ industry/european-union-bans-700-generic-drugs-for-manipulation-oftrials-by-gvk-biosciences/107418/.

9. Jose Rodriguez-Perez. Data Integrity and Compliance (Milwaukee: ASQ Quality Press, 2019).

10. U.S. Food and Drug Administration, “Data Integrity and Compliance With CGMP Guidance for Industry,” accessed Sept. 16, 2021, https://www.fda.gov/downloads/drugs/guidances/ucm495891.pdf.

11. U.S. Food and Drug Administration. “Notification to Pharmaceutical Companies,” accessed Sept. 28, 2021, https://www.fda.gov/drugs/drugsafety-and-availability/notification-pharmaceutical-companies-clinicaland-bioanalytical-studies-conducted-panexcell.

12. U.S. Food and Drug Administration document, accessed Sept. 16, 2021, https://www.fda.gov/downloads/Drugs/ GuidanceComplianceRegulatoryInformation/ EnforcementActivitiesbyFDA/UCM382514.pdf.

13. Alexander Gaffney. “India’s Data Integrity Problems.” Regulatory Focus, Feb. 3, 2015, https://www.raps.org/regulatory-focus%E2%84%A2/ news-articles/2015/2/india-s-data-integrity-problems.

14. Pharmaceutical Inspection Co-operation Scheme website, accessed Sept. 29, 2021, http://www.picscheme.org/.

15. U.S. Food and Drug Administration. “Inspections, Compliance, Enforcement, and Criminal Investigations,” accessed Nov. 7, 2022, http://web.archive.org/web/20151022163659/http://www.fda.gov/ICECI/ EnforcementActions/WarningLetters/2014/ucm409898.htm.

Bibliography

Abboud, Leila. 2005. “Mental Illness Said to Affect One-Quarter of Americans.” The Wall Street Journal, June 7, D-1 and D-7. Ament, M., A. Cox, A. Blandford, and D. Brumby. 2010. “Working Memory Load Affects Device-Specific but Not Task-Specific Error Rates.” In Proceedings of the 32nd Annual Conference of the Cognitive Science Society, 91–96. Austin, TX: Cognitive Science Society. American Psychiatric Association. 2013. DSM-5: Diagnostic and Statistical Manual of Mental Disorders. 5th ed. Washington, DC: APA. Boorman, Daniel. 2001. “Today’s Electronic Checklists Reduce Likelihood of Crew Errors and Help Prevent Mishaps.” ICAO Journal 56: 17–20, 36. Bridges, William, and Revonda Tew. 2010. Human Factors Missing from Process Safety Management. 6th Global Congress on Process Safety, San Antonio, TX. Bryan, D. J., and H. Angel. 2000. Retention and Fading of Military Skills: Literature Review. Canada Department of National Defense. Center for Chemical Process Safety. 2007. Human Factors Methods for Improving Performance in the Process Industries. Hoboken, NJ: Wiley-Interscience. Davies, P. 2005. “First State Hospital Report Card Issued.” The Wall Street Journal, January 20, D-5. Dekker, Sidney W. A. 2005. Ten Questions About Human Error: A New View of Human Factors and System Safety. Mahwah, NJ: Lawrence Erlbaum Associates. Edmonds, Janette, ed. 2016. Human Factors in the Chemical and Process Industries: Making It Work in Practice. Cambridge, MA: Elsevier. Folkard, Simon, and David A. Lombardi. 2006. “Modelling the Impact of the Components of Long Work Hours on Injuries and “Accidents’.” American Journal of Industrial Medicine 49, no. 11: 953–63. Food and Drug Administration. 1996 Do It by Design: An Introduction to Human Factors in Medical Devices. Washington, DC: FDA. ______. 2006. Q9 Quality Risk Management. Guidance for Industry. Washington, DC: FDA.

243

244 Bibliography

_______. 2016. Applying Human Factors and Usability Engineering to Medical Devices. Guidance for Industry. Washington, DC: FDA. _______. 2017. Comparative Analyses and Related Comparative Use Human Factors Studies for a Drug-Device Combination Product Submitted in an ANDA. Draft Guidance for Industry. Washington, DC: FDA. Friedman, D. B., and L. Hoffman-Goetz. 2006. “A Systematic Review of Readability and Comprehension Instruments Used for Print and Web-Based Cancer Information.” Health Education & Behavior 33: 352–73. Garnerin, P., B. Pellet-Meier, P. Chopard, T. Perneger, and P. Bonnabry. 2007. “Measuring Human-Error Probabilities in Drug Preparation: A Pilot Simulation Study. European Journal of Clinical Pharmacology 6, no. 8: 769–76. Gawande, Atul. 2009. The Checklist Manifesto: How to Get Things Right. New York: Metropolitan Books. Hales, Brigette M., and Peter J. Pronovost. 2006. “The Checklist—A Tool for Error Management and Performance Improvement.” Journal of Critical Care 21, no. 3: 231–35. Helmreich Robert L. 2000. “On Error Management: Lessons from Aviation.” BMJ 320, no. 7237:781–85. Institute of Nuclear Power Operations (INPO). 1984. An Analysis of Root Causes in 1983 Significant Event Reports. INPO 84-027. Atlanta: INPO. International Organization for Standardization (ISO). 1999. ISO 9355-1:1999 (R2015) Ergonomic requirements for the design of displays and control actuators—Part 1: Human interactions with displays and control actuators. Geneva: ISO. ———. 1999. ISO 9355-2:1999 (R2015). Ergonomic requirements for the design of displays and control actuators—Part 2: Displays. Geneva: ISO. ———. 2006. ISO 9355-3:2006 (R2015). Ergonomic requirements for the design of displays and control actuators—Part 3: Control actuators. Geneva: ISO. _______. 2015. ISO 9001:2015 Quality management systems—requirements. Geneva: ISO. _______. 2016. ISO 13485:2016 Medical devices—Quality management systems—Requirements for regulatory purposes. Geneva: ISO. ———. 2017. ISO/IEC 17025:2017 General requirements for the competence of testing and calibration laboratories. Geneva: ISO. Jenson, L. S., A. F. Merry, C. S. Webster, J. Weller, and L. Larson. 2004. “Evidence Based Strategies for Preventing Drug Administration Errors During Anaesthesia.” Anaesthesia 59: 493–504. Jett, Quintus R., and Jennifer M. George. 2003. “Work Interrupted: A Closer Look at the Role of Interruptions in Organizational Life.” Academy of Management Review 28, no. 3: 494–507. Kantowiz, Barry H., and Robert D. Sorkin. 1983. Human Factors: Understanding People–System Relationships. New York: John Wiley & Sons. Keller, Gary, and Jay Papasan. 2013. The One Thing: The Surprisingly Simple Truth Behind Extraordinary Results. Austin, TX: Bard Press.

Bibliography 245

Kirkpatrick, Donald L., and James D. Kirkpatrick. 2006. Evaluating Training Programs. 3rd ed. San Francisco: Berrett-Koehler Publishers. Kirwan, Barry. 2005. “Human Reliability Assessment.” In Evaluation of Human Work. 3rd ed. Edited by John R. Wilson and Nigel Corlett. Boca Raton, FL: CRC Press. Kohn, Linda T., Janet M. Corrigan, and Molla S. Donaldson, eds. 2000. To Err Is Human: Building a Safer Health System. Washington, DC: Institute of Medicine Committee on Quality of Health Care in America. Kolaczkowski, A., J. Forester, E. Lois, and S. Cooper. 2005. Good Practices for Implementing Human Reliability Analysis (HRA). NUREG-1792 Final Report. Washington, DC: U.S. Nuclear Regulatory Commission. Kothari, Dilip, Suman Gupta, Chetan Sharma, and Saroj Kothari. 2010. “Medication Error in Anaesthesia and Critical Care: A Cause for Concern.” Indian Journal of Anaesthesia 54, no. 3: 187–192. https://www.ncbi.nlm.nih. gov/pmc/articles/PMC2933474/. Krause, Thomas R., ed. 1999. Current Issues in Behavior-Based Safety: How to Make Continuous Improvement a Reality. Ojai, CA: Behavioral Science Technology, Inc. Leape, Lucian L. 1994. “Error in Medicine.” Journal of the American Medical Association 272, no. 23: 1851–57. LeBoeuf, Michael. 1985. The Greatest Management Principle in the World. New York: Putnam. Li, Simon Y. W., Ann Blandford, Paul Cairns, and Richard M. Young. 2008. “The Effect of Interruptions on Post-completion and Other Procedural Errors: An Account Based on the Activation-Based Goal Memory Model.” Journal of Experimental Psychology: Applied 14, no. 4: 314–28. Norman, Don A. 1988. The Psychology of Everyday Things. New York: Basic Books. ———. 2013. The Design of Everyday Things. New York: Basic Books. Park, Kyung S. 1987. Human Reliability: Analysis, Prediction, and Prevention of Human Errors. Vol. 7 of Advances in Human Factors/Ergonomics, Gavriel Salvendy, ed. Amsterdam: Elsevier Science Publishers. Peters, George A., and Barbara J. Peters. 2006. Human Error: Causes and Control. Boca Raton, FL: CRC Press. Productivity Press Development Team. 1997. Mistake-Proofing for Operators: The ZQC System. Shopfloor Series. New York: Productivity Press. Rasmussen, Jens. 1983. “Skills, Rules, and Knowledge: Signals, Signs, and Symbols, and Other Distinctions in Human Performance Models.” IEEE Transactions on Systems, Man, and Cybernetics SMC-13, no. 3: 257–67. Reason, James. 1990. Human Error. New York: Cambridge University Press. ———. 2004. “Beyond the Organizational Accident: The Need for ‘Error Wisdom’ on the Frontline.” Quality and Safety in Healthcare 13 (Supplement II): ii28–ii33. Reason, James, and Alan Hobbs. 2003. Managing Maintenance Error: A Practical Guide. Burlington, VT: CRC Press.

246 Bibliography

Rodríguez-Pérez, J. 2010. CAPA for the FDA-Regulated Industry. Milwaukee: ASQ Quality Press. Rodríguez-Pérez, J. 2016. Handbook of Investigation and Effective CAPA Systems. Second ed. Milwaukee: ASQ Quality Press. Rodríguez-Pérez, J. 2019. Data Integrity and Compliance. Milwaukee: ASQ Quality Press. Rodríguez-Pérez, J. 2021. Handbook of Investigation and Effective CAPA Systems. Third ed. Milwaukee: ASQ Quality Press. Sears, Richard L. 1985. “A New Look at Accident Contributors and the Implications of Operational and Training Procedures: Influence of Training, Operational and Maintenance Practices on Flight Safety.” In Proceedings of the Flight Safety Foundation’s 38th Annual International Air Safety Seminar, 29–51. Arlington, VA: Flight Safety Institute. Senders, John W., and Neville P. Moray. 1991. Human Error: Cause, Prediction and Reduction. Hillsdale, NJ: CRC Press. Sharps, M. J., C. A. Wilson-Leff, and J. L. Price. 1995. “Relational and ItemSpecific Information as Determinants of Category Superiority Effects.” The Journal of General Psychology 122, no. 3: 271–85. Shingo, Shigeo. 1986. Zero Quality Control: Source Inspection and the PokaYoke System. New York: Productivity Press. Suchman, Lucy A. 1987. Plans and Situated Actions: The Problem of HumanMachine Communication. Learning in Doing: Social, Cognitive, and Computational Perspectives series. New York: Cambridge University Press. Swain, Alan D., and H. E. Guttmann. 1983. Handbook of Human Reliability Analysis with Emphasis on Nuclear Power Plant Applications. Washington, DC: NUREG/CR-1278. Tague, Nancy R. 2005. The Quality Toolbox. Second ed. Milwaukee: ASQ Quality Press. Trafton, J. Gregory, Erik M. Altmann, and Raj M. Ratwani. 2009. “A Memory for Goals Model of Sequence Errors.” In Proceedings of the 9th International Conference of Cognitive Modeling (paper 225). Manchester, England. US Department of Energy. 2009. Human Performance Improvement Handbook: Volume 1: Concepts and Principles. Washington, DC: Technical Standards Program. Weick, Karl E., Kathleen M. Sutcliffe. 2007. Managing the Unexpected. 2nd. ed. San Francisco: Jossey-Bass. Weick, Karl E., Kathleen M. Sutcliffe, and David Obstfeld. 1999. “Organizing for High Reliability: Processes of Collective Mindfulness.” In: Research in Organizational Behavior, vol. 21, edited by Robert I. Sutton and Barry M. Staw, 81–124. Greenwich, CT: JAI. Wickens, Christopher D., John D. Lee, Yili Liu, and Sallie E. Gordon Becker. 2004. An Introduction to Human Factors Engineering. 2nd ed. Upper Saddle River, NJ: Pearson Prentice Hall. Wolff, Alan M., Sally A. Taylor, and Janette F. McCabe. 2004. “Using Checklists and Reminders in Clinical Pathways to Improve Hospital Inpatient Care.” The Medical Journal of Australia 181, no. 8: 428–31.

Index

Note: Page numbers in italics indicate figures and tables.

A

attention failures, 36, 45–46, 232. See also slips and lapses automotive safety, 209

absent-mindedness, 45 accountability concept, 134 active failures, 37–39, 147, 148–149 activity analysis, 80–82 adequate investigation concept, 67 affordance, for error, 199–200 ALCOA principles for documentation, 224, 225t American Psychiatric Association (APA), 69 An Introduction to Human Factors Engineering (Wickens et al.), 84–85 AND relationships, 190 anticipation errors, 221 Anxiety and Depression Association of America (ADAA), 69 Applying Human Factors and Usability Engineering to Medical Devices (FDA), 26 artificial constraints, 200 Association for the Advancement of Medical Instrumentation (AAMI), 27–28 at-risk behavior, 8, 10 attachments, for written procedures, 100 attention, motivation and, 68–70 attention, principles based on, 86 attention deficit hyperactivity disorder (ADHD), 69. See also conduct disorder

B backdating, 52 barriers. See control barriers behavioral characteristics, 20 behavioral consequences, 136–138 behavior-based compliance, 57–66, 59t behaviors, nontechnical failure and, 8 Bhopal (India) disaster, 12 bilingual documents, 89 blame, in human failures, 6, 13, 61, 134, 141, 220, 231 Boeing 777 electronic checklist, 203 branching, in procedures, 105–106

C CAPA (corrective and preventive actions), 151–155, 156–158t double-digit rule, 155 fault tree analysis and, 189–190 human errors and retraining, 219–220 human factors and, 66–67 plan preparation, 151–155, 156–158t root causes and, 163–164 systematic thinking and, 60 verification of change, 153

247

248 Index

caregiver health literacy, 214 catastrophic error, 148–149 Center for Chemical Process Safety, 18–19 Centers for Disease Control and Prevention (CDC), 3 Challenger space shuttle disaster, 16, 61 Checklist Manifesto (Gawande), 203–204 checklists, 13–14, 19, 24, 43, 102, 159, 202–204 Chernobyl nuclear disaster, 8, 13–14 chronology, investigation plan, 142, 143–144 chunking, checklists and, 203 clarity principle, 101–102 coherence principle, 101, 105–106 commission errors, 41, 46t, 210, 211t communication, coordination and, 40 comparative task analysis, 26–27 competence defined, 74, 110 ISO standards for, 110–112 performance and, 109–123 training and, 41 competence management, 74–78 compliance, and quality culture, 57–66 comprehensive instructions example, 107t computer–human interface, 83–84, 130, 130-131t, 132 conditional statements (if/then), 104–105, 104t, 105t conduct disorder, 6–7, 69 conscious workspace, 68–69 consequences, behavioral, 47, 51, 136–138 control barrier analysis, 149, 150t control barriers, 20, 139, 142, 144, 151–152, 157–158t, 163–164 coordination and communication, 40 corner-cutting violations, 52, 70 correctness principle, 101, 108–109 Costco Wholesale, 61 CREAM, 192 crew resource management, 12 criticality (task parameter), 81 culture, organizational, 63–64 current good manufacturing practices (CGMP), 60

D data integrity commitment to, 226–227 defined, 224 electronic records and, 62–63 good documentation practice and, 134–136 human error and, 224–228, 225t regulatory impact and, 225–227 decoupling, defined, 209 deep structures, defined, 205 defects vs. errors, 5–6 design, human error and, 6 design, user-centered, 212, 215, 230 design for recovery, 215 design principles, 200, 205, 205t device–user interface. See computer– human interface display design, principles of, 84–87 distracted driving, 222 Do It By Design (FDA), 26–27 document design, 88–89, 130, 185 document redesign, 209–210 documentation practices, data integrity and, 224–228, 225t double-digit rule, 155 drug administration safety, 216–217 DSM-5 Diagnostic and Statistical Manual of Mental Disorders, 6

E economy principle, 101, 108 effectiveness evaluation, 154–155 electronic records, vs. hard copy, 62–63 Enron Corporation, 15 ergonomic factors, environmental, 128–129, 129t error analysis, 81 error control, 209–217, 232 error correction, 207 error detection, 201 error management approach to, 20–21 hierarchy of actions, 200–202 error prediction, 17–19 error prevention, 200–201, 232 error producing conditions (EPC), 32 error recovery, 201

error reduction, 109–110, 183–185, 200 error tolerance, 201 error-prone situations, 19–20 error-proof operation design, 83 error-provoking conditions, 49 errors. See also human errors of commission, 37 vs. defects, 5–6 detection of, 201 learning from, 138–139 of omission, 37 predictability of, 17–19 as symptoms, 1 European Medicines Agency (EMA), 226 European Union (EU), 224 exceptional violations, 52, 53–54 execution errors, 34, 46t expertise, deference to, 66 Exxon Valdez oil spill, 14–15

F facilities and equipment design and maintenance, 123–129, 128f failure, preoccupation with, 65 failure mode and effects analysis (FMEA), 185–187, 187–188 family health literacy, 214 fatigue and shift work, 40, 71–72 FDA warning letters, 134, 135–136 FDA-regulated industry, 25–29 5S visual system, 206 Flixborough disaster, 11 FMEA (failure mode and effects analysis), 185–187, 187–188 FMECA (failure mode, effects, and criticality analysis), 185–187 Food and Drug Administration (FDA) Guidance for Industry, 119 human factors and, 25–27, 131 pharmaceutical CGMP, 72–73, 87–88 risk management tools, 185, 187 software validation, 153 training requirements, 109, 119 warning letters, 64, 134, 135–136, 224–226 forcing functions, 207 forms, procedures and, 87–109

Index 249

FTA (fault tree analysis), 189, 189f, 190 Fukushima Daiichi nuclear disaster, 12, 14

G Gawande, Atul The Checklist Manifesto, 203–204 generic drugs, 26–27 “Global Patient Safety Challenge on Medication Safety” (WHO), 3–4 goal conflict and procedural deviation, 25 Greatest Management Principle in the World (LeBoeuf), 137 group competence, 77 Guidance for Industry: Quality Systems Approach to Pharmaceutical CGMP Regulations, 119 gulfs of execution and evaluation, 200

H hard copy vs. electronic records, 62–63 healthcare industry. See medical errors HEART (human error assessment and reduction techniques), 19, 192, 197 Herald of Free Enterprise ferry disaster, 13 hierarchy of actions, human error and, 201 high-reliability organizations (HRO), 64–65 hospital medical errors, 2–3 HRA tools, 19 human error investigation CAPA plan, 151–155 control barrier analysis, 149, 150t diagnostic tool for, 159–162 elements of, 156–158t investigation framework, 142–155, 156–158t overview, 141–142 questionnaire, 24

250 Index

human error probabilities (HEP), 3, 193–194 human error probability reduction checklists and, 202–204 examples of effective actions, 209– 217, 211t hierarchy of actions, 200–202 mistake-proofing, 204–207, 208– 209, 208t overview, 199–200 human errors attention and motivation and, 68–69 causes of, 19–21 classification of, 31–49 conduct disorder and, 6–7 data integrity and, 224–228, 225t versus defects, 5–7 defined, 2, 31, 32 effective procedures and forms, 89–90, 108 elimination of, 200 failure to follow procedures and, 21–25 four main groups, 37 good documentation and, 224–228, 225t high price across industries, 4–5 human factors and, 25–29, 31–32 human reliability and error prediction, 17–19 introduction to, 1–7 investigation and prevention, 233t medical safety and, 2–4 multitasking and, 222–223 normalcy and impairment, 48 organizational response to, 133–139 personal accountability and, 133–139 psychology and classification of, 31–49 randomness of, 57, 184 reduction techniques, 192 retraining and, 219–220 risk assessment of, 183–197 safety and human factors, 8–17 situational factors and, 36 statistics related to, 7–8 steps for reducing, 230–231 types of, 35f unintentional, 31

human factors attention and motivation, 68–70 behavior-based compliance and quality culture, 57–66 categories, 39–41 defined, 2, 31 design issues, 126 domains model, 57, 58f error-provoking conditions and, 49 examples in manufacturing operations, 129–132 facilities and equipment design and maintenance, 123–129 FDA-regulated industry and, 25–29 human error and, 8–17, 31–32 management of, 66–67 motivation and attention, 68–70 overview, 57 people engagement, 67–68 procedures and forms, 87–109 safety and, 8–17 supervision and staffing, 70–79 task design, 79–87 training, competence, and performance, 109–123 workplace involvement, 68–70 human factors framework, 66–67 human failures adequate investigation of, 67 classification of, 32–39 defined, 2 management systems perspective, 134 types of, 33f human performance fixing root causes, 166–181 overview, 163–166 risk assessment and, 183–184 three levels of, 32–33, 33f human reliability, error prediction and, 17–19 human reliability analysis (HRA), 18–19, 190–191 human reliability analysis event tree (HRAET), 194 human reliability engineering, 32 human reliability factors, root causes and, 173–175 human violation, 2 human–computer interface, 83–87, 130, 130–131t, 132

Index 251

I if/then statements, 104–105, 104t, 105t impairment, 48, 62 incentive programs, 70 individual competence, 77–78 influencing factors, errors and, 49 information, improved access to, 215 informative inspection, 206–207 input failure, 43, 48, 127 Institute of Medicine (IOM), 2–3, 63 instructions, incorrect or inaccurate, 23–24 intentional errors, 34. See also violations intentional noncompliance, 51–54 International Civil Aviation Organization (ICAO), 49 interpretations, simplification of, 65 interruptions, errors following, 221–223 interview process, 145–147 interview vs. interrogation, 144–145 investigation plan, 142–143 ISO 9001:2015 standard, 112, 119 ISO 9355 standard, 126–127 ISO 10018:2020 standard, 68 ISO 13485:2016 standard, 59, 109 ISO 13485:2018 standard, 111, 119 ISO/IEC 17025:2017 standard, 110

J job design, 82–83 Johns Hopkins University, 3

K Kirkpatrick model for training evaluation, 120–123, 121t knowledge, mistake-proofing, 200 knowledge-based errors, 32–33, 33f, 34, 36, 47–48

L labeling comparison, 26–27 laboratory documentation, 210

lapses, 34, 36, 41, 46t latent factors, 142, 144, 148–149, 159 latent failures, 37–39, 148–149 leadership, management vs., 59 lean principles, misapplication of, 25 learning process, 75 LeBoeuf, Michael The Greatest Management Principle in the World, 137 legibility and comprehension, 93 long-term memory, 220

M maintenance errors, 49, 125–126 management vs. leadership, 59 manufacturing industry procedures, 23, 66–67, 87 manufacturing operations, examples in, 129–132 Mars Polar Lander failure, 61 medical devices, 25–28, 42, 59 medical errors high price of human error, 2–5 human error statistics, 7 procedures, 21–25, 216 memory, reliance on, 211, 212, 220–222 memory failures, 43, 230, 232 memory principles, 86–87 mental model principles, 86 misidentification errors, 42 mistake-proofing, 204–209, 208t Mistake-Proofing for Operators: The ZQC System (Shingo), 207 mistakes. See also errors; human errors; human failures categories of, 34, 46–48 defined, 34, 36, 46 detection of, 206–207, 208t prevention of, 207–209, 208t mortality rates, medical errors and, 2–3 Morton Thiokol, 16 motivation and attention, 68–70 motivational human error, 70 multitasking, 222–223

252 Index

N National Aeronautics and Space Administration (NASA), 16, 25, 61, 193–194 National Institutes of Health (NIH), 6–7 natural constraints, 200 necessary violations, 37, 52, 54 negative reinforcement, 136–137 nondetection errors, 42 nonverbal communication, 146–147 normalcy and impairment, 48 Norman, Donald The Psychology of Everyday Things, 6, 199 nuclear disasters, 8, 12, 13–14, 28–29

O Occidental Petroleum company, 13 omission errors, 41, 44, 46t, 210, 211t on-site investigation, 144–145 operating procedures, 21–25 operation design, error-proof, 83 operational task analysis, 114 operations, sensitivity to, 66 OR relationships, 190 organizational change, 40 organizational competence, 76–77 organizational culture, 40 organizational error, 70

P parallelism, in error reduction, 184 patient health literacy, 214 penalty (behavioral consequence), 136 people engagement, 67–68 perceptual principles, 85 performance, training and competence, 109–123 performance influencing factors (PIF), 32 performance shaping factors (PSF), 17, 32, 163 perseveration errors, 221 personal accountability, 133–139 personal performance, 167–169

person–process interface, 32 Pharmaceutical Inspection Co-operation Scheme (PIC/S), 226 physical comparison, 27–28 Piper Alpha explosion, 13 plan errors, 34 poka-yoke, 210. See also mistake-proofing positive reinforcement, 136–137 PRA (probabilistic risk assessment), 196 preventive efforts, 229 proactive investigation, 190 procedure models, 22f procedure violations, 21–25 procedures and instructions basic steps for, 95–96 effective formats, 96–101, 97f, 100f how to design and write, 89–94 overview, 87–89 referencing and branching in, 105–106 root causes and, 175–179 who should write, 94–95 writing, 40, 96–101 writing principles, 101–109 process industries, human factors model, 57, 58f process safety management, 8 prospective memory, 43 The Psychology of Everyday Things (Norman), 6, 199 punishment, 134, 136

Q QMS vs. behavior-based quality, 59–60, 59g quality culture, 57–66, 65f quality systems, 9f, 10f quantitative and qualitative analysis, 191–194 quantitative information, 104

R radioactivity, medical use of, 28–29 Rasmussen, Jens, 31, 32

Index 253

reactive investigation, 190, 229 readability and comprehension, 93 readability principle, 101, 103–105 Reason, James, 31 recognition errors, 34, 41–42, 184 recovery, design for, 215 redundancy, 184 reference listed drugs (RLD), 26–27 referencing, in procedures, 105–106 resilience, commitment to, 66 resilience, organizational, 138–139 retraining, 219–223 retrieval failures, 43, 221 revision history, 101 rework procedure information, 210 Río Piedras (Puerto Rico) explosion, 15–16 risk assessment error reduction and prevention, 184–185 human reliability analysis (HRA), 190–191 overview, 183–184 quantitative and qualitative analysis, 191–194 THERP, 194–197 risk management tools, 185–190 risk priority number (RPN), 188 root causes of complex issues, 190 examples for fixing, 166–181 identification of, 147 related to human performance, 163–166 routine violations, 52 rule-based errors, 32–33, 33f, 34, 36, 46–48 Rumsfeld, Donald, 23–24

S sabotage, 54 safety culture aspects associated with, 65f building, 63–66 characteristics of, 9f FDA warning letter, 64 human factors and, 8–17 two parts of, 62

safety precautions, 99 Sandia Laboratories, 194 scoping document, 114 self-checks, 207 sensory memory, 220 sequence errors, 37, 221 shift work, 40, 71–72 Shingo, Shigeo, 204, 206 Mistake-Proofing for Operators: The ZQC System, 207 short-term memory, 220 Sinegal, Jim, 61 situational factors, 36, 147–148 situational violations, 52, 53 skill retention, 114–116, 117f, 118f skill-based errors, 32–33, 33f, 34, 41–46 skills, defined, 34 sleep loss, 40, 71–72 slips and lapses, 34, 41, 45–46, 46t software validation, 153 source inspection, 207 Space Shuttle Challenger accident, 61 staffing levels, 40, 70–79, 72–74 standardization, 212–213 storage failure, 43 successive checks, 207 Suchman, Lucy A., 22–23 supervision and management, 70–79, 79f, 166, 180–181 Swain, Allan D., 194 Swiss cheese model of system failure, 17, 20, 39, 148–149 system improvement, 20–21 systems perspective, in human failures management, 134

T task analysis, 80–82 task complexity, 19, 79–80 task design, 79–87, 175–179 task saturation, 222 task switching, 222 team competence, 77 team training, 214 Texas City Refinery explosion, 17 technology errors, 215 Tenerife Airport collision, 11–12

254 Index

THERP (technique for human errorrate prediction), 19, 192, 194–197 Three Mile Island mechanical failure, 12, 83–87 time pressure, 40 timing errors, 37 To Err is Human (IOM), 63, 211 top events, 190 training competence and performance, 40, 109–123 deficiencies in, 112–113 defined, 109 effectiveness of, 118–123, 121t effectiveness verification, 18 gap analysis and, 114 plans for, 114 root causes and, 169–172 typical program errors, 112–113 training effectiveness evaluation, Kirkpatrick model for, 120–121 training needs analysis (TNA), 113–114

U UDA model, 115–116, 117t, 118t unintentional human error, 31, 34 “unknown known” statement, 23–24 U.S. Army Research Institute’s Users’ Decision AID (UDA) model., 115–116 U.S. Centers for Disease Control and Prevention (CDC), 3 U.S. Department of Health and Human Services, 3 U.S. Institute of Medicine (IOM), 211 U.S. National Transportation Safety Board (NTSB), 16 U.S. Nuclear Regulatory Commission, 194–195195 usability engineering, 132 user-centered design, 211, 212

V vigilance decrement, 42, 69, 213–214 violations categories of, 36–37, 51, 52–54 common examples of, 47 defined, 32, 33f, 36 error-provoking conditions and, 49 intentional, 8, 34 procedural, 21 visual aids, 107t visual systems, in mistake prevention, 206

W Wickens, Christopher, 84–85 wide structures, defined, 205 work instructions, 21–22, 95–101, 139, 148, 159–160 work redesign, 83 worker safety, 213 working from memory, 220–222 working instructions (batch record), 164–165 workload and staffing levels, 72–74 workplace design, 81–82 workplace injuries, 8 workplace involvement, 68–70 workplace safety, 10 World Health Organization (WHO), 3–4 wrong-side errors, 205

Y Yerkes-Dodson law, 73

Z ZQC system, 207